Utility functions
load_yaml
load_yaml (config_path)
RLHFConfig
RLHFConfig (epsilon:float=0.1, ent_coef:float=0.01, vf_coef:float=0.1)
Reference Model
create_reference_model
create_reference_model (model)
Configs
ModelConfig
ModelConfig (model_path:str)
TokenizerConfig
TokenizerConfig (tokenizer_path:str)
OptimizerConfig
OptimizerConfig (lr:float=0.0001, eps:float=1e-08, weight_decay:float=1e-06)
TrainerConfig
TrainerConfig (epochs:int=20)
PPOConfig
PPOConfig (ent_coef:float=0.01, vf_coef:float=0.5)
InstructConfig
InstructConfig (model:__main__.ModelConfig, tokenizer:__main__.TokenizerConfig)