Dataset
Dataset for RL-based model and Reward model
Dataset for Reward Model
PairDataset
PairDataset (dataset:Iterable, tokenizer:Callable, max_length:int=1024)
Pairwise dataset for train reward model.
Type | Default | Details | |
---|---|---|---|
dataset | typing.Iterable | A dataset | |
tokenizer | typing.Callable | The tokenizer of the reward model | |
max_length | int | 1024 | Max context length of the reward model |
Dataset for PPO Agent
PromptDataset
PromptDataset (dataset:Iterable, tokenizer:Callable, max_length:int=1024)
Dataset for train RL-based language model.
Type | Default | Details | |
---|---|---|---|
dataset | typing.Iterable | A dataset | |
tokenizer | typing.Callable | The tokenizer of the language model | |
max_length | int | 1024 | Max context length of the language model |