Dataset

Dataset for RL-based model and Reward model

Dataset for Reward Model


source

PairDataset

 PairDataset (dataset:Iterable, tokenizer:Callable, max_length:int=1024)

Pairwise dataset for train reward model.

Type Default Details
dataset typing.Iterable A dataset
tokenizer typing.Callable The tokenizer of the reward model
max_length int 1024 Max context length of the reward model

Dataset for PPO Agent


source

PromptDataset

 PromptDataset (dataset:Iterable, tokenizer:Callable, max_length:int=1024)

Dataset for train RL-based language model.

Type Default Details
dataset typing.Iterable A dataset
tokenizer typing.Callable The tokenizer of the language model
max_length int 1024 Max context length of the language model