Update: GRPO scripts with verl
see verl-grpo.sh and our tech report.
Training scripts with 360-LLaMA-Factory
Usage:
-
follow installation of 360-LLaMA-Factory
-
place e.g.
train-dpo.shin your git-cloned 360-LLaMA-Factory's root directory (same hierarchy as 360-example.sh) -
register your dataset (e.g. Light-R1-DPO) in dataset_info.json
"light-r1-dpo": {
"file_name": "/path/to/dpo-pairs.json",
"ranking": true,
"formatting": "sharegpt",
"columns": {
"messages": "conversations",
"chosen": "chosen",
"rejected": "rejected"
}
},- fill in the missing arguments in
train-dpo.shandsh train-dpo.sh