Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

README.md

Update: GRPO scripts with verl

see verl-grpo.sh and our tech report.

Training scripts with 360-LLaMA-Factory

Usage:

  1. follow installation of 360-LLaMA-Factory

  2. place e.g. train-dpo.sh in your git-cloned 360-LLaMA-Factory's root directory (same hierarchy as 360-example.sh)

  3. register your dataset (e.g. Light-R1-DPO) in dataset_info.json

  "light-r1-dpo": {
    "file_name": "/path/to/dpo-pairs.json",
    "ranking": true,
    "formatting": "sharegpt",
    "columns": {
      "messages": "conversations",
      "chosen": "chosen",
      "rejected": "rejected"
    }
  },
  1. fill in the missing arguments in train-dpo.sh and sh train-dpo.sh