Comparing changes

* Support reward model and dpo * support train reward model * fix config * fix lint * fix lint * support jsonl dataset * feat: support ORPO * reorg configs * rename collate function * rename collate function * use varlen attention in validation * fix lint * fix lint * rebase main * update * add reference and update dpo loss * inherit sft * fix broadcast * fix nan loss skip * support reward model sp * support dpo sp * support orpo sp * fix bugs * fix rebase * convert script * fix precommit * mv convert script to model * fix version check * fix import * add comments of reward token * fix orpo cfg * fix lint * fix lint * remove seed * remove seed * add sp config * add reward sp config * fix convert * fix lora reward model convert * fix qlora reward merge * update dpo loss * log reward acc and margin in dpo * update logits mask * unpack logits first * more loss setting in dpo cfgs * more loss setting in orpo cfgs --------- Co-authored-by: HIT-cwh <[email protected]>

fix dispatch bugs

… withou… (#774) fix HFCheckpointHook bugs when training deepseekv2 and mixtral without shard moe

… head num (#769) * Support the scenario where sp size is not divisible by attn head num * refactor attention.py * do not have to set sp_inner_size in config * rename * fix lint

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Commits on Jun 13, 2024

Commits on Jun 17, 2024

This comparison is taking too long to generate.