WayXG

Follow

WayXG

Follow

1 follower · 2 following

Achievements

Achievements

Popular repositories Loading

Online-RLHF Online-RLHF Public

Forked from RLHFlow/Online-RLHF

A recipe for online RLHF.

Python
RLHF-Reward-Modeling RLHF-Reward-Modeling Public

Forked from RLHFlow/RLHF-Reward-Modeling

Recipes to train reward model for RLHF.

Python
ToRA ToRA Public

Forked from WeiXiongUST/ToRA

ToRA is a series of Tool-integrated Reasoning LLM Agents designed to solve challenging mathematical reasoning problems by interacting with tools [ICLR'24].

Python
RLHF4MATH_Dev RLHF4MATH_Dev Public

Python
preference-construction preference-construction Public

Python
RAFT RAFT Public

Forked from RLHFlow/RAFT

This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or rejection sampling fine-tuning.

Python