Popular repositories Loading
-
-
RLHF-Reward-Modeling
RLHF-Reward-Modeling PublicForked from RLHFlow/RLHF-Reward-Modeling
Recipes to train reward model for RLHF.
Python
-
ToRA
ToRA PublicForked from WeiXiongUST/ToRA
ToRA is a series of Tool-integrated Reasoning LLM Agents designed to solve challenging mathematical reasoning problems by interacting with tools [ICLR'24].
Python
-
-
-
RAFT
RAFT PublicForked from RLHFlow/RAFT
This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or rejection sampling fine-tuning.
Python
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.