rlhf

Here are 159 public repositories matching this topic...

LAION-AI / Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

python machine-learning ai nextjs discord-bot assistant language-model chatgpt rlhf

Updated Aug 17, 2024
Python

hiyouga / LLaMA-Factory

Star

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Updated Oct 30, 2024
Python

RUCAIBox / LLMSurvey

Star

The official GitHub page for the survey paper "A Survey of Large Language Models".

natural-language-processing pre-training pre-trained-language-models in-context-learning large-language-models llm llms chain-of-thought chatgpt rlhf instruction-tuning

Updated Aug 20, 2024
Python

ymcui / Chinese-LLaMA-Alpaca-2

Star

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

nlp yarn llama alpaca 64k large-language-models llm rlhf flash-attention llama2 llama-2 alpaca-2 alpaca2

Updated Sep 23, 2024
Python

InternLM / InternLM

Star

Official release of InternLM2.5 base and chat models. 1M context support

chatbot chinese gpt pretrained-models llm long-context rlhf large-language-model flash-attention fine-tuning-llm

Updated Oct 10, 2024
Python

huggingface / alignment-handbook

Star

Robust recipes to align language models with human and AI preferences

transformers llm rlhf

Updated Oct 7, 2024
Python

argilla-io / argilla

Star

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

nlp machine-learning natural-language-processing ai weak-supervision developer-tools active-learning annotation-tool text-annotation weakly-supervised-learning human-in-the-loop mlops text-labeling gpt-4 llm langchain rlhf

Updated Oct 31, 2024
Python

hiyouga / ChatGLM-Efficient-Tuning

Star

Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调

transformers pytorch lora language-model alpaca fine-tuning peft huggingface chatgpt rlhf chatglm qlora chatglm2

Updated Oct 12, 2023
Python

opendilab / awesome-RLHF

Star

A curated list of reinforcement learning with human feedback resources (continually updated)

reinforcement-learning deep-learning deep-reinforcement-learning large-language-models human-feedback rlhf

Updated Oct 28, 2024

Docta-ai / docta

Star

A Doctor for your data

data language-model data-curation data-centric-ai data-diagnosis data-centric-machine-learning rlhf

Updated Aug 7, 2024
Python

argilla-io / distilabel

Star

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

python ai openai synthetic-data synthetic-dataset-generation huggingface llms rlhf rlaif

Updated Oct 29, 2024
Python

THUDM / WebGLM

Star

WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)

llm chatgpt rlhf webglm

Updated Jul 29, 2023
Python

tatsu-lab / alpaca_eval

Star

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

nlp deep-learning leaderboard evaluation instruction-following foundation-models large-language-models rlhf

Updated Oct 23, 2024
Jupyter Notebook

PKU-Alignment / safe-rlhf

Star

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Updated Jun 13, 2024
Python

THUDM / ImageReward

Star

[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation

generative-model diffusion-models human-preferences rlhf

Updated Oct 3, 2024
Python

xtreme1-io / xtreme1

Star

Xtreme1 is an all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM.

computer-vision image-annotation annotation point-cloud image-classification annotation-tool 3d-annotation labeling-tool multimodal lidar-object-tracking image-labelling-tool lidar-object-detection lidar-camera-fusion lidar-annotation rlhf

Updated Oct 8, 2024
TypeScript

RLHFlow / RLHF-Reward-Modeling

Star

Recipes to train reward model for RLHF.

llm rlhf reward-models llama3

Updated Sep 23, 2024
Python

ContextualAI / HALOs

Star

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

alignment ppo halos dpo kto rlhf

Updated Oct 26, 2024
Python

princeton-nlp / SimPO

Star

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward

alignment large-language-models rlhf preference-alignment

Updated Oct 29, 2024
Python

GaryYufei / AlignLLMHumanSurvey

Star

Aligning Large Language Models with Human: A Survey

awesome survey llama gpt-4 large-language-models llms chatgpt rlhf supervised-finetuning llama2 chinese-llama

Updated Sep 11, 2023

Improve this page

Add a description, image, and links to the rlhf topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the rlhf topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rlhf

Here are 159 public repositories matching this topic...

LAION-AI / Open-Assistant

hiyouga / LLaMA-Factory

RUCAIBox / LLMSurvey

ymcui / Chinese-LLaMA-Alpaca-2

InternLM / InternLM

huggingface / alignment-handbook

argilla-io / argilla

hiyouga / ChatGLM-Efficient-Tuning

opendilab / awesome-RLHF

Docta-ai / docta

argilla-io / distilabel

THUDM / WebGLM

tatsu-lab / alpaca_eval

PKU-Alignment / safe-rlhf

THUDM / ImageReward

xtreme1-io / xtreme1

RLHFlow / RLHF-Reward-Modeling

ContextualAI / HALOs

princeton-nlp / SimPO

GaryYufei / AlignLLMHumanSurvey

Improve this page

Add this topic to your repo