rlaif

Here are 7 public repositories matching this topic...

argilla-io / distilabel

⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.

python ai openai synthetic-data synthetic-dataset-generation huggingface llms rlhf rlaif

Updated Aug 9, 2024
Python

mengdi-li / awesome-RLAIF

Star

A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)

alignment rl llms rlhf rlaif

Updated Jun 27, 2024

vicgalle / zero-shot-reward-models

Sponsor

Star

ZYN: Zero-Shot Reward Models with Yes-No Questions

reinforcement-learning zero-shot llm rlhf reward-models trlx rlaif

Updated Aug 15, 2023
Python

holarissun / Prompt-OIRL

Star

code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning

inverse-reinforcement-learning irl offline-rl large-language-models llm prompt-engineering rlhf rlaif offline-irl

Updated Mar 20, 2024
Python

vicgalle / distilled-self-critique

Sponsor

Star

distilled Self-Critique refines the outputs of a LLM with only synthetic data

synthetic-data llm rlaif self-critique

Updated Apr 11, 2024
Jupyter Notebook

vicgalle / awesome-rlaif

Sponsor

Star

A curated and updated list of relevant articles and repositories on Reinforcement Learning from AI Feedback (RLAIF)

awesome research language-model llm rlhf rlaif

Updated Jan 24, 2024

zhaochen0110 / Timo

Star

Timo: Towards Better Temporal Reasoning for Language Models (COLM 2024)

temporal-reasoning sota-model llms rlhf rlaif llm-as-a-judge llm-as-evaluator self-critic-framework

Updated Jul 3, 2024
Python

Improve this page

Add a description, image, and links to the rlaif topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the rlaif topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rlaif

Here are 7 public repositories matching this topic...

argilla-io / distilabel

mengdi-li / awesome-RLAIF

vicgalle / zero-shot-reward-models

holarissun / Prompt-OIRL

vicgalle / distilled-self-critique

vicgalle / awesome-rlaif

zhaochen0110 / Timo

Improve this page

Add this topic to your repo