[NeurIPS 2023] We use large language models as commonsense world model and heuristic policy within Monte-Carlo Tree Search, enabling better-reasoned decision-making for daily task planning problems.

Python 159 17 Updated May 23, 2024

amix / vimrc

The ultimate Vim configuration (vimrc)

Vim Script 30,624 7,287 Updated Oct 6, 2024

zotero / zotero

Zotero is a free, easy-to-use tool to help you collect, organize, annotate, cite, and share your research sources.

JavaScript 10,143 748 Updated Oct 7, 2024

MuiseDestiny / zotero-gpt

GPT Meet Zotero.

TypeScript 4,997 200 Updated Sep 23, 2024

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 10,193 2,291 Updated Oct 7, 2024

argilla-io / distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Python 1,481 115 Updated Oct 7, 2024

allenai / reward-bench

RewardBench: the first evaluation tool for reward models.

Python 379 48 Updated Oct 6, 2024

tatsu-lab / alpaca_farm

A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.

Python 767 60 Updated Jul 1, 2024

openai / transformer-debugger

Python 4,020 233 Updated Jun 4, 2024

karpathy / minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 9,080 840 Updated Jul 1, 2024

microsoft / promptbench

A unified evaluation framework for large language models

Python 2,407 179 Updated Sep 12, 2024

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)

Python 2,151 212 Updated Oct 6, 2024

SparkJiao / llama-pipeline-parallel

A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to copy code and launch discussions about the problems you hav…

Python 45 2 Updated Jul 4, 2023