Stars
Recipes to train reward model for RLHF.
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Example models using DeepSpeed
A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。
Code for "Learning to summarize from human feedback"
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically Ch…
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
Train transformer language models with reinforcement learning.
Collection of links, tutorials and best practices of how to collect the data and build end-to-end RLHF system to finetune Generative AI models
Code and documentation to train Stanford's Alpaca models, and generate the data.
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
A curated list of reinforcement learning with human feedback resources (continually updated)
A simple and well styled PPO implementation. Based on my Medium series: https://medium.com/@eyyu/coding-ppo-from-scratch-with-pytorch-part-1-4-613dfc1b14c8.
Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学
Application for managing bookshelves on project gutenberg site.
Extract data from a wide range of Internet sources into a pandas DataFrame.
Python module to get real-time stock data from Google Finance API
For trading. Please star.
Gathers machine learning and deep learning models for Stock forecasting including trading bots and simulations