![scikit-learn logo](https://raw.githubusercontent.com/github/explore/80688e429a7d4ef2fca1e82350fe8e3517d3494d/topics/scikit-learn/scikit-learn.png)
Block or Report
Block or report yechenzhi
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLanguage
Sort by: Recently starred
Starred repositories
SGLang is yet another fast serving framework for large language models and vision language models.
Materials for the Hugging Face Diffusion Models Course
A curated list of reinforcement learning with human feedback resources (continually updated)
Custom data types and layouts for training and inference
Self-Explore to avoid ️the p️️it! Improving the Reasoning Capabilities of Language Models with Fine-grained Rewards
A Native-PyTorch Library for LLM Fine-tuning
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
Firefly: 大模型训练工具,支持训练Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
A framework for few-shot evaluation of language models.
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Train transformer language models with reinforcement learning.
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
The official implementation of Self-Play Fine-Tuning (SPIN)
A Toolkit for Distributional Control of Generative Models
A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
Reference implementation for DPO (Direct Preference Optimization)
[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
A framework for prompt tuning using Intent-based Prompt Calibration
Scenic: A Jax Library for Computer Vision Research and Beyond
Mixture-of-Experts for Large Vision-Language Models
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
Implementation of various self-attention mechanisms focused on computer vision. Ongoing repository.
Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI