Highlights
- Pro
Block or Report
Block or report waterwaterrr
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
Arena-Hard-Auto: An automatic LLM benchmark.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com
Robust recipes to align language models with human and AI preferences
[ACL2023] We introduce LLM-Blender, an innovative ensembling framework to attain consistently superior performance by leveraging the diverse strengths of multiple open-source LLMs. LLM-Blender cut …
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Towards Real-World Writing Assistance: A Chinese Character Checking Benchmark with Faked and Misspelled Characters
The repository of EMNLP 2023 "MixEdit: Revisiting Data Augmentation and Beyond for Grammatical Error Correction"
[AAAI 2024] MESED: A Multi-modal Entity Set Expansion Dataset with Fine-grained Semantic Classes and Hard Negative Entities
Automatically split your PyTorch models on multiple GPUs for training & inference
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
[ICLR 2021] Contrastive Learning with Adversarial Perturbations for Conditional Text Generation
A high-throughput and memory-efficient inference and serving engine for LLMs
Code for ACL 2024 paper "TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space"
A natural language interface for computers
Reference implementation for DPO (Direct Preference Optimization)
Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"
fanqiwan / KCA
Forked from 18907305772/KCAKnowledge Verification to Nip Hallucination in the Bud
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
A framework for few-shot evaluation of language models.
A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"