Skip to content
View CurryxIaoHu's full-sized avatar
  • Shandong University

Block or report CurryxIaoHu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning

Jupyter Notebook 328 26 Updated Jun 29, 2024

Rewarded soups official implementation

HTML 43 4 Updated Sep 27, 2023
Python 8 1 Updated Jul 16, 2024

[ACL'2024] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization

Python 41 3 Updated Aug 20, 2024

A lightweight library for large laguage model (LLM) jailbreaking defense.

Python 26 3 Updated Aug 16, 2024

Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"

Python 32 3 Updated Aug 21, 2024

Official repository for ``Concept-based Interpretation Without Linear Assumption'' published in ICLR 2023.

Python 1 Updated Feb 5, 2024
Jupyter Notebook 7 1 Updated Sep 3, 2024

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

Python 426 33 Updated Aug 25, 2024
Python 190 31 Updated Feb 5, 2024
Jupyter Notebook 16 1 Updated Aug 23, 2023
Python 5 3 Updated Jun 11, 2024

A framework for few-shot evaluation of language models.

Python 6,338 1,677 Updated Sep 7, 2024
Python 12 Updated Aug 15, 2024

ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"

Jupyter Notebook 14 1 Updated Sep 3, 2024

A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..

105 4 Updated Sep 7, 2024

欢迎来到 LLM-Dojo,这里是一个开源大模型学习场所,使用简洁且易阅读的代码构建模型训练框架(支持各种主流模型如Qwen、Llama、GLM等等)、RLHF框架(DPO/CPO/KTO/PPO)等各种功能。👩‍🎓👨‍🎓

Python 207 14 Updated Aug 29, 2024

A recipe for online RLHF.

Python 372 42 Updated Aug 21, 2024

Improving Alignment and Robustness with Circuit Breakers

Jupyter Notebook 118 15 Updated Jul 12, 2024

Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various c…

Python 105 9 Updated Mar 18, 2024

Directional Preference Alignment

44 2 Updated May 23, 2024

Recipes to train reward model for RLHF.

Python 609 51 Updated Aug 28, 2024

Trains and compares a variety of preference models (reward models) with different losses and datasets.

Python 1 Updated Mar 4, 2023
Jupyter Notebook 1 Updated Apr 23, 2024

LLM Unlearning

Python 111 15 Updated Oct 20, 2023

kaggle:otto competition

Python 14 2 Updated Feb 13, 2023

News Recommendation with Category Description by a Large Language Model

Python 2 2 Updated May 13, 2024

(WSDM 2024) Official implementation of the paper "ONCE: Boosting Content-based Recommendation with Both Open- and Closed-source Large Language Models"

Python 65 5 Updated Apr 18, 2024
Next