CurryxIaoHu

Follow

Zhihao Xu CurryxIaoHu

Follow

student of Shandong University

2 followers · 1 following

Shandong University

Achievements

Achievements

Starred repositories

princeton-nlp / benign-data-breaks-safety

Python 12 Updated Jun 17, 2024

princeton-nlp / LESS

[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning

Jupyter Notebook 328 26 Updated Jun 29, 2024

alexrame / rewardedsoups

Rewarded soups official implementation

HTML 43 4 Updated Sep 27, 2023

OpenBMB / CPO

Python 8 1 Updated Jul 16, 2024

ZHZisZZ / modpo

[ACL'2024] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization

Python 41 3 Updated Aug 20, 2024

YihanWang617 / llm-jailbreaking-defense

A lightweight library for large laguage model (LLM) jailbreaking defense.

Python 26 3 Updated Aug 16, 2024

rishub-tamirisa / tamper-resistance

Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"

Python 32 3 Updated Aug 21, 2024

jybai / concept-gradients

Official repository for ``Concept-based Interpretation Without Linear Assumption'' published in ICLR 2023.

Python 1 Updated Feb 5, 2024

sciai-lab / Truth_is_Universal

Jupyter Notebook 7 1 Updated Sep 3, 2024

likenneth / honest_llama

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

Python 426 33 Updated Aug 25, 2024

hongjunyan / pytorch-news-recommendation

Python 2 Updated Oct 4, 2022

SAI990323 / TALLRec

Python 190 31 Updated Feb 5, 2024

Xuan-ZW / LKPNR

Jupyter Notebook 16 1 Updated Aug 23, 2023

zhengzhi-1997 / LLM-TRSR

Python 5 3 Updated Jun 11, 2024

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 6,338 1,677 Updated Sep 7, 2024

LiuAmber / RAHF

Python 12 Updated Aug 15, 2024

yihuaihong / ConceptVectors

ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"

Jupyter Notebook 14 1 Updated Sep 3, 2024

cooperleong00 / Awesome-LLM-Interpretability

A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..

105 4 Updated Sep 7, 2024

mst272 / LLM-Dojo

欢迎来到 LLM-Dojo，这里是一个开源大模型学习场所，使用简洁且易阅读的代码构建模型训练框架(支持各种主流模型如Qwen、Llama、GLM等等)、RLHF框架(DPO/CPO/KTO/PPO)等各种功能。👩‍🎓👨‍🎓

Python 207 14 Updated Aug 29, 2024

RLHFlow / Online-RLHF

A recipe for online RLHF.

Python 372 42 Updated Aug 21, 2024

GraySwanAI / circuit-breakers

Improving Alignment and Robustness with Circuit Breakers

Jupyter Notebook 118 15 Updated Jul 12, 2024

raghavc / LLM-RLHF-Tuning-with-PPO-and-DPO

Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various c…

Python 105 9 Updated Mar 18, 2024

RLHFlow / Directional-Preference-Alignment

Directional Preference Alignment

44 2 Updated May 23, 2024

RLHFlow / RLHF-Reward-Modeling

Recipes to train reward model for RLHF.

Python 609 51 Updated Aug 28, 2024

nggsam / preference_model

Trains and compares a variety of preference models (reward models) with different losses and datasets.

Python 1 Updated Mar 4, 2023

sdm-4400 / Reward-Model-PPO-training

Jupyter Notebook 1 Updated Apr 23, 2024

kevinyaobytedance / llm_unlearn

LLM Unlearning

Python 111 15 Updated Oct 20, 2023

makotu1208 / Otto-kaggle-solution-makotupart

kaggle:otto competition

Python 14 2 Updated Feb 13, 2023

yamanalab / gpt-augmented-news-recommendation

News Recommendation with Category Description by a Large Language Model

Python 2 2 Updated May 13, 2024

Jyonn / ONCE

(WSDM 2024) Official implementation of the paper "ONCE: Boosting Content-based Recommendation with Both Open- and Closed-source Large Language Models"

Python 65 5 Updated Apr 18, 2024

Starred topics

Game engine

languages