Lists (1)
Sort Name ascending (A-Z)
Stars
official implementation of paper "Process Reward Model with Q-value Rankings"
Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasoning elevation🍓 and hallucination alleviation🍄.
A survey on harmful fine-tuning attack for large language model
A reading list on LLM based Synthetic Data Generation 🔥
MULFE: Multi-Level Benchmark for Free Text Model Editing
Sparse Autoencoder for Mechanistic Interpretability
A curated list of Large Language Model (LLM) Interpretability resources.
Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
A collection of AWESOME things about mixture-of-experts
Code and data for "Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?" (ACL 2024)
Code and data for "MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning" (ICLR 2024)
[ICML 2024] TrustLLM: Trustworthiness in Large Language Models
A modular graph-based Retrieval-Augmented Generation (RAG) system
This repository collects all relevant resources about interpretability in LLMs
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024
[NeurIPS 2024] Large Language Model Unlearning via Embedding-Corrupted Prompts
Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873
The official implementation of the ICML 2024 paper "MemoryLLM: Towards Self-Updatable Large Language Models"
[ICLR24 (Spotlight)] "SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation" by Chongyu Fan*, Jiancheng Liu*, Yihua Zhang, Eric Wong, D…
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".
A curated list of resources dedicated to retrieval-augmented generation (RAG).
[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use
Attribute (or cite) statements generated by LLMs back to in-context information.
Steering Llama 2 with Contrastive Activation Addition