The resources related to the trustworthiness of large models (LMs) across multiple dimensions (e.g., safety, security, and privacy), with a special focus on multi-modal LMs (e.g., vision-language models and diffusion models).
-
This repo is in progress 🌱 (currently manually collected).
-
Badges:
-
🌻 Welcome to recommend resources to us via Issues with the following format (please fill in this table):
Title | Link | Code | Venue | Classification | Model | Comment |
---|---|---|---|---|---|---|
aa | arxiv | github | bb'23 | A1. Jailbreak | LLM | Agent |
- [2023.01.20] 🔥 We collect
3
related papers from NDSS'24! - [2023.01.17] 🔥 We collect
108
related papers from ICLR'24! - [2023.01.09] 🔥 LM-SSP is released!
- Book (1)
- Competition (5)
- Leaderboard (3)
- Toolkit (3)
- Survey (23)
- Paper
- A. Safety
- A1. Jailbreak (107)
- A2. Alignment (43)
- A3. Deepfake (26)
- A4. Ethics (5)
- A5. Fairness (44)
- A6. Hallucination (83)
- A7. Prompt Injection (6)
- A8. Toxicity (40)
- B. Security
- B1. Adversarial Examples (52)
- B2. Poisoning (35)
- B3. System (3)
- C. Privacy
- C1. Contamination (6)
- C2. Copyright (31)
- C3. Data Reconstruction (10)
- C4. Extraction (5)
- C5. Inference (23)
- C6. Privacy-Preserving Computation (15)
- C7. Unlearning (18)
- A. Safety
-
Organizers: Tianshuo Cong (丛天硕), Xinlei He (何新磊), Zhengyu Zhao (赵正宇), Yugeng Liu (刘禹更), Delong Ran (冉德龙)
-
This project is inspired by LLM Security, Awesome LLM Security, LLM Security & Privacy, UR2-LLMs, PLMpapers, EvaluationPapers4ChatGPT