FYYFU

fyyfu FYYFU

security

9 repositories

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]

Python 177 16 Updated Sep 26, 2024

Robust recipes to align language models with human and AI preferences

Python 4,521 392 Updated Sep 23, 2024

Papers and resources related to the security and privacy of LLMs 🤖

Python 398 31 Updated Sep 9, 2024

Interpretability for sequence generation models 🐛 🔍

Python 362 36 Updated Aug 22, 2024

The jailbreak-evaluation is an easy-to-use Python package for language model jailbreak evaluation.

Python 19 3 Updated Sep 6, 2024

A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).

775 49 Updated Sep 27, 2024

[NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey

66 6 Updated Aug 7, 2024

A curation of awesome tools, documents and projects about LLM Security.

881 86 Updated Aug 29, 2024

ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.

Python 62 6 Updated May 9, 2024