Skip to content
View FYYFU's full-sized avatar

Block or report FYYFU

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

security

9 repositories

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]

Python 177 16 Updated Sep 26, 2024

Robust recipes to align language models with human and AI preferences

Python 4,521 392 Updated Sep 23, 2024

Papers and resources related to the security and privacy of LLMs 🤖

Python 398 31 Updated Sep 9, 2024

Interpretability for sequence generation models 🐛 🔍

Python 362 36 Updated Aug 22, 2024

The jailbreak-evaluation is an easy-to-use Python package for language model jailbreak evaluation.

Python 19 3 Updated Sep 6, 2024

A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).

775 49 Updated Sep 27, 2024

[NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey

66 6 Updated Aug 7, 2024

A curation of awesome tools, documents and projects about LLM Security.

881 86 Updated Aug 29, 2024

ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.

Python 62 6 Updated May 9, 2024