"On the Privacy Risks of Algorithmic Recourse". Martin Pawelczyk, Himabindu Lakkaraju* and Seth Neel*. In International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR, 2023.

Jupyter Notebook 5 Updated Mar 26, 2023

princeton-nlp / QuRating

[ICML 2024] Selecting High-Quality Data for Training Language Models

Python 108 9 Updated Jun 20, 2024

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 5,706 1,521 Updated Jul 1, 2024

JailbreakBench / jailbreakbench

An Open Robustness Benchmark for Jailbreaking Language Models [arXiv 2024]

Python 109 12 Updated Jun 13, 2024

SafeAILab / RAIN

[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning

Python 71 4 Updated May 23, 2024

nlp-titech / samia

Source code for paper "Sampling-based Pseudo-Likelihood for Membership Inference Attacks".

Python 4 1 Updated Apr 18, 2024

UKPLab / CATfOOD

Enhancing small language models with LLM generated counterfactuals.

Python 5 Updated Sep 15, 2023

Nicolas-BZRD / llm-recipes

Python 10 3 Updated Mar 13, 2024

googleinterns / localizing-paragraph-memorization

Jupyter Notebook 12 Updated Feb 21, 2024

zjysteven / mink-plus-plus

Min-K%++: Improved baseline for detecting pre-training data of LLMs https://arxiv.org/abs/2404.02936

Python 21 2 Updated Jun 10, 2024

chawins / pal

PAL: Proxy-Guided Black-Box Attack on Large Language Models

Python 37 3 Updated Jun 2, 2024

eth-sri / llmprivacy

Python 31 5 Updated Jun 13, 2024

AI-secure / DecodingTrust

A Comprehensive Assessment of Trustworthiness in GPT Models

Python 225 50 Updated Jun 19, 2024

ltroin / llm_attack_defense_arena

Python 60 4 Updated Apr 11, 2024

LLM-Tuning-Safety / LLMs-Finetuning-Safety

We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.

Python 195 17 Updated Feb 23, 2024

XuandongZhao / weak-to-strong

Weak-to-Strong Jailbreaking on Large Language Models

Python 55 7 Updated Feb 21, 2024

iamgroot42 / mimir

Python package for measuring memorization in LLMs.

Jupyter Notebook 79 11 Updated May 10, 2024

ffhibnese / Model-Inversion-Attack-ToolBox

A comprehensive toolbox for model inversion attacks and defenses, which is easy to get started.

Python 97 3 Updated Jun 22, 2024

seclab-yonsei / amplifying-exposure

Python 2 Updated Feb 21, 2024

xehartnort / few-shot-mia

Few-Shot Membership Inference Attacks

Python 1 Updated Apr 24, 2023

tanjeffreyz / neighborhood-mia

Implementation of "Membership Inference Attacks against Language Models via Neighbourhood Comparison" by Justus Mattern, Fatemehsadat Mireshghallah, Zhijing Jin, Bernhard Schölkopf, Mrinmaya Sachan…

Python 1 Updated Nov 13, 2023

thu-coai / Targeted-Data-Extraction

Official Code for ACL 2023 paper: "Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation"

Python 20 Updated May 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Klaus Hipp khipp

Achievements

Achievements

Highlights

Block or report khipp

Stars

karpathy / LLM101n

wangyu-ustc / MemoryLLM

huggingface / transformers

zed-industries / zed

ahans30 / goldfish-loss

rasbt / LLMs-from-scratch

zou-group / textgrad

LiveBench / LiveBench

MartinPawel / CounterfactualDistanceAttack