WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning method which reduces LLM performance on WMDP while retaining …

Jupyter Notebook 53 13 Updated Apr 27, 2024

penghui-yang / awesome-data-poisoning-and-backdoor-attacks

A curated list of papers & resources linked to data poisoning, backdoor attacks and defenses against them

147 16 Updated May 14, 2024

AtsuMiyai / UPD

[arXiv2024] Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models

Python 55 4 Updated Jul 7, 2024

arcee-ai / mergekit

Tools for merging pretrained large language models.

Python 4,078 356 Updated Jul 10, 2024

apple / ml-veclip

The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"

Jupyter Notebook 201 8 Updated Jul 1, 2024

MadryLab / pretraining-distribution-shift-robustness

Jupyter Notebook 12 Updated Mar 4, 2024

YuheD / awesome-performance-evaluation

5 Updated Jul 3, 2024

niconi19 / LLM-Conversation-Safety

[NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey

49 4 Updated Mar 19, 2024

YuheD / awesome-model-transferability-estimation

A collection of model transferability estimation methods.

16 Updated Feb 19, 2024

ffhibnese / Model-Inversion-Attack-ToolBox

A comprehensive toolbox for model inversion attacks and defenses, which is easy to get started.

Python 99 3 Updated Jul 10, 2024

centerforaisafety / HarmBench

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Jupyter Notebook 211 33 Updated Jul 11, 2024

hammoudhasan / SynthCLIP

Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.

Python 79 1 Updated Mar 24, 2024

ybwang119 / label_recovery

[ICLR 2024] Towards Elminating Hard Label Constraints in Gradient Inverision Attacks

Jupyter Notebook 12 Updated Feb 6, 2024

cnut1648 / Model-Fingerprint

Fingerprint large language models

Python 13 1 Updated Jul 11, 2024

mrflogs / ICLR24

Official code for ICLR 2024 paper, "A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation"

Python 57 1 Updated Apr 21, 2024

ThuCCSLab / Awesome-LM-SSP

A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).

577 35 Updated Jul 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tim tim-learn

Achievements

Achievements

Block or report tim-learn

Stars

prateeky2806 / ties-merging

yuyongcan / STAMP

test-time-training / ttt-lm-pytorch

Allen-piexl / JailbreakZoo

zjunlp / KnowledgeEditingPapers

SORRY-Bench / sorry-bench

LHXXHB / PseudoCal

OODRobustBench / OODRobustBench

stanfordmlgroup / ManyICL

harveyhuang18 / EMR_Merging

tim-learn / UEO

mrflogs / CraFT

shuolucs / Awesome-Out-Of-Distribution-Detection

yuweihao / MambaOut

centerforaisafety / wmdp