We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.

Python 239 28 Updated Feb 23, 2024

ThuCCSLab / FigStep

Jailbreaking Large Vision-language Models via Typographic Visual Prompts

Python 85 6 Updated May 4, 2024

xinleihe / toxic-prompt

Python 18 3 Updated Nov 20, 2023

liuyugeng / baadd

Code for Backdoor Attacks Against Dataset Distillation

Python 29 4 Updated Apr 19, 2023

BayesWatch / cinic-10

A drop-in replacement for CIFAR-10.

Jupyter Notebook 235 22 Updated Mar 7, 2021

tianshuocong / TePA

[S&P'24] Test-Time Poisoning Attacks Against Test-Time Adaptation Models

Python 16 1 Updated Feb 2, 2024

YJiangcm / Lion

Code for "Lion: Adversarial Distillation of Proprietary Large Language Models (EMNLP 2023)"

Python 201 19 Updated Feb 11, 2024

xinleihe / MGTBench

Python 141 14 Updated Aug 8, 2024

boz083 / Plot_Steal

Python 8 Updated Feb 26, 2023

TrustAIResearch / MLHospital

Python 43 6 Updated Apr 25, 2023

IntelLabs / MART

Modular Adversarial Robustness Toolkit

Python 17 Updated Jun 6, 2024

huyvnphan / PyTorch_CIFAR10

Pretrained TorchVision models on CIFAR10 dataset (with weights)

Python 645 155 Updated Jun 24, 2023

ZhengyuZhao / TransferAttackEval

Revisiting Transferable Adversarial Images (arXiv)

Python 113 10 Updated Oct 9, 2024

ZhengyuZhao / Targeted-Transfer

Simple yet effective targeted transferable attack (NeurIPS 2021)

Python 47 7 Updated Nov 17, 2022

jmiemirza / DUA

The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization (CVPR 2022)

Python 53 4 Updated Feb 2, 2023

liuyugeng / ML-Doctor

Code for ML Doctor

Python 86 23 Updated Aug 14, 2024

Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.

Python 28,347 3,383 Updated Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tianshuo Cong tianshuocong

Achievements

Achievements

Highlights

Block or report tianshuocong

Stars

TrustAIRLab / Comprehensive_Jailbreak_Assessment

ThuCCSLab / JailbreakEval

jbhuang0604 / awesome-tips

jiaxiaojunQAQ / LAS-AT

ys-zong / VLGuard

TrustAIRLab / easy-bib

Libr-AI / do-not-answer

anthropics / hh-rlhf

ZhengyuZhao / Adi-Red-Scene

ZhengyuZhao / AI-Security-and-Privacy-Events

EleutherAI / lm-evaluation-harness

chichidd / RAI2

ThuCCSLab / Awesome-LM-SSP

LLM-Tuning-Safety / LLMs-Finetuning-Safety