-
Tsinghua University
- Beijing, China
-
02:51
(UTC +08:00) - https://tianshuocong.github.io/
Highlights
- Pro
Stars
A collection of automated evaluators for assessing jailbreak attempts.
Code for LAS-AT: Adversarial Training with Learnable Attack Strategy (CVPR2022)
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
Local Discriminative Regions for Scene Recognition (ACMMM 2018)
A curated list of academic events on AI Security & Privacy
A framework for few-shot evaluation of language models.
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.
Jailbreaking Large Vision-language Models via Typographic Visual Prompts
Code for Backdoor Attacks Against Dataset Distillation
A drop-in replacement for CIFAR-10.
[S&P'24] Test-Time Poisoning Attacks Against Test-Time Adaptation Models
Code for "Lion: Adversarial Distillation of Proprietary Large Language Models (EMNLP 2023)"
Pretrained TorchVision models on CIFAR10 dataset (with weights)
Revisiting Transferable Adversarial Images (arXiv)
Simple yet effective targeted transferable attack (NeurIPS 2021)
The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization (CVPR 2022)
Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.