Skip to content
View tianshuocong's full-sized avatar
🏠
Working from home
🏠
Working from home

Highlights

  • Pro

Block or report tianshuocong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A collection of automated evaluators for assessing jailbreak attempts.

Python 72 9 Updated Jun 26, 2024

Code for LAS-AT: Adversarial Training with Learnable Attack Strategy (CVPR2022)

Python 105 10 Updated Mar 30, 2022

[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.

Python 45 1 Updated Aug 21, 2024
TeX 5 1 Updated Mar 9, 2024

Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs

Jupyter Notebook 180 24 Updated Jun 7, 2024

Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"

1,614 127 Updated Sep 19, 2023

Local Discriminative Regions for Scene Recognition (ACMMM 2018)

Python 22 3 Updated Oct 3, 2023

A curated list of academic events on AI Security & Privacy

135 15 Updated Aug 22, 2024

A framework for few-shot evaluation of language models.

Python 6,925 1,851 Updated Nov 9, 2024
Jupyter Notebook 9 2 Updated Feb 15, 2023

A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).

921 60 Updated Nov 7, 2024

We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.

Python 239 28 Updated Feb 23, 2024

Jailbreaking Large Vision-language Models via Typographic Visual Prompts

Python 85 6 Updated May 4, 2024
Python 18 3 Updated Nov 20, 2023

Code for Backdoor Attacks Against Dataset Distillation

Python 29 4 Updated Apr 19, 2023

A drop-in replacement for CIFAR-10.

Jupyter Notebook 235 22 Updated Mar 7, 2021

[S&P'24] Test-Time Poisoning Attacks Against Test-Time Adaptation Models

Python 16 1 Updated Feb 2, 2024

Code for "Lion: Adversarial Distillation of Proprietary Large Language Models (EMNLP 2023)"

Python 201 19 Updated Feb 11, 2024
Python 141 14 Updated Aug 8, 2024
Python 8 Updated Feb 26, 2023
Python 43 6 Updated Apr 25, 2023

Modular Adversarial Robustness Toolkit

Python 17 Updated Jun 6, 2024

Pretrained TorchVision models on CIFAR10 dataset (with weights)

Python 645 155 Updated Jun 24, 2023

Revisiting Transferable Adversarial Images (arXiv)

Python 113 10 Updated Oct 9, 2024

Simple yet effective targeted transferable attack (NeurIPS 2021)

Python 47 7 Updated Nov 17, 2022

The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization (CVPR 2022)

Python 53 4 Updated Feb 2, 2023

Code for ML Doctor

Python 86 23 Updated Aug 14, 2024

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.

Python 28,347 3,383 Updated Nov 8, 2024
Next