Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
rjha18 committed Jan 4, 2024
1 parent 05aeae7 commit c7cf68f
Showing 1 changed file with 5 additions and 4 deletions.
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,22 @@
# FLIP
## tl;dr
Official implementation of [FLIP](https://arxiv.org/abs/2310.18933), presented at [NeurIPS 2023](https://neurips.cc/virtual/2023/poster/70392). The implementation is a cleaned-up 'fork' of the [backdoor-suite](https://github.com/SewoongLab/backdoor-suite). A more complete (messy) version is available upon request.
Official implementation of [FLIP](https://arxiv.org/abs/2310.18933), presented at [NeurIPS 2023](https://neurips.cc/virtual/2023/poster/70392). The implementation is a cleaned-up 'fork' of the [backdoor-suite](https://github.com/SewoongLab/backdoor-suite). Precomputed labels for our main table are available [here](https://github.com/SewoongLab/FLIP/releases/). More details are available in the paper. A more complete (messy) version of the code is available upon request.

**Authors:** [Rishi D. Jha\*](http:https://rishijha.com/), Jonathan Hayase\*, Sewoong Oh

---
## Abstract
In a backdoor attack, an adversary injects corrupted data into a model's training dataset in order to gain control over its predictions on images with a specific attacker-defined trigger. A typical corrupted training example requires altering both the image, by applying the trigger, and the label. Models trained on clean images, therefore, were considered safe from backdoor attacks.
However, in some common machine learning scenarios, the training labels are provided by potentially malicious third-parties. This includes crowd-sourced annotation and knowledge distillation. We, hence, investigate a fundamental question: can we launch a successful backdoor attack by only corrupting labels? We introduce a novel approach to design label-only backdoor attacks, which we call FLIP, and demonstrate its strengths on three datasets (CIFAR-10, CIFAR-100, and Tiny-ImageNet) and four architectures (ResNet-32, ResNet-18, VGG-19, and Vision Transformer). With only 2\% of CIFAR-10 labels corrupted, FLIP achieves a near-perfect attack success rate of $99.4\%$ while suffering only a $1.8\%$ drop in the clean test accuracy. Our approach builds upon the recent advances in trajectory matching, originally introduced for dataset distillation.
In a backdoor attack, an adversary injects corrupted data into a model's training dataset in order to gain control over its predictions on images with a specific attacker-defined trigger. A typical corrupted training example requires altering both the image, by applying the trigger, and the label. Models trained on clean images, therefore, were considered safe from backdoor attacks. However, in some common machine learning scenarios, the training labels are provided by potentially malicious third-parties. This includes crowd-sourced annotation and knowledge distillation. We, hence, investigate a fundamental question: can we launch a successful backdoor attack by only corrupting labels? We introduce a novel approach to design label-only backdoor attacks, which we call FLIP, and demonstrate its strengths on three datasets (CIFAR-10, CIFAR-100, and Tiny-ImageNet) and four architectures (ResNet-32, ResNet-18, VGG-19, and Vision Transformer). With only 2\% of CIFAR-10 labels corrupted, FLIP achieves a near-perfect attack success rate of $99.4\%$ while suffering only a $1.8\%$ drop in the clean test accuracy. Our approach builds upon the recent advances in trajectory matching, originally introduced for dataset distillation.

![Diagram of algorithm.](/img/flip.png)

---

## In this repo

This repo is split into four main folders: `experiments`, `modules`, `precomputed`, and `schemas`. The `experiments` folder (as described in more detail [here](#installation)) contains subfolders and `.toml` configuration files on which an experiment may be run. The `modules` folder stores source code for each of the subsequent part of an experiment. These modules take in specific inputs and outputs as defined by their subseqeunt `.toml` documentation in the `schemas` folder. Each module refers to a step of the FLIP algorithm. Finally in the `precomputed` folder, precomputed labels used for the main table of our paper are provided for analysis.
This repo is split into three main folders: `experiments`, `modules`, and `schemas`. The `experiments` folder (as described in more detail [here](#installation)) contains subfolders and `.toml` configuration files on which an experiment may be run. The `modules` folder stores source code for each of the subsequent part of an experiment. These modules take in specific inputs and outputs as defined by their subseqeunt `.toml` documentation in the `schemas` folder. Each module refers to a step of the FLIP algorithm.

Additionally, in the `precomputed` release, labels used for the main table of our paper are provided for analysis.

Please don't hesitate to file a GitHub issue or reach out for any issues or requests!

Expand Down

0 comments on commit c7cf68f

Please sign in to comment.