Skip to content
This repository has been archived by the owner on Jul 18, 2024. It is now read-only.
/ EAD-Attack Public archive

Codes for reproducing the white-box adversarial attacks in “EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples,” AAAI 2018

License

Notifications You must be signed in to change notification settings

IBM/EAD-Attack

Repository files navigation

EAD: Elastic-Net Attacks to Deep Neural Networks

EAD is a elastic-net attack to deep neural networks (DNNs).
We propose formulating the attack process as a elastic-net regularized optimization problem, featuring an attack which produces L1-oriented adversarial examples which includes the state-of-the-art L2 attack (C&W) as a special case.

Experimental results on MNIST, CIFAR-10, and ImageNet show that EAD yields a distinct set of adversarial examples and attains similar attack performance to state-of-the-art methods in different attack scenarios. More importantly, EAD leads to improved attack transferability and complements adversarial training for DNNs, suggesting novel insights on leveraging L1 distortion in generating robust adversarial examples.

For more details, please see our paper:

EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples by Yash Sharma*, Pin-Yu Chen*, Huan Zhang, Jinfeng Yi, Cho-Jui Hsieh (AAAI 2018)

* Equal contribution

The attack has also been used in the following works (incomplete):

Attacking the Madry Defense Model with L1-based Adversarial Examples by Yash Sharma, Pin-Yu Chen (ICLR 2018 Workshop)

On the Limitation of Local Intrinsic Dimensionality for Characterizing the Subspaces of Adversarial Examples by Pei-Hsuan Lu, Pin-Yu Chen, Chia-Mu Yu (ICLR 2018 Workshop)

Bypassing Feature Squeezing by Increasing Adversary Strength by Yash Sharma, Pin-Yu Chen

On the Limitation of MagNet Defense against L1-based Adversarial Examples by Pei-Hsuan Lu, Pin-Yu Chen, Kang-Cheng Chen, Chia-Mu Yu (IEEE/IFIP DSN 2018 Workshop)

The algorithm has also been repurposed for generating constrastive explanations in:

Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives by Amit Dhurandhar, Pin-Yu Chen, Ronny Luss, Chun-Chen Tu, Paishun Ting, Karthikeyan Shanmugam and Payel Das (NIPS 2018)

The experiment code is based on Carlini and Wagner's L2 attack.
The attack can also be found in the Cleverhans Repository.

Setup and train models

The code is tested with python3 and TensorFlow v1.2 and v1.3. The following packages are required:

sudo apt-get install python3-pip
sudo pip3 install --upgrade pip
sudo pip3 install pillow scipy numpy tensorflow-gpu keras h5py

Prepare the MNIST and CIFAR-10 data and models for attack:

python3 train_models.py

To download the inception model (inception_v3_2016_08_28.tar.gz):

python3 setup_inception.py

To prepare the ImageNet dataset, download and unzip the following archive:

ImageNet Test Set

and put the imgs folder in ../imagesnetdata. This path can be changed in setup_inception.py.

Train defensively distilled models

Train defensively distilled MNIST and CIFAR-10 models with temperature varying from 1 to 100:

python3 train_models.py -dd

Train defensively distilled MNIST and CIFAR-10 models under specified temperatures:

python3 train_models.py -dd -t 1 10 100

Run attacks

A unified attack interface, test_attack.py is provided. Run python3 test_attack.py -h to get a list of arguments and help. Note the default values provided as well.

To generate best-case, average-case, and worst-case statistics, add "-tg 9" to command.

For computational efficiency, maximize the batch size and fix the 'initial_constant' to a large value, setting the number of binary search steps to 1.

The following are some examples of attacks:

Run the L1-oriented attack on the Inception model with 100 ImageNet images

python3 test_attack.py -a L1 -d imagenet -n 100

Run the EN-oriented attack on the defensively distilled (T=100) CIFAR model with 1000 images

python3 test_attack.py -d cifar -tp 100

Save original and adversarial images in the saves directory

python3 test_attack.py -sh

Generate adversarial images on undefended MNIST model with confidence (50), attack defensively distilled (T=100) MNIST model

python3 test_attack.py -cf 50 -tm dd_100

Adversarial Training

Adversarially train MNIST models by augmenting the training set with L2, EAD(L1), EAD(EN), L2+EAD(L1), and L2+EAD(EN)-based examples, respectively. This will use the provided numpy save files in the train directory.

python3 train_models.py -d mnist -a

Generate and save your own training set examples for use in adversarial training (ex - L1-oriented attack)

python3 test_attack.py -a L1 -sn -tr

Now, attack an adversarially trained model (ex - L1-trained network)

python3 test_attack.py -adv l1

About

Codes for reproducing the white-box adversarial attacks in “EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples,” AAAI 2018

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages