This repository is for NeurIPS 2018 spotlight paper Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples.
Update: for Pytorch implementation, please refer to src/pytorch
.
- Please download VGG-Face caffe model from here.
- Unzip the model under
data/
folder.
In attribute_mutation.ipynb
, attribute-substituted and attribute-preserved images are produced for the base image. Four attributes are encoded with indices from 0 to 3. See the following table for details. Please use attributes[index]
for corresponding attributes.
Attribute | Index |
---|---|
left eye | 0 |
right eye | 1 |
nose | 2 |
mouth | 3 |
Two actions are also encoded with indices, which is listed in the following table. Please use actions[index]
for corresponding actions.
Action | Index |
---|---|
substitution | 0 |
preservation | 1 |
Generated images are saved in folder data/attribute_mutated/[attribute]_[action]/
.
Attribute witnesses are extracted layer by layer based on attribute-substituted and attribute-preserved images. Please find the implementation in witness_extraction.ipynb
. Extracted witnesses are saved in folder data/witnesses/
.
With extracted attribute witnesses, neuron weakening and strengthening are applied for each input during execution. Adversary detection is achieved by observing the final prediction from attribute-steered model comparing to the original model. Detailed implementation is in adversary_detection.ipynb
.
7 adversarial attacks are included in folder data/attacks
. Please change attack_path
in the code to test on different attacks.
Please cite for any purpose of usage.
@inproceedings{NeurIPS2018_7998,
title={Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples},
author={Tao, Guanhong and Ma, Shiqing and Liu, Yingqi and Zhang, Xiangyu},
booktitle={Advances in Neural Information Processing Systems 31},
pages = {7728--7739},
year={2018}
}