Axiomatic Attribution for Deep Networks

Mukund Sundararajan; Ankur Taly; Qiqi Yan

Axiomatic Attribution for Deep Networks

Mukund Sundararajan, Ankur Taly, Qiqi Yan

Proceedings of the 34th International Conference on Machine Learning, PMLR 70:3319-3328, 2017.

Abstract

We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms—Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.

Cite this Paper

BibTeX


@InProceedings{pmlr-v70-sundararajan17a,
  title = 	 {Axiomatic Attribution for Deep Networks},
  author =       {Mukund Sundararajan and Ankur Taly and Qiqi Yan},
  booktitle = 	 {Proceedings of the 34th International Conference on Machine Learning},
  pages = 	 {3319--3328},
  year = 	 {2017},
  editor = 	 {Precup, Doina and Teh, Yee Whye},
  volume = 	 {70},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--11 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v70/sundararajan17a/sundararajan17a.pdf},
  url = 	 {https://proceedings.mlr.press/v70/sundararajan17a.html},
  abstract = 	 {We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms—Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.}
}

Endnote

%0 Conference Paper
%T Axiomatic Attribution for Deep Networks
%A Mukund Sundararajan
%A Ankur Taly
%A Qiqi Yan
%B Proceedings of the 34th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2017
%E Doina Precup
%E Yee Whye Teh	
%F pmlr-v70-sundararajan17a
%I PMLR
%P 3319--3328
%U https://proceedings.mlr.press/v70/sundararajan17a.html
%V 70
%X We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms—Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.

APA


Sundararajan, M., Taly, A. & Yan, Q.. (2017). Axiomatic Attribution for Deep Networks. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:3319-3328 Available from https://proceedings.mlr.press/v70/sundararajan17a.html.

Related Material

Download PDF