Adversarial explanations for understanding image classification decisions and improved neural network robustness

Woods, Walt; Chen, Jack; Teuscher, Christof

doi:10.1038/s42256-019-0104-6

Article
Published: 04 November 2019

Adversarial explanations for understanding image classification decisions and improved neural network robustness

Nature Machine Intelligence volume 1, pages 508–516 (2019)Cite this article

2312 Accesses
26 Citations
16 Altmetric
Metrics details

Subjects

A preprint version of the article is available at arXiv.

Abstract

For sensitive problems, such as medical imaging or fraud detection, neural network (NN) adoption has been slow due to concerns about their reliability, leading to a number of algorithms for explaining their decisions. NNs have also been found to be vulnerable to a class of imperceptible attacks, called adversarial examples, which arbitrarily alter the output of the network. Here we demonstrate both that these attacks can invalidate previous attempts to explain the decisions of NNs, and that with very robust networks, the attacks themselves may be leveraged as explanations with greater fidelity to the model. We also show that the introduction of a novel regularization technique inspired by the Lipschitz constraint, alongside other proposed improvements including a half-Huber activation function, greatly improves the resistance of NNs to adversarial examples. On the ImageNet classification task, we demonstrate a network with an accuracy-robustness area (ARA) of 0.0053, an ARA 2.4 times greater than the previous state-of-the-art value. Improving the mechanisms by which NN decisions are understood is an important direction for both establishing trust in sensitive domains and learning more about the stimuli to which NNs respond.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Comparing explanatory power between Grad-CAM and AEs when applied to a robust NN trained on CIFAR-10.**

**Fig. 2: Illustration showing how AEs might improve trust in a medical NN’s decision.**

**Fig. 3: Comparison of different state-of-the-art, robust NNs.**

**Fig. 4: Demonstration of different AEs for a car classification problem, as computed on four different NNs trained on the CIFAR-10 dataset.**

**Fig. 5: Different explanation techniques using ρ = 0.075 with an NN trained on the COCO dataset.**

**Fig. 6: Demonstration of the limited utility of attack ARA when considering if an NN has learned salient features.**

**Fig. 7: Illustration of the benefits of noisy training, even with a Lipschitz regularization.**

E pluribus unum interpretable convolutional neural networks

Article Open access 14 July 2023

George Dimas, Eirini Cholopoulou & Dimitris K. Iakovidis

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

Article 13 May 2019

Cynthia Rudin

Understanding adversarial examples requires a theory of artefacts for deep learning

Article 23 November 2020

Cameron Buckner

Data availability

All data used in this work, including the CIFAR-10⁴², ILSVRC 2012³³, JSRT⁴⁴, and the COCO⁴⁵ datasets are freely available.

Code availability

A reference implementation of the techniques presented throughout this work, applied to the CIFAR-10 dataset, can be found at https://github.com/wwoods/adversarial-explanations-cifar.

References

Finlayson, S. G. et al. Adversarial attacks on medical machine learning. Science 363, 1287–1289 (2019).
Article Google Scholar
Stilgoe, J. Machine learning, social learning and the governance of self-driving cars. Soc. Stud. Sci. 48, 25–56 (2018).
Article Google Scholar
Tsao, H.-Y., Chan, P.-Y. & Su, E. C.-Y. Predicting diabetic retinopathy and identifying interpretable biomedical features using machine learning algorithms. BMC Bioinform. 19, 283 (2018).
Article Google Scholar
Szegedy, C. et al. Intriguing properties of neural networks. Preprint at https://arxiv.org/abs/1312.6199 (2013).
Papernot, N., McDaniel, P. & Goodfellow, I. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. Preprint at https://arxiv.org/abs/1605.07277 (2016).
Selvaraju, R. R. et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision 618–626 (IEEE, 2017).
Ribeiro, M. T., Singh, S. & Guestrin, C. Why should I trust you? Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1135–1144 (ACM, 2016).
Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. Preprint at https://arxiv.org/abs/1312.6034 (2013).
Landecker, W. Interpretable Machine Learning and Sparse Coding for Computer Vision. PhD thesis, Portland State Univ. (2014).
Bau, D., Zhou, B., Khosla, A., Oliva, A. & Torralba, A. Network dissection: quantifying interpretability of deep visual representations. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 6541–6549 (IEEE, 2017).
Bau, D. et al. Visualizing and understanding generative adversarial networks. In International Conference on Learning Representations (ICLR, 2019).
Murdoch, W. J., Singh, C., Kumbier, K., Abbasi-Asl, R. & Yu, B. Interpretable machine learning: definitions, methods, and applications. Preprint at https://arxiv.org/abs/1901.04592 (2019).
Hong, S., You, T., Kwak, S. & Han, B. Online tracking by learning discriminative saliency map with convolutional neural network. In International Conference on Machine Learning 597–606 (ICML, 2015).
Zeiler, M. D. & Fergus, R. Visualizing and understanding convolutional networks. In European Conference on Computer Vision 818–833 (Springer, 2014).
Olshausen, B. A. & Field, D. J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996).
Article Google Scholar
Luo, W., Li, Y., Urtasun, R. & Zemel, R. Understanding the effective receptive field in deep convolutional neural networks. In Advances in Neural Information Processing Systems 4898–4906 (NIPS, 2016).
Jetley, S., Lord, N. A., Lee, N. & Torr, P. Learn to pay attention. In International Conference on Learning Representations (ICLR, 2018).
Li, K., Wu, Z., Peng, K.-C., Ernst, J. & Fu, Y. Tell me where to look: guided attention inference network. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 9215–9223 (IEEE, 2018).
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems 5998–6008 (NIPS, 2017).
Cui, Y. et al. Attention-over-attention neural networks for reading comprehension. In Proc. 55th Annual Meeting of the Association for Computational Linguistics Vol. 1, 593–602 (ACL, 2017).
Ghorbani, A., Abid, A. & Zou, J. Interpretation of neural networks is fragile. Proc. 33rd AAAI Conference on Artificial Intelligence 3681–3688 (AAAI, 2019).
Athalye, A., Engstrom, L., Ilyas, A. & Kwok, K. Synthesizing robust adversarial examples. In Proc. 35th International Conference on Machine Learning (eds Dy, J. & Krause, A.) 284–293 (PMLR, 2018).
Goodfellow, I., Shlens, J. & Szegedy, C. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR, 2015).
Chakraborty, A., Alam, M., Dey, V., Chattopadhyay, A. & Mukhopadhyay, D. Adversarial attacks and defences: a survey. Preprint at https://arxiv.org/abs/1810.00069 (2018).
Athalye, A., Carlini, N. & Wagner, D. Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In Proc. 35th International Conference on Machine Learning (eds Dy, J. & Krause, A.) 274–283 (PMLR, 2018).
Khoury, M. & Hadfield-Menell, D. On the geometry of adversarial examples. Preprint at https://arxiv.org/abs/1811.00525 (2018).
Tsipras, D., Santurkar, S., Engstrom, L., Turner, A. & Madry, A. Robustness may be at odds with accuracy. In International Conference on Learning Representations (ICLR, 2019).
Stutz, D., Hein, M. & Schiele, B. Disentangling adversarial robustness and generalization. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 6976–6987 (IEEE, 2019).
Weng, T.-W. et al. Evaluating the robustness of neural networks: an extreme value theory approach. In International Conference on Learning Representations (ICLR, 2018).
Cisse, M., Bojanowski, P., Grave, E., Dauphin, Y. & Usunier, N. Parseval networks: improving robustness to adversarial examples. In Proc. 34th International Conference on Machine Learning 854–863 (JMLR, 2017).
Behrmann, J., Grathwohl, W., Chen, R. T. Q., Duvenaud, D. & Jacobsen, J.-H. Invertible residual networks. In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. & Salakhutdinov, R.) 573–582 (PMLR, 2019).
Madry, A., Makelov, A., Schmidt, L., Tsipras, D. & Vladu, A. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations (ICLR, 2018).
Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vision (IJCV) 115, 211–252 (2015).
Article MathSciNet Google Scholar
Carlini, N. & Wagner, D. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy 39–57 (IEEE, 2017).
Pei, K., Cao, Y., Yang, J. & Jana, S. Deepxplore: Automated whitebox testing of deep learning systems. In Proc. 26th Symposium on Operating Systems Principles 1–18 (ACM, 2017).
Cohen, J., Rosenfeld, E. & Kolter, Z. Certified adversarial robustness via randomized smoothing. In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. & Salakhutdinov, R.) 1310–1320 (PMLR, 2019).
Liao, F. et al. Defense against adversarial attacks using high-level representation guided denoiser. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 1778–1787 (IEEE, 2018).
Kurakin, A. et al. Adversarial attacks and defences competition. In The NIPS’17 Competition: Building Intelligent Systems 195–231 (Springer, 2018).
Tramèr, F. et al. Ensemble adversarial training: attacks and defenses. In International Conference on Learning Representations (ICLR, 2018).
Wong, E., Schmidt, F., Metzen, J. H. & Kolter, J. Z. Scaling provable adversarial defenses. In Advances in Neural Information Processing Systems 8400–8409 (NIPS, 2018).
Su, D. et al. Is robustness the cost of accuracy? A comprehensive study on the robustness of 18 deep image classification models. In European Conference on Computer Vision 631–648 (Springer, 2018).
Krizhevsky, A. & Hinton, G. Learning Multiple Layers of Features from Tiny Images. Technical report, Univ. Toronto (2009).
Rony, J. et al. Decoupling direction and norm for efficient gradient-based L2 adversarial attacks and defenses. In The IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2019).
Shiraishi, J. et al. Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists’ detection of pulmonary nodules. Am. J. Roentgenol. 174, 71–74 (2000).
Article Google Scholar
Lin, T.-Y. et al. Microsoft COCO: common objects in context. In European Conference on Computer Vision 740–755 (Springer, 2014).
Carlini, N. et al. On evaluating adversarial robustness. Preprint at https://arxiv.org/abs/1902.06705 (2019).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).
He, K., Zhang, X., Ren, S. & Sun, J. Identity mappings in deep residual networks. In European Conference on Computer Vision 630–645 (Springer, 2016).
He, T. et al. Bag of tricks for image classification with convolutional neural networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 558–567 (IEEE, 2019).
Paszke, A. et al. Automatic differentiation in PyTorch. In Proc. 31st Conference on Neural Information Processing Systems (NIPS, 2017).
Yamada, Y., Iwamura, M., Akiba, T. & Kise, K. Shakedrop regularization for deep residual learning. Preprint at https://arxiv.org/abs/1802.02375 (2018).
Huang, G., Sun, Y., Liu, Z., Sedra, D. & Weinberger, K. Q. Deep networks with stochastic depth. In European Conference on Computer Vision 646–661 (Springer, 2016).

Download references

Acknowledgements

This work was supported in part by the Center for Brain-Inspired Computing (C-BRIC), one of six centres in the Joint University Microelectronics Program (JUMP), a Semiconductor Research Corporation (SRC) programme sponsored by the Defense Advanced Research Projects Agency (DARPA). W.W. acknowledges additional funding from Defense Threat Reduction Agency (DTRA) (award no. HDTRA1-18-1-0009). J.C. acknowledges funding from the Maseeh College of Engineering & Computer Science’s Undergraduate Research and Mentoring Program and the SRC Education Alliance (award no. 2009-UR-2032G). W.W. and J.C. received funding from F. Maseeh. We thank A. Madry^27,32 and J. Cohen³⁶ for helpful discussions and clarifications about their work. We thank FuR and A. Parise for assisting with the collection of photos for the examples throughout this work.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Portland State University, Portland, OR, USA
Walt Woods, Jack Chen & Christof Teuscher

Authors

Walt Woods
View author publications
You can also search for this author in PubMed Google Scholar
Jack Chen
View author publications
You can also search for this author in PubMed Google Scholar
Christof Teuscher
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

W.W. contributed the original idea, algorithms, experimental design, ablation studies, some active learning annotations and wrote the majority of the paper. J.C. conducted LIME and Grad-CAM integrations, annotated the majority of the active learning annotations, provided text for the active learning sections of the paper and contributed editing support. C.T. provided scope advisement, editing support and funding for the work.

Corresponding authors

Correspondence to Walt Woods, Jack Chen or Christof Teuscher.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary figures and methods.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Woods, W., Chen, J. & Teuscher, C. Adversarial explanations for understanding image classification decisions and improved neural network robustness. Nat Mach Intell 1, 508–516 (2019). https://doi.org/10.1038/s42256-019-0104-6

Download citation

Received: 11 May 2019
Accepted: 23 September 2019
Published: 04 November 2019
Issue Date: November 2019
DOI: https://doi.org/10.1038/s42256-019-0104-6

This article is cited by

A-XAI: adversarial machine learning for trustable explainability
- Nishita Agrawal
- Isha Pendharkar
- Ketan Kotecha
AI and Ethics (2024)
Autonomous vehicles decision-making enhancement using self-determination theory and mixed-precision neural networks
- Mohammed Hasan Ali
- Mustafa Musa Jaber
- P. Punitha
Multimedia Tools and Applications (2023)
Countering Malicious DeepFakes: Survey, Battleground, and Horizon
- Felix Juefei-Xu
- Run Wang
- Yang Liu
International Journal of Computer Vision (2022)
Adversarial radiomics: the rising of potential risks in medical imaging from adversarial learning
- Andrea Barucci
- Emanuele Neri
European Journal of Nuclear Medicine and Molecular Imaging (2020)
Bringing robustness against adversarial attacks
- Gean T. Pereira
- André C. P. L. F. de Carvalho
Nature Machine Intelligence (2019)