Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Adversarial explanations for understanding image classification decisions and improved neural network robustness

A preprint version of the article is available at arXiv.

Abstract

For sensitive problems, such as medical imaging or fraud detection, neural network (NN) adoption has been slow due to concerns about their reliability, leading to a number of algorithms for explaining their decisions. NNs have also been found to be vulnerable to a class of imperceptible attacks, called adversarial examples, which arbitrarily alter the output of the network. Here we demonstrate both that these attacks can invalidate previous attempts to explain the decisions of NNs, and that with very robust networks, the attacks themselves may be leveraged as explanations with greater fidelity to the model. We also show that the introduction of a novel regularization technique inspired by the Lipschitz constraint, alongside other proposed improvements including a half-Huber activation function, greatly improves the resistance of NNs to adversarial examples. On the ImageNet classification task, we demonstrate a network with an accuracy-robustness area (ARA) of 0.0053, an ARA 2.4 times greater than the previous state-of-the-art value. Improving the mechanisms by which NN decisions are understood is an important direction for both establishing trust in sensitive domains and learning more about the stimuli to which NNs respond.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Comparing explanatory power between Grad-CAM and AEs when applied to a robust NN trained on CIFAR-10.
Fig. 2: Illustration showing how AEs might improve trust in a medical NN’s decision.
Fig. 3: Comparison of different state-of-the-art, robust NNs.
Fig. 4: Demonstration of different AEs for a car classification problem, as computed on four different NNs trained on the CIFAR-10 dataset.
Fig. 5: Different explanation techniques using ρ = 0.075 with an NN trained on the COCO dataset.
Fig. 6: Demonstration of the limited utility of attack ARA when considering if an NN has learned salient features.
Fig. 7: Illustration of the benefits of noisy training, even with a Lipschitz regularization.

Similar content being viewed by others

Data availability

All data used in this work, including the CIFAR-1042, ILSVRC 201233, JSRT44, and the COCO45 datasets are freely available.

Code availability

A reference implementation of the techniques presented throughout this work, applied to the CIFAR-10 dataset, can be found at https://github.com/wwoods/adversarial-explanations-cifar.

References

  1. Finlayson, S. G. et al. Adversarial attacks on medical machine learning. Science 363, 1287–1289 (2019).

    Article  Google Scholar 

  2. Stilgoe, J. Machine learning, social learning and the governance of self-driving cars. Soc. Stud. Sci. 48, 25–56 (2018).

    Article  Google Scholar 

  3. Tsao, H.-Y., Chan, P.-Y. & Su, E. C.-Y. Predicting diabetic retinopathy and identifying interpretable biomedical features using machine learning algorithms. BMC Bioinform. 19, 283 (2018).

    Article  Google Scholar 

  4. Szegedy, C. et al. Intriguing properties of neural networks. Preprint at https://arxiv.org/abs/1312.6199 (2013).

  5. Papernot, N., McDaniel, P. & Goodfellow, I. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. Preprint at https://arxiv.org/abs/1605.07277 (2016).

  6. Selvaraju, R. R. et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision 618–626 (IEEE, 2017).

  7. Ribeiro, M. T., Singh, S. & Guestrin, C. Why should I trust you? Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1135–1144 (ACM, 2016).

  8. Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. Preprint at https://arxiv.org/abs/1312.6034 (2013).

  9. Landecker, W. Interpretable Machine Learning and Sparse Coding for Computer Vision. PhD thesis, Portland State Univ. (2014).

  10. Bau, D., Zhou, B., Khosla, A., Oliva, A. & Torralba, A. Network dissection: quantifying interpretability of deep visual representations. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 6541–6549 (IEEE, 2017).

  11. Bau, D. et al. Visualizing and understanding generative adversarial networks. In International Conference on Learning Representations (ICLR, 2019).

  12. Murdoch, W. J., Singh, C., Kumbier, K., Abbasi-Asl, R. & Yu, B. Interpretable machine learning: definitions, methods, and applications. Preprint at https://arxiv.org/abs/1901.04592 (2019).

  13. Hong, S., You, T., Kwak, S. & Han, B. Online tracking by learning discriminative saliency map with convolutional neural network. In International Conference on Machine Learning 597–606 (ICML, 2015).

  14. Zeiler, M. D. & Fergus, R. Visualizing and understanding convolutional networks. In European Conference on Computer Vision 818–833 (Springer, 2014).

  15. Olshausen, B. A. & Field, D. J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996).

    Article  Google Scholar 

  16. Luo, W., Li, Y., Urtasun, R. & Zemel, R. Understanding the effective receptive field in deep convolutional neural networks. In Advances in Neural Information Processing Systems 4898–4906 (NIPS, 2016).

  17. Jetley, S., Lord, N. A., Lee, N. & Torr, P. Learn to pay attention. In International Conference on Learning Representations (ICLR, 2018).

  18. Li, K., Wu, Z., Peng, K.-C., Ernst, J. & Fu, Y. Tell me where to look: guided attention inference network. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 9215–9223 (IEEE, 2018).

  19. Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems 5998–6008 (NIPS, 2017).

  20. Cui, Y. et al. Attention-over-attention neural networks for reading comprehension. In Proc. 55th Annual Meeting of the Association for Computational Linguistics Vol. 1, 593–602 (ACL, 2017).

  21. Ghorbani, A., Abid, A. & Zou, J. Interpretation of neural networks is fragile. Proc. 33rd AAAI Conference on Artificial Intelligence 3681–3688 (AAAI, 2019).

  22. Athalye, A., Engstrom, L., Ilyas, A. & Kwok, K. Synthesizing robust adversarial examples. In Proc. 35th International Conference on Machine Learning (eds Dy, J. & Krause, A.) 284–293 (PMLR, 2018).

  23. Goodfellow, I., Shlens, J. & Szegedy, C. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR, 2015).

  24. Chakraborty, A., Alam, M., Dey, V., Chattopadhyay, A. & Mukhopadhyay, D. Adversarial attacks and defences: a survey. Preprint at https://arxiv.org/abs/1810.00069 (2018).

  25. Athalye, A., Carlini, N. & Wagner, D. Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In Proc. 35th International Conference on Machine Learning (eds Dy, J. & Krause, A.) 274–283 (PMLR, 2018).

  26. Khoury, M. & Hadfield-Menell, D. On the geometry of adversarial examples. Preprint at https://arxiv.org/abs/1811.00525 (2018).

  27. Tsipras, D., Santurkar, S., Engstrom, L., Turner, A. & Madry, A. Robustness may be at odds with accuracy. In International Conference on Learning Representations (ICLR, 2019).

  28. Stutz, D., Hein, M. & Schiele, B. Disentangling adversarial robustness and generalization. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 6976–6987 (IEEE, 2019).

  29. Weng, T.-W. et al. Evaluating the robustness of neural networks: an extreme value theory approach. In International Conference on Learning Representations (ICLR, 2018).

  30. Cisse, M., Bojanowski, P., Grave, E., Dauphin, Y. & Usunier, N. Parseval networks: improving robustness to adversarial examples. In Proc. 34th International Conference on Machine Learning 854–863 (JMLR, 2017).

  31. Behrmann, J., Grathwohl, W., Chen, R. T. Q., Duvenaud, D. & Jacobsen, J.-H. Invertible residual networks. In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. & Salakhutdinov, R.) 573–582 (PMLR, 2019).

  32. Madry, A., Makelov, A., Schmidt, L., Tsipras, D. & Vladu, A. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations (ICLR, 2018).

  33. Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vision (IJCV) 115, 211–252 (2015).

    Article  MathSciNet  Google Scholar 

  34. Carlini, N. & Wagner, D. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy 39–57 (IEEE, 2017).

  35. Pei, K., Cao, Y., Yang, J. & Jana, S. Deepxplore: Automated whitebox testing of deep learning systems. In Proc. 26th Symposium on Operating Systems Principles 1–18 (ACM, 2017).

  36. Cohen, J., Rosenfeld, E. & Kolter, Z. Certified adversarial robustness via randomized smoothing. In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. & Salakhutdinov, R.) 1310–1320 (PMLR, 2019).

  37. Liao, F. et al. Defense against adversarial attacks using high-level representation guided denoiser. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 1778–1787 (IEEE, 2018).

  38. Kurakin, A. et al. Adversarial attacks and defences competition. In The NIPS’17 Competition: Building Intelligent Systems 195–231 (Springer, 2018).

  39. Tramèr, F. et al. Ensemble adversarial training: attacks and defenses. In International Conference on Learning Representations (ICLR, 2018).

  40. Wong, E., Schmidt, F., Metzen, J. H. & Kolter, J. Z. Scaling provable adversarial defenses. In Advances in Neural Information Processing Systems 8400–8409 (NIPS, 2018).

  41. Su, D. et al. Is robustness the cost of accuracy? A comprehensive study on the robustness of 18 deep image classification models. In European Conference on Computer Vision 631–648 (Springer, 2018).

  42. Krizhevsky, A. & Hinton, G. Learning Multiple Layers of Features from Tiny Images. Technical report, Univ. Toronto (2009).

  43. Rony, J. et al. Decoupling direction and norm for efficient gradient-based L2 adversarial attacks and defenses. In The IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2019).

  44. Shiraishi, J. et al. Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists’ detection of pulmonary nodules. Am. J. Roentgenol. 174, 71–74 (2000).

    Article  Google Scholar 

  45. Lin, T.-Y. et al. Microsoft COCO: common objects in context. In European Conference on Computer Vision 740–755 (Springer, 2014).

  46. Carlini, N. et al. On evaluating adversarial robustness. Preprint at https://arxiv.org/abs/1902.06705 (2019).

  47. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).

  48. He, K., Zhang, X., Ren, S. & Sun, J. Identity mappings in deep residual networks. In European Conference on Computer Vision 630–645 (Springer, 2016).

  49. He, T. et al. Bag of tricks for image classification with convolutional neural networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 558–567 (IEEE, 2019).

  50. Paszke, A. et al. Automatic differentiation in PyTorch. In Proc. 31st Conference on Neural Information Processing Systems (NIPS, 2017).

  51. Yamada, Y., Iwamura, M., Akiba, T. & Kise, K. Shakedrop regularization for deep residual learning. Preprint at https://arxiv.org/abs/1802.02375 (2018).

  52. Huang, G., Sun, Y., Liu, Z., Sedra, D. & Weinberger, K. Q. Deep networks with stochastic depth. In European Conference on Computer Vision 646–661 (Springer, 2016).

Download references

Acknowledgements

This work was supported in part by the Center for Brain-Inspired Computing (C-BRIC), one of six centres in the Joint University Microelectronics Program (JUMP), a Semiconductor Research Corporation (SRC) programme sponsored by the Defense Advanced Research Projects Agency (DARPA). W.W. acknowledges additional funding from Defense Threat Reduction Agency (DTRA) (award no. HDTRA1-18-1-0009). J.C. acknowledges funding from the Maseeh College of Engineering & Computer Science’s Undergraduate Research and Mentoring Program and the SRC Education Alliance (award no. 2009-UR-2032G). W.W. and J.C. received funding from F. Maseeh. We thank A. Madry27,32 and J. Cohen36 for helpful discussions and clarifications about their work. We thank FuR and A. Parise for assisting with the collection of photos for the examples throughout this work.

Author information

Authors and Affiliations

Authors

Contributions

W.W. contributed the original idea, algorithms, experimental design, ablation studies, some active learning annotations and wrote the majority of the paper. J.C. conducted LIME and Grad-CAM integrations, annotated the majority of the active learning annotations, provided text for the active learning sections of the paper and contributed editing support. C.T. provided scope advisement, editing support and funding for the work.

Corresponding authors

Correspondence to Walt Woods, Jack Chen or Christof Teuscher.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Woods, W., Chen, J. & Teuscher, C. Adversarial explanations for understanding image classification decisions and improved neural network robustness. Nat Mach Intell 1, 508–516 (2019). https://doi.org/10.1038/s42256-019-0104-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-019-0104-6

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics