Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

A primer on deep learning in genomics

Abstract

Deep learning methods are a class of machine learning techniques capable of identifying highly complex patterns in large datasets. Here, we provide a perspective and primer on deep learning applications for genome analysis. We discuss successful applications in the fields of regulatory genomics, variant calling and pathogenicity scores. We include general guidance for how to effectively use deep learning methods as well as a practical guide to tools and resources. This primer is accompanied by an interactive online tutorial.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Deep learning workflow in genomics.
Fig. 2: Applications of deep learning in genomics.

Similar content being viewed by others

References

  1. Angermueller, C., Pärnamaa, T., Parts, L. & Stegle, O. Deep learning for computational biology. Mol. Syst. Biol. 12, 878 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  2. Ching, T. et al. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15, 20170387 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Telenti, A., Lippert, C., Chang, P. C. & DePristo, M. Deep learning of genomic variation and regulatory network data. Hum. Mol. Genet. 27, R63–R71 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Yue, T. & Wang, H. Deep learning for genomics: a concise overview. Preprint at https://arxiv.org/abs/1802.00810 (2018).

  5. Camacho, D. M., Collins, K. M., Powers, R. K., Costello, J. C. & Collins, J. J. Next-generation machine learning for biological networks. Cell 173, 1581–1592 (2018).

    Article  CAS  PubMed  Google Scholar 

  6. Libbrecht, M. W. & Noble, W. S. Machine learning applications in genetics and genomics. Nat. Rev. Genet. 16, 321–332 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Goodfellow, I., Bengio, Y., Courville, A. & Bengio, Y. Deep Learning Vol. 1 (MIT Press, Cambridge, 2016).

    Google Scholar 

  8. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    Article  CAS  PubMed  Google Scholar 

  9. Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 1, 1097–1105 (2012).

    Google Scholar 

  10. Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006).

    Article  CAS  PubMed  Google Scholar 

  11. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).

    Google Scholar 

  12. Khodabandelou, G., Mozziconacci, J. & Routhier, E. Genome functional annotation using deep convolutional neural network. Preprint at https://www.biorxiv.org/content/early/2018/05/25/330308 (2018).

  13. Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Powers, D. M. W. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2, 37–63 (2011).

    Google Scholar 

  15. Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Hastie, T., Tibshirani, R. & Friedman, J. H. The Elements of Statistical Learning Vol. 1 (Springer Science+Business Media, New York, 2001).

    Book  Google Scholar 

  17. Quang, D. & Xie, X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 44, e107 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. Preprint at https://arxiv.org/abs/1703.01365v2 (2017).

  19. Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. Proc. Int. Conf. Mach. Learn. 70, 3145–3153 (2017).

    Google Scholar 

  20. Ribeiro, M. T., Singh, S. & Guestrin, C. “Why should I trust you?”: explaining the predictions of any classifier. in KDD 1135–1144 (AAAI Press, Menlo Park, CA, USA, 2016).

  21. Park, Y. & Kellis, M. Deep learning for regulatory genomics. Nat. Biotechnol. 33, 825–826 (2015).

    Google Scholar 

  22. Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).

    Article  CAS  PubMed  Google Scholar 

  23. Lanchantin, J., Singh, R., Wang, B. & Qi, Y. Deep motif dashboard: visualizing and understanding genomic sequences using deep neural networks. Pac. Symp. Biocomput. 22, 254–265 (2017).

    PubMed  PubMed Central  Google Scholar 

  24. Zeng, H., Edwards, M. D., Liu, G. & Gifford, D. K. Convolutional neural network architectures for predicting DNA-protein binding. Bioinformatics 32, i121–i127 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Liu, F., Li, H., Ren, C., Bo, X. & Shu, W. PEDLA: predicting enhancers with a deep learning-based algorithmic framework. Sci. Rep. 6, 28517 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Kleftogiannis, D., Kalnis, P. & Bajic, V. B. DEEP: a general computational framework for predicting enhancers. Nucleic Acids Res. 43, e6 (2015).

    Article  CAS  PubMed  Google Scholar 

  27. Min, X. et al. Predicting enhancers with deep convolutional neural networks. BMC Bioinformatics 18 (Suppl. 13), 478 (2017).

  28. Eser, U. & Stirling Churchman, L. FIDDLE: an integrative deep learning framework for functional genomic data inference. Preprint at https://www.biorxiv.org/content/early/2016/10/17/081380 (2016).

  29. Li, Y., Shi, W. & Wasserman, W. W. Genome-wide prediction of cis-regulatory regions using supervised deep learning methods. BMC Bioinformatics 19, 202 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Wang, Y. et al. Predicting DNA methylation state of CpG dinucleotide using genome topological features and deep networks. Sci. Rep. 6, 19598 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Schreiber, J., Libbrecht, M., Bilmes, J. & Noble, W. Nucleotide sequence and DNaseI sensitivity are predictive of 3D chromatin architecture. Preprint at https://www.biorxiv.org/content/early/2017/01/30/103614 (2017).

  32. Zeng, W., Wu, M. & Jiang, R. Prediction of enhancer-promoter interactions via natural language processing. BMC Genomics 19 (Suppl. 2), 84 (2018).

  33. Shrikumar, A., Greenside, P. & Kundaje, A. Reverse-complement parameter sharing improves deep learning models for genomics. Preprint at https://www.biorxiv.org/content/early/2017/01/27/103663 (2017).

  34. Tan, J., Hammond, J. H., Hogan, D. A. & Greene, C. S. ADAGE-based integration of publicly available Pseudomonas aeruginosa gene expression data with denoising autoencoders illuminates microbe-host interactions. mSystems 1, e00025-15 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Chen, Y., Li, Y., Narayan, R., Subramanian, A. & Xie, X. Gene expression inference with deep learning. Bioinformatics 32, 1832–1839 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Chen, L., Cai, C., Chen, V. & Lu, X. Learning a hierarchical representation of the yeast transcriptomic machinery using an autoencoder model. BMC Bioinformatics 17 (Suppl. 1), 9 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Cui, H. et al. Boosting gene expression clustering with system-wide biological information: a robust autoencoder approach. Preprint at https://www.biorxiv.org/content/early/2017/11/05/214122 (2017).

  38. Xie, R., Wen, J., Quitadamo, A., Cheng, J. & Shi, X. A deep auto-encoder model for gene expression prediction. BMC Genomics 18 (Suppl. 9), 845 (2017).

  39. Jha, A., Gazzara, M. R. & Barash, Y. Integrative deep models for alternative splicing. Bioinformatics 33, i274–i282 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Tripathi, R., Patel, S., Kumari, V., Chakraborty, P. & Varadwaj, P. K. DeepLNC, a long non-coding RNA prediction tool using deep neural network. Netw. Model. Anal. Health Inform. Bioinform. 5, 21 (2016).

    Article  Google Scholar 

  41. Yu, N., Yu, Z. & Pan, Y. A deep learning method for lincRNA detection using auto-encoder algorithm. BMC Bioinformatics 18 (Suppl. 15), 511 (2017).

  42. Hill, S. T. et al. A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential. Nucleic Acids Res. 46, 8105–8113 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Angermueller, C., Lee, H. J., Reik, W. & Stegle, O. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol. 18, 67 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Shaham, U. et al. Removal of batch effects using distribution-matching residual networks. Bioinformatics 33, 2539–2546 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Lin, C., Jain, S., Kim, H. & Bar-Joseph, Z. Using neural networks for reducing the dimensions of single-cell RNA-Seq data. Nucleic Acids Res. 45, e156 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Poplin, R. et al. Creating a universal SNP and small indel variant caller with deep neural networks. Preprint at https://www.biorxiv.org/content/early/2018/03/20/092890 (2017).

  47. Luo, R., Sedlazeck, F.J., Lam, T.-W. & Schatz, M. Clairvoyante: a multi-task convolutional deep neural network for variant calling in single molecule sequencing. Preprint at https://www.biorxiv.org/content/early/2018/09/26/310458 (2018).

  48. Luo, R., Lam, T.-W. & Schatz, M. Skyhawk: an artificial neural network-based discriminator for reviewing clinically significant genomic variants. Preprint at https://www.biorxiv.org/content/early/2018/05/01/311985 (2018).

  49. Torracinta, R. et al. Adaptive somatic mutations calls with deep learning and semi-simulated data. Preprint at https://www.biorxiv.org/content/early/2016/10/04/079087 (2016).

  50. Boža, V., Brejová, B. & Vinař, T. DeepNano: deep recurrent neural networks for base calling in MinION nanopore reads. PLoS One 12, e0178751 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Teng, H., Hall, M.B., Duarte, T., Cao, M.D. & Coin, L. Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning. Preprint at https://www.biorxiv.org/content/early/2017/08/23/179531 (2017).

  52. Qi, H. et al. MVP: predicting pathogenicity of missense variants by deep neural networks. Preprint at https://www.biorxiv.org/content/early/2018/02/02/259390 (2018).

  53. Quang, D., Chen, Y. & Xie, X. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31, 761–763 (2015).

    Article  CAS  PubMed  Google Scholar 

  54. Korvigo, I., Afanasyev, A., Romashchenko, N. & Skoblov, M. Generalising better: applying deep learning to integrate deleteriousness prediction scores for whole-exome SNV studies. PLoS One 13, e0192829 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Yuan, Y. et al. DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations. BMC Bioinformatics 17, 476 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Yousefi, S. et al. Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models. Sci. Rep. 7, 11707 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Ma, W., Qiu, Z., Song, J., Cheng, Q. & Ma, C. DeepGS: predicting phenotypes from genotypes using deep learning. Preprint at https://www.biorxiv.org/content/early/2017/12/31/241414 (2017).

  58. Zhou, J. et al. Whole-genome deep learning analysis reveals causal role of noncoding mutations in autism. Preprint at https://www.biorxiv.org/content/early/2018/05/11/319681 (2018).

  59. Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171–1179 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Sundaram, L. et al. Predicting the clinical impact of human mutation with deep neural networks. Nat. Genet. 50, 1161–1170 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Avsec, Z. et al. Kipoi: accelerating the community exchange and reuse of predictive models for genomics. Preprint at https://www.biorxiv.org/content/early/2018/07/24/375345 (2018).

  62. Webb, S. Deep learning for biology. Nature 554, 555–557 (2018).

    Article  CAS  PubMed  Google Scholar 

  63. Ghorbani, A., Abid, A. & Zou, J. Interpretation of neural networks is fragile. Preprint at https://arxiv.org/abs/1710.10547 (2017).

  64. Gupta, A. & Zou, J. Feedback GAN (FBGAN) for DNA: a novel feedback-loop architecture for optimizing protein functions. Preprint at https://arxiv.org/abs/1804.01694 (2018).

  65. Stranger, B. et al.; eGTEx Project. Enhancing GTEx by bridging the gaps between genotype, gene expression, and disease. Nat. Genet. 49, 1664–1670 (2017).

Download references

Acknowledgements

We thank N. Wineinger, R. Dias, J. di Iulio and D. Evans for comments on the paper. The work of A. Telenti, A. Torkamani and P.M. is supported by the Qualcomm Foundation and the NIH Center for Translational Science Award (CTSA, grant UL1TR002550). Further support to A. Torkamani is from U54GM114833 and U24TR002306. J.Z. is supported by a Chan–Zuckerberg Biohub Investigator grant and National Science Foundation (NSF) grant CRII 1657155.

Author information

Authors and Affiliations

Authors

Contributions

All authors conceived and designed the project. J.Z., M.H., P.M., A. Torkamani and A. Telenti wrote the manuscript. J.Z. and A.A. wrote the online tutorial.

Corresponding authors

Correspondence to James Zou or Amalio Telenti.

Ethics declarations

Competing interests

M.H. is an employee of Peltarion.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zou, J., Huss, M., Abid, A. et al. A primer on deep learning in genomics. Nat Genet 51, 12–18 (2019). https://doi.org/10.1038/s41588-018-0295-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-018-0295-5

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing