Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Privacy challenges and research opportunities for genomic data sharing

Abstract

The sharing of genomic data holds great promise in advancing precision medicine and providing personalized treatments and other types of interventions. However, these opportunities come with privacy concerns, and data misuse could potentially lead to privacy infringement for individuals and their blood relatives. With the rapid growth and increased availability of genomic datasets, understanding the current genome privacy landscape and identifying the challenges in developing effective privacy-protecting solutions are imperative. In this work, we provide an overview of major privacy threats identified by the research community and examine the privacy challenges in the context of emerging direct-to-consumer genetic-testing applications. We additionally present general privacy-protection techniques for genomic data sharing and their potential applications in direct-to-consumer genomic testing and forensic analyses. Finally, we discuss limitations in current privacy-protection methods, highlight possible mitigation strategies and suggest future research opportunities for advancing genomic data sharing.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Taxonomy of known privacy attacks in genomic data sharing.
Fig. 2: Membership disclosure attack by Homer et al.10, in which an adversary aims at determining the presence of the target in the mixture (for example, case group).
Fig. 3: Genetic genealogy search framework for forensics analysis.

Similar content being viewed by others

References

  1. Mardis, E. R. A decade’s perspective on DNA sequencing technology. Nature 470, 198–203 (2011).

    CAS  PubMed  Google Scholar 

  2. Metzker, M. L. Sequencing technologies—the next generation. Nat. Rev. Genet. 11, 31–46 (2010).

    CAS  PubMed  Google Scholar 

  3. Denny, J. C. et al. The “All of Us” Research Program. N. Engl. J. Med. 381, 668–676 (2019).

    PubMed  Google Scholar 

  4. Green, R. C. et al. Disclosure of APOE genotype for risk of Alzheimer’s disease. N. Engl. J. Med. 361, 245–254 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Goldman, J. S. et al. Genetic counseling and testing for Alzheimer disease: joint practice guidelines of the American College of Medical Genetics and the National Society of Genetic Counselors. Genet. Med. 13, 597–605 (2011).

    PubMed  PubMed Central  Google Scholar 

  6. Heeney, C., Hawkins, N., de Vries, J., Boddington, P. & Kaye, J. Assessing the privacy risks of data sharing in genomics. Public Health Genomics 14, 17–25 (2011).

    CAS  PubMed  Google Scholar 

  7. Wang, S. et al. Genome privacy: challenges, technical approaches to mitigate risk, and ethical considerations in the United States. Ann. NY Acad. Sci 1387, 73–83 (2017).

    PubMed  Google Scholar 

  8. Lin, Z., Owen, A. B. & Altman, R. B. Genomic research and human subject privacy. Science 305, 183 (2004).

    CAS  PubMed  Google Scholar 

  9. Sankararaman, S., Obozinski, G., Jordan, M. I. & Halperin, E. Genomic privacy and limits of individual detection in a pool. Nat. Genet. 41, 965–967 (2009).

    CAS  PubMed  Google Scholar 

  10. Homer, N. et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 4, e1000167 (2008).

    PubMed  PubMed Central  Google Scholar 

  11. Humbert, M., Ayday, E., Hubaux, J.-P. & Telenti, A. Addressing the concerns of the lacks family: Quantification of kin genomic privacy. In Proc. 2013 ACM SIGSAC Conference on Computer & Communications Security 1141–1152 (ACM, 2013).

  12. Gymrek, M., McGuire, A. L., Golan, D., Halperin, E. & Erlich, Y. Identifying personal genomes by surname inference. Science 339, 321–324 (2013).

    CAS  PubMed  Google Scholar 

  13. Lippert, C. et al. Identification of individuals by trait prediction using whole-genome sequencing data. Proc. Natl Acad. Sci. USA 114, 10166–10171 (2017).

    CAS  PubMed  Google Scholar 

  14. McGuire, A. L. et al. To share or not to share: a randomized trial of consent for data sharing in genome research. Genet. Med. 13, 948–955 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Oliver, J. M. et al. Balancing the risks and benefits of genomic data sharing: genome research participants’ perspectives. Public Health Genomics 15, 106–114 (2012).

    CAS  PubMed  Google Scholar 

  16. Health Insurance Portability and Accountability Act of 1996, 18 USC §264. (1996).

  17. Rocher, L., Hendrickx, J. M. & de Montjoye, Y.-A. Estimating the success of re-identifications in incomplete datasets using generative models. Nat. Commun. 10, 3069 (2019).

    PubMed  PubMed Central  Google Scholar 

  18. Na, L. et al. Feasibility of reidentifying individuals in large national physical activity data sets from which protected health information has been removed with use of machine learning. JAMA Netw. Open 1, e186040 (2018).

    PubMed  PubMed Central  Google Scholar 

  19. The Genetic Information Nondiscrimination Act of 2008 (2008); https://www.eeoc.gov/laws/statutes/gina.cfm

  20. European Parliament and Council. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection Of Natural Persons With Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EE (General Data Protection Regulation). Off. J. Eur. Union 119, 1–88 (2016).

  21. Erlich, Y. & Narayanan, A. Routes for breaching and protecting genetic privacy. Nat. Rev. Genet. 15, 409–421 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Naveed, M. et al. Privacy in the genomic era. ACM Comput. Surv. 48, 6 (2015).

    PubMed  PubMed Central  Google Scholar 

  23. Mittos, A., Malin, B. & De Cristofaro, E. Systematizing genome privacy research: a privacy-enhancing technologies perspective. Proc. Priv. Enhancing Technol. 2019, 87–107 (2019).

  24. Akgün, M., Bayrak, A. O., Ozer, B. & Sağıroğlu, M. Ş. Privacy preserving processing of genomic data: a survey. J. Biomed. Inform 56, 103–111 (2015).

    PubMed  Google Scholar 

  25. Sweeney, L., Abu, A. & Winn, J. Identifying participants in the personal genome project by name (2013); http://dataprivacylab.org/projects/pgp/1021-1.pdf

  26. Gitschier, J. Inferential genotyping of Y chromosomes in Latter-Day Saints founders and comparison to Utah samples in the HapMap project. Am. J. Hum. Genet. 84, 251–258 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Malin, B. Re-identification of familial database records. In AMIA Annual Symposium Proc., Vol. 2006, 524 (American Medical Informatics Association, 2006).

  28. Malin, B. & Sweeney, L. How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems. J. Biomed. Inform. 37, 179–192 (2004).

    PubMed  Google Scholar 

  29. Malin, B. & Sweeney, L. Determining the identifiability of DNA database entries. In Proc. AMIA Symposium, Vol. 537 (American Medical Informatics Association, 2000).

  30. Erlich, Y., Shor, T., Pe’er, I. & Carmi, S. Identity inference of genomic data using long-range familial searches. Science 362, 690–694 (2018).

    CAS  PubMed  Google Scholar 

  31. Kahn, S. D. On the future of genomic data. Science 331, 728–729 (2011).

    CAS  PubMed  Google Scholar 

  32. Areheart, B. A. & Roberts, J. L. GINA, big data, and the future of employee privacy. Yale Law J 128, 3 (2019).

    Google Scholar 

  33. Soo-Jin Lee, S. & Borgelt, E. Protecting posted genes: social networking and the limits of GINA. Am. J. Bioeth 14, 32–44 (2014).

    PubMed  Google Scholar 

  34. Wheeler, D. A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).

    CAS  PubMed  Google Scholar 

  35. Nyholt, D. R., Yu, C.-E. & Visscher, P. M. On Jim Watson’s APOE status: genetic information is hard to hide. Eur. J. Hum. Genet. 17, 147–149 (2009).

    Google Scholar 

  36. Humbert, M., Ayday, E., Hubaux, J.-P. & Telenti, A. Quantifying interdependent risks in genomic privacy. ACM Trans. Priv. Secur 20, 3 (2017).

    Google Scholar 

  37. Ayday, E. & Humbert, M. Inference attacks against kin genomic privacy. IEEE Secur. Priv. 15, 29–37 (2017).

    Google Scholar 

  38. Shringarpure, S. S. & Bustamante, C. D. Privacy risks from genomic data-sharing beacons. Am. J. Hum. Genet. 97, 631–646 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Wang, R., Li, Y.F., Wang, X., Tang, H. & Zhou, X. Learning your identity and disease from research papers: information leaks in genome wide association study. In Proc. 16th ACM conference on Computer and Communications Security 534–544 (ACM, 2009).

  40. James, R. et al. Exploring pathways to trust: a tribal perspective on data sharing. Genet. Med. 16, 820–826 (2014).

    PubMed  PubMed Central  Google Scholar 

  41. Harding, A. et al. Conducting research with tribal communities: sovereignty, ethics, and data-sharing issues. Environ. Health Perspect. 120, 6–10 (2012).

    PubMed  Google Scholar 

  42. Arquette, M. et al. Holistic risk-based environmental decision making: a Native perspective. Environ. Health Perspect. 110 (Suppl. 2), 259–264 (2002).

    PubMed  PubMed Central  Google Scholar 

  43. Mello, M. M. & Wolf, L. E. The Havasupai Indian tribe case—lessons for research involving stored biologic samples. N. Engl. J. Med. 363, 204–207 (2010).

    CAS  PubMed  Google Scholar 

  44. Christofides, E. & O’Doherty, K. Company disclosure and consumer perceptions of the privacy implications of direct-to-consumer genetic testing. New Genet. Soc. 35, 101–123 (2016).

    Google Scholar 

  45. Laestadius, L. I., Rich, J. R. & Auer, P. L. All your data (effectively) belong to us: data practices among direct-to-consumer genetic testing firms. Genet. Med. 19, 513–520 (2017).

    PubMed  Google Scholar 

  46. Niemiec, E. & Howard, H. C. Ethical issues in consumer genome sequencing: use of consumers’ samples and data. Appl. Transl. Genom. 8, 23–30 (2016).

    PubMed  PubMed Central  Google Scholar 

  47. 23andMe. Terms of Service (accessed 11 June 2020); https://www.23andme.com/about/tos/

  48. Allyse, M. 23 and me, we, and you: direct-to-consumer genetics, intellectual property, and informed consent. Trends Biotechnol. 31, 68–69 (2013).

    CAS  PubMed  Google Scholar 

  49. Eriksson, N. et al. Web-based, participant-driven studies yield novel genetic associations for common traits. PLoS Genet. 6, e1000993 (2010).

    PubMed  PubMed Central  Google Scholar 

  50. Ram, N., Guerrini, C. J. & McGuire, A. L. Genealogy databases and the future of criminal investigation. Science 360, 1078–1079 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Greytak, E. M., Kaye, D. H., Budowle, B., Moore, C. & Armentrout, S. L. Privacy and genetic genealogy data. Science 361, 857 (2018).

    CAS  PubMed  Google Scholar 

  52. Berkman, B. E., Miller, W. K. & Grady, C. Is it ethical to use genealogy data to solve crimes? Ann. Intern. Med. 169, 333–334 (2018).

    PubMed  PubMed Central  Google Scholar 

  53. GEDmatch. GEDmatch.Com Terms of Service and Privacy Policy (accessed 11 June 2020); https://www.gedmatch.com/tos.htm

  54. Erlich, Y. et al. Redefining genomic privacy: trust and empowerment. PLoS Biol. 12, e1001983 (2014).

    PubMed  PubMed Central  Google Scholar 

  55. Lauter, K., López-Alt, A. & Naehrig, M. Private computation on encrypted genomic data. In Progress in Cryptology - LATINCRYPT 2014, Vol. 8895, 3–27 (Springer, 2015).

  56. Wang, S. et al. HEALER: homomorphic computation of ExAct Logistic rEgRession for secure rare disease variants analysis in GWAS. Bioinformatics 32, 211–218 (2016).

    PubMed  Google Scholar 

  57. He, D. et al. Identifying genetic relatives without compromising privacy. Genome Res. 24, 664–672 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. Bohannon, P., Jakobsson, M. & Srikwan, S. Cryptographic approaches to privacy in forensic DNA databases. In Int. Workshop on Public Key Cryptography 373–390 (Springer, 2000).

  59. Sousa, J. S. et al. Efficient and secure outsourcing of genomic data storage. BMC Med. Genomics 10 (Suppl. 2), 46 (2017).

    PubMed  PubMed Central  Google Scholar 

  60. Deuber, D. et al. My genome belongs to me: controlling third party computation on genomic data. Proc. Priv. Enh. Technol. 2019, 108–132 (2019).

    Google Scholar 

  61. Ayday, E., Raisaro, J.L., Hubaux, J.-P. & Rougemont, J. Protecting and evaluating genomic privacy in medical tests and personalized medicine. In Proc. 12th ACM Workshop on Workshop on Privacy in the Electronic Society 95–106 (ACM, 2013).

  62. Constable, S. D., Tang, Y., Wang, S., Jiang, X. & Chapin, S. Privacy-preserving GWAS analysis on federated genomic datasets. BMC Med. Inform. Decis. Mak. 15 (Suppl. 5), S2 (2015).

    PubMed  PubMed Central  Google Scholar 

  63. Zhang, Y., Dai, W., Jiang, X., Xiong, H. & Wang, S. FORESEE: fully outsourced secure genome study based on homomorphic encryption. BMC Med. Inform. Decis. Mak. 15 (Suppl. 5), S5 (2015).

    PubMed  PubMed Central  Google Scholar 

  64. Chen, F. et al. PRINCESS: privacy-protecting rare disease international network collaboration via encryption through software guard extensions. Bioinformatics 33, 871–878 (2017).

    CAS  PubMed  Google Scholar 

  65. Goodrich, M.T. The mastermind attack on genomic data. In Security and Privacy, 2009 30th IEEE Symposium 204–218 (IEEE, 2009).

  66. Atallah, M.J., Kerschbaum, F. & Du, W. Secure and private sequence comparisons. In Proc. 2003 ACM Workshop on Privacy in the Electronic Society 39–44 (ACM, 2003).

  67. Jha, S., Kruger, L. & Shmatikov, V. Towards practical privacy for genomic computation. In Proc. 2008 IEEE Symposium on Security and Privacy 16–230 (IEEE, 2008).

  68. Bruekers, F., Katzenbeisser, S., Kursawe, K. & Tuyls, P. Privacy-preserving matching of DNA profiles. IACR Cryptol 2008, 203 (2008).

    Google Scholar 

  69. Danezis, G. & De Cristofaro, E. Fast and private genomic testing for disease susceptibility. In Proc. 13th Workshop on Privacy in the Electronic Society 31–34 (ACM, 2014).

  70. Duverle, D.A., Kawasaki, S., Yamada, Y., Sakuma, J. & Tsuda, K. Privacy-preserving statistical analysis by exact logistic regression. In Proc. 2015 IEEE Security and Privacy Workshops 7–16 (IEEE, 2015).

  71. Kamm, L., Bogdanov, D., Laur, S. & Vilo, J. A new way to protect privacy in large-scale genome-wide association studies. Bioinformatics 29, 886–893 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  72. Cho, H., Wu, D. J. & Berger, B. Secure genome-wide association analysis using multiparty computation. Nat. Biotechnol. 36, 547–551 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. Sweeney, L. k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10, 557–570 (2002).

    Google Scholar 

  74. Malin, B. A. An evaluation of the current state of genomic data privacy protection technology and a roadmap for the future. J. Am. Med. Inform. Assoc 12, 28–34 (2005).

    PubMed  PubMed Central  Google Scholar 

  75. Li, N., Qardaji, W. & Su, D. On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy. In Proc. 7th ACM Symposium on Information, Computer and Communications Security 32–33 (ACM, 2012).

  76. Malin, B. A. Protecting genomic sequence anonymity with generalization lattices. Methods Inf. Med. 44, 687–692 (2005).

    CAS  PubMed  Google Scholar 

  77. Dwork, C. Differential privacy. Int. Colloq. Autom. Lang. Program 4052, 1–12 (2006).

    Google Scholar 

  78. Simmons, S. & Berger, B. Realizing privacy preserving genome-wide association studies. Bioinformatics 32, 1293–1300 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  79. Johnson, A. & Shmatikov, V. Privacy-preserving data exploration in genome-wide association studies. In Proc. 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’13 1079 (ACM, 2013).

  80. Yu, F. & Ji, Z. Scalable privacy-preserving data sharing methodology for genome-wide association studies: an application to iDASH healthcare privacy protection challenge. BMC Med. Inform. Decis. Mak. 14 (Suppl. 1), S3 (2014).

    PubMed  PubMed Central  Google Scholar 

  81. Uhlerop, C., Slavković, A. & Fienberg, S. E. Privacy-preserving data sharing for genome-wide association studies. J. Priv. Confid. 5, 137–166 (2013).

    PubMed  PubMed Central  Google Scholar 

  82. Backes, M., Berrang, P., Humbert, M. & Manoharan, P. Membership privacy in MicroRNA-based studies. In Proc. 2016 ACM SIGSAC Conference on Computer and Communications Security 319–330 (ACM, 2016).

  83. Tramèr, F., Huang, Z., Hubaux, J.-P. & Ayday, E. Differential privacy with bounded priors: reconciling utility and privacy in genome-wide association studies. In Proc. 22nd ACM SIGSAC Conference on Computer and Communications Security 1286–1297 (ACM, 2015).

  84. Raisaro, J. L. et al. Protecting privacy and security of genomic data in I2B2 with homomorphic encryption and differential privacy. IEEE/ACM Trans. Comput. Bioinform 15, 1413–1426 (2018).

    Google Scholar 

  85. Huang, Z., Ayday, E., Fellay, J., Hubaux, J.-P. & Juels, A. GenoGuard: protecting genomic data against brute-force attacks. In 36th IEEE Symposium on Security and Privacy (2015).

  86. Juels, A. & Ristenpart, T. Honey encryption: security beyond the brute-force bound. In Annual International Conference on the Theory and Applications of Cryptographic Techniques 293–310 (Springer, 2014).

  87. Humbert, M., Ayday, E., Hubaux, J.-P. & Telenti, A. Reconciling utility with privacy in genomics. In Proc. 13th Workshop on Privacy in the Electronic Society 11–20 (ACM, 2014).

  88. Allyse, M.A., Robinson, D.H., Ferber, M.J. & Sharp, R.R. Direct-to-consumer testing 2.0: emerging models of direct-to-consumer genetic testing. In Mayo Clinic Proc., Vol. 93, 113–120 (Elsevier, 2018).

  89. Future of Privacy Forum. Privacy best practices for consumer genetic testing services (2018); https://fpf.org/wp-content/uploads/2018/07/Privacy-Best-Practices-for-Consumer-Genetic-Testing-Services-FINAL.pdf

  90. Wee, R., Henaghan, M. & Winship, I. Dynamic consent in the digital age of biology: online initiatives and regulatory considerations. J. Prim. Health Care 5, 341–347 (2013).

    PubMed  Google Scholar 

  91. Mackey, T. K. et al. ‘Fit-for-purpose?’—challenges and opportunities for applications of blockchain technology in the future of healthcare. BMC Med. 17, 68 (2019).

    PubMed  PubMed Central  Google Scholar 

  92. Maxmen, A. AI researchers embrace Bitcoin technology to share medical data. Nature 555, 293–294 (2018).

    CAS  PubMed  Google Scholar 

  93. Lawler, M. et al. All the world’s a stage: facilitating discovery science and improved cancer care through the global alliance for genomics and health. Cancer Discov 5, 1133–1136 (2015).

    PubMed  Google Scholar 

  94. Phillips, A. M. ‘Only a click away—DTC genetics for ancestry, health, love…and more: a view of the business and regulatory landscape’. Appl. Transl. Genom 8, 16–22 (2016).

    PubMed  PubMed Central  Google Scholar 

  95. Simmons, S., Sahinalp, C. & Berger, B. Enabling privacy-preserving GWASs in heterogeneous human populations. Cell Syst 3, 54–61 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  96. Yu, F., Fienberg, S. E., Slavković, A. B. & Uhler, C. Scalable privacy-preserving data sharing methodology for genome-wide association studies. J. Biomed. Inform. 50, 133–141 (2014).

    PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was supported by National Human Genome Research Institute grant K99HG010493 to L.B., National Institute of General Medical Sciences grant R01GM118609 and National Heart, Lung, and Blood Institute grant R01HL136835 to L.O.-M.

Author information

Authors and Affiliations

Authors

Contributions

L.B. conducted the literature review, drafted the organization of the article and contributed most the writing. Y.H. contributed to the sections on data sharing in DTC genetic testing and provided helpful comments on the presentation. L.O.-M. provided the motivation for this work, and provided detailed edits and critical suggestions on the organization and structure of the article.

Corresponding author

Correspondence to Luca Bonomi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bonomi, L., Huang, Y. & Ohno-Machado, L. Privacy challenges and research opportunities for genomic data sharing. Nat Genet 52, 646–654 (2020). https://doi.org/10.1038/s41588-020-0651-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-020-0651-0

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics