Abstract
Drug repurposing is an effective strategy to identify new uses for existing drugs, providing the quickest possible transition from bench to bedside. Real-world data, such as electronic health records and insurance claims, provide information on large cohorts of users for many drugs. Here we present an efficient and easily customized framework for generating and testing multiple candidates for drug repurposing using a retrospective analysis of real-world data. Building upon well-established causal inference and deep learning methods, our framework emulates randomized clinical trials for drugs present in a large-scale medical claims database. We demonstrate our framework on a coronary artery disease cohort of millions of patients. We successfully identify drugs and drug combinations that substantially improve the coronary artery disease outcomes but haven’t been indicated for treating coronary artery disease, paving the way for drug repurposing.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The data we use is MarketScan Commercial Claims and Encounters (CCAE, more than 100 million patients, from 2012 to 2017) The details of source data structure and prepossessed input data demo are available at the Github repository https://github.com/ruoqi-liu/DeepIPW. Access to the MarketScan data analysed in this manuscript is provided by the Ohio State University. The dataset is available from IBM at https://www.ibm.com/products/marketscan-research-databases.
Code availability
The source code for this paper can be downloaded from the Github repository at https://github.com/ruoqi-liu/DeepIPWor the Zenodo repository at https://doi.org/10.5281/zenodo.4079391.
References
Langedijk, J., Mantel-Teeuwisse, A. K., Slijkerman, D. S. & Schutjens, M.-H. D. Drug repositioning and repurposing: terminology and definitions in literature. Drug Discov. Today 20, 1027–1034 (2015).
Ashburn, T. T. & Thor, K. B. Drug repositioning: identifying and developing new uses for existing drugs. Nat. Rev. Drug Discov. 3, 673–683 (2004).
Pushpakom, S. et al. Drug repurposing: progress, challenges and recommendations. Nat. Rev. Drug Discov. 18, 41–58 (2019).
Luo, H. et al. DPDR-CPI, a server that predicts drug positioning and drug repositioning via chemical-protein interactome. Sci. Rep. 6, 35996 (2016).
Dakshanamurthy, S. et al. Predicting new indications for approved drugs using a proteochemometric method. J. Med. Chem. 55, 6832–6848 (2012).
Sanseau, P. et al. Use of genome-wide association studies for drug repositioning. Nat. Biotechnol. 30, 317–320 (2012).
Iorio, F. et al. Discovery of drug mode of action and drug repositioning from transcriptional responses. Proc. Natl Acad. Sci USA 107, 14621–14626 (2010).
Sirota, M. et al. Discovery and preclinical validation of drug indications using compendia of public gene expression data. Sci. Transl. Med. 3, 96ra77 (2011).
Buchan, N. S. et al. The role of translational bioinformatics in drug discovery. Drug Discov. Today 16, 426–434 (2011).
Sherman, R. E. et al. Real-world evidence—what is it and what can it tell us. N. Engl. J. Med. 375, 2293–2297 (2016).
Cheng, F. et al. Network-based approach to prediction and population-based validation of in silico drug repurposing. Nat. Commun. 9, 2691 (2018).
Xu, H. et al. Validating drug repurposing signals using electronic health records: a case study of metformin associated with reduced cancer mortality. J. Am. Med. Inform. Assoc. 22, 179–191 (2014).
Hernán, M. A. & Robins, J. M. Using big data to emulate a target trial when a randomized trial is not available. Am. J. Epidemiol. 183, 758–764 (2016).
D’Agostino, R. B. Estimating treatment effects using observational data. JAMA 297, 314–316 (2007).
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Hirano, K., Imbens, G. W. & Ridder, G. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71, 1161–1189 (2003).
MarketScan Research Databases. IBM https://www.ibm.com/products/marketscan-research-databases (2020).
Commercial Claims and Encounters: Medicare Supplemental https://theclearcenter.org/wp-content/uploads/2020/01/IBM-MarketScan-User-Guide.pdf (Truven Health Analytics, 2016).
Classification of diseases, functioning, and disability. Centers for Disease Control and Prevention https://www.cdc.gov/nchs/icd/index.htm (2019).
The Observational Health Data Sciences and Informatics (OHDSI). https://ohdsi.org/ (2019).
Causes of heart failure. American Heart Association https://www.heart.org/en/health-topics/heart-failure/causes-and-risks-for-heart-failure/causes-of-heart-failure (2017).
Gheorghiade, M. & Bonow, R. O. Chronic heart failure in the united states: a manifestation of coronary artery disease. Circulation 97, 282–289 (1998).
Conditions that increase risk for stroke. Centers for Disease Control and Prevention https://www.cdc.gov/stroke/conditions.htm (2018).
Coronary artery disease. Heart and Stroke Foundation of Canada https://www.heartandstroke.ca/heart/conditions/coronary-artery-disease (2019).
Austin, P. C. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav. Res. 46, 399–424 (2011).
Efron, B. & Tibshirani, R. Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat. Sci. 1, 54–75 (1986).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).
Kuhn, M., Campillos, M., Letunic, L. J. & Bork, P. A side effect resource to capture phenotypic effects of drugs. Mol. Syst. Biol. 6, 343 (2010).
Wishart, D. S. et al. Drugbank 5.0: a major update to the drugbank database for 2018. Nucleic Acids Res. 46, D1074–D1082 (2018).
Fisher, M. L. et al. Beneficial effects of metoprolol in heart failure associated with coronary artery disease: a randomized trial. J. Am. Coll. Cardiol. 23, 943–950 (1994).
Wong, T. Y., Simó, R. & Mitchell, P. Fenofibrate – a potential systemic treatment for diabetic retinopathy?. Am. J. Ophthalmol. 154, 6–12 (2012).
Hydrochlorothiazide. drugs.com https://www.drugs.com/monograph/hydrochlorothiazide.html (2019).
Pepine, C. J. et al. A calcium antagonist vs a non–calcium antagonist hypertension treatment strategy for patients with coronary artery disease: the international verapamil-trandolapril study (invest): a randomized controlled trial. JAMA 290, 2805–2816 (2003).
Jukema, J. W. et al. Effects of lipid lowering by pravastatin on progression and regression of coronary artery disease in symptomatic men with normal to moderately elevated serum cholesterol levels: the regression growth evaluation statin study (regress). Circulation 91, 2528–2540 (1995).
Kjekshus, J., Pedersen, T. R., Olsson, A. G., Færgeman, O. & Pyörälä, K. The effects of simvastatin on the incidence of heart failure in patients with coronary heart disease. J. Card. Fail. 3, 249–254 (1997).
Higuchi, T., Abletshauser, C., Nekolla, S. G., Schwaiger, M. & Bengel, F. M. Effect of the angiotensin receptor blocker valsartan on coronary microvascular flow reserve in moderately hypertensive patients with stable coronary artery disease. Microcirculation 14, 805–812 (2007).
Diltiazem. SIDER http://sideeffects.embl.de/drugs/3075/ (2019).
Ozery-Flato, M., Goldschmidt, Y., Shaham, O., Ravid, S. & Yanover, C. Framework for identifying drug repurposing candidates from observational healthcare data. Preprint at medRxiv https://doi.org/10.1101/2020.01.28.20018366 (2020).
Shimoni, Y. et al. An evaluation toolkit to guide model selection and cohort definition in causal inference. Preprint at https://arxiv.org/abs/1906.00442 (2019).
Zhang, P., Wang, F., Hu, J. & Sorrentino, R. Exploring the relationship between drug side-effects and therapeutic indications. In AMIA Annual Symposium Proceedings 2013 1568–1577 (American Medical Informatics Association, 2013).
Liang, X. et al. LRSSL: predict and interpret drug–disease associations based on data integration using sparse subspace learning. Bioinformatics 33, 1187–1196 (2017).
Luo, H. et al. DRAR-CPI: a server for identifying drug repositioning potential and adverse drug reactions via the chemical–protein interactome. Nucleic Acids Res. 39, W492–W498 (2011).
Dudley, J. T., Deshpande, T. & Butte, A. J. Exploiting drug–disease relationships for computational drug repositioning. Brief. Bioinform. 12, 303–311 (2011).
Jarada, T. N., Rokne, J. G. & Alhajj, R. A review of computational drug repositioning: strategies, approaches, opportunities, challenges, and directions. J. Cheminf. 12, 46 (2020).
Gottlieb, A., Stein, G. Y., Ruppin, E. & Sharan, R. PREDICT: a method for inferring novel drug indications with application to personalized medicine. Mol. Syst. Biol. 7, 496 (2011).
Rubinstein, L. V. et al. Design issues of randomized phase II trials and a proposal for phase ii screening trials. J. Clin. Oncol. 23, 7199–7206 (2005).
Metformin to reduce heart failure after myocardial infarction (gips-iii). clinicaltrials.gov https://clinicaltrials.gov/ct2/show/study/NCT01217307?term=metformin&cond=Coronary+Artery+Disease&phase=12&draw=2&rank=2 (2018).
Escitalopram oxalate. drugs.com https://www.drugs.com/monograph/escitalopram-oxalate.html (2020).
Responses of myocardial ischemia to escitalopram treatment (remit). clinicaltrials.gov https://clinicaltrials.gov/ct2/show/NCT00574847?term=escitalopram&cond=Coronary+Artery+Disease&draw=2&rank=1 (2015).
Effect of atorvastatin on fractional flow reserve in coronary artery disease (forte). clinicaltrials.gov https://clinicaltrials.gov/ct2/show/NCT01946815?term=atorvastatin&cond=Coronary+Artery+Disease&phase=12&draw=2&rank=1 (2018).
Dahlöf, B. et al. Cardiovascular morbidity and mortality in the losartan intervention for endpoint reduction in hypertension study (life): a randomised trial against atenolol. Lancet 359, 995–1003 (2002).
D’Agostino, R. B. Jr Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat. Med. 17, 2265–2281 (1998).
Acknowledgements
This work was funded in part by the National Center for Advancing Translational Research of the National Institutes of Health under award number CTSA Grant UL1TR002733. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author information
Authors and Affiliations
Contributions
P.Z. conceived the project. R.L. and P.Z. developed the method. R.L. conducted the experiments. R.L., L.W. and P.Z. analysed the results. R.L., L.W. and P.Z. wrote the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Machine Intelligence thanks Daniel Merk and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 CAD cohorts characteristics.
a, The patients’ distribution of total time in the database. b, The patient’s distribution of time before/after CAD initiation date. c, The growth of the number of patients developing outcomes after CAD initiation date. d, The gender distribution with age at CAD initiation date.
Extended Data Fig. 2 Performance comparison of LSTM-IPTW and LR-IPTW using drug candidate: diltiazem (with known CAD indication).
The three figures on the top are results obtained from LSTM-IPTW, while the figures on the bottom are from LR-IPTW. a, and (d) The absolute SMD of each covariate in the original data (orange triangles) and in the weighted data (blue circles). b, and (e) The distribution of estimated propensity scores over user (orange area) and non-user (blue area) cohorts. c, and (f) The ROC curves for the propensity model (orange), expected value (green) and weighted propensity (blue).
Extended Data Fig. 3 Distribution of estimated ATE of drug classes on defined outcomes across the 50 bootstrap samples.
All these showing drug classes satisfy two conditions: adjusted p-value less than 0.05 and post unbalanced ratio less than 2%. Within the boxplot, the central line denotes the median, and the bottom and the top edges denote the 25th(Q1) and 75th(Q3) and percentiles respectively. The whiskers extend to 1.5 times the interquartile range.
Extended Data Fig. 4 The list of significant drug classes.
The drug classes are denoted using ATC code and corresponding names.
Extended Data Fig. 5 The estimated treatment effects for CAD over balanced and statistically significant drug combinations.
The drug combinations are ranked by the estimated ATE values.
Extended Data Fig. 6 Performance comparison of proposed method and three pre-clinical methods evaluated by Precision@K.
The values of K are selected from {6, 9}.
Extended Data Fig. 7 Retrieved additional repurposing candidates under different thresholds’ setting.
The adjusted p-value is changed to 0.15 and the post unbalanced ratio remains the same as previous setting (less than 2%).
Extended Data Fig. 8 The definition of user and non-user cohorts.
Index date refers to the first prescription of the trial’s drug (user cohort) or the alternative drug (non-user cohort). The time period before the index date is the baseline period, and the time after the index date is the follow-up period. The patient covariates are collected during the baseline period and the treatment effects areevaluated at the follow-up period.
Supplementary information
Supplementary Information
Supplementary Tables 1–6 and Figs. 1 and 2.
Rights and permissions
About this article
Cite this article
Liu, R., Wei, L. & Zhang, P. A deep learning framework for drug repurposing via emulating clinical trials on real-world patient data. Nat Mach Intell 3, 68–75 (2021). https://doi.org/10.1038/s42256-020-00276-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-020-00276-w
This article is cited by
-
Clinical data mining: challenges, opportunities, and recommendations for translational applications
Journal of Translational Medicine (2024)
-
Comparing the effects of four common drug classes on the progression of mild cognitive impairment to dementia using electronic health records
Scientific Reports (2023)
-
Multitask joint strategies of self-supervised representation learning on biomedical networks for drug discovery
Nature Machine Intelligence (2023)
-
High-throughput target trial emulation for Alzheimer’s disease drug repurposing with real-world data
Nature Communications (2023)
-
Artificial intelligence to guide precision anticancer therapy with multitargeted kinase inhibitors
BMC Cancer (2022)