Main

Lung-function abnormality predicts mortality and is a diagnostic criterion for chronic obstructive pulmonary disease (COPD)1, which is the most prevalent respiratory disease globally2 and lacks disease-modifying treatments. Although smoking and other environmental risk factors for COPD are well known and genetic susceptibility is recognized, the molecular pathways underlying COPD are incompletely understood. As with other complex traits there has been a lack of ancestral diversity in genome-wide association studies (GWAS)3 of lung function4,5,6. Multi-ancestry studies improve the power and fine-mapping resolution of GWAS and increase the prospects for prediction, prevention, diagnosis and treatment in diverse populations3,4,7.

Understanding of the genes, proteins and pathways involved in disease-related traits underpins modern drug development. A high yield of genetic-association signals, improved signal resolution and integration with functional evidence assist confident identification of causal genes as well as the variants and pathways that impact gene function and regulation. Although datasets and in silico tools to connect GWAS signals to causal genes are improving, the findings from different datasets and tools have lacked consensus8,9, highlighting a need for frameworks to integrate functional evidence types and compare findings10.

Aggregation of lung-function-associated genetic variants into a genetic risk score (GRS) provides a tool for COPD prediction5. When a GRS comprises many variants, partitioning the GRS according to the biological pathways the variants influence could provide a tool to explore their aggregated consequences across different traits through phenome-wide association studies (PheWAS). Just as PheWAS of individual genetic variants predicts the consequences of perturbations of specific protein targets, informing assessment of drug efficacy, drug safety and drug repurposing11, PheWAS of pathway-partitioned GRS could inform the understanding of the consequences of perturbations of specific pathways.

Through the largest global assembly of lung-function genomics studies to date we: (1) undertook a multi-ancestry GWAS meta-analysis of lung-function traits in 588,452 individuals to detect novel signals, improve fine mapping and estimate heterogeneity in allelic effects attributable to ancestry; (2) tested whether lung-function signals are age- or smoking-dependent, and assessed their relationship to height; (3) investigated cell-type and functional specificity of lung-function association signals; (4) fine-mapped signals through annotation-informed credible sets, integrating functional data such as respiratory cell-specific chromatin accessibility signatures; (5) applied a consensus-based framework to systematically investigate and identify putative causal genes, integrating eight locus-based or similarity-based criteria; (6) developed and applied a GRS for the ratio of forced expiratory volume in 1 s (FEV1) to forced vital capacity (FVC) in different ancestries in the UK Biobank and COPD case–control studies; and (7) applied PheWAS to individual variants, GRS for each lung-function trait and GRS partitioned by pathway. Through these approaches, we aimed to detect novel lung-function signals and putative causal genes as well as provide new insights into the mechanistic pathways underlying lung function, some of which may be amenable to drug therapy.

Results

We undertook genome-wide association analyses of FEV1, FVC, FEV1/FVC and peak expiratory flow rate (PEF) from 49 cohorts (Methods and Supplementary Tables 1,2). Our study of up to 588,452 participants comprised individuals of African (AFR; n = 8,590), American/Hispanic (AMR; n = 14,668), East Asian (EAS; n = 85,279), South Asian (SAS; n = 4,270) and European ancestry (EUR; n = 475,645; Supplementary Fig. 1a,b). In cohort-specific analyses we adjusted for age, age squared, sex and height, accounting for population structure and relatedness (Methods and Supplementary Tables 24), and then applied genomic control using the linkage disequilibrium (LD) score regression intercept12. After filtering and meta-analysis across multi-ancestry cohorts, 66.8 million variants were available in each of four lung-function traits, with genomic inflation factors λ of 1.025, 1.022, 0.984 and 0.996 for FEV1, FVC, FEV1/FVC and PEF, respectively (Supplementary Figs. 2,3 and Supplementary Table 5).

1,020 signals for lung function

After excluding eight signals associated with smoking behavior (Supplementary Table 26) and combining signals that co-localized across traits, we identified 1,020 distinct signals for lung function using a stringent threshold of P < 5 × 10−9 (ref.13; Fig. 1a). Of these, 713 are novel with respect to the signals and studies described in the Supplementary Note (Supplementary Table 6). These 1,020 signals explain 33.0% of FEV1/FVC heritability (21.3% for FEV1, 17.3% for FVC and 21.4% for PEF; Methods).

Fig. 1: Study overview.
figure 1

a, Discovery meta-analysis. *For signals present in more than one trait, the signal is only counted once (for the most significant trait). b, Pathway analyses, GRS analyses and PheWAS studies.

To facilitate fine mapping, we included larger, more diverse populations than previous lung-function GWAS. We performed multi-ancestry meta-regression with MR-MEGA7, which incorporates axes of genetic ancestry as covariates to model heterogeneity (Methods). We then incorporated functional annotation for chromatin accessibility and transcription-factor binding sites in respiratory-relevant cells and tissues, and enriched genomic annotations14 to weight prior causal probabilities of association for putative causal variants (Methods). Overall reductions in credible set size and higher maximum posterior probabilities of association for the most likely causal variants were evident after multi-ancestry meta-regression and after functional annotations were incorporated (Supplementary Fig. 4). Following fine mapping, 438 (43%) signals had a single putative causal variant (posterior probability > 50%) and the median credible set size was nine variants (Supplementary Note).

Of the 960 sentinels represented in ≥7 cohorts, 109 signals showed heterogeneity attributable to ancestry (PHet < 0.05; Supplementary Fig. 5 and Supplementary Table 7), which was more than expected (binominal test, P = 3.93 × 10−15). Among these, five signals (rs9393688, rs28574670 (LTBP4), rs7183859 (THSD4), rs59985551 (EFEMP1) and rs78101726 (MECOM)) showed significant ancestry-correlated heterogeneity (Bonferroni correction for 960 signals tested, PHet < 5.21 × 10−5; Supplementary Fig. 6a–e). The intronic variant rs7183859 in THSD4, which we previously implicated in lung function15, showed larger effect-size estimates in non-EUR ancestries and in particular AFR ancestries (PHet = 3.33 × 10−5; Supplementary Fig. 6c).

We examined associations of lung-function-associated SNPs in children’s cohorts (Supplementary Table 8) and tested for differences in the estimated effect sizes of lung-function-associated SNPs between children and adults as well as between ever-smokers and never-smokers in EUR individuals (Methods). The effect-size estimates between children and adults were correlated (r from 0.51 for FEV1/FVC to 0.79 for FEV1; Supplementary Fig. 7), although 113 signals showed nominal evidence (P < 0.05) of age-dependent effects (more than expected, binomial P = 2.56 × 10−13). Three signals (rs7977418 (CCDC91), rs34712979 (NPNT) and rs931794 (HYKK) showed age-dependent effects (Bonferroni-corrected P< 4.64 × 10−5; Supplementary Table 9). We observed nominal evidence (P < 0.05) of smoking-dependent effects for 69 of 1,020 signals (Supplementary Fig. 8), more than expected (binomial P = 0.0079). The intronic SNP rs7733410 in HTR4, a signal we previously reported for lung function15, showed a 76.2% larger effect on FEV1 in ever-smokers compared with never-smokers (P = 4.09 × 10−5; Supplementary Table 10). As height is a determinant of lung growth, we compared height and lung-function associations, and tested the impact of additional height adjustments for sentinel SNPs. We found no correlation between estimated effect sizes for height and lung function (Supplementary Fig. 9), and the addition of height squared and height cubed covariates had little impact on effect-size estimates (Supplementary Fig. 10).

Cell-type and functional specificity

We assessed whether our association signals were enriched for regulatory or functional features in specific cell types. Using stratified LD-score regression16, we found enrichment of all histone marks tested (H3K27ac, H3K9ac, H3K4me3 and H3K4me1) in lung- and smooth-muscle-containing cell lines (Supplementary Table 11). Using GARFIELD17 we assessed for enrichment of our signals for DNase l hypersensitivity sites and chromatin accessibility peaks, showing enrichment in a wide variety of cell types, including higher enrichment in both fetal and adult lung and blood for FEV1, FEV1/FVC and PEF as well as fibroblast enrichment for FVC (Supplementary Fig. 11a). Our signals were enriched for transcription-factor footprints in fetal lung for FEV1, FEV1/FVC and PEF, for footprints in skin for FVC and also in blood for PEF (Supplementary Fig. 11b). Genic annotation enrichment patterns were similar across all traits, with enrichment mainly in exonic, 3′ UTR and 5′ UTR regions (Supplementary Fig. 11c). For all traits, we saw enrichment for transcription start sites, weak enhancers, enhancers and promoter flanks, with cell types for weak enhancer enrichment including endothelial cells for FEV1, FEV1/FVC and PEF (Supplementary Fig. 11d). For transcription-factor binding sites, we observed a similar enrichment pattern across all of the lung-function traits, with the largest fold enrichment observed for endothelial cells (Supplementary Fig. 11e). Our signals were enriched for assay for transposase-accessible chromatin using sequencing (ATAC–seq) peaks (Supplementary Note) in matrix fibroblast 1 for FVC, matrix fibroblast 2 for FEV1, myofibroblast for FEV1, FEV1/FVC and PEF, and alveolar type 1 cells in FEV1/FVC; furthermore, genic annotations showed enrichment of exon variants for FEV1, FEV1/FVC and 3′ UTR variants for FEV1 and FVC. We also found enrichment of transcription-factor binding sites in lung across all phenotypes and in bronchus for FEV1/FVC (Supplementary Table 12).

Identification of putative causal genes and variants

To identify putative causal genes, we systematically integrated orthogonal evidence using eight locus- or similarity-based criteria (Supplementary Note): (1) the nearest gene to the sentinel SNP, (2) co-localization of the GWAS signal and expression quantitative trait loci (eQTL) or (3) protein quantitative trait loci (pQTL) signals in relevant tissues (Methods), (4) rare variant association in whole-exome sequencing in the UK Biobank, (5) proximity to a gene for a Mendelian disease with a respiratory phenotype (±500 kb), (6) proximity to a human ortholog of a mouse-knockout gene with a respiratory phenotype (±500 kb), (7) an annotation-informed credible set14 containing a missense/deleterious/damaging variant with a posterior probability of association >50% and (8) the gene with the highest polygenic priority score (PoPS)9. We identified 559 putative causal genes satisfying at least two criteria, of which 135 were supported by at least three criteria (Figs. 1b, 2 and Supplementary Fig. 12). Among the 20 genes supported by ≥4 criteria (Supplementary Table 13), six previously implicated genes (TGFB2, NPNT, LTBP4, TNS1, SMAD3 and AP3B1)5,15,18,19,20 were supported by additional criteria compared with the original reports. Fourteen of the 20 genes supported by ≥4 criteria have not been previously confidently implicated in lung function (CYTL1, HMCN1, GATA5, ADAMTS10, IGHMBP2, SCMH1, GLI3, ABCA3, TIM1, CFH, FGFR1, LRBA, CLDN18 and IGF2BP2). These are involved in smooth-muscle function (FGFR1, GATA5 and STIM1), tissue organization (ADAMTS10), alveolar and epithelial function (ABCA3 and CLDN18), and inflammation and immune response to infection (CFH, CYTL1, HMCN1, LRBA and STIM1).

Fig. 2: 135 genes prioritized with ≥3 variant-to-gene criteria.
figure 2

The number of variant-to-gene criteria implicating the gene is in brackets after the gene name. The gray in the first eight columns indicates that at least one variant implicates the gene as causal via the evidence for that column. The last four columns indicate the level of association of the most significant variant implicating the gene as causal with respect to the FEV1/FVC decreasing allele; red indicates that this association is in the same direction of effect as the FEV1/FVC decreasing allele and blue indicates the opposite direction with the shade indicating P < the corresponding value in the legend.

To supplement our understanding of the biological pathways and clinical phenotypes influenced by lung-function-associated variants, we undertook PheWAS of selected individual variants. We selected 27 putative causal genes implicated by ≥4 criteria (20 genes) or by a single putative causal missense variant that was deleterious (five genes: ACAN, ADGRG6, SCARF2, CACNA1S and HIST1H2BE) or rare (two genes: SOS2 and ADRB2; Supplementary Table 14). We interpreted the PheWAS findings (shown in full in Supplementary Fig. 13 and Supplementary Table 15) alongside literature reviews (Supplementary Table 16) and highlight examples below.

The putative causal deleterious missense ABCA3 rs149989682 (A allele; frequency of 0.6%) variant associated with reduced FEV1/FVC was reported to cause pediatric interstitial lung disease21. ABCA3, which is expressed in alveolar type II cells and localized to lamellar bodies, is involved in surfactant-phospholipid metabolism, and ABCA3 mutations cause severe neonatal surfactant deficiency22. The putative causal deleterious missense GATA5 rs200383755 (C allele, frequency of 0.6%) variant associated with lower FEV1 was associated with increased asthma risk, higher blood pressure and reduced risk of benign prostatic hyperplasia (Supplementary Fig. 13i). GATA5 associations have not been previously noted in asthma GWAS, although Gata5-deficient mice show airway hyperresponsiveness23. GATA5 encodes a transcription factor expressed in bronchial smooth muscle, bladder and prostate; a previous benign prostatic hyperplasia GWAS reported a GATA5 signal23,24. CLDN18 was implicated by four criteria, including a mouse knockout with abnormal pulmonary alveolar epithelium morphology25. Through calcium-independent cell adhesion, CLDN18 influences epithelial-barrier function through tight-junction-specific obliteration of the intercellular space26. Its splice variant, CLDN18.1, is predominantly expressed in the lung27. Reduced CLDN18 expression was reported in asthma26. However, our PheWAS showed no association with asthma susceptibility or other traits (CLDN18 rs182770 in Supplementary Table 15). LRBA was also implicated by four criteria. Mutations resulting in LRBA deficiency cause common variable immunodeficiency-8 with autoimmunity, which can include coughing, respiratory infections, bronchiectasis and interstitial lung disease28,29. The putative causal LRBA tolerated missense variant rs2290846 (posterior probability of 56.3%) was associated with 31 traits (false discovery rate (FDR) < 1%; Supplementary Fig. 13n and Supplementary Table 15); the G allele, associated with lower FVC and lower FEV1, was associated with lower neutrophils as well as lower risk of cholelithiasis, cholecystitis30 and diverticular disease.

FGFR1, encoding Fibroblast growth factor receptor 1, has roles in lung development and regeneration31. Loss-of-function FGFR1 mutations cause hypogonadotropic hypogonadism32. The T allele of rs881299, associated with lower FEV1/FVC and higher FVC, was strongly associated with higher testosterone (particularly in males) and higher sex-hormone-binding globulin (SHBG), lower body-mass index (BMI) as well as lower levels of alanine transaminase and urate (Supplementary Fig. 13w–y and Supplementary Table 15). The missense SOS2 variant rs72681869 also showed association with SHBG; in both sexes, the G allele, associated with lower FVC and lower FEV1, was associated with lower SHBG, higher alanine aminotransferase (ALT) and aspartate aminotransferase (AST), higher fat mass, HbA1c and higher systolic and diastolic blood pressure, higher urate and creatinine, and in males lower testosterone and reduced inguinal hernia risk (Supplementary Fig. 13z–bb). Mutations in SOS2 have been reported in individuals with Noonan syndrome. The A allele of rs7514261 implicating CFH, associated with lower FVC, was strongly associated with reduced risk of macular degeneration33 as well as raised albumin (Supplementary Fig. 13g).

CACNA1S is one of several putative causal genes encoding calcium voltage-gated channel subunits in skeletal muscle (CACNA1S, CACNA1D and CACNA2D3 supported by ≥2 criteria; CACNA1C was supported by PoPS). CACNA1S mutations have been reported in hypokalemic periodic paralysis34 and malignant hyperthermia35. CACNA1S is strongly expressed in skeletal muscle but at much lower levels in airway smooth muscle. The common CACNA1S missense variant rs3850625 (A allele, frequency of 11.8% in EUR and 21.4% in SAS) was associated with lower FVC, lower FEV1, lower whole body fat-free mass, reduced hand grip strength as well as lower AST and creatinine levels (Supplementary Fig. 13f). CACNA1S and CACNA1D are targeted by dihydropyridine calcium channel blockers, which previously produced small improvements in lung function in asthma36. For the low-frequency missense ADRB2 variant rs1800888 (T; 1.49% in EUR), associated with lower FEV1 and lower FEV1/FVC, the strongest PheWAS association was with increased eosinophil count (Supplementary Fig. 13d).

Druggable targets

Using the Drug Gene Interaction Database, we surveyed 559 genes supported by ≥2 criteria. CheMBL interactions identified 292 drugs mapping to 55 genes (Supplementary Table 17), including ITGA2, which encodes integrin subunit alpha 2. The reduced expression of ITGA2 in lung tissue associated with the C allele of rs12522114 mimics vatelizumab-induced ITGA2 inhibition; this allele is associated with higher FEV1 and FEV1/FVC, indicating the potential to repurpose vatelizumab, which increases T regulatory cell populations37, for COPD treatment.

Pathway analysis

Using ConsensusPathDB38, we tested biological pathway enrichment for 559 causal genes supported by ≥2 criteria, highlighting pathways relevant for development, tissue integrity and remodeling (Supplementary Table 18). These include pathways not previously implicated in pathway enrichment analyses for lung function—such as PI3K–Akt signaling, integrin pathways, endochondral ossification, calcium signaling, hypertrophic cardiomyopathy and dilated cardiomyopathy—as well as those previously implicated via individual genes5 such as TNF signaling, actin cytoskeleton, AGE–RAGE signaling, Hedgehog signaling and cancers. We found strengthened enrichment through newly identified genes in previously described pathways, such as extracellular matrix organization (34 new genes), elastic fiber formation (eight new genes) and TGF–Core (four new genes). Consistent with our ConsensusPathDB findings, Ingenuity Pathway Analysis (https://digitalinsights.qiagen.com/IPA)39 highlighted enrichment of cardiac hypertrophy signaling and osteoarthritis pathways and also implicated pulmonary and hepatic fibrosis signaling pathways, axonal guidance and PTEN signaling as well as the upstream regulators TGFB1 and IGF-1 (Supplementary Table 19).

Multi-ancestry GRS for FEV1/FVC and COPD

We built multi-ancestry and ancestry-specific GRSs weighted by FEV1/FVC effect sizes and tested association with FEV1/FVC and COPD (GOLD stage 2–4) within groups of individuals of different ancestries in the UK Biobank (Methods). Our new GRS improved lung-function and COPD prediction compared with a previous GRS based only on individuals of EUR ancestry5 (Fig. 3a,b and Supplementary Table 20), and the multi-ancestry GRS outperformed the ancestry-specific GRS in all UK Biobank ancestries. We then tested the multi-ancestry GRS in five independent COPD case–control studies (Supplementary Table 21 and Methods). Stronger COPD susceptibility associations were observed across five EUR-ancestry studies compared with a previous GRS5 (Fig. 3c and Supplementary Table 22). In the meta-analysis of these EUR studies, the odds ratio for COPD per s.d. of GRS increase was 1.63 (95% confidence interval (CI), 1.56–1.71; P = 7.1 × 10−93); members of the highest GRS decile had a 5.16-fold higher COPD risk than the lowest decile (95% CI, 4.14–6.42; P = 1.0 × 10−48; Fig. 3d and Supplementary Table 23). The results for individuals in the SPIROMICS study of AFR ancestry were comparable to individuals from the UK Biobank with AFR ancestry but lower in magnitude compared with the COPDGene AFR population (Fig. 3c).

Fig. 3: GRS performance.
figure 3

a, Prediction performance of three GRSs across ancestry groups for FEV1/FVC shown as the s.d. change in FEV1/FVC per s.d. increase in GRS for individuals in the UK Biobank grouped according to ancestry. Sample sizes: AFR, n = 4,227; AMR, n = 2,798; EAS, n = 1,564; and EUR, n = 320,656. b, Prediction performance of three GRSs for COPD shown as COPD odds ratio per s.d. increase in GRS. Sample sizes: AFR, 250 cases and 3,977 controls; AMR, n = 151 cases and 2,647 controls; EUR, 24,062 cases and 296,594 controls. UKB, UK Biobank. c, Odds ratio for COPD per s.d. change in GRS in COPD case–control studies. P values were calculated from a logistic regression adjusted for age, age squared, sex, height and principal components, followed by fixed-effect meta-analysis. d, Decile analysis meta-analyzed across five EUR studies shown as the COPD odds ratio compared between members of each decile and the reference decile. n = 11,074 (4,328 cases and 6,746 controls). Statistical tests were two-sided, the height of the bars show the point estimate of the effect and whiskers show the 95% CI. OR, odds ratio.

PheWAS of trait-specific GRSs

To study the aggregate effects of lung-function-associated genetic variants on a wide range of diseases and disease-relevant traits, we created GRSs for FEV1, FVC, FEV1/FVC and PEF, each comprising sentinel variants (P < 5 × 10−9) with weights estimated from the multi-ancestry meta-regression (Methods), and tested these in PheWAS. These GRS values showed distinct patterns of associations with respiratory and non-respiratory phenotypes (Supplementary Fig. 14 and Supplementary Table 24). A GRS for lower FEV1 was most strongly associated with increased risk of asthma and COPD, family history of chronic bronchitis/emphysema, lower hand grip strength, increased fat mass, increased HbA1c and type 2 diabetes risk, and elevated C-reactive protein. In addition, associations were observed with increased asthma exacerbations and lower age of onset for COPD (Supplementary Fig. 14a). The GRS for lower FEV1/FVC was associated with key respiratory phenotypes: increased risk of COPD and asthma, family history of chronic bronchitis/emphysema, increased emphysema risk, increased risk of respiratory insufficiency or respiratory failure and younger age of onset for COPD but a slightly lower risk of COPD exacerbations (Supplementary Fig. 14b). In contrast, the GRS for lower FVC was strongly associated with many traits—among the strongest associations were high C-reactive protein, increased fat mass, raised HbA1c and type 2 diabetes, raised systolic blood pressure, lower hand grip strength and raised ALT as well as increased risk of clinical codes for asthma and COPD (Supplementary Fig. 14c). Although the GRS for lower FEV1/FVC was associated with increased standing and sitting height, the GRSs for lower FEV1 and FVC were associated with increased standing height but reduced sitting height. Broadly similar phenome-wide associations were seen for the PEF and the FEV1 GRS (Supplementary Fig. 14d).

PheWAS of GRSs partitioned by pathway

Finally, we hypothesized that partitioning our lung-function GRS into pathway-specific GRSs according to the biological pathways the variants influence could inform understanding of mechanisms underlying impaired lung function, and the probable consequences of perturbing specific pathways. Informed by the above prioritization of putative causal genes and classification of these genes by pathway (‘Pathway analysis’ section), we undertook PheWAS for FEV1/FVC-weighted GRSs partitioned by each of the 29 pathways enriched (FDR < 10−5) for the 559 genes implicated by ≥2 criteria (Methods). Partitioning of GRSs in this way highlighted markedly different patterns of phenome-wide associations (Supplementary Fig. 15 and Supplementary Table 25). Figures 47 highlight four pathway-specific GRS examples; all demonstrated association with COPD clinical codes and family history of chronic bronchitis/emphysema, although the associations with other traits varied. The GRS for lower FEV1/FVC specific to elastic fiber formation was associated with increased risk of inguinal, abdominal, diaphragmatic and femoral hernia; diverticulosis; arthropathies; hallux valgus as well as genital prolapse; reduced carpal tunnel syndrome risk and BMI; and increased asthma risk (Fig. 4). In contrast, the GRS for lower FEV1/FVC specific to PI3K–Akt signaling was associated with increased asthma risk, lower IGF-1, lower liver enzymes (ALT, AST and gamma glutamyltransferase (GGT)), lower lymphocyte counts, raised eosinophils, lower fat mass and BMI, and reduced diabetes risk (Fig. 5). The GRS for lower FEV1/FVC specific to the hypertrophic cardiomyopathy pathway was associated with reduced liver enzymes (ALT and GGT) as well as lower apolipoprotein B, LDL, IGF-1 and mean platelet volume (Fig. 6). The GRS associations for lower FEV1/FVC partitioned to signal transduction were specific to respiratory traits, including asthma and emphysema (Fig. 7). Variable height associations were evident: the GRS for lower FEV1/FVC showed association with increased height when partitioned to elastic fiber formation or hypertrophic cardiomyopathy (Figs. 4 and 6), reduced height when partitioned to ESC pluripotency (Supplementary Fig. 15g) and no height association when partitioned to PI3K–Akt signaling or signal transduction (Figs. 5 and 7).

Fig. 4: PheWAS for FEV1/FVC-weighted GRS partitioned according to elastic fiber formation.
figure 4

Reactome pathway database. CP, composite phenotype and DFP, Data-Field ID phenotype (Methods). The peach-colored line means FDR 1%.

Fig. 5: PheWAS for FEV1/FVC-weighted GRS partitioned according to the PI3K–Akt signaling pathway in Homo sapiens.
figure 5

Kyoto Encyclopedia of Genes and Genomes. CP, composite phenotype; DFP, Data-Field ID phenotype (Methods). The peach-colored line means FDR 1%.

Fig. 6: PheWAS for FEV1/FVC-weighted GRS partitioned according to hypertrophic cardiomyopathy in H. sapiens.
figure 6

Kyoto Encyclopedia of Genes and Genomes. CP, composite phenotype; DFP, Data-Field ID phenotype (Methods). The peach-colored line means FDR 1%.

Fig. 7: PheWAS for FEV1/FVC-weighted GRS partitioned according to signal transduction.
figure 7

Reactome pathway database. CP, composite phenotype (Methods). The peach-colored line means FDR 1%.

We hypothesized that individuals may have high GRS for ≥1 pathways and low GRS for other pathways. Comparisons of the GRSs of individuals across pairs of pathways for each of the 29 pathways (Supplementary Fig. 16a) and in detail for the elastic fiber, PI3K–Akt signaling, hypertrophic cardiomyopathy and signal transduction pathways (Supplementary Fig. 16b) demonstrated how GRS profiles may be concordant or discordant across pathways, which could have implications for the choice of therapy.

Discussion

We present a large ancestrally diverse lung-function GWAS and a comprehensive initiative to relate lung-function- and COPD-associated variants to functional annotations, cell types, genes and pathways. It is the first to investigate possible consequences of intervening in relevant pathways through PheWAS studies, utilizing pathway-partitioned GRS.

The 1,020 signals identified were enriched in functionally active regions in alveolar type 1 cells, fibroblasts, myofibroblasts, bronchial epithelial cells, and adult and fetal lung. We showed effect heterogeneity attributable to ancestry for 109 signals (including LTBP4, THSD4, EFEMP1 and MECOM), between ever-smokers and never-smokers (HTR4), and differences in effects between adults and children (including CCDC91 and NPNT). We mapped lung-function signals to 559 putatively causal genes meeting ≥2 independent criteria. Exemplar genes supported by ≥4 criteria or by deleterious or rare putative causal missense variants implicated surfactant-phospholipid metabolism, smooth-muscle function, epithelial morphology and barrier function, innate immunity, calcium signaling, adrenoceptor signaling, and lung development and regeneration. Among the pathways enriched for putative causal genes were PI3K–Akt signaling, integrin pathways, endochondral ossification, calcium signaling, hypertrophic cardiomyopathy and dilated cardiomyopathy. These pathways have not been previously implicated in lung function using GWAS.

Combined as a GRS weighted by FEV1/FVC effect size, the 1,020 variants strongly predicted COPD in the UK Biobank and in COPD case–control studies, with a more than fivefold change in risk between the highest and lowest GRS deciles. This GRS more strongly predicted FEV1/FVC and COPD across all ancestries than a previous GRS5. Partitioning the FEV1/FVC GRS by the pathways defined by specific variants, informed by detailed, systematic variant-to-gene mapping and pathway analyses, and using our new Deep-PheWAS platform40, illustrated unique patterns of phenotype associations for each pathway GRS. These patterns of PheWAS findings are relevant to the potential efficacy and side effects of intervention in these pathways. As a proof-of-concept, the GRS associated with lower FEV1/FVC specific to PI3K–Akt signaling was associated with increased risk of COPD but a lower risk of diabetes; PI3K inhibition impairs glucose uptake in muscle and increases hepatic gluconeogenesis, contributing to glucose intolerance and diabetes41. The PheWAS and druggability analyses we conducted have the potential to identify drug repurposing opportunities for COPD.

The patterns of pleiotropy we show through PheWAS for individual variants, trait-specific GRS and pathway-partitioned GRS may help explain variants and pathways that increase susceptibility to more than one disease and thereby predispose to particular patterns of multimorbidity. For example, the elastic fiber pathway GRS was associated with increased risk of muscular (for example, hernias) and musculoskeletal conditions related to connective-tissue laxity. Our findings also further inform the complex relationship between height, BMI and obesity, and lung function and their genetic determinants5,42. Lung-function and height associations were uncorrelated, and height relationships differed between GRS for different lung-function traits, and even between sitting and standing height for the same trait. The pathway-partitioned GRS analyses indicate that the relationship between genetic variants, height and lung-function traits depends on the pathways through which the variants act.

The last comprehensive attempt to map lung-function-associated variants to genes identified 107 putative causal genes, mostly through eQTLs only, and only eight genes were then implicated by ≥2 criteria5. In contrast, we implicated 559 causal genes meeting ≥2 criteria by drawing on new data and methodologies, such as single-cell epigenome data, rare variant associations identified in sequencing data in the UK Biobank and similarity-based approach PoPS9. Nevertheless, our study has limitations. We focused on multi-ancestry rather than ancestry-specific signals, as the sample sizes for lung-function genomics studies in all non-EUR ancestry groups were limited, particularly for the AFR ancestries4. Non-EUR ancestries are under-represented in genomic studies3, constraining GWAS and PheWAS studies in these populations. Correcting this will require substantial global investment in suitably phenotyped and genotyped studies, with appropriate community participation and workforce development. Improved sample sizes across all ancestries would improve power in ancestry-specific studies42 and fine mapping of multi-ancestry meta-analysis signals.

Strategies for in silico mapping of association signals to causal genes are evolving and difficult to evaluate without a reference set of fully functionally characterized lung-function-associated variants and causal genes. Our variant-to-gene mapping framework parallels one that was recently adopted10 and could help prioritization of genes for functional experiments such as gene editing in relevant organoids with appropriate readouts to confirm mechanism. An additional limitation is that classifications of pathways may be imperfect; we used multiple pathway classifications as it is unclear which is superior across all component pathways and we present the pathway-partitioned PheWAS results as a resource to others.

In summary, our multi-ancestry study highlights new putative causal variants, genes and pathways, some of which are targeted by existing drug compounds. These findings bring us closer to understanding mechanisms underlying lung function and COPD and will inform functional genomics experiments to confirm mechanisms and consequently guide the development of therapies for impaired lung function and COPD.

Methods

GWAS in each cohort

Following cohort-level quality control of the lung-function phenotypes (Supplementary Note), all phenotypes were rank inverse-normal transformed after adjustment for age, sex, height, smoking, ancestry principal components and relatedness (mixed models in BOLT-LMM or SAIGE). Quality control of the imputation and association summary statistics in each cohort was performed by the central analysis team (Supplementary Note). We assigned each cohort to one of the five 1000 Genomes super-populations—EUR, AFR, AMR, EAS or SAS—based on self-reported ancestry, apart from the UK Biobank (57.4% of the total sample size), where we used ADMIXTURE v1.3.0 (ref. 43) to determine ancestry (Supplementary Note and Supplementary Table 4). We also acquired lung-function-association results from each cohort using untransformed phenotypes for analysis using MR-MEGA.

Meta-analysis

Before meta-analysis, association statistics in each cohort were adjusted by the LD-score regression intercept calculated in each cohort to adjust for any residual confounding (Supplementary Table 5); the appropriate ancestry-specific LD reference was used for each cohort (10,000 UK Biobank samples for EUR and 1000 Genomes Project samples for AFR, AMR, SAS and EAS). Before meta-analysis, variants with imputation INFO < 0.5 or minor-allele counts (MAC) < 3 were excluded. As transformed effects were not on comparable scales, we meta-analyzed across cohorts using sample-size weighted Z-score meta-analysis with METAL (released version 28 August 2018)44. No genomic control was applied post meta-analysis. Following meta-analysis, variants with MAC < 20 were excluded.

Signal selection and conditional analysis

We chose a genome-wide significance threshold of P < 5 × 10−9, as recommended from sequencing studies13. We selected 2-Mb regions centered on the most significant variant for all regions containing a variant with P < 5 × 10−9. Regions within 500 kb of each other were merged for conditional analysis. Stepwise conditional analysis was run in each region in each cohort using GCTA v1.93.2beta45 with an ancestry-specific LD reference for each cohort (Supplementary Note), and then the conditional results were meta-analyzed across cohorts and any new conditionally independent signals with P < 5 × 10−9 were added to our list of signals. We used moloc v0.1.0 (ref. 46) to co-localize signals across the four lung-function traits to obtain a set of distinct signals, which were then co-localized with previously reported signals to obtain a set of novel lung-function signals (Supplementary Note).

Exclusion of smoking signals from follow-up

We checked our sentinels for association with the smoking quantitative traits ‘age of initiation’ (n = 262,990) and ‘cigarettes per day’ (n = 263,954), and the binary traits ‘smoking cessation’ (n = 139,453 cases and n = 407,766 controls) and ‘smoking initiation’ (n = 557,337 cases and n = 674,754 controls) in the GWAS and Sequencing Consortium of Alcohol and Nicotine use (GSCAN) consortium47 (proxies with a squared correlation coefficient (r2) > 0.8 were checked for sentinels not present in GSCAN). We excluded eight lung-function signals from further analysis, which we determined to be primarily driven by smoking behavior (Supplementary Table 26), according to the following criteria: (1) P < 4.86 × 10−5 (Bonferroni-corrected 5% threshold for 1,028 signals) for association with any smoking trait and (2) the same ‘risk’ allele that increases smoking exposure behavior and decreases lung function.

Heritability estimate

We calculated the proportion of variance explained by the sentinels reported for each trait using the formula

$$\frac{{\mathop {\sum }\nolimits_{i = 1}^n 2f_{\mathrm{i}}(1 - f_{\mathrm{i}})\beta _{\mathrm{i}}^2}}{V}$$

where n is the number of variants, fi and βi are the frequency and effect estimates of the ith variant from the UK Biobank European ancestry untransformed results, respectively, and V is the phenotypic variance (always one as our phenotypes were inverse-normal transformed). We assumed a heritability of 40% (refs. 48,49) to estimate the proportion of additive polygenic variance.

Ancestry-adjusted trans-ethnic meta-analysis using MR-MEGA

To improve the fine-mapping resolution using LD differences between ancestries and to estimate the heterogeneity of variant associations attributable to ancestry, we undertook multi-ancestry meta-regression using MR-MEGA v0.2 (ref. 7), which incorporates axes of genetic ancestry as covariates. MR-MEGA uses multidimensional scaling of allele frequencies across cohorts to derive principal axes of genetic variation to use for ancestry adjustment (Supplementary Note). The location of the cohorts on the first two multidimensional scaling-derived principal components, plotted in Supplementary Fig. 17, shows clustering in accordance with the assigned ancestry groups. We used four principal components for ancestry adjustment, as this captured most of the variance. MR-MEGA implements genomic control at study level; therefore, no further genomic control was applied. We ran MR-MEGA at each locus containing ≥1 signals; in the loci with multiple signals, we ran MR-MEGA multiple times, each time conditioning on all except one signal at the locus. For each sentinel, we obtained an estimated ancestry-associated (P-value_ancestry_het) and residual (P-value_residual_het) heterogeneity. In addition, MR-MEGA reports the log-transformed Bayes factor, which can be used for the construction of credible sets.

Effects in children

To obtain unbiased effect estimates for comparison between adults and children, we first redefined 1,077 lead SNPs for lung function in the UK Biobank EUR population (n = 320,656) by selecting 1-Mb regions centered on the most significant variant for regions containing a variant with P < 5 × 10−8. For these SNPs, we then took the untransformed effect estimates from the meta-analysis of the non-UK Biobank EUR cohorts (34 cohorts for FEV1 and FVC, n = 128,071; 33 cohorts for FEV1/FVC, n = 123,429; 15 cohorts for PEF, n = 60,122). Next, we meta-analyzed two EUR-ancestry children’s cohorts—ALSPAC and Raine Study (age, 13–15 yr, n = 6,070)—to obtain effect estimates in children at the new lead SNPs. To investigate the age-dependent effects of genetic variants on lung function, we compared the effect sizes estimated in adults and children using a Welch’s t-test; a Bonferroni significance threshold for 1,077 tests was applied (P < 4.64 × 10−5).

Cell-type and functional specificity

Stratified LD-score regression

We tested for enrichment of regulatory features at variants overlapping four histone marks (H3K27ac, H3K9ac, H3K4me3 and H3K4me1) that are specific to adult lung, fetal lung, and peripheral blood mononuclear primary and smooth-muscle-containing cell lines (colon and stomach) using stratified LD-score regression12. We only considered EUR-specific meta-analysis with 39 cohorts for FVC, FEV1 and FEV1/FVC (17 cohorts for PEF). For the analysis of cell-type-specific annotations, we assessed statistical significance at the 0.05 level after Bonferroni correction for 60 hypotheses tested. Given that these annotations are not independent, a Bonferroni correction is conservative. We also report results with FDR < 0.05 using the Benjamini–Hochberg method.

Regulatory and functional enrichment using GARFIELD

We tested enrichment of SNPs at functionally annotated regions (DNase I hypersensitivity hotspots, open chromatin peaks, transcription-factor footprints and formaldehyde-assisted isolation of regulatory elements, histone modifications, chromatin segmentation states, genic annotations and transcription-factor binding sites) using GARFIELD17. We used the EUR meta-analysis with 17 cohorts for PEF and 39 cohorts for FVC, FEV1 and FEV1/FVC. We applied GARFIELD to DNase I hypersensitivity hotspot annotation in 424 cell lines and primary cell types from ENCODE and Roadmap Epigenomics and derived enrichment estimates at trait-genotype association P-value thresholds of P < 5 × 10−5 and P < 5 × 10−9.

Enrichment of annotations in respiratory-relevant cell types and tissues

We curated annotations from assays of respiratory-relevant cells and tissues—that is, (1) single-cell genome ATAC–seq data50 from 19 cell types (myofibroblast, pericyte, ciliated, T cell, club, capillary endothelial 1 and 2, basal, matrix fibroblast 1 and 2, arterial endothelial, pulmonary neuroendocrine, natural killer cell, macrophage, B cell, erythrocyte, lymphatic endothelial, alveolar type 1 and 2 (downloaded from https://www.lungepigenome.org/)), (2) ATAC–seq data for five human primary lung-cell types implicated in COPD pathobiology51 (large and small airway epithelial cells, alveolar type 2, pneumocytes and lung fibroblasts (downloaded from http://www.copdconsortium.org/)) and (3) tissue-specific transcription-factor binding sites from DNase-seq footprinting of 589 human transcription factors in lung and bronchus52. We tested for cell- and tissue-specific enrichment of these annotations at our lung-function signals using functional GWAS (fGWAS)14 (Supplementary Note).

Identification of putative causal genes and variants

eQTL and pQTL co-localization

Three eQTL resources were used for co-localization of lung-function signals with gene expression signals: (1) GTEx V8 (downloaded from https://www.gtexportal.org/, July 2020; tissues: stomach, small-intestine terminal ileum, lung, esophagus muscularis, esophagus gastroesophageal junction, colon transverse, colon sigmoid, artery tibial, artery coronary and artery aorta), (2) eQTLgen53 blood eQTLs and (3) UBC lung eQTL54. Two blood pQTL resources were used to co-localize with associations with protein levels, that is, INTERVAL pQTL55 and SCALLOP pQTL. The coloc_susie method56 was used to test eQTL and pQTL co-localization (Supplementary Note).

Rare variants from exome sequencing

We checked for rare (MAF < 1%) exonic associations near (±500 kb) our lung-function sentinels using both single-variant and gene-based collapsing tests from (1) 281,104 UK Biobank exomes from the AstraZeneca PheWAS Portal57 (https://azphewas.com/), (2) loss-of-function and missense variants in 454,787 UK Biobank participants58 and (3) gene-based tests on whole-exome imputation in 500,000 UK Biobank participants59. We used a threshold of P < 5 × 10−6 for both single-variant and gene-based tests (Supplementary Note).

Nearby Mendelian respiratory-disease genes

We selected rare Mendelian-disease genes from ORPHANET (https://www.orpha.net/) within ±500 kb of a lung-function sentinel that were associated with respiratory terms matching regular expression—that is, respir, lung, pulm, asthma, COPD, pneum, eosin, immunodef, cili, autoimm, leukopenia, neutropenia and Alagille syndrome. We implicated the gene if it had a corresponding respiratory term match in the disease name or if it occurs frequently in human phenotype ontology terms for that disease (Supplementary Note).

Nearby mouse-knockout orthologs with a respiratory phenotype

We selected human orthologs of mouse-knockout genes with phenotypes in the ‘respiratory’ category, as listed in the International Mouse Phenotyping consortium (https://www.mousephenotype.org/), within ±500 kb of a lung-function sentinel (Supplementary Note).

PoPS

We calculated a gene-level PoPS9 based on the assumption that if the associations enriched in genes share functional characteristics with a gene near to a lung-function signal, then that gene is more likely to be causal. The full set of gene features used in the analysis included 57,543 total features—40,546 derived from gene expression data, 8,718 extracted from a protein–protein interaction network and 8,479 based on pathway membership. In this study we prioritized genes for all autosomal lung-function signals within a 500-kb (±250 kb) window of the sentinel and reported the top prioritized genes in the region. For the signals that did not have prioritized genes within the 500-kb window, we looked for prioritized genes using a 1-Mb (±500 kb) window (Supplementary Note).

Annotation-informed credible sets

We used the enriched annotations in respiratory-relevant cell types and tissues and enriched genic annotations (Supplementary Table 12) to create annotation-informed 95% credible sets using fGWAS based on the MR-MEGA ancestry-adjusted meta-regression results (Supplementary Note). We implicated a putative causal missense variant if it accounted for >50% of the posterior probability in the credible set and annotated these using Ensembl Variant Effect Predictor60 to check for a deleterious effect by the SIFT, PolyPhen or CADD metrics.

Allocation of genes prioritized with ≥3 variant-to-gene to lung-function biology categories

We allocated prioritized genes with ≥3 criteria to different lung-function roles (epithelial, inflammatory, peripheral lung (including alveolus and endothelial), lung remodeling (including connective tissue), chest-wall movement and lung development) based on literature reviews, including GeneCards (https://www.genecards.org) and PubMed (https://pubmed.ncbi.nlm.nih.gov). Eighteen of the genes were difficult to assign to a specific category on this basis, mainly because they were involved in generic processes such as transcriptional control in a wide variety of cell types; these are not shown in Supplementary Fig. 12 but are included in Supplementary Table 13.

Interaction with smoking

Association testing for lung-function traits (FEV1, FVC, FEV1/FVC and PEF) was calculated separately in ever- and never-smoker subgroups and meta-analyzed across EUR-ancestry cohorts. We included untransformed phenotypes with ever- and never-smoking summary statistics (n = 28 cohorts) comprising 206,162 ever-smokers and 229,046 never-smokers. A z-test was used to compare genetic effect between the untransformed association results for the ever- and never-smokers:

$$z = \frac{{\beta _1 - \beta _2}}{{\sqrt {\mathrm{se}_1^2 + \mathrm{se}_1^2} }}$$

where se is the standard error of the effect β. We considered a significant interaction any signal with a P < 4.9 × 10−5 (5% Bonferroni-corrected for 1,020 signals tested).

GRS

We selected four ancestry groups in the UK Biobank (UKB) as test datasets (SAS was excluded from GRS analyses because UKB SAS was the only cohort in the multi-ancestry analysis for SAS): UKB EUR, UKB AMR, UKB EAS and UKB AFR. All of the other cohorts except UKB SAS and Qatar Biobank were used as discovery datasets.

We repeated the multi-ancestry meta-regression (MR-MEGA), after excluding the four test GWAS, incorporating the same four axes of genetic variation as covariates to account for ancestry. Autosomal signals for each lung-function trait that were reported in the target ancestry population were included in downstream analysis for each ancestry. For ancestry j (j = EUR, AMR, EAS or AFR), we estimated ancestry-specific predicted allelic effects for the ith SNP to be used as weights in the multi-ancestry GRS by

$$\hat b_{\mathrm{ij}} = \alpha _{0{\mathrm{i}}} + \mathop {\sum }\limits_{k = 1}^4 \alpha _{\mathrm{ki}}\bar x_{\mathrm{kj}}$$

where \(\bar x_{\mathrm{kj}}\) is the averaged position of discovery studies with ancestry j on the kth axis of genetic variation from multi-ancestry meta-regression, and \(\alpha _{0{\mathrm{i}}}\) and \(\alpha _{\mathrm{ki}}\) denote the intercept and effect of the kth axis of genetic variation for the ith SNP from the multi-ancestry meta-regression.

We ran each of the ancestry-specific fixed-effect meta-analyses after excluding the test GWAS from the ancestry group using METAL using the inverse-variance weighting method. For comparison, SNPs used as weights in multi-ancestry GRS were selected to build ancestry-specific GRS for each ancestry.

Testing GRS in independent COPD case–control cohorts

We tested the association of multi-ancestry GRS with COPD susceptibility in five EUR-ancestry COPD case–control studies: COPDGene (non-Hispanic white), ECLIPSE, GenKOLS, NETT/NAS and SPIROMICS (non-Hispanic EUR) (Supplementary Table 21). We also tested the association in two AFR ancestry COPD case–control studies: COPDGene (African American) and SPIROMICS (African American) (Supplementary Table 21). Associations were tested using logistic regression models, adjusted for age, age squared, sex, height and principal components. In each COPD case–control study, we divided individuals into deciles according to their weighted GRS. For each decile, logistic models were fitted to compare the risk of COPD for members of the test decile with those with the lowest decile (that is, those with the lowest genetic risk). The results were meta-analyzed by ancestry-specific study groups using the fixed-effect model.

PheWAS

We used Deep-PheWAS40, which addresses both phenotype matrix generation and efficient association testing while incorporating the following developments that are not yet available in current platforms and online resources: (1) clinically curated composite phenotypes for selected health conditions that integrate different data types (including primary and secondary care data) to study phenotypes that are not well captured by current classification trees; (2) integration of quantitative phenotypes from primary care data, such as pathology records and clinical measures; (3) clinically curated phenotype selection for traits that are extremely highly correlated and (4) GRSs. The platform includes 2,421 phenotypes in the UK Biobank, with a subset of 2,243 recommended for association testing—some phenotypes that are generated are used solely in the definition of other phenotypes. We removed the four measures of lung function and added seven phenotypes defined in-house (P4002-6) to give 2,246 as our final maximum number of phenotypes for association. Deep-PheWAS then filters these, requiring a minimum case number; we chose to keep the default settings of a 50-case minimum for binary phenotypes and a 100-case minimum for quantitative phenotypes. After limiting to EUR ancestry and filtering for case numbers, 1,909 phenotypes were left for association analysis (Supplementary Table 27). No additional phenotypes were removed when removing pairs related up to second degree (KING kinship coefficient ≥ 0.0884).

There are five types of phenotypes within Deep-PheWAS categorized according to the data and methods used to create them. Composite phenotypes are made using linked hospital and primary care data, including in some cases primary care prescription data, alongside any of the UK Biobank field-IDs (DFP), including self-reported non-cancer diagnosis and self-reported operations. Phecodes are defined using only linked hospital data (https://phewascatalog.org/phecodes_icd10). Formula phenotypes combine available data using bespoke R code per phenotype rather than the in-built functions of phenotype development available in Deep-PheWAS. Added phenotypes are lists of cases and controls that have been added to the PheWAS and not developed by the Deep-PheWAS phenotype matrix generation pipeline. More complete definitions for all none-added phenotypes can be found in the Deep-PheWAS description40. All phenotypes were adjusted for age, sex and the first ten principal components.

Single-variant PheWAS

We ran 28 single-variant PheWAS across 1,909 traits (Supplementary Table 27) in up to 430,402 unrelated EUR individuals in the UK Biobank. We selected the variant with the most significant P value for each of the 20 genes with ≥4 lines of evidence for being causal (Supplementary Table 13). A further seven variants were included in single-variant PheWAS that were putatively causal (accounted for >50% posterior probability in the credible set and had a deleterious annotation; Supplementary Table 14) but in a gene that was implicated by fewer than four lines of evidence. The single-variant PheWAS was aligned to the lung-function-trait decreasing allele. Where we noted associations with testosterone and SHBG, we also undertook sex-stratified PheWAS.

Association with trait-specific GRS

We created four GRSs for the UK Biobank EUR samples, one for each trait FEV1, FVC, FEV1/FVC and PEF, including all conditionally independent sentinel variants for the trait that were associated with P < 5 × 10−9, yielding 425, 372, 442 and 194 variants in each trait-specific GRS, respectively. Each of the four GRS were weighted by the effect sizes from the multi-ancestry meta-regression for the relevant trait and then checked for association with 1,909 traits in the PheWAS.

Association with pathway-specific GRS

We selected 29 pathways that were enriched at FDR < 10−5 for our 559 genes implicated by ≥2 lines of evidence (Supplementary Table 18). We created a weighted GRS (weights estimated from multi-ancestry meta-regression for FEV1/FVC) for each of the 29 pathways by including for each gene in the pathway (as for ‘Single-variant PheWAS’) the variant with the most significant P value for the trait that implicates the gene in our variant-to-gene mapping (Supplementary Table 13). Each of the 29 GRSs were then checked for association with 1,909 traits in the PheWAS.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.