Introduction

Vascular contributions to cognitive impairment and dementia (VCID) include blood vessel injuries that can cause significant changes to memory, thinking, and behavior1. Vascular diseases are also associated with increased risk for Alzheimer’s disease (AD)2, which is the 6th leading cause of death in the US with increasing numbers and financial tolls3,4. Vascular-related dementia is the second most common cause of cognitive decline, with only AD-related dementia being more prevalent5. Several extensive cohort-based studies have found that cardiovascular disease risk factors such as tobacco smoking are associated with cognitive decline and increased risk of dementia in older age6.

Despite early reports suggesting that smoking tobacco might have beneficial effects on cognition7,8 and a reduced risk of dementia9, more recent evidence clearly suggests that active smoking has neurotoxic effects on the brain10,11,12 and is associated with a doubling of dementia risk for older adults13. Smoking tobacco might indirectly increase risk of cognitive impairment by exacerbating the subclinical risks associated with underlying vascular disease. Smoking causes vascular damage, including carotid artery disease, atherosclerotic plaque formation, increased platelet aggregation, compromised endothelial cell function, arterial stiffness and increased systolic blood pressure, which all contribute to stroke risk14,15,16. The presence of smoking with hypertension is one of the greatest risk factors for acute myocardial infarction and stroke, according to a meta-analysis of 10 cohort studies which included more than 60,000 people17. Tobacco smoking is also a major risk factor for cardiovascular disease and chronic obstructive pulmonary disease (COPD), both of which can decrease cognitive function18. Importantly, prevalence rates of any dementia and AD may be greater in women than men19,20,21. A recent study found that women had a stronger association between vascular risk factors and worse cognition than men in a middle-aged Hispanic/Latino population22. However, it is currently unknown if the effects of smoking on verbal learning and memory function are different between men and women.

Several studies have found an association between smoking, cognitive decline, and dementia12,23,24. Nevertheless, the majority of these studies do not assess biological sex as a main variable of interest but instead use sex as a covariate. As a covariate, the variance due to sex is only being controlled for and not directly explored. Thus, the question whether smoking affects cognition in men and women differently remains unresolved. One methodological issue that might explain why so few studies have directly examined this important question pertains to the need for relatively large sample sizes to be adequately powered for interaction, or moderation, analyses. It is estimated that detecting an interaction requires at least four times the sample size than needed to detect a main effect25,26.

The few studies that assess a smoking by sex interaction or test women and men separately, have typically been underpowered and have found contradicting results. Some studies suggest no sex effect or that smoking may affect men’s cognition more strongly than women’s12,13,27,28,29; however, other studies have found a larger effect of smoking on cognition in women30,31. It does appear, however, that smoking often impacts women’s risk to a greater extent for diseases that occur outside of the central nervous system. A meta-analysis of more than 2.4 million people suggests that, compared with nonsmokers, women who smoke have a 25% greater relative risk of coronary heart disease than men who smoke, independent of other cardiovascular risk factors32. Women smokers may also have a greater relative risk of lung cancer than men who smoke33. Others have found the effects of smoking in various disease models such as bone fragility and Crohn’s disease differ between men and women34,35. Furthermore, there appears to be a sex difference in the behavioral response to the acute effect of smoking in prepulse inhibition, a measure of startle reflex36. Importantly, human and animal studies point to many structural and functional sex differences in nicotinic acetylcholinergic brain physiology (for review see37). Collectively, these data suggest sex-specific effects should be considered a high priority of inquiry when assessing the relationship between smoking status and cognition.

The primary objective of this study was to determine whether biological sex moderates the relationship between smoking and memory performance in healthy adults aged 18–85 years old. We performed down sampling analyses to estimate the minimum sample size required to detect this interaction. In addition, given the known deleterious effects of smoking on the cardiovascular system and the established relationship between cardiovascular disease and cognitive impairment and dementia38,39,40,41, we further tested if sex moderates the combined influence of diabetes, heart disease, hypertension, and stroke history on memory performance. We hypothesize that both smoking and an integrated cardiovascular disease index would have a greater detrimental relationship with paired-associate learning in women compared to men. For this study, we leveraged MindCrowd (www.mindcrowd.org), a web-based cohort of more than 70,000 persons aged 18 to 85 years from which memory and demographic data were collected.

Results

The entire demographic, health, lifestyle, and medical variable composition of our study cohort can be found in Supplemental Table S1. Additionally, chi square test of differences between the smoker and non-smoker groups for all covariates can be found in Supplemental Table S2. Briefly, the groups significantly differed in all covariates, which is expected based on our large sample sizes.

Multivariate linear regression

Smoking

In women and men 18–85 years old our linear model (LM) revealed a significant omnibus effect of smoking (F(30, 81,670) = 560. 58, p = 0e + 00; Fig. 1). The model also revealed a significant sex × smoking interaction (β = 0.99, SE = 0.22, p = 4e−6). Simple effects analyses revealed that women’s memory is negatively related to smoking (β =  − 1.01 word pairs, SE = 0.14, p = 7.83e−13; Fig. 2) whereas men’s memory was not significantly related (β =  − 0.27 word pairs, SE = 0.18, p = 0.135: Fig. 2).

Figure 1
figure 1

Participants who self-report current smoker status perform worse on paired-associates learning (PAL) compared to non-smokers across 18–85 year olds. Linear regression fit (line fill ± 95% confidence interval (CI)) of the PAL total number of correct from 18 to 85 years old. Lines were split by Smoker versus Non-smoker. (F(30, 81,670) = 560. 58, p = 0e+00, N = 81, 700).

Figure 2
figure 2

The main effect of self-reported smoking status on PAL performance across 18–85 year olds separately between men and women. Simple effects analyses revealed that women’s memory is negatively impacted by smoking (β =  − 1.01 word pairs, std error = 0.14, p = 7.83e−13; (B)) whereas men’s is not significantly impacted (β =  − 0.27 word pairs, std error = 0.18, p = 0.135: (A)). Shown are the linear regression fit lines (± 95% confidence interval (CI)) of the PAL total number of correct from 18 to 85 years old.

Cardiovascular disease

In the whole cohort, 77.4% of women and 78.1% of men had a CVD sum of 0, 17.9% of women and 16.6% of men had a CVD sum of 1, 4.1% of women and 4.3% of men had a CVD sum of 2, and 0.6% of women and 1% of men had a CVD sum score of 3+ . In women and men 18–85 years old our linear model revealed a significant omnibus effect of cardiovascular disease (F(31, 81,669) = 542.88, p = 0e+00: Fig. 3). The model also revealed a significant sex × cardiovascular disease interaction in the 1 group (β =  − 0.84 word pairs, SE = 0.14, p = 6.41e−9), 2 group (β =  − 1.38 word pairs, SE = 0.27, p = 2.08e−7), and 3+ group (β =  − 1.64 word pairs, SE = 0.54, p = 0.002). Simple effects analyses revealed the negative relationship between cardiovascular disease and memory had slightly larger effect sizes in men compared to women in the 1 group (men: β =  − 0.64 word pairs, SE = 0.12, p = 1.22e−7; women: β =  − 0.41 word pairs, SE = 0.09, p = 3.74e−6), the 2 group (men: β =  − 0.99 word pairs, SE = 0.23, p = 1.2e−5; women: β =  − 0.67 word pairs, SE = 0.16, p = 3.18e−5), and the 3 group (men: β = − 1.80 word pairs, SE = 0.41, p = 1.3e−5; women: β =  − 1.27 word pairs, SE = 0.35, p = 0.0003; Fig. 4).

Figure 3
figure 3

The main effect of a self-reported cardiovascular disease (CVD) composite score on PAL performance across 18–85 year olds. CVD composite was a sum score of self-reported heart disease, hypertension, diabetes, and stroke. Individuals were treated as groups based on their composite score (0 as the control group compared to 1, 2, and 3+). Linear regression fit (line fill ± 95% confidence interval (CI)) of the PAL total number of correct from 18 to 85 years old (F(31, 81,669) = 542.88, p = 0e+00).

Figure 4
figure 4

The main effect of a self-reported CVD composite score on PAL performance across 18–85 year olds separately between men and women. Simple effects analyses revealed the negative impact of cardiovascular disease on memory had slightly larger effect sizes in men compared to women in the 3 group (men: β =  − 1.80 word pairs, std error = 0.41, p = 1.3e−5; women: β =  − 1.27 word pairs, std error = 0.35, p = 0.0003).

Propensity score matching

When collapsed across age, PSM suggested there is no relationship between smoking and memory in men (β =  − 0.21 (− 0.62 to 0.19, 95% credible interval)) and a negative relationship in women (β =  − 0.54 (− 0.14 to − 0.93, 95% credible interval)) (Fig. 5).

Figure 5
figure 5

The propensity score matching main effect of self-reported smoking on PAL performance across all ages separately between men and women. We conducted propensity score matching (PSM) analysis matching smokers and non-smokers for various health and lifestyle factors from self-report including: age, race, ethnicity, marital status, handedness, education level, number of daily medications, history of diabetes, seizures, cancer, stroke, hypertension, heart disease, family history of Alzheimer disease, drug abuse, loss of consciousness, and dizziness. The dependent variable was the total number of correct word pairs entered across the three trials of PAL tests (range of 0–36). When collapsed across age, PSM suggested there is no effect of smoking on memory in males (β =  − 0.21 (− 0.62 to 0.19, 95% credible interval)) and a negative effect in females (β =  − 0.54 ( − 0.93 to −0.14, 95% credible interval)).

Down sampling

The horizontal red line in Fig. 6A indicates the effect size estimated by the largest-sized sample. Green filled circles indicate an individual down-sampled comparison that resulted in a statistically significant association (p < 0.05). We performed 1000 of these simulations at each down-sampled size. At a cohort size of approximately 10,000 samples, 50% of down-sampled comparisons resulted in the observation of a significant smoking × sex interaction (Fig. 6B). Even at the largest sample size shown, there are still some observed down-sampled comparisons that result in a non-significant association (black circles). These data illustrate why there is a concern about small sample sizes and their ability to result in misestimated β values and the probability of type 2 errors. Additionally, note that at sample sizes below 6000 it is possible to observe significant associations in the opposite direction of the actual effect (e.g. smoking enhances performance).

Figure 6
figure 6

(A) We conducted 1000 down-sample linear regression models of the MindCrowd cohort between the ages of 18 and 85 years for the interaction effect (β, y axis) of sex × smoking on paired associate learning (PAL) for each indicated total sample size (x axis). For each analysis, we had an equal amount of smokers and non-smokers and women and men. The horizontal red line indicates the effect size estimated by the total study sample. Green filled circles indicate an individual down-sampled comparison that resulted in a statistically significant association (p < 0.05), black dots are non-significant comparisons. Red arrow highlights that at samples sizes approximating 5000 one could potentially produce a significant beta value with the opposite sign from the largest sampled model. (B) is the same data displayed to easily see the positive relationship between significant betas and sample size.

Artificial error introduction

A Monte Carlo simulation (Fig. 7) was used to determine the effect of introducing artificial error (in addition to any real self-report error already in the data). The error was introduced in 1% increments (x-axis) by randomizing smoker status. Each box represents 10,000 model simulations, and plotted are the p-values for each simulation. The red line represents a significance level of α = 0.05. At 0% simulated error, we report our measured p-value. In the whole cohort (Fig. 7A) and women specifically (Fig. 7B), we measured a significant effect in > 90% of simulations after 10% additional self-report error was introduced. For the sex × smoking interaction term, 75% of simulations were statistically significant after 10% additional self-report error was introduced (Fig. 7C).

Figure 7
figure 7

Artificial error introduction suggests the present smoking results are likely not due to self-report error. A Monte Carlo simulation was used to determine the effect of introducing artificial error (in addition to any real self-report error already in the data). Error was introduced in 1% increments (x-axis) by randomizing smoker status. Each box represents 10,000 model simulations, and plotted are the p-values for each simulation. The red dashed line represents a significance level of α = 0.05. At 0% simulated error, we report our measured p-value. In the whole cohort and women specifically (A,B), we measure a significant effect in > 90% of simulations after 10% additional self-report error is introduced. For the sex × smoking interaction term, 75% of simulations are statistically significant after 10% additional self-report error is introduced.

Permutations

From one million permutations performed for the main effect of smoking in the whole cohort (Fig. 8A) and the main effect of smoking in women only (Fig. 8B), not a single t-statistic was observed to be more extreme than our reported t-statistic. This suggests that the odds of our findings being observed due to chance alone is less than one in a million. However, our reported t-value for the men only analysis did overlap with values calculated in the permutation tests demonstrating confidence in our non-significant finding (Fig. 8C). Further, we ran the same permutation tests on the sex × cardiovascular interaction term and showed that our full model t-value did not overlap with any permutation test (Fig. 8D).

Figure 8
figure 8

We examined the smoking effect on PAL through the use of permutation testing. This was performed one million times per model. The smoking data label for every participant was randomly assigned and the t-statistic for the main effect of smoking in the whole cohort (A), women only (B), and men only was re-calculated (C). We also conducted permutation tests on the interaction between sex × smoking on PAL (D). Black dashed line indicates the full model results statistic. Results from permutation testing indicate the present results are likely not due to chance.

Discussion

In this large web-based study, we found current cigarette smoking and cardiovascular disease are associated with worse memory performance in adults as young as 18 years. Furthermore, we found significant sex-modification of these associations showing that the impact of smoking on verbal recall was worse in women whereas the impact of cardiovascular disease on memory performance was worse in men. These findings are important because according to the U.S. Department of Health and Human Services, cigarette smoking is the leading cause of preventable disease and death in the United States and accounts for about 1 in every 5 deaths42. In 2018, nearly 14 of every 100 U.S. adults aged 18 years or older (13.7%) smoked cigarettes, which translates to about 34.2 million adult smokers43. In addition, cardiovascular disease is the leading cause of morbidity and mortality worldwide, and an important predictor of cognitive decline and VCID.

We found that sex modifies the relationship between smoking and verbal recall in that women are negatively affected to a larger degree than men. This finding is in agreement with studies that reported a larger effect of smoking on cognition in women30,31, but stands in contrast with other smaller studies that found no sex effect or that smoking might affect cognition more strongly in men12,13,27,28,29. These results also align well with studies that suggest smoking impacts coronary heart disease and lung cancer more in women than in men32,33. We used down sampling analyses to demonstrate the importance of large sample sizes to ensure reproducible interaction effects of our model. These analyses suggest that a study sample of at least 10,000 is needed to observe a significant sex by smoking interaction at least 50% of the time. This finding highlights the possibility that many previous studies may have been underpowered to find the interaction.

In the United States, men (15.6%) were more likely to be current cigarette smokers than women (12.0%) according to a 2018 report43 and we replicated this finding within our MindCrowd cohort (Fig. 9). In addition, women on average smoke fewer cigarettes per day and have lower salivary cotinine levels compared with men. However, smoking rates for women have increased relative to smoking rates for men in the US and the popularity of smoking in women from low to middle-income countries may be increasing44,45. This is of particular interest since there are higher numbers of Alzheimer’s disease cases in women than men19,20,21 and these data suggest that smoking could potentially accelerate these trends. While men may be at a slightly higher risk for VCID throughout most of the lifespan, some risk factors for VCID more adversely affect women such as preeclampsia, menopause, and hormone replacement46. Not only does the risk of dementia increase with age, but normative decreases in many cognitive abilities occur across the lifespan47, therefore a better understanding of modifiable contributors, such as smoking, to cognitive function is essential.

Figure 9
figure 9

A visual representation of US smoking rates by age derived from data from the National Health Interview Study (NHIS) for years 2015–2018 and MindCrowd smoking rates per age for men and women. The NHIS is is one of the major household survey-based data collection programs of the National Center for Health Statistics (NCHS) which is part of the Centers for Disease Control and Prevention (CDC). These data demonstrate that men typically have higher smoking rates compared to women of the same age and that MindCrowd, while demonstrating lower rates of smoking overall, follows the same trend as observed by the NHIS.

In addition to sex differences in smoking behaviors, sex differences in the cholinergic system are possible biological mechanistic explanations for why smoking may have a more substantial impact on cognition in women compared to men. Animal studies have shown both nicotine-exposed and non-nicotine exposed female rats exhibit higher cholinergic receptor (NAChR) densities than their male counterparts48. Since NAChR innervation influences several cognitive functions and neurotransmitters49, perhaps higher expression in women exacerbates the effects of smoking on cognition. However, it is not clear whether smoking-related changes in cognition are primarily due to nicotine exposure or the complex chemical makeup of cigarette tobacco and its additives42. This is becoming an important distinction as the prevalence of adult e-cigarette use/vaping increased from 2.8% in 2017 to 3.2% in 201843. However, since we did not ask about vaping specifically, further research is needed to monitor the relationship between e-cigarette use and memory performance since it is possible that the effects of e-cigarette use will differ compared to smoking tobacco due to the differing mix of chemical exposures.

Given the established relationships between smoking and cardiovascular disease, we also tested whether sex moderates the impact of cardiovascular disease on memory performance. Although much of the prior research in this domain has focused on cardiovascular risk scores (for review see6), we used number of CVD disease incidents as opposed to risk factors. Nevertheless, we can draw some comparisons between these investigations. The existing cross-sectional associations between cardiovascular disease risk and cognitive function in the literature are largely consistent with results obtained in this study between cardiovascular disease and memory performance. However, comparison with these findings for the sex effect is limited because of differences in study analyses. As with smoking studies, the majority of this research investigates sex as a covariate in the primary analyses instead of comparing men and women separately or investigating an interaction (for review see6). We found that sex moderates the relationship between cardiovascular disease and cognitive performance in that men are affected to a larger degree. However, the few studies that investigated men and women separately have found women’s cognitive performance to be slightly more impacted. For example, Kaffashian et al. investigated the Framingham cardiovascular risk profile and cognitive function and 10-year decline separately between men and women and reported larger effects in women across their cognitive batteries in 35–55 year-olds50. In a cohort of community-dwelling adults without clinical heart disease, the Framingham Cardiac Risk Score (FCRS) was associated with the rate of cognitive decline in women, but not men51. Yet another cohort of Mexican Americans demonstrated that higher predicted cardiovascular disease risk was associated with greater change in errors on multiple cognitive tests and that these associations were larger in women than men52. Lastly, a recent study found women had a higher association between vascular risk factors and worse cognition compared to men in a middle-aged Hispanic/Latino population22. It is unclear why we found larger effects in men, however, it could be due to the broad age range included in our study. The relationship between cardiovascular disease and cognitive function is primarily studied in older adults, with a few examples including 35 and older41,50,53,54, and one study including 18–30 year-olds55. Understanding the relationship between cardiovascular health and cognitive function in young adulthood may be necessary for understanding possible treatment and intervention opportunities.

Due to the large, widely available, and electronic nature of our study cohort, we rely on self-report answers to demographic, lifestyle, and health questions56. Current studies comparing self-report data given over the Internet versus data collected in-person show anywhere from a 0.3 to 20% discrepancy for height and weight measurements57,58. For some socially unacceptable measures (like smoking) internet self-report may actually have higher accuracy since the pressure to “perform” well in the presence of an investigator is removed when answering questions electronically59. To investigate the potential role that false-report error may play on our smoking effect, we re-analyzed the smoking effect after introducing additional error into the smoking self-report response. The additional error was added by randomly flipping the smoking response to various percentages of the cohort (stepwise from 1 to 10% of individuals) and re-analyzing the effect of smoking using our complete statistical model a total of 10,000 times for each error percentage. These results suggest it is unlikely that smoking self-report error is driving our results. Lastly, PAL was tested cross-sectionally in the cohort; therefore, determinations about the influence of collected factors on trajectories of change in performance across time within an individual subject are not possible. Additional longitudinal-based studies will be necessary to identify this class of variables.

There are limitations of this study to acknowledge. First, the smoking rate in MindCrowd is lower than national averages which may reflect the tendency of healthier people to participate in observational research60. However, since we used propensity matching to confirm the sex effect, this suggests our results are generalizable to the broader population. Next, the primary outcome measure was based on a single verbal memory and learning test with a ceiling and a floor effect. Using a measure with a more comprehensive score range may have tracked subtle differences between ages and our results may not be generalizable to different cognitive functions. Further, our study design is dependent on self-report of smoking and cardiovascular disease and we did not attempt to verify these with medical records. However, previous studies have reported discrepancies between self-reported smoking habits and serum cotinine concentration, a biomarker of nicotine absorption, especially in girls and women, suggesting that more women under-report smoking than do men61,62,63. A higher rate of inaccurate smoking status self-report by women in our dataset would have attenuated our results. This suggests that our results may actually underestimate the difference between men and women who smoke. Additionally, we were unable to assess a dose–response of smoking since we only asked for smoking status and did not collect requisite smoking history information in order to calculate pack years. Finally, the cross-sectional design of this study does not allow a causal conclusion and there may be other factors driving this association which were not measured (e.g. social class, diet, etc.). A longitudinal study design should follow to verify our results and assess a causal interpretation.

Despite these limitations, there are several advantages of using a large web-based study cohort. Our cohort includes a wide range of adults aged 18–85, which allowed us to assess the relationship between smoking and cardiovascular disease and verbal memory in the broadest single study age range used to date. The MindCrowd cohort has a considerable number of variables assessed, which enabled us to control for many potential confounding variables in addition to conducting propensity matching made possible due to the cohort size. Both regression and propensity analyses indicate that the independent effect of smoking over and above other health factors on verbal memory performance is robust in women. Propensity score matching attempts to mimic randomization by matching smokers and non-smokers on all observed covariates. Because smoking is correlated with many other lifestyle factors (e.g. education), propensity score matching helps isolate the effect of smoking on memory performance. Another advantage of using a web-based study design is the ability to use an identical study wide protocol as opposed to attempting to harmonize protocols and data across sites, which is often required when combining smaller, local cohorts. Furthermore, this cohort is geographically diverse with participants in both rural and urban settings, which allows a higher degree of generalizability. In addition to smoking status, we also explored the synergistic effects of cardiovascular disease on verbal learning and memory using a composite. Composites have the advantage of weighing multiple risks with a single summary variable and can be more sensitive and robust than single variables64. Therefore, composite scores may be more biologically relevant and have advantages in both clinical practice and cardiovascular research.

In summary, we report that sex moderates the relationship between smoking status and verbal learning and memory performance based on results from the largest study to date. Furthermore, we report down sampling tests that suggest the minimum sample required to dependably detect this interaction 50% of the time is 10,000. Based on these results, prior study sample sizes may have produced less reliable results due to small sample sizes. Our results highlight the importance of investigating sex as a variable of interest in understanding environmental influences on verbal learning and memory performance across development and aging. The results suggest smoking may impact women’s cognitive health to a greater degree than men.

Methods

Study participants

In January of 2013 we launched our observational study site at www.mindcrowd.org. Website visitors, who were 18 years or older, were asked to consent to our study before any data collection via an electronic consent form. Approval for this study was obtained from the Western Institutional Review Board (WIRB study number 1129241) and all experiments were performed in accordance with the Declaration of Helsinki.

As of March 17, 2020 MindCrowd has recruited 84,260 qualified participants from around the world aged 18–85 with 64.3% women and 35.7% men. An overrepresentation of women has been previously described in studies drawn from the general population65. Across the entire sample, 7.9% of participants reported being a current smoker (Fig. 9).

After consenting to the study and answering five demographic questions (age, sex, years of education, primary language, and country of residence), participants completed a web-based paired-associates learning (PAL) task. For this cognitive task, during the learning phase, participants were presented 12 word-pairs, one word-pair at a time (2 s/word-pair). During the recall phase, participants were presented with the first word of each pair and were asked to use their keyboard to type (i.e., recall) the missing word. This learning-recall procedure was repeated for two additional trials. Before beginning the task, each participant received one practice trial consisting of three word-pairs not contained in the 12 used during the test. Word-pairs were presented in different random orders during each learning and each recall phase. The same word pairs and orders of presentation were used for all participants. The dependent variable/criterion was the total number of correct word pairs entered across the three trials (i.e., 36 is a perfect score). Upon completing the PAL task, participants were directed to a webpage asking them to fill out an additional 17 demographic and health/disease risk factor questions including if they are a current smoker. Other questions included: marital status, handedness, race, ethnicity, number of daily prescription medications, a first-degree family history of dementia, and yes/no responses to the following: seizures, dizzy spells, loss of consciousness (more than 10 min), high blood pressure, diabetes, heart disease, cancer, stroke, alcohol/drug abuse, brain disease, and memory problems. Next, participants were shown their results and were provided with different comparisons to other test takers based on the average scores across all participants, as well as across sex, age, education, etc. On the same page, participants were also provided with the option to provide contact information if they wanted to be recontacted for future research.

Statistical analyses

Multivariate linear regression

To investigate the sex × smoking interaction on memory performance we ran a linear model (LM) controlling for various health and lifestyle factors from self-report including: age, race, ethnicity, marital status, handedness, education attainment, number of daily medications, history of diabetes, seizures, cancer, stroke, hypertension, heart disease, family history of Alzheimer disease, drug abuse, loss of consciousness, and dizziness. In order to justify the use of all covariates in both models, we ran stability selection using a Lasso regression model. We chose a common cut-off value for stability selection of 70%, all covariates passed this cutoff. The dependent variable was the total number of correct word pairs entered across the three trials of PAL tests (range of 0–36). We report standardized beta coefficients, standard error (SE), and p values.

To assess the interaction between cardiovascular disease × sex on PAL performance, we created a cardiovascular disease composite score by computing a sum number of CVD factors including heart disease, hypertension, diabetes, and stroke history (range of 0–4). Individuals were treated as groups based on their composite score (0 as the control group compared to 1, 2, and 3+ of these conditions). Participants with a sore of 3 or 4 were combined into one group to create an adequate sample size. We ran a linear model (LM) controlling for age and education level. We report standardized beta coefficients, standard error (SE), and p values.

Propensity score matching

In addition to running a multivariate linear regression model, we also conducted propensity score matching. Propensity score matching (PSM) has the benefit of reducing bias and variance66; however, this is often achieved at the expense of sample size compared to regression analysis. We matched participants on all variables listed as covariates in the multivariate linear model. Matching was performed using the R package, MatchIt (version 3.0.2). Effect sizes were estimated from the matched cohort using the R package, Zelig (version 5.1.6.1). Due to constraints on sample sizes, we conducted the PSM model collapsed across all ages on men and women separately. Zelig uses least squares regression on matched data to estimate the partial effect on an outcome of interest, in our case, total word pairs correct67,68. We also report effect sizes with credible intervals. The credible interval is the Bayesian version of a confidence interval and can be interpreted as a probability (i.e., there is a 95% probability that the effect size is between X and Y word pairs).

Down sampling

In order to demonstrate the importance of our large sample size for generating reliable effects, we ran 1000 down-sample linear regression models of the MindCrowd cohort between the ages of 18 and 85 years for the interaction effect of smoking × sex on paired-associate learning (PAL) for each indicated total sample size. Total sample sizes ranged from 268 to 13,400 and each group (smoker and non-smoker and sex) was sampled at equal size per age 18–85.

Artificial error introduction

To investigate the potential role false-report error may play on the smoking effect, we used a Monte Carlo simulation to determine the effect of introducing artificial error (in addition to any real self-report error already in the data). The additional error was added by randomly re-assigning the self-reported smoking status (smoker or non-smoker) to various percentages of the cohort (stepwise from 1 to 10% of individuals) and re-analyzing the effect of smoking using our complete statistical model. This process was performed a total of 10,000 times for each error percentage, and the resulting influences on the p-value are reported using boxplots.

Permutations

We examined the smoking effect on PAL through the use of permutation testing. Permutation testing is an approach utilized to determine the probability of a false-positive finding if the null hypothesis were true. To create the permuted datasets, we randomly assigned the smoking status for each participant before analyzing the complete model in the whole cohort, women only, and men only. We also ran permutation tests on the interaction between sex × smoking on PAL. This process was performed one million times per model.