Main

SARS-CoV-2 was first identified in December 2019 in Wuhan, China4, and the first infection was detected in the United States on 7 January 2020 (ref. 5). The virus is both highly transmissible and virulent. Estimates of the basic reproductive number, R0, range from 1.4 to 6.49 (ref. 6), with an estimated overall case fatality rate of 1.4% (ref. 7), which is highly varied across age classes. As of 24 April 2020, >2,626,000 confirmed cases of SARS-Cov-2 and >181,000 deaths had been recorded globally. Currently, the cumulative reported incidence of COVID-19 in the United States is the highest in the world1.

As the COVID-19 epidemic expands within the United States, a central focus of public health efforts will be limiting fatalities. A key driver of this outcome will be keeping the case burden of patients with COVID-19 within the treatment capacity of the healthcare system. If the medical system is overwhelmed, the standard of care for all individuals seeking medical care could be reduced, thereby exacerbating negative health outcomes8. Patients critically ill with COVID-19 might fare particularly poorly. High mortality rates within this group will probably be further compounded by shortages of intensive care facilities and/or access to mechanical ventilation equipment9. Patients without COVID-19 and who require care for other conditions will also be affected by the health system’s inability to meet their needs.

Effective allocation of limited medical resources, such as healthcare workers, protective equipment and ventilators, is required to reduce the likelihood of the healthcare system being overwhelmed. However, to achieve this, information on the distribution of the burden of disease and how that burden aligns with healthcare system capacity is required.

Several factors probably contribute to the heterogeneous distribution of COVID-19 burden across the United States. The first of these is demography. The incidence of COVID-19 consistently rises with increasing age2,3 (noting that incidence observed from diagnostic testing does not necessarily reflect total infections). This age-dependent pattern of infection seems to be largely driven by differences in susceptibility and symptomatic infection rates between age classes rather than by differences in transmission potential3,10. Rates of hospitalization and intensive care unit admission are also higher in individuals aged >60 years than in younger age classes11. Thus, variation in age structure between counties could lead to differences in the per capita burden of disease between regions. Access to healthcare could also affect the distribution of COVID-19 burden. Many rural areas of the United States might have insufficient or no resources to provide acute or critical care. Residents of such areas could therefore be at increased risk for insufficient treatment. Finally, limited healthcare system capacity in rural areas could lead to an unexpected influx of cases to hospitals in more densely populated regions.

The temporal distribution of COVID-19 spread could also contribute to heterogeneity in disease burden across the United States. The magnitude and timing of the epidemic peak, for example, determine the minimum healthcare system capacity needed to provide adequate care. However, obtaining accurate predictions of the epidemic peak is often challenging in emerging outbreaks due to limited and often unreliable data on incidence, as well as to the challenges associated with modeling the effects of rapidly deployed and changing mitigation efforts. County-level variability in testing standards and efforts12,13, nonpharmaceutical interventions (NPIs) such as social distancing14 and outbreak onset15, and a lack of serological data also limit efforts to accurately model epidemic trajectories beyond several weeks. By contrast, projections of cumulative disease burden are less hindered by these challenges as these are not aimed at describing an epidemic time course. Although such projections miss the nuance of the intensity and timing of outbreaks, their estimates of the spatial footprint of disease burden contain core information relevant to informing resource distribution. Comparing the expected cumulative number of critical and severe infections against healthcare resources in each county in the United States allows for the identification of regions that may experience particularly high disease burden. Furthermore, analysis of simulations of multiple transmission scenarios (for example, different contact patterns) allows for possible identification of those regions with consistently high disease burdens without needing to forecast an exact epidemic trajectory.

Here, we project the cumulative case burden (case numbers) and cumulative per capita burden (cases per person) of severe and critical COVID-19 cases in each county within the United States by combining demographic data and age-specific risk factors under the assumption that 20% of the population becomes infected. We calculate the cumulative healthcare system burden, using case/bed ratio that each county could experience, as its own residents (and those from nearby counties with limited or nonexistent medical resources) seek care. We repeat this analysis for a range of transmission scenarios, map the expected burden of COVID-19 for each scenario and identify those regions consistently expected to experience the highest cumulative burden of disease. A summary of the main findings, limitations and policy implications of this study is given in Table 1.

Table 1 Policy summary

We developed a modified, age-stratified susceptible–exposed–infected–recovered (SEIR) epidemic model (based on the model of Davies et al.3) to project the number of COVID-19 cases for all counties (and county equivalents such as independent cities) in the United States. In this model, susceptible individuals (S) become infected in a density-dependent fashion and enter the exposed (E) class, before eventually becoming either asymptomatically infected (IA) or mildly symptomatic (but not yet clinically presenting) (IP). Following published estimates3, we assume that relative susceptibility to infection and the fraction of individuals who become mildly symptomatic rather than asymptomatically infected are higher in older age classes than in younger age classes. Individuals in the IP class eventually become fully symptomatic (IC). Asymptomatic and symptomatic individuals recover with immunity to classes RA and RS, respectively. All individuals in the infected classes (IA, IP, IC) are infectious; however, our model assumes that the relative infectiousness of asymptomatic individuals is scaled by factor bA, and the relative infectiousness of fully symptomatic individuals is scaled by factor bC to account for the effects of case isolation and quarantine. Mixing between individuals of different age classes is determined by a parameter θ. For θ = 1, mixing patterns reflect empirically measured rates for the United Kingdom16. For θ = 0, mixing patterns are homogeneous; for 0 < θ < 1, mixing patterns are intermediate. This model aims to specifically project the age distribution of cases over a wide variety of transmission scenarios, and is not intended to produce epidemiological forecasts. As such, we include epidemiological details that could result in differences in disease burden between age classes, such as age-specific mixing patterns and rates of symptom presentation. However, we do not vary the components of our model linked to interventions (for example, transmission rate, mixing patterns) over time or by location.

We investigated a scenario in which 20% of the population in each county becomes infected. A 20% cumulative infection rate represents a pessimistic scenario over the next few months, but perhaps this will be an optimistic scenario beyond that time frame17. A 20% cumulative infection rate is independent of R0 and is not equivalent to 20% of the herd immunity threshold. We intentionally ignored spatial variation in the progression of the epidemic, to simplify comparisons of disease burden between regions.

As we aim to provide general estimates of relative distribution of disease burden rather than make precise predictions of case load over time, we sought to identify patterns of disease burden that are robust to different assumptions about the dynamics of epidemiological spread. Accordingly, we varied our assumptions about the overall transmissibility of COVID-19, age structure of contact patterns and the contributions of fully symptomatic individuals to transmission. For each set of assumptions, we simulated our model for each county in the United States using demographic data from the 2018 American Community Survey18. We then extracted the number of individuals in each age class who had become symptomatically infected by the time the cumulative population infection rate had reached 20%. We present detailed results for the most optimistic scenario and most pessimistic scenario. In the optimistic scenario (transmission, R0 = 2, relative infectivity of fully symptomatic individuals, bC = 0.1 and mixing structure, θ = 1; see Methods) transmission is slow, fully symptomatic individuals are effectively quarantined and mixing patterns exhibit a strong age structure, potentially decreasing transmission from asymptomatically infected (and thus nonquarantined) individuals in less vulnerable age classes (such as children) to individuals in more vulnerable age classes (such as the elderly). By contrast, the pessimistic scenario (R0 = 5, θ = 0, bC = 1) is characterized by high transmission, well-mixed contact patterns and ineffective quarantine. Results for 25 alternate combinations of R0, θ and bC are summarized in Extended Data Figs. 14.

To evaluate the sensitivity of our results to the effects of crowding on transmission and epidemic size, we also investigated alternative scenarios in which R0 increases as a linear function of urban population, from R0 = 2 in counties with 0% of residents living in urban areas, to \(R_{0_{\mathrm{max}}}\) in counties with 100% of residents living in urban areas. In an optimistic scenario we set \(R_{0_{\mathrm{max}}}\)to 3 (other parameters: θ = 1, bC = 0.1) and, in a pessimistic scenario, we set \(R_{0_{\mathrm{max}}}\) to 5 (other parameters: θ = 0, bC = 1). Disease burden in each county was calculated when the cumulative number of infections reached 20% of the herd immunity threshold multiplied by the population size, rather than 20% of the population size (see Methods). The relationship between crowding and R0 has not been definitively established and, as such, these results should be interpreted cautiously.

Using our projections of cumulative symptomatic infections, we further estimated the number of severe cases (that is, requiring hospitalization) and critical cases (that is, requiring intensive care) using published rates of these outcomes for various age classes11. In all transmission scenarios, the areas with high relative burdens of hospitalizations and intensive care unit (ICU) admissions generally had large populations (Figs. 1a,d,h, 2a,d and 3a,d). However, we observed the opposite pattern for the per capita burden of hospitalizations and ICU admissions, which were distributed heterogeneously and were higher in rural areas than in major population centers (Fig. 1c,f,j). Due to the positive correlation between age and disease severity, areas with the highest per capita burden were those with the highest percentages of individuals >60 years of age (Figs. 1b,e,i, 2b,e and 3b,e). Although more elderly age classes were disproportionately affected in the pessimistic transmission scenario (Figs. 2g and 3g), the sets of counties with very high projected burdens of per capita hospitalizations and ICU admissions remained similar across different transmission scenarios. Indeed, of the 315 counties at or above the 90% quantile of per capita hospitalization in the optimistic transmission scenario (Fig. 1, legend), 308 counties were also at or above this quantile in the pessimistic scenario. The median percentage of people residing in rural areas among these 308 counties was 100%, which is significantly greater than the median of all counties (57.54%, Mann–Whitney U = 696,348, n1 = 308, n2 = 3,142, two-sided P < 2.2 × 10–16). Of the 315 counties at or above the 90% quantile of per capita ICU admissions in the optimistic transmission scenario, 313 were also at or above this quantile in the pessimistic scenario. Again, the median percentage of people residing in rural areas among these 313 counties was 100%, significantly greater than the median of all counties (Mann–Whitney U = 725,670, n1 = 313, n2 = 3,142, two-sided P < 2.2 × 10−16).

Fig. 1: Population characteristics of the United States and their relationships with disease burden.
figure 1

a, Population of each county. b, Fraction of individuals within each county >60 years of age. c, Fraction of the population of each county classified as living in a rural area according to the 2010 US Census23. dk, The relationship between population characteristics (x axes) and metrics of disease burden (y axes) for the optimistic transmission scenario (blue) and pessimistic scenario (red). d, Total population vs. projected cumulative hospitalizations. e, Fraction of population over 60 vs. projected cumulative hospitalizations per capita. f, Fraction of population residing in rural area vs. projected cumulative hospitalizations per capita. g, Fraction of population residing in rural area vs. projected cumulative hospitalizations per hospital bed. h, Total population vs. projected cumulative ICU admissions. i, Fraction of population over 60 vs. projected cumulative ICU admissions per capita. j, Fraction of population residing in rural area vs. projected ICU admissions per capita. h, Fraction of population residing in rural area vs. projected ICU admissions per ICU bed.

Fig. 2: Projected cumulative burden of hospitalizations in the United States.
figure 2

ac, Optimistic scenario; df, pessimistic scenario. a,d, Relative number of hospitalizations in each county. b,e, Number of projected hospitalizations per capita in each county. a,b,d,e, Cases not yet allocated to healthcare systems. c,f, Cumulative number of hospitalizations per hospital bed after allocation of cases to healthcare systems. g, Cumulative fraction of each age class hospitalized in each transmission scenario. Each of the 315 lines for each transmission scenario represents a different county. h,i, Counties estimated to be in the 90% quantile of hospitalizations per capita (h) and hospitalizations per hospital bed (i) (after case allocation). Colors in h,i indicate whether these counties were estimated to be in the 90% quantile in the optimistic scenario, the pessimistic scenario, both or neither. A high-resolution version of this figure is provided in Supplementary Information.

Fig. 3: Projected cumulative burden of ICU admissions in the United States.
figure 3

ac, Optimistic scenario; df, pessimistic scenario. a,d, Relative number of ICU admissions in each county. b,e, Number of projected ICU admissions per capita in each county. a,b,d,e, Cases not yet allocated to healthcare systems. c,f, Cumulative number of ICU admissions per ICU bed after cases have been allocated to healthcare systems. g, Cumulative fraction of each age class requiring ICU admission in each transmission scenario. Each of the 315 lines for each transmission scenario represents a different county. h,i, Counties estimated to be in the 90% quantile of ICU admissions per capita (h) and ICU admissions per ICU bed (i) (after case allocation). Colors in h,i indicate whether these counties were estimated to be in the 90% quantile in the optimistic scenario, the pessimistic scenario, both or neither. A high-resolution version of this figure is provided in Supplementary Information.

Next, we evaluated how projected case burdens aligned with healthcare system capacity. We obtained data on the number of hospital beds and ICU beds in each county from the American Hospital Association 2018 annual survey19. We distributed cases to healthcare systems within and outside of their county of origin based on an allocation algorithm (see Methods). This algorithm distributes severe and critical cases based on relative distance and the relative capacity of healthcare systems to provide care (quantified as the number of hospital beds and ICU beds, respectively). The majority of cases originating from within a county with substantial medical resources stay within that county. Most severe and critical cases originating from within a county with few hospitals or ICU beds are allocated to nearby counties with greater care capacity. All severe or critical cases originating in a county that lacks the capacity to provide appropriate care entirely are distributed to nearby counties.

The maps of relative hospitalizations per bed (Fig. 2c,f) and relative ICU admissions per bed (Fig. 3c,f) indicate those counties expected to experience a higher burden of disease relative to medical resources. The burden of cases relative to hospital and ICU beds was generally highest away from urban centers in counties with substantial rural populations (Extended Data Fig. 5d,h). Several regions have a high concentration of counties with a high burden, including much of the western United States, the northern Midwest, Florida and northern New England. These patterns are robust to assumptions about transmission rates and age-specific mixing patterns. The optimistic and pessimistic transmission scenarios each identified 248 counties as being at or above the 90% quantile of cumulative hospitalizations per hospital bed; 247 counties were identified in both transmission scenarios. The median percentage of people residing in rural areas among these 247 counties (38.97%) is lower than the median for all counties with hospital beds (51.82%, Mann–Whitney U = 246,652, n1 = 247, n2 = 2,478, two-sided P = 4.64 × 10−7). Nevertheless these data indicate that the healthcare system burden is not concentrated in urban centers. In the case of ICU admissions per bed, all of the 136 counties identified as being at or above the 90% quantile were the same for both transmission scenarios. These 136 counties (median percentage of residents living in rural areas = 31.11%) were not identified as being more rural than all counties with ICU beds (median percentage of residents living in rural areas = 36.21%, Mann–Whitney U = 85,746, n1 = 136, n2 = 1,353, two-sided P = 0.19) but, again, a pattern emerges of healthcare system burden not being concentrated in urban areas.

For analyses where R0 varied as a function of the percentage of population residing in urban areas, the per capita and per hospital and ICU bed burdens of disease were not generally higher in rural areas, but rather were distributed heterogeneously across urban and rural areas (Extended Data Fig. 6). Counties at or above the 90% quantile for various metrics of disease burden were less rural than comparable counties, but were not heavily concentrated in the urban end of the urban–rural distribution (Extended Data Figs. 910). Otherwise, results from these analyses (Extended Data Figs. 610) largely agree with those presented above, indicating that our finding that disease burden is not expected to be concentrated only in urban areas is robust to assumptions about the effects of crowding on transmission patterns and epidemic size.

Even with unprecedented efforts to rapidly develop a vaccine20, a pharmaceutical intervention against COVID-19 is unlikely to be available in the near future. SARS-Cov-2 transmission is expected to continue over the coming months and will probably affect every locality in the United States. We aimed to identify counties that consistently emerge as being likely to experience a large burden of disease on their population and healthcare systems (across a range of assumptions about transmission patterns). We identified several regions in need of additional support, including much of the western portion of the country, the northern Midwest, Florida and northern New England. At a fine geographical scale, our results suggest that considerable rural–urban inequities exist, with the per capita burden of disease being higher away from major population centers.

Before even considering the increased case burden that these more rural places are projected to experience relative to the rest of the country, it is evident that hospitals—and, to a greater extent, hospitals with the capacity to provide intensive care—are unevenly distributed. Many regions have limited, or no, facilities equipped to provide the type of acute or critical care required to treat COVID-19 (ref. 19). Case fatality rates in these regions could rise above the national average if people are unable to access care. Bolstering the capacity of rural health systems, ensuring equitable access to care and implementing public health measures such as testing and contact tracing in both urban and rural areas should be central goals of COVID-19 management strategies in the United States. While the healthcare systems of major population centers were not identified as weak spots in our analysis, they do service a much larger number of people. Given the consequences of their potential failure, they should remain a priority for response efforts.

Our findings are robust to different assumptions about transmission patterns. However, it is imperative that they be interpreted in the context of our methodology. We were deliberately conservative in not considering the impact of potential therapeutics and vaccines. Our results only underscore the urgency of developing these interventions. Likewise, we did not consider the impact of other NPIs such as social distancing. Our findings point to the importance of implementing these measures in urban and rural regions. We specifically did not attempt to predict the epidemic peak timing or magnitude. Given the time-invariant scenario we model (that is, 20% of the population acquires infection), it is likely that our projections will not precisely match future observed patterns of disease burden in the short term, as many regions are still in the early phases of their epidemics, or in the long term, as the timing, extent and efficacy of interventions will vary among regions. However, our results provide an approximation of the expected patterns of burden rooted in basic features of demography and health system capacity. Notably, we did not consider how other factors linked with an increased risk of severe disease, such as comorbidities21 (for example, hypertension, pulmonary disease), or decreased access to medical care, such as noninsurance rate and socioeconomic status22, might exacerbate disease burden in certain regions. Incorporating such factors into mathematical models and their forecasts is an essential area of future research, and could reveal additional ‘hotspots’ of disease burden that were not identified in our analyses, which considered the role of demography alone. Future work should also seek to identify if and where disease burden is disproportionately high in certain racial or minority groups. Finally, we urge public health officials using our results to carefully consider location-specific details and nuances not explicitly included in our analyses when planning their response, and to focus on patterns of relative burdens rather than projections for individual counties.

In conclusion, we have identified areas in the United States expected to be particularly heavily affected by COVID-19. Our findings suggest that ensuring equitable allocation of medical care and public health resources to communities away from major population centers will be crucial as the country attempts to mitigate the consequences of the ongoing COVID-19 epidemic.

Methods

Data

We obtained counts of the number of individuals in 10-year age bins for all counties in the United States (we include non-county federally incorporated places in the set of all counties for the purposes of our analyses) from the 2018 American Community Survey, available from the United States Census Bureau18. We define the set of age categories as G= {0–9, 10–19,…,70–79, 80+}. We obtained data on hospital location and bed number from the American Hospital Association 2018 annual survey19. We used the calculated total of all beds for each hospital to represent the number of hospital beds, and the number of adult medical/surgical intensive care beds to represent the number of ICU beds. We aggregated hospital and ICU bed data by county in accordance with American Hospital Association data use policy. We obtained the numbers of individuals in each county living in rural and urban areas from the 2010 US census23.

Mechanistic models

We developed an age-stratified mechanistic epidemiological model based on that of Davies et al.3 that follows a SEIR framework. This model assumes no births or deaths. The subscript i denotes the index of the age strata. The parameter ri denotes the rate of symptomatic infection for age class Gi. The parameter ui denotes the relative susceptibility of age class Gi. We set values for ri and ui according to the means of the consensus estimates from Davies et al.3:

$${{r = }}\left\{ {{\mathrm{0}}{\mathrm{.40,}}\;{\mathrm{0}}{\mathrm{.25,}}\;{\mathrm{0}}{\mathrm{.37,}}\;{\mathrm{0}}{\mathrm{.42,}}\;{\mathrm{0}}{\mathrm{.51,}}\;{\mathrm{0}}{\mathrm{.59,}}\;{\mathrm{0}}{\mathrm{.2,}}\;{\mathrm{0}}{\mathrm{.76,}}\;{\mathrm{0}}{\mathrm{.76}}} \right\}$$
$${{u = }}\left\{ {{\mathrm{0}}{\mathrm{.33,}}\;{\mathrm{0}}{\mathrm{.37,}}\;{\mathrm{0}}{\mathrm{.69,}}\;{\mathrm{0}}{\mathrm{.81,}}\;{\mathrm{0}}{\mathrm{.74,}}\;{\mathrm{0}}{\mathrm{.8,}}\;{\mathrm{0}}{\mathrm{.89,}}\;{\mathrm{0}}{\mathrm{.77,}}\;{\mathrm{0}}{\mathrm{.77}}} \right\}$$

The infected class is decomposed into asymptomatic (IA), symptomatic, pre-clinical (IP) and symptomatic, clinical (IC) classes to reflect relevant aspects of SARS-Cov-2 epidemiology, namely that not all infected individuals show symptoms and that individuals are frequently quarantined upon presenting symptoms. We also decomposed the recovered class into separate compartments for those recovered from symptomatic infection, RS, and those recovered from asymptomatic infection, RA, to simplify calculations of total symptomatic and asymptomatic cases. This model framework allows us to impose assumptions about the infectivity of asymptomatic and fully symptomatic individuals (bA and bC, respectively) relative to the infected class probably responsible for the bulk of transmission (IP).

$$\frac{{\mathrm{d}{{S}}_i}}{{\mathrm{d}t}} = - {{S}}_i\;u_i\;\beta \mathop {\sum }\limits_{j = 1}^9 C_{i,j}\frac{{{{I}}_{{\mathrm{P}}_j} + b_{\mathrm{C}}{{I}}_{\mathrm{C}_j} + b_{\mathrm{A}}{{I}}_{\mathrm{A}_j}}}{{N_j}}$$
$$\frac{{\mathrm{d}{{E}}_i}}{{\mathrm{d}t}} = {{S}}_i\;u_i\;\beta \mathop {\sum }\limits_{j = 1}^9 C_{i,j}\frac{{{{I}}_{{\mathrm{P}}_j} + b_{\mathrm{C}}{{I}}_{\mathrm{C}_j} + b_{\mathrm{A}}{{I}}_{\mathrm{A}_j}}}{{N_j}} - \delta _{{E}}{{E}}_i$$
$$\frac{{\mathrm{d}{{I}}_{{\mathrm{P}}_i}}}{{\mathrm{d}t}} = r_i\delta {_E}E_i - \delta_{\mathrm{P}}I_{{\mathrm{P}}_i}$$
$$\frac{{\mathrm{d}{{I}}_{\mathrm{C}_i}}}{{\mathrm{d}t}} = \delta_{\mathrm{P}}I_{\mathrm{P}_i} - \delta_{\mathrm{C}}{{I}}_{\mathrm{C}_i}$$
$$\frac{{\mathrm{d}{{I}}_{\mathrm{A}_i}}}{{\mathrm{d}t}} = \left( {1 - r_i} \right)\delta_{E}E_i - \delta _{\mathrm{A}}{{I}}_{\mathrm{A}_i}$$
$$\frac{{\mathrm{d}{{R}}_{\mathrm{{S}}}}}{{\mathrm{d}t}} = \delta _{\mathrm{C}}{{I}}_{\mathrm{C}_i}$$
$$\frac{{\mathrm{d}{{R_{\mathrm{A}}}}}}{{\mathrm{d}t}} = \delta _{\mathrm{A}}{{I}}_{\mathrm{A}_i}$$

Here, C is the contact matrix whose entries Ci,j correspond to the mean number of contacts between individuals in the ith and jth age classes of G, δ parameters determine the mean amount of time (t) that individuals spend in each class and β is the transmission parameter.

We used this model to simulate a wide range of plausible epidemiological scenarios. Specifically, we considered values for bC in {0.1, 0.5, 1}, values for R0 in {2, 4, 6} and values for the degree of homogeneous mixing in {0, 0.5, 1}. In the sections below, we describe how we constructed the contact matrix C. We set the values of the following model parameters according to published estimates3: \(b_{\mathrm{A}} = 0.5,\delta _{{E}} = \frac{1}{3},\delta _{\mathrm{P}} = \frac{1}{{2.1}},\delta _{\mathrm{C}} = \frac{1}{{2.9}},\delta _{\mathrm{A}} = \frac{1}{5}\). After constructing C and fixing these variables, we used numerical methods combined with the next-generation matrix approach24 to calculate the value for β that corresponds to the value R0 we wished to assume for each scenario.

Rescaling the contact matrix

We used the ‘socialmixr’25 R package to retrieve the UK contact matrix from the POLYMOD study16, with contacts binned according to the following age categories: {0–9,10–19,…,60–69, 70+}. We term this matrix A. No finer resolution was available for contacts involving individuals over the age of 70. However, to account for differences between individuals in the age classes 70–79 and 80+ in terms of relevant COVID-19 parameters, we synthesized a new matrix, B, that includes contacts for individuals in the age classes 70–79 and 80+:

$$B_{i,70 - 79} = A_{i,70 + }\frac{{N_{70 - 79}}}{{N_{70 + }}}$$
$$B_{i,80 + } = A_{i,70 + }\frac{{N_{80 + }}}{{N_{70 + }}}$$
$$B_{70 - 79,j} = A_{70 + ,j}$$
$$B_{80 + ,j} = A_{70 + ,j}$$

where Nx is the number of individuals in the entire United States in age class x.

Next, we constructed the contact matrix used in our model C by rescaling B to reflect our assumptions about mixing patterns:

$$C_{i,j} = \frac{{\left( {1 - \theta } \right)\mathop {\sum }\nolimits_{j = 1}^9 B_{i,j}}}{9} + \theta \;B_{i,j}$$

The quantity θ represents the degree of homogeneous mixing. When θ = 1, contact patterns are identical to the POLYMOD contact patterns. When θ = 0, contact rates are homogenous across age classes. Values of θ between 0 and 1 correspond to mixing patterns intermediate between the POLYMOD and homogenous scenarios. This rescaling procedure preserves the total number of contacts experienced by each age class while changing the identity of those contacts.

Model simulation

For each scenario in each county, we used the following conditions to initiate the model:

$${{S}}_i = N_i - 4$$
$${{E}}_i = 1$$
$${{I}}_{{\mathrm{P}}_i} = 1$$
$${{I}}_{\mathrm{C}_i} = 1$$
$${{I}}_{\mathrm{A}_i} = 1$$
$${{R}}_{{\mathrm{S}}_i} = 0$$
$${{R}}_{\mathrm{A}_i} = 0$$

The number of individuals within each age class for the county of interest is Ni.

We then simulated the model in R using the ‘ode’ function in the ‘deSolve’ package26 with the ‘lsoda’ integrator and a step size of 0.25. We truncated the simulation when \(\mathop {\sum }\limits_{i = 1}^9 \frac{{{{I}}_{{\mathrm{P}}_i} + {{I}}_{\mathrm{C}_i} + {{I}}_{\mathrm{A}_i} + {{R}}_{{\mathrm{S}}_i} + {{R}}_{\mathrm{A}_i}}}{{{{S}}_i + {{E}}_i + {{I}}_{{\mathrm{P}}_i} + {{I}}_{\mathrm{C}_i} + {{I}}_{\mathrm{A}_i} + {{R}}_{{\mathrm{S}}_i} + {{R}}_{\mathrm{A}_i}}} = 0.2\)

and then extracted the number of individuals in each age-stratified compartment.

Case estimation

We calculated the total number of symptomatic infections in each age class by the time that the cumulative infection rate reached 20% as \({{I}}_{{\mathrm{P}}_i} + {{I}}_{\mathrm{C}_i} + {{R}}_{{\mathrm{S}}_i}\) at the end of the simulation. We then calculated the number of hospitalizations in each age class by multiplying the number of symptomatic infections in each age class by age-stratified estimates11 of hospitalization rates for symptomatic cases: {0.001, 0.003, 0.012, 0.032, 0.049, 0.102, 0.166, 0.243, 0.273}.

We then calculated the number of ICU admissions in each age class by multiplying the number of hospitalizations by age-stratified estimates11 of the rate of ICU admissions for patients given hospitalization: {0.05, 0.05, 0.05, 0.063, 0.122, 0.274, 0.432, 0.709}.

Case distribution

We distributed cases originating in a given county to the healthcare systems of that county and other counties using the following algorithm.

  • Let the county of origin be denoted as c0 and the potential destination counties as c0,…,cN

  • Let the distances between the center of population of the county of c0 and each potential destination county ci be d0,i

    We obtained the latitude and longitude of the center of population for each county from publicly available data from the 2010 US census, and calculated pairwise distances between counties using the Rpackage ‘geosphere27.

  • We next removed all destination counties with d0,i > 400 km.

  • We calculated a distance weight, yi, for each remaining potential destination county as \(y_i = \frac{1}{{20}}\mathrm{e}^{\frac{{d_{0,1}}}{{20}}}\)

  • We calculated a bed weight, zi, for each county as the number of total hospital beds in ci. For projections involving ICU admissions, we used the number of ICU beds rather than the number of hospital beds.

  • We then calculated a composite weight, wi, for each county as \(w_i = \frac{{y_i}}{{\mathop {\sum }\nolimits_{j = 0}^9 y_j}}\frac{{z_i}}{{\mathop {\sum }\nolimits_{j = 0}^9 z_j}}\)

  • Lastly, cases originating in c0 were then distributed to counties c0,…,cN proportional to \(\frac{{w_0}}{{\mathop {\sum }\nolimits_{i = 0}^9 w_i}}, \ldots, \frac{{w_N}}{{\mathop {\sum }\nolimits_{i = 0}^9 w_i}}\)

‘Alternate optimistic’ and ‘alternate pessimistic’ scenarios

For the two scenarios in which we varied R0 between counties according to the percentage of the population residing in rural areas, the value of R0 for each county was calculated as:

$$R_0 = 2 + \left( {R_{0_{\mathrm{max}}} - 2} \right) \times \mathrm{percentage}\;{\mathrm{population}}\;{\mathrm{residing}}\;{\mathrm{in}}\;{\mathrm{urban}}\;{\mathrm{area}}$$

Instead of truncating our simulations at a 20% cumulative infection rate, we truncated our simulations when the following condition was met, indicating that the cumulative infections rate was equal to 20% of the herd immunity threshold:

\(\mathop {\sum }\limits_{i = 1}^9 \frac{{{{I}}_{{\mathrm{P}}_i} + {I}}_{C_i} + {I}_{A_i} + {{R}}_{{\mathrm{S}}_i} + {{R}}_{\mathrm{A}_i}}{{{{S}}_i + {{E}}_i + {{I}}_{{\mathrm{P}}_i} + {{I}}_{\mathrm{C}_i} + {{I}}_{\mathrm{A}_i} + {{R}}_{{\mathrm{S}}_i} + {{R}}_{\mathrm{A}_i}}} = 0.2 \times \left( {1 - \frac{1}{{R_0}}} \right)\)

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.