Behind the highly politicized disagreements over COVID-19 control measures lies a widely shared desire to return economic and social life to sustainable levels as soon and for as long as possible, while preserving health-care systems and minimizing severe illness and death. The main arguments are about the extent to which these goals are mutually reinforcing, and whether there is a trade-off between greater viral transmission and increased social and economic activity. The difficulty in identifying control measures that are both effective and minimally disruptive motivates the search for new approaches to modelling transmission. Given the limited data available from epidemiological studies on how interventions can curb infection, such models can provide an initial framework for evaluating hypothetical control measures and help to guide policy decisions.

Writing in Nature, Chang et al.1 present an innovative method that combines simple infectious-disease models with human-mobility data obtained from mobile-phone records. This data-rich model has enabled them to generate and, to some extent, test hypotheses on where the virus is transmitted, how racial and socio-economic disparities in COVID-19 infections arise, and how effective different types of control measure might be.

Chang and colleagues analysed a data set of mobile-phone records for 98 million people in the United States, providing anonymized location information from 1 March to 1 May 2020 and comprising more than 5 billion time points. The authors looked at broad patterns of human interaction at non-residential locations of interest, for example in venues such as shops (Fig. 1), restaurants and places of worship, and used these data as parameters in their model to predict the numbers of new cases detected in entire cities each day. The authors also ‘fitted’ the model (estimated a few additional unknown parameters) to the data indicating the number of cases. After fitting, the model estimates not only the total rate of infection, but also infection rates in different individual venues or types of venue. This method complements more-conventional epidemiological and surveillance approaches that try to estimate the risks individually for different sorts of activity2,3.

A cashier wearing a face shield talks with a customer at the Local Market Foods store in Chicago, April 2020.

Figure 1 | A grocery store in Chicago, Illinois. Image taken in April 2020.Credit: Kamil Krzaczynski/AFP/Getty

In principle, studies of outbreak clusters or contact-tracing data offer a way to identify the types of activity and location at which viral transmission does or doesn’t happen as a result of interventions, and the shared features of such findings could be used to infer which activities could be continued and which are key targets for control. For example, case studies of clusters of infections in locations such as those for indoor choir practice, on ships and in nursing homes demonstrate the high risks of poorly ventilated, crowded spaces. By contrast, case studies, such as those providing examples of test results indicating the lack of transmission from infected hair-salon workers to clients while all were wearing face coverings4, have provided some support for the reopening of such establishments, given adequate mask policies.

These case-study approaches are essential but have substantial limitations. They often have small sample sizes5, and frequently test only some of the people who have come into contact with a person confirmed to have a SARS-CoV-2 infection, making it difficult to identify the factors that determine whether or not an outbreak occurs. Furthermore, in many places, clusters and infections identified by contact tracing account for only a modest proportion of all transmissions, so inferences drawn from such examples might be of limited use for generalization. Randomized experiments to evaluate interventions (in which the introduction of an intervention is staggered over time in different populations)6 are an alternative, but they are hard to do7, and such studies have been rare for COVID-19.

Given the limited data available from these epidemiological and experimental studies, mathematical modelling presents an enticing approach for evaluating interventions. However, without data from epidemiological studies, modelling studies can, at best, assume only a possible range of effectiveness for particular interventions. A strength of Chang and colleagues’ approach is that the estimates of effects of interventions are based on the modelled rate of transmission in a particular place. This is, in turn, based on mobile-phone data on people’s locations, which provides information such as the number of people who come into close contact with each other at a particular venue, and is combined with strong but transparent assumptions about the factors underlying transmission at different types of venue.

For a given venue, the data used by Chang and colleagues provide detailed estimates of how many people visit per hour, how long they stay on average, and which neighbourhoods they are visiting from. The transmission model is designed to group people in each neighbourhood by their disease status — whether they are susceptible (not previously infected), exposed, infected or removed (having either recovered or died), corresponding to what is described as an SEIR model — and the model allows people to move sequentially through these compartments.

The key assumption made by Chang et al. is that the rate at which people in the population are likely to become infected (which in this context refers to their moving from the ‘susceptible’ to the ‘exposed’ compartment) depends on which venues they visit and how that changes over time. In this model, venues at which people stay for longer and that are more densely occupied carry a higher risk than do settings in which people stay for less time and that are less packed. This underlying model is exceptionally simple, but the texture of population behavioural changes over time is provided by the detailed mobility data.

One of the strengths of this data-rich modelling approach is that the number of other unknown parameters that need to be fitted to the model is low, reducing the risk that the model is ‘overfitted’ in a way that forces the model’s structure to account for key features of the data. After fitting parameters to publicly available data for the daily number of infections, Chang et al. find that their model can better predict epidemic trajectories observed for multiple cities, when compared with the results of models that lack such detailed mobility data. This indicates that the addition of the mobility data improved the accuracy of their model. Although the goal of this mechanistic modelling was not to provide a forecast8, confirming that the model has reasonable predictive power is often a necessary first step in trying to draw any conclusions about mechanisms that might underlie the observed patterns.

In addition to trying alternative fitting approaches, the authors analysed how sensitive particular outcomes were to the parameters of the model, finding that model outputs are consistent over a range of plausible parameter values. These give confidence that, although simple, the assumptions underlying this model generate predictions that match the data better than do equally plausible alternatives, which increases the persuasiveness of the hypotheses generated about interventions. An important extension of this work will be to check whether the model continues to match the data for case numbers in more recent months, including the case surges observed during the summer in some cities.

Chang and colleagues’ model predicts that infections in venues such as restaurants, gyms and religious establishments have a disproportionately large role in driving up infection rates, corroborating findings from epidemiological studies5,9. Locations missing from Chang and colleagues’ analysis demonstrate some of the limitations of working with such mobility data. Children, elderly people and those in prison are under-represented in the data sets, so it was not possible to make inferences about the role of schools, nursing homes and prisons in community transmission of viral infection. Models that incorporate both mobility data and typical epidemiological data sources, such as social-contact surveys, might help to address this gap10.

Nevertheless, the fine-level detail in the mobility data for the venues analysed enabled the authors to evaluate nuanced versions of reopening strategies beyond just simply modelling an overall reduction of activity across a city trying to cope with the pandemic. Chang et al. found that capping the maximum occupancy of venues — a strategy that implicitly reduces the number of person-hours spent in risky, high-occupancy settings — is predicted to result in a decreased number of new infections compared with a strategy of less-targeted, overall activity reduction. The authors’ approach can be extended to evaluate other types of reopening strategy. For example, time-limited visits to places such as gyms and museums could be modelled by decreasing the average visit length.

Chang and colleagues’ work also deepens our understanding of possible causes of the observed disparities in COVID-19 cases by income level, which are incompletely understood. By combining their mobility model with demographic census data, the authors identified two possible main contributors. The first is that lower-income neighbourhoods, which tend to have higher numbers of front-line workers, had less overall reduction in mobility during lockdowns than did higher-income neighbourhoods, a conclusion shared by other studies11,12. The second possible contributor is that, across many types of setting, the venues visited by people from lower-income neighbourhoods tend to be more crowded than are the venues visited by those from higher-income areas. This type of observation is possible only because of the high level of detail on venue size and occupancy in the data analysed by Chang and colleagues.

The authors present clear hypotheses, whose plausibility is enhanced by their fit in the model to the observed infection data, the model’s parsimony (which reduces concerns about overfitting), and their ability to provide explanations for observations that were not part of the data, such as disparities associated with income. Further model testing is needed, but given the challenges in gathering and interpreting other relevant data types, these findings could have a valuable role in guiding policy decisions on how to reopen society safely and minimize the harm caused by movement restrictions. Chang and colleagues’ work underscores the value of integrating mobility data into epidemiological surveillance systems13, an increasingly popular approach that should become routine as we rebuild such surveillance systems to incorporate the lessons of this pandemic.