Main

COVID-19 is a severe acute respiratory infection (SARI) that emerged in early December 2019 in Wuhan, China1. The outbreak was declared a public health emergency of international concern by the World Health Organization on 30 January 2020. COVID-19 is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), an enveloped, single-stranded positive-sense RNA virus that belongs to the Betacoronavirus genus and Coronaviridae family2. SARS-CoV-2 is closely related genetically to bat-derived SARS-like coronaviruses3. Human-to-human transmission occurs primarily via respiratory droplets and direct contact, similar to human influenza viruses, SARS-CoV and Middle East respiratory syndrome coronavirus4. The most commonly reported clinical symptoms are fever, dry cough, fatigue, dyspnoea, anosmia, ageusia, or some combination of these1,4,5. As of 16 June 2020, more than 7.9 million cases have been confirmed worldwide, resulting in 434,796 deaths6.

Brazil declared COVID-19 a national public health emergency on 3 February 20207. After the development of a national emergency plan and the early establishment of molecular diagnostic facilities across Brazil’s network of public health laboratories, the country reported its first confirmed COVID-19 case on 25 February 2020, in a traveller returning to São Paulo from northern Italy8. São Paulo is the largest city in South America and no other Brazilian city receives a greater proportion of international flights9. Currently, Brazil has one of the fastest-growing COVID-19 epidemics in the world, now accounting for 1,864,681 cases and 72,100 deaths, comprising over 55% of the total number of reported cases in Latin America and the Caribbean (as of 14 July 2020)6. About 21% of Latin American and Caribbean populations are estimated to be at risk of severe COVID-19 illness10. The region has been experiencing large outbreaks, with growing epidemics in Brazil, Peru, Mexico, Chile, Colombia, Panama and possibly Venezuela and Nicaragua, amid growing concerns about testing capacity for COVID-19 (refs. 11,12,13,14). Preparedness for laboratory surveillance of SARS-CoV-2 in Latin America is centred around a network of national reference influenza surveillance laboratories that is facing several challenges, including a shortage of reagents and equipment15.

Conscious of the challenges associated with surveillance since the beginning of the epidemic in Brazil, here we focus on two main objectives. First, we contextualize the Brazilian SARS-CoV-2 epidemic by comparing local transmission dynamics with those observed in other selected countries. Second, we use geospatial data related to confirmed COVID-19 cases and SARI cases with unknown aetiology to evaluate the relationship between socioeconomic factors and COVID-19 distribution.

Results

Contextualizing COVID-19 data reporting systems in Brazil

On 22 January 2020—more than 1 month before the first case in Brazil—the Brazilian Ministry of Health implemented the REDCap platform to report prospective suspected, probable and confirmed COVID-19 cases (see Methods for case definitions), as part of an early response to the pandemic16. By 27 March 2020, the REDCap system was discontinued (Fig. 1). Since then, mild COVID-19 cases started to be reported on e-SUS Vigilância Epidemiológica (e-SUS-VE), a new national COVID-19 reporting system, and hospitalized COVID-19 cases started to be recorded on a pre-existing Sistema de Informação de Vigilância Epidemiológica da Gripe (SIVEP-Gripe) system. The SIVEP-Gripe system has been in use since 2009 (having been implemented in response to the 2009 influenza H1N1 pandemic) and has since centralized the reporting of respiratory viruses and SARI for the Brazilian Ministry of Health (Fig. 1). Both e-SUS-VE and SIVEP-Gripe include suspected and confirmed COVID-19 cases as reported by public health and private services (primary and emergency care). These two reporting systems (e-SUS-VE and SIVEP-Gripe) are inter-related on the Portal do COVID-19 website (https://covid.saude.gov.br/), which summarizes daily the aggregated counts from both platforms.

Fig. 1: Timeline of national COVID-19 reporting systems in Brazil.
figure 1

The REDCap system operated between late January and 25 March 2020. Aggregated numbers from e-SUS-VE and SIVEP-Gripe data for mild and hospitalized COVID-19 cases, respectively, are updated on a daily basis on the Portal do COVID-19 website (https://covid.saude.gov.br/).

SARS-CoV-2 reporting in Brazil

We analysed a total of 514,200 SARS-CoV-2 cases from the Portal do COVID-19 website (SIVEP-Gripe and e-SUS-VE databases combined) that were confirmed by molecular diagnostic and clinical epidemiological criteria by 31 May 2020 (see Methods). Cases were reported in 75.3% (4,196 of 5,570) of municipalities across all five administrative regions of Brazil and included 206,555 (40.2%) recovered patients and 29,314 fatal (17.5%) COVID-19 cases (Fig. 2a). We further analysed a total of 1,468 confirmed cases from the REDCap system, including 342 imported cases with associated travel history information. After excluding individuals who travelled to multiple countries before entering Brazil (n = 56) and who had an unknown country of origin (n = 16), the self-reported countries of infection for cases acquired abroad until 19 March 2020 were the United States (28.6%; n = 76), Italy (24.4%; n = 65), the United Kingdom (10.5%; n = 28) and Spain (8.3%; n = 22) (Extended Data Fig. 1). The first reported case (SPBR1) was reported on 25 February 2020 in the municipality of São Paulo, the fourth most populous urban area worldwide. Following the first reports of COVID-19 in Brazil’s largest population centres, SARS-CoV-2 subsequently spread to municipalities with smaller population sizes (Fig. 2b). Until 31 May 2020, most confirmed cases and deaths were reported in the states of São Paulo (109,698 cases and 7,615 deaths), Rio de Janeiro (53,388 cases and 5,344 deaths), Ceará (48,489 cases and 3,010 deaths) and Amazonas (41,378 cases and 2,052 deaths), which together account for 49.2% of all cases and 61.5% of deaths in Brazil (Fig. 2c).

Fig. 2: COVID-19 epidemiology in Brazil.
figure 2

a, Numbers of COVID-19 cases (blue solid line) and deaths (blue dashed line) reported to the Ministry of Health (Portal do COVID-19 website), along with numbers of COVID-19 confirmed cases (salmon solid line) and cases of SARI with unknown aetiology (salmon dashed line) reported to the SIVEP-Gripe database. b, First COVID-19 cases by date and Brazilian municipal population size based on the Ministry of Health data, from 28 March 2020. Each circle represents the first confirmed COVID-19 case in the municipality (n = 4,196 Brazilian municipalities). SPBR1 is the first detected SARS-CoV-2 infection in Brazil8. c, Map coloured according to the number of confirmed COVID-19 cases per state reported to the Ministry of Health (Portal do COVID-19 website). Circle sizes are proportional to the number of reported COVID-19 deaths in each federal unit. AC, Acre; AL, Alagoas; AM, Amazonas; AP, Amapá; BA, Bahia; CE, Ceará; DF, Distrito Federal; ES, Espírito Santo; GO, Goiás; MA, Maranhão; MG, Minas Gerais; MS, Mato Grosso do Sul; MT, Mato Grosso; PA, Pará; PB, Paraíba; PE, Pernambuco; PI, Piauí; PR, Paraná; RJ, Rio de Janeiro; RN, Rio Grande do Norte; RO, Rondônia; RR, Roraima; RS, Rio Grande do Sul; SC, Santa Catarina; SE, Sergipe; SP, São Paulo; TO, Tocantins.

Basic reproduction number of SARS-CoV-2 in Brazil and comparison countries

To estimate the basic reproduction number (R0) of SARS-CoV-2 in Brazil, daily confirmed cases in São Paulo, Rio de Janeiro, Ceará and Amazonas states were compiled from Ministry of Health data (for specification of the time windows used in the analyses, see Extended Data Fig. 2). For comparison, we compiled time series of confirmed cases in several European countries from the Johns Hopkins Coronavirus Resource Center (https://coronavirus.jhu.edu/; see also Extended Data Fig. 3). We found that São Paulo, Rio de Janeiro and Amazonas were characterized by similar R0 values of 2.9 (95% Bayesian credible interval (BCI) = 2.2–5.1), 2.9 (95% BCI = 2.2–4.9) and 2.6 (95% BCI = 2.0–4.5), respectively. However, for Ceará, the estimated R0 was considerably lower at 1.9 (95% BCI = 1.5–3.0) (Fig. 3 and Extended Data Fig. 1). This finding could be a result of the small window between the first reported cases and the early implementation of non-pharmaceutical interventions (NPIs) in this state (Supplementary Table 1 and Extended Data Fig. 2). On a national scale, the estimated R0 for Brazil was slightly higher than that of the Brazilian states considered in this study, with a median of 3.1 (95% BCI = 2.4–5.5), and also slightly higher than R0 values estimated for other severely affected countries: Spain (2.6; 95% BCI = 2.0–4.6); France (2.5; 95% BCI = 1.9–4.4); the United Kingdom (2.6; 95% BCI = 2.0–5.1); and Italy (2.5; 95% BCI = 2.0–4.4) (Fig. 3). While the incidence curves for European countries have consistently flattened and declined since the implementation of NPIs (suggesting that the R0 value has fallen below 1), Brazil’s daily incidence curve has continued to increase (Fig. 2a and Extended Data Fig. 4).

Fig. 3: Estimated R0 values for four Brazilian states and selected countries.
figure 3

Left: R0 values for the Amazonas, Ceará, Rio de Janeiro and São Paulo states. Right: R0 for Brazil, France, Italy, Spain and the United Kingdom. Violin plots of posterior samples for the basic reproduction number, the box plots show the median, first, and third quartiles. The whiskers extend to the most extreme value less than 1.5 times the interquartile range beyond the quartile. The daily numbers of infections used in each analysis can be found in Extended Data Figs. 3 and 4. Daily numbers of infections and prior distributions can be found in Extended Data Figs. 5 and 6.

SARIs mostly reflect COVID-19 cases

In the early phase of the COVID-19 epidemic in Brazil, we analysed the results for other respiratory pathogens tested in Brazil as part of a differential diagnosis by the Central Public Health Laboratories and National Influenza Centres (Brazilian Ministry of Health), obtained from a REDCap platform17 designed for COVID-19. The respiratory viruses most frequently identified between 7 January 2020 and 27 March 2020, in patients with a suspected but negative diagnosis of COVID-19, were influenza A virus (347 (14.3%) of 2,429 tested cases), influenza B virus (251 (10.3%) of 2,429) and human rhinovirus (136 (5.6%) of 2,429). We found co-detection of SARS-CoV-2 with six other respiratory viruses, the most frequent of which were influenza A (11 (0.5%) of 2,429) and human rhinovirus (6 (0.2%) of 2,429) (Extended Data Fig. 7).

The SIVEP-Gripe system started reporting hospitalized COVID-19 cases in early March 2020 (epidemiological week 10) (Fig. 4). In this system, the number of tested cases is unavailable. We found that the peak of influenza confirmed cases (n = 447) occurred at epidemiological week 12 (15–21 March 2020). During the same week 12, we detected an 8.5-fold increase in total cases attributed to SARS-CoV-2 (n = 3,789) and a 9.9-fold increase in total cases reported as SARI with unknown aetiology (n = 4,424) (Fig. 4). From 2 January to 31 May 2020, a total of 2,136 influenza cases and 272 cases caused by other respiratory pathogens, including human respiratory syncytial virus, human rhinovirus, adenovirus and metapneumovirus, were reported in the SIVEP-Gripe database. The low observed incidence of influenza and other respiratory viruses may have been influenced by limited testing for these viruses during this period. Although NPIs may have an impact in reducing influenza virus transmission, this does not necessarily reflect a lower co-circulation of other respiratory viruses18.

Fig. 4: Reports of COVID-19 and SARI with unknown aetiology and influenza.
figure 4

The red and orange lines indicate cases reported in 2020 (solid red, COVID-19; solid orange, influenza; dashed red, SARI with unknown aetiology). The blue lines indicate cases reported in 2016 for influenza (solid blue line) and SARI with unknown aetiology (dashed blue line). Grey lines indicate influenza (solid line) and SARI cases with unknown aetiology (dashed line) for 2017, 2018 and 2019 combined.

Socioeconomic differences are associated with COVID-19 diagnosis

Until 31 May 2020, a total of 73,648 COVID-19 confirmed cases and 168,001 SARI cases with unknown aetiology were reported in the SIVEP-Gripe system. We hypothesized that the 2.3-fold increase of SARI cases with unknown aetiology was associated with differential access to healthcare due to socioeconomic factors.

We focused on the Metropolitan Region of São Paulo (MRSP), which has a population of 23 million inhabitants across six sub-regions (Central, West, North, East, Southeast and Southwest) and 39 municipalities (Fig. 5a). To test this hypothesis, we obtained per-capita income at the census tract level (typically 150–300 households) in the MRSP, based on the residential address of each case. We then linked this information to each patient’s final diagnosis outcome: confirmed case of COVID-19 or SARI with unknown aetiology. While the income distribution of SARI cases with unknown aetiology was similar to that of all residents of the MRSP over the whole period (Fig. 5b), we observed that the income distribution of individuals with COVID-19 confirmed by laboratory and clinical criteria was initially higher than that of all MRSP residents and decreased over time towards similar levels by epidemiological week 21 (Fig. 5b). Importantly, we found that the log odds of one or more confirmed COVID-19 cases per census tract increased with per-capita income in epidemiological weeks 12 and 22 (likelihood ratio test P value < 0.001; Fig. 5b and Supplementary Table 2). This provides statistical evidence of an association between confirmed COVID-19 diagnosis and per-capita income, suggesting a socioeconomic difference in access to COVID-19 diagnosis in the MRSP. For reference, we also provide a map of per-capita income (Fig. 5a) and population density in each census tract (Extended Data Fig. 8).

Fig. 5: COVID-19 diagnosis and socioeconomic factors in the MRSP.
figure 5

a, Spatial distribution of income per capita of MRSP based on the census tract of residence. BRL, Brazilian reais. NA, not applicable. b, Distribution of household per-capita income based on the census tract of residence for COVID-19 cases and SARI cases with unknown aetiology. The distribution of average per-capita income for MRSP as a whole, weighted by population size, is shown on the left. Per-capita income distributions are presented in box plots, where the horizontal line inside the box represents the median per-capita income level, and the box edges show the per-capita income within the first and third interquartile range. The whiskers represent the per-capita income range. Epi, epidemiological. c, Posterior mean relative risk of COVID-19 confirmed diagnosis (top) and SARI cases with unknown aetiology (bottom) for epidemiological week 12 (before implementation of NPI in São Paulo state) and weeks 16 and 21 (after implementation of NPI in São Paulo state) (see Methods for details).

We conducted a geospatial analysis to understand the distribution of relative risk of observing a COVID-19 case or SARI case with unknown aetiology in the MRSP, using a Bayesian method and adjusted for spatial and non-spatial effects as defined by the Besag–York–Mollié model19 (Fig. 5). Our estimates show an increase in the relative risk of COVID-19 diagnosis in higher-income census tracts between epidemiological weeks 12 and 21, especially in the central region of the MRSP (Fig. 5a,c). We observed a similar trend in the relative risk of SARI cases with unknown aetiology among residents of the central region. However, there was also an increased probability of SARI cases with unknown aetiology in the southwest, west, north and south sub-regions, where income per capita is typically lower. Overall, the relative risk of SARI cases with unknown aetiology is more spatially widespread in the MRSP than that of confirmed COVID-19 cases (Fig. 5c).

The relative risk of SARI cases with unknown aetiology compared with confirmed COVID-19 cases in the central region of the MRSP decreased through time, probably as a response to several NPIs implemented throughout the state of São Paulo (see Supplementary Table 1). By week 16 (1 month after the start of the NPIs in São Paulo), we detected an increased risk particularly of SARI cases with unknown aetiology outside the central region of the MRSP, especially in the southwest region. SARI cases with unknown aetiology risk were also high in the east region. By week 21, the risk remained high throughout the central region and the risk of SARI cases with unknown aetiology decreased in the east region, possibly as a result of interventions targeting the reduction of SARS-CoV-2 transmission.

Demographics and characteristics of COVID-19 hospitalized and fatal cases in Brazil

Analysis of the age–sex structure of 67,180 confirmed COVID-19 cases reported on the SIVEP-Gripe system revealed a high proportion (44,027 (65.5%) of 67,180) of confirmed COVID-19 infections in middle- or older-aged individuals (≥50 years of age) and a lower proportion (1,454 (2.2%) of 67,180) in younger age groups (≤20 years of age) (Fig. 6a). The median age was 59 years (interquartile range = 44–72). The majority (38,654 (57.5%) of 67,180) were male. Similarly, 59% (14,498 of 24,519) of COVID-19 deaths were in men, and 85% (20,916 of 24,519) were in people aged ≥50 years. A total of 2.95% (1,983 of 67,180) cases were reported as nosocomial transmission, defined as a COVID-19 case acquired after hospitalization. Overall, 116 newborns (≤1 month old), 381 infants (≥1–12 months old), 518 children (≥1–12 years old) and 258 adolescents (≥12–17 years of age) were diagnosed with COVID-19. In addition, 740 patients were pregnant (61 in the first trimester, 172 in the second trimester, 447 in the third trimester and 60 with missing gestational age).

Fig. 6: Age–sex structure and clinical features of confirmed COVID-19 cases reported in the SIVEP-Gripe system.
figure 6

a, Numbers of patients with ongoing COVID-19, or who have recovered or died from the disease, by age and sex. Ongoing cases were those still active on the SIVEP-Gripe database and without a recorded clinical outcome (death or recovered). b, Symptoms, signs and comorbidities of hospitalized individuals with confirmed COVID-19. c, Comorbidities among confirmed COVID-19 cases according to age and outcome (n = 15,720 confirmed COVID-19 cases with complete comorbidity and outcome (death or recovery) information; n = 19,409 confirmed COVID-19 cases with complete information on comorbidities and ICU admission). Horizontal axes show the proportion of patients in each age/outcome stratified for each of the comorbidities recorded.

By 31 May 2020, 91% (67,042 of 73,649) of patients with COVID-19 reported in the SIVEP-Gripe system had been hospitalized. Of these, 30.3% (22,332 of 73,649) were admitted to an intensive care unit (ICU). The median length of ICU stay for patients with COVID-19 was 5 d (interquartile range = 210 d; range = 0–65 d), based on the ICU admission and discharge dates of 8,240 confirmed cases. Most symptoms reported by patients with COVID-19 were a cough (56,681 (85.2%) of 66,514 without missing data), fever (51,312 (79.6%) of 65,310) and dyspnoea (51,312 (76.6%) of 65,310) (Fig. 6b). These three symptoms comprise part of the case definition of SARI in Brazil. In addition, 68% (40,806 of 60,400) of individuals with COVID-19 were hypoxic (O2 saturation < 95%), reflecting the overall severity of cases reported on SIVEP-Gripe (as shown in Fig. 1). The most prevalent comorbidities were cardiovascular disease (23,085 (66.5%) of 34,693 without missing data) and diabetes (17,271 (54.5%) of 31,672) (Fig. 6a). Among the patients with COVID-19, older age groups tended to have a higher proportion of comorbidities than younger age groups in different outcomes (Fig. 6c). The proportions of the general Brazilian population with cardiovascular disease and diabetes are 4.2 and 6.2%, respectively20. A total of 83.7% (17,921 of 21,414 with complete comorbidity information) of individuals with confirmed COVID-19 had at least one comorbidity (see Supplementary Table 2 for information on data completeness).

Discussion

While the COVID-19 epidemic in Brazil continues to grow, details of its transmission potential and clinical and epidemiological characteristics remains poorly understood. We estimate a higher median transmission potential (R0) of SARS-CoV-2 of 3.1 (2.4–5.5) in Brazil compared with Italy, the United Kingdom, France, and Spain, which have point estimates of R0 varying from 2.5–2.6; however, the credible intervals overlap substantially. We have also observed rapid spread of COVID-19 through the country, with more populated and better-connected municipalities being affected earlier, and less populated municipalities being affected at a later stage of the epidemic. In the São Paulo metropolitan region, we found a higher risk of diagnosed COVID-19 cases in census tracts with higher per-capita income during the early phase of the COVID-19 epidemic but also as the weeks progressed. This contrasts with the wider spread of SARI cases among sub-regions with lower per-capita income. Our results provide new insights into the Brazilian COVID-19 epidemic and highlight the high transmission potential of SARS-CoV-2 in the country, the role of its large urban centres and the lack of lockdown and the challenges in reporting and non-equitable access to testing/diagnostics as factors potentially contributing to the rapid and sustained spread of the epidemic in Brazil.

Recent estimates of R0 at the beginning of the COVID-19 epidemic in Brazil have suggested that an infected individual would infect on average three or four others21. The credible intervals of our estimates broadly overlap with these observations and are lower compared with previously published estimates for Brazil22. As a comparison, the reproduction number in Peru has been estimated at around 2.3 (2.0–2.5)23. Since the start of the epidemic in Brazil, several types of NPI have been adopted with varied success by the country’s 27 federal units and 5,596 municipalities. Virus transmission seems to have dropped substantially in most affected states21 and also in the city of São Paulo24. However, the estimated reproduction number remains above 1 (refs. 21,24). Thus, only mitigation (and not suppression) of the epidemic has been achieved so far, which has been linked to substantial excess deaths due to poorer healthcare available25,26. Closer surveillance of viral transmission at the local scales and an assessment of the impact of the different control measures on COVID-19 transmission will help to determine an optimal mitigation strategy to minimize infections and reduce healthcare demand in Brazil. Moreover, continued monitoring of the genetic diversity of the virus lineages circulating in Brazil24 will be important, as recent data suggest that virus diversity may play a role in virus transmissibility27,28.

We found that 65.5% of reports in the SIVEP-Gripe system, which includes most severe COVID-19 cases, are from patients aged ≥50 years of age. This observation is remarkably similar to current estimates for Latin America10, where 65% of the individuals ≥50 years of age have been estimated to be at high risk of severe COVID-19, defined as individuals with at least one condition who would require hospitalization if infected. Moreover, we found that 57 and 59% of the severe COVID-19 cases and deaths (respectively) reported in SIVEP-Gripe were male, and that the most frequent comorbidities were cardiovascular disease and diabetes. Overall, 84% of SIVEP-Gripe reports had at least one underlying condition. Of these, 21% (n = 9,471/45,480) were included in the working age bracket (16–65 years of age). Moreover, only 2.6% (n = 1,892/73,673) of the COVID-19 confirmed cases reported in the SIVEP-Gripe system included occupation information. Information on socioeconomic determinants, as well as occupation and race/ethnicity, are critical29 as this allows the prioritization of control efforts; for example, towards healthcare workers and patients attending hospitals30 or work settings31.

Our data uncover a socioeconomic bias in testing and diagnostics in current surveillance guidelines and suggest that the number of reported confirmed case counts may substantially underestimate the number of cases in the general population, particularly in regions of lower socioeconomic status. Socioeconomic differences are associated with access to healthcare32 and should be taken into account when designing targeted interventions. We found that the proportion of SARI cases with unknown aetiology versus confirmed COVID-19 cases has increased across the entire country (as of 15 June 2020, the number of reported SARI cases with unknown aetiology was nearly twofold greater than the number of confirmed COVID-19 cases). Based on clinical and epidemiological grounds, it is likely that many SARI cases with unknown aetiology are caused by SARS-CoV-2. In order to rigorously establish the contribution of non-SARS-CoV-2 infections to the SARI cases, we would need additional denominator data to understand the level of testing for these viruses (that is, the negative test results). Our findings with regards to socioeconomic bias are likely to apply to other states and regions of Brazil and highlight the importance of scaling up surveillance and laboratory capacity within Latin America. Indeed, the largest Brazilian serosurvey conducted to date suggests that undetected cases may be seven times higher than reported cases33.

We further show that SARI cases with unknown aetiology are associated with lower socioeconomic status in the MRSP. The socioeconomic disparities observed here were particularly evident at the beginning of the outbreak (Fig. 5b). This can be explained in part by: (1) the high proportion of early cases in returning travellers with higher income and better access to private laboratories for diagnostics; and (2) the more limited access to freely available diagnostic screening. For example, between 25 February and 18 March 2020, two-thirds (586 (66.9%) of 876) of diagnostic tests were performed in private medical laboratories where costs varied typically between 300 and 690 Brazilian Reais (for context, the current minimum monthly salary is 1,045 Brazilian Reais). Thus, the true burden of the epidemic in lower-income neighbourhoods is probably underestimated. In New York City, for example, poorer neighbourhoods have been found to have a higher disease burden, which is driven in part by the movement of essential workers using public transport during the pandemic34. Data-driven analyses are urgently needed to help tackle health inequities during the ongoing epidemic in Brazil. Strategies to evaluate and control transmission should consider differential assess to COVID-19 diagnosis for lower-income populations, changes in reporting systems and delays in reporting, which are key to accurately determining rates of epidemic growth35. Innovative infectious disease surveillance approaches such as those obtained from aggregated mobility data, when used properly, could help support public health actions across the COVID-19 epidemic36,37,38,39.

Epidemics of COVID-19 and influenza seem to have occurred simultaneously in Brazil (Fig. 4 and Extended Data Fig. 7) and symptoms overlap between the two infections. We detected co-circulation of eight other respiratory viruses, the most common of which were influenza A and B and human rhinovirus. We also detected multiple co-detection of SARS-CoV-2 with other respiratory viruses, such as influenza A and B and human metapneumovirus, which have also been reported elsewhere40,41. Although, co-infections with other respiratory viruses have been reported in other countries40,42,43, no difference in clinical disease severity between cases with and without viral co-infection has been observed thus far44. The co-circulation of other respiratory pathogens highlights the need to scale up laboratory and molecular screening of SARS-CoV-2 and other respiratory viruses in public laboratories across Brazil15. Continued molecular and genomic surveillance will be important to determine patterns of virus transmission and to guide public health measures in forthcoming phases of the epidemic24,45,46,47.

There are several limitations to this study. First, detailed individual-level data were only available for the REDCap and SIVEP-Gripe systems, in which many cases had incomplete documentation, particularly regarding comorbidities. Second, our socioeconomic analysis was based partially on ecological inference, using the per-capita income in the census tract of residence (rather than the actual income of the patients), and assuming the same denominator for each census tract (~300 households). We emphasize that our spatial analysis is prone to methodological constraints caused by ecological fallacy and the modifiable areal unit problem. These constraints are inherent to any spatial analysis of aggregated data. Despite the above-mentioned limitation, census tracts correspond to small areas of analysis, of no more than 300 households but often fewer than that. Social science literature on Brazil not only highlights the country’s socioeconomic inequality but also how it is spatially pronounced. For this reason, census tracts remain a useful tool with which to infer per-capita income in the absence of individual-level data. In addition, our databases were predominantly composed of hospitalized patients with COVID-19, and we were unable to evaluate the rate of hospitalization among the different socioeconomic statuses. In the future, robust modelling of the relationships between socioeconomic factors and disease severity will require a data collection system with detailed information on symptoms/signs and comorbidities both in severe and non-severe cases. Finally, our retrospective study focused predominantly on symptomatic patients who presented or were referred to health services for testing. Therefore, we are unable (and do not attempt) to describe the full spectrum of disease, nor can we describe the full epidemiological picture of this epidemic.

In conclusion, we have provided a comprehensive assessment of COVID-19 reporting and transmission in Brazil. Our findings provide important context for diagnostic screening and healthcare planning, and for future precision studies focusing on the impacts of non-pharmaceutical and pharmaceutical interventions, and the effects of social health determinants on COVID-19 transmission.

Methods

Ethical approval and case definitions

This retrospective national study was supported by the Brazilian Ministry of Health and ethical approval was provided by the national ethical review board (Comissão Nacional de Ética em Pesquisa; protocol number CAAE 30127020.0.0000.0068).

A patient presenting with an acute respiratory syndrome (fever and at least one sign/symptom of respiratory illness) and: (1) a history of travel to a location with community transmission of COVID-19; or (2) contact with a confirmed or probable COVID-19 case in the 14 d preceding symptom onset; or (3) absence of an alternative diagnosis that completely explained the clinical presentation6 was considered to have suspected COVID-19.

Initially, a traveller was suspected to have COVID-19 only when arriving from China, although the definition of suspected cases associated with travel later included Japan, Singapore, South Korea, North Korea, Thailand, Vietnam and Cambodia (21 February 2020), then also Italy, Germany, Australia, the United Arab Emirates, the Philippines, France, Iran and Malaysia (25 February 2020), then also the United States, Canada, Switzerland, the United Kingdom and four additional countries (3 March 2020). From 9 March 2020 onwards, the Ministry of Health decided to start testing all hospitalized patients with severe respiratory symptoms, regardless of their travel history.

Contact with a confirmed or probable COVID-19 case was defined as face-to-face or direct contact with someone known to have COVID-19, or direct contact in a healthcare setting. Moreover, patients reporting travel to an affected country in the preceding 14 d were considered imported cases. Cases not meeting this criterion were considered to be due to local transmission.

Suspected COVID-19 cases were confirmed by laboratory testing (that is, molecular diagnostics with real-time quantitative PCR), or by clinical epidemiological criteria. In the latter case, the classification was used when laboratory testing was inconclusive or unavailable, as recommended by the Brazilian Ministry of Health guidelines dated 6 April 202048, and by the World Health Organization interim guidance dated 25 March 202049.

Individual-level reporting of COVID-19 and SARI cases with unknown aetiology from Brazil

To investigate individual-level diagnostic and demographic data, self-reported travel history, place of residence and likely place of infection, differential diagnoses for other respiratory pathogens, as well as clinical details, including comorbidities, we collected three epidemiological data sources: (1) n = 67,344 suspected and n = 1,468 confirmed cases reported to the REDCap database from 25 February to 25 March 2020; (2) n = 73,637 confirmed SIVEP-Gripe cases from 1 March to 31 May 2020 (available at http://shiny.hmg.saude.gov.br/dataset); and (3) n = 514,200 confirmed cases from aggregated data released daily at the Portal do COVID-19 (Brazilian Health Ministry) from 25 February to 31 May 2020 (available at https://covid.saude.gov.br). The SIVEP-Gripe system reports cases of SARI, which can be defined as an acute respiratory infection with onset, within the past 10 d, of fever (≥38 °C) and cough, and typically requires hospitalization (see also Fig. 1a).

Basic reproduction number estimation

We estimated the basic reproduction number (R0) for SARS-CoV-2 using time series of confirmed COVID-19 cases at the national and state (São Paulo, Rio de Janeiro, Ceará and Amazonas) level (Extended Data Fig. 1). To avoid the impact of NPIs on R0 estimates, only data points up to 14 d after the implementation of the strictest interventions were used. As lockdown was not imposed in Brazil, the strictest measure was considered to be the closure of non-essential commerce. For European countries, the date of lockdown was used as the NPI date. NPI dates for Brazilian states were collected from state decrees. For Brazil as a whole, the NPI date for São Paulo state was used, as by that point most states in Brazil had already closed non-essential commerce. For the European countries, lockdown dates were collected from https://www.covid19healthsystem.org/mainpage.aspx.

To test the estimation routine and provide international context, this analysis was replicated on equivalent time series from Italy, Spain, France and the United Kingdom. Aggregated epidemiological data from the United States and China were not included due to possible heterogeneity within each country. Daily counts of confirmed cases were modelled with a negative binomial distribution with a mean equal to a fixed portion, ρ, of the total daily number of cases in an exponential model of incidence. The functional form of the incidence model is \(\rho R_{0} \gamma i_{0}{\rm{e}}^{(R_{0} - 1)\gamma t}\), where ρ is the probability of an infection being counted in the time series, R0 is the basic reproduction number, γ is the rate at which individuals cease to be infectious. and i0 is the proportion of the population that was infectious at the start of the observations. We assume that the observed number of cases on day n was drawn from a negative binomial observation where the mean is μ(𝑛) and the variance σ = μ + μ2/𝑘, with fixed size parameter 𝑘 (dispersion parameter). The product of ρ and 𝑖0 is denoted ξ. Since the probability of being observed and the initial condition only appear as the product ξ in the likelihood, there is an identifiability problem preventing the estimation of ρ and i0 individually, and consequently we only consider their product, ξ. Although in this model it is theoretically possible to estimate both R0 and γ, in practice this is difficult, so we use an informative prior to constrain γ to a priori plausible values. The factor of ρR0γ accounts for the partial observation of the incidence. In this analysis, the delay between infection and reporting was not accounted for.

Since ρ and i0 only appear together, they were unidentifiable, and we combine them into a single parameter, ξ. This identifiability issue prevents us from estimating the prevalence without additional information to inform either i0 or ρ. The analysis was carried out in a Bayesian framework with an uninformative prior distribution on R0 and an informative prior on the removal rate. All other parameters had weakly informative prior distributions (see Supplementary Information). The informative prior ensures that an individual is infectious for an average of 5–14 d (ref. 50) (Supplementary Information and Figs. 5 and 6). Standard diagnostics were used to check whether the Markov chain Monte Carlo samples were satisfactory. Full details of the model used, the estimation process and convergence of Markov chain Monte Carlo chains can be found in the Supplementary Information.

Geospatial analysis of COVID-19 cases and socioeconomic status

The average household per-capita income for the MRSP was retrieved at the census tract level from the 2010 census (https://censo2010.ibge.gov.br/). We geocoded 24,063 COVID-19 cases and 32,914 SARI cases with unknown aetiology from MRSP, which were reported until 28 May 2020. The geocoding was based on self-reported residential addresses or postal codes using the Galileo algorithm51 and coordinates were confirmed using Google API.

To elucidate the distribution of COVID-19 cases and SARI cases with unknown aetiology, we mapped the mean relative risk of COVID-19 and SARI with unknown aetiology at the census tract level for MRSP for three epidemiological weeks (12, 16 and 21) (Extended Data Fig. 9). As the observation process was a confounding process and without additional assumptions (for example, covariates), we cannot disentangle an increase in prevalence from an increase in case ascertainment. The cumulative number of cases in each tract was modelled as a Poisson random variable with a mean specified by the expected number of cases under a null model adjusted by tract specific risk due to spatial and non-spatial effects: the Besag–York–Mollié model19. Estimates of the risk of COVID-19 diagnosis or SARI cases with unknown aetiology were obtained using approximate Bayesian methods (integrated nested Laplace approximation). A complete specification of the model and the computational methodology can be found in the Supplementary Information.

The association between final diagnostic category (COVID-19 or SARI with unknown aetiology) and socioeconomic status in the subset of cases in the MRSP with geocoded residential information was evaluated using logistic regression models. We focused on the cases in epidemiological weeks 12, 16 and 22. Within each of those weeks, if a census tract reported any COVID-19 or SARI with unknown aetiology, we calculated the proportion of the number of COVID-19 cases. Since most census tracts reported only one case each week, the proportion of COVID-19 cases for each census tract was mostly either 0 or 1 in a given week. For this reason, we defined two categories: (1) the census tract only reported SARI of unknown aetiology (that is, no COVID-19 cases); or (2) the census tract reported at least one COVID-19 case during the week. We used these two categories as the binary response, and applied logistic regression models to investigate whether income per capita was associated with this response. The analyses were adjusted by the logarithm of the population sizes and the longitude and latitude coordinates of the census tracts. The analyses were performed individually for each of epidemiological weeks 12, 16 and 22. Further analysis details can be found in the Supplementary Information.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.