Main

The scales at which ecosystems are observed play a critical role in shaping our understanding of their structure and function1,2,3. Ecological patterns emerge from temporal and spatial domains that may be coarser or finer than the processes that shape them, which means that investigation across multiple scales is essential for understanding ecological phenomena1,4. This awareness has grown rapidly since the 1980s5, accelerated by the need to understand how changes in the global climate, ocean and land systems are affecting everything from individual populations6 to entire biomes7, while technological advances in areas such as remote sensing and genetics are making it ever-easier to quantify ecological features across a broad and increasing range of scales2,5.

Given the growing awareness of scale, expanding data-gathering capabilities and the fact that the most comprehensive (and arguably best-known) meta-analyses8,9 of ecological research scales were published nearly 30 years ago (but see refs 4,10 for more recent reviews), it is both timely and important to assess the scales of contemporary ecological investigation. To address this need, we quantified the spatial and temporal domains of empirical observations that were reported within recently (2004–2014) published ecological studies. We define domain as the distribution of observations within the spectrum of one or more scale dimensions (note: this definition differs from the ‘domain of scale’3, which is 'a portion of the scale spectrum within which process–pattern relationships are consistent regardless of scale’), and empirical observations as ecological observations collected under uncontrolled or non-manipulated conditions. Empirical observations are critical for developing and testing the models that explain why ecological patterns vary in time and space1,8; therefore, the spatio-temporal domains of observations provide an important indicator of the field’s progress towards achieving a holistic, predictive understanding of ecosystems1,2.

Our study focused on two dimensions of spatial scale (that is, resolution (grain) and extent) and two of temporal scale (that is, interval and duration) (Table 1). We analysed the observational domains within each of these four dimensions and between pairs of these dimensions. We also assessed two additional dimensions—actual extent (the summed area of spatial replicates) and actual duration (the summed observational time of temporal replicates)—which we used to evaluate how much the actual scales of observation (that is, how much space and time are covered by the measurement) differ from the scales they ostensibly represent. These differences may impact how effectively observations characterize ecological phenomena. For one, an increasing gap between actual and ostensible observational scales implies greater interpolation or extrapolation of observed measurements, raising the odds of over-leveraging data. Furthermore, since natural systems are frequently complex, nonlinear and non-random11,12,13, a larger gap increases the likelihood of data challenges such as censoring (sensu14) as phenomena may resolve themselves in the space or time between replicates.

Table 1 Scale dimensions of ecological observations assessed in this meta-analysis

Results

We reviewed 348 papers randomly selected from 42,918 published between 2004 and 2014 in the top 30 ecology-themed journals. We extracted scale data from 378 observations of ‘natural’ (that is, non-experimentally manipulated) ecological features reported within 133 of the reviewed papers (plus an additional 62 cited as the source of observations). Most sampled observations were collected using conventional field methods (80%), followed by automated in situ sensing techniques (12.4%), remote sensing (6.9%) and palaeo-reconstruction (<0.8%).

Observational domains within individual dimensions

In terms of resolution, spatial replicates for most (67%) observations were ≤1 m2, 24% were 1 m2 up to 1 ha, and 9% were >1 ha (Fig. 1a). These distributions primarily reflect those of field observations, the dominant observational methodology. Automated sensing and palaeo-reconstruction observations had resolutions that were generally finer (85% or more ≤0.1 m2) than field observations (47% ≤0.1 m2), while most remote observations were much coarser (70% >100 m2; Supplementary Fig. 1).

Fig. 1: Observational domains within individual dimensions.
figure 1

ad, Histograms of the resolution (a), extent (b), interval (c) and duration (d) of observations collected from the surveyed ecological studies. Bars represent the average percentages for each bin realized after 1,000 perturbed resamples, while the grey error bars indicate 95% confidence intervals. The bar widths in c and d indicate differences in scale between the x axis labels. The grey vertical line in d indicates that the majority (>95%) of observations of ≤1 day duration were temporally unreplicated. kyr, thousand years.

The extent of 19% of the observations was ≤10 ha, 23% covered 10–1,000 ha, 11% covered 1,000–10,000 ha, 19% covered 10,000–100,000 ha, 12% covered 100,000–1,000,000 ha and 15% covered >1,000,000 ha (Fig. 1b). As with resolution, the extent covered by automated sensing methods tended to be smaller (52% ≤100 ha) than those of field observations (31% ≤100 ha), while 96% of remote and all palaeo-reconstruction observations covered areas >10,000 ha.

In the temporal dimensions, 37% of observations were not repeated (Fig. 1c), 17% were repeated at short intervals (sub-second to daily), 20% were repeated at daily to monthly intervals, 18% were repeated at monthly to yearly intervals, 6% were repeated at yearly to decadal intervals and 2% were repeated at decadal or greater intervals. Among temporally replicated observations (Supplementary Fig. 1), automated sensing had the finest intervals (61% ≤1 day and 100% ≤1 year), followed by remote observation (37% ≤1 day and 78% ≤1 year), field methods (17% ≤1 day and 86% ≤1 year) and palaeo-reconstructions (21% ≤1 decade).

The duration was ≤1 day for 31% of the sampled observations (due to lack of temporal replication), while 10% covered 1 day to 1 month, 23% covered 1 month to 1 year, 27% covered 1–10 years and 9% covered >1 decade (Fig. 1d). Palaeo-reconstructions naturally had the longest duration (67% >1 decade), while only 40% of field, automated and remote observations had durations exceeding 1 year.

Observational domains within two dimensions

Contrasting resolution with interval revealed that most temporally replicated observations had resolutions of 10 cm2 to 1 m2 and were revisited at daily to yearly intervals (Fig. 2a). A less dense, oblong concentration of observations bounded on the upper left by monthly to yearly observations at 100 m2 resolution and on the lower right by near-daily to monthly observations with 1–10 ha resolution is also evident. The four observational methods had substantially different domains, as indicated by the locations of their median values (see Supplementary Fig. 2): the median domain of field observations had 0.1–1 m2 resolution and a monthly interval, whereas remote observations had a coarser median resolution (1,000 m2) but finer median interval (1 day). Palaeo-reconstructions and automated sensing were both finely resolved (median between 10 cm2 and 0.01 m2), but automated approaches had an hourly to daily median interval compared with a multi-decadal interval for palaeo-reconstructions.

Fig. 2: Observational domains within two dimensions.
figure 2

ad, Kernel density estimates of observational densities within the domains defined by: resolution and interval (of temporally replicated observations) (a), duration and interval (of temporally replicated observations) (b), resolution and extent (c), and duration and extent (d). Density estimates were applied to the log-transformed values of each observational dimension, and density estimates were rescaled to represent percentages. The letters in the plots denote the median values of different observational methods (a, automated sensing; f, field observations; p, palaeo-observations; r, remote sensing). The grey shaded areas represent physically impossible domains (intervals greater than duration and resolutions greater than extent). Density values below the lower third percentile fall within the darkest portion of the colour gradient. The grey vertical line in d indicates that the majority (>95%) of observations of ≤1 day duration were temporally unreplicated.

Comparing the interval and duration of temporally replicated observations showed that most observations had daily to decadal intervals and durations of one month to one decade (Fig. 2b). Interval appears to increase with duration; observations lasting one month to one year tend to have daily to monthly intervals, while those lasting one year to one decade tend to have yearly to decadal intervals. This tendency is reflected in the domain medians of the primary observational methods: automated sensing had the finest median interval (hour–day) and shortest duration (month–year), followed by remote sensing (~1 day and 1 year, respectively), field observations (1 month and ~1 year, respectively) and finally palaeo-reconstructions (1 decade and millennium, respectively).

Contrasting the two spatial dimensions shows a primary concentration of observations of 10 cm2 to nearly 100 m2 resolution with extents ranging between 1,000 and 1,000,000 ha (Fig. 2c). Another prominent concentration consists of higher-resolution (1 cm2 to 1 m2), smaller-extent (10–1,000 ha) observations, beneath which lies a third, fainter concentration of 1–1,000 cm2 resolution and 1,000 m2 to <10 ha extent. These three concentrations suggest that observational extent increases with resolution, which is further evident in the median domain values (and kernel densities; Supplementary Fig. 2) of automated (0.01 m2 resolution, 100 ha extent), field (0.1–1 m2 resolution, 1,000–10,000 ha extent) and remote (1,000 m2 resolution, 1–10 million ha extent) observations. Palaeo-reconstructions were the exception, having very fine median resolution (0.01 m2) but large extent (1 million ha)—a possible artefact of small sample size.

There are two primary observational domains within the contrast between duration and extent. The first consists of observations lasting 1 month to 1 decade with extents of 10–1,000 ha, while the second is defined by observations of 1 year to several decades that cover 10,000–1,000,000 ha (Fig. 2d). Three other notable but lesser concentrations are also evident, including small-area observations (0.1–1 ha) covering 1 month to 1 decade, and short-duration, temporally unreplicated observations (≤1 day) of either 1–100 ha or 10,000–1,000,000 ha. The median observation from automated sensing (1 year duration, 100 ha extent) lies near the centre of the first major concentration, while the median extents of field (1,000–10,000 ha) and remote (1–10 million ha) observations bound the second major concentration at its upper and lower extents, with the median duration of both observational types falling between 1 month and 1 year.

Differences between actual and ostensible scales

Observational extent was on average 5.6 orders of magnitude larger than actual extent (Fig. 3a). This difference increased with extent, reaching a maximum of 8.3 between 100 million and 1 billion ha of extent, then falling to 3 orders of magnitude between 1 and 10 billion ha (these extents comprised <2% of observations, which were primarily collected with remote sensing). Remote observations had the smallest mean difference magnitude (1.9), compared with ≥5.7 for the other three methods (Supplementary Fig. 3).

Fig. 3: The magnitudes of the difference between actual and ostensible scales.
figure 3

a,b, Difference between extent and actual extent (the summed area of spatial replicates) (a), and duration and actual duration (the summed sampling duration across temporal replicates) (b). Difference values are expressed in terms of how many orders of magnitude larger (longer) extent (duration) is than actual extent (actual duration), and are summarized (as box plots, with the circle in the box representing the mean and the line the median) in bins representing the increasing scales of extent (duration). The percentages of observations falling within each bin are indicated by the colour of the interquartile and the numerical value above the upper whisker.

The difference magnitudes between observational duration and actual duration were somewhat smaller, averaging 3.4 and ranging from ~2 for the shortest durations (hour–day) to >4 for observations lasting 1 decade to 1 century (Fig. 3b). As with extent, the difference fell substantially for the longest durations (century to millennia), as these domains were covered by palaeo-reconstructions (Supplementary Fig. 3), which show little difference between actual and ostensible duration because coring techniques capture continuous temporal records. The mean difference magnitudes for the other three observational methods ranged from just over 3 (field and automated sensing) to nearly 6 (remote observations).

Potential biases and uncertainties in quantifying scales

Our results were potentially influenced by several methodological issues. First, most studies did not precisely report observational scales, thus we had to estimate, rather than simply record, scale values for most observations (we estimated 63, 60, 69, 36, 64 and 83% of resolution, extent, actual extent, interval, duration and actual duration values, respectively). Estimation errors may therefore have biased our findings. We attempted to quantify and account for this error by assessing between-observer variability and incorporating this uncertainty into our resampling methodology (Supplementary Results). The resulting confidence intervals (Fig. 1) suggest that estimation errors did not unduly influence our findings.

Our scale-estimation protocols may also have introduced bias—particularly our protocol for estimating resolution (the smallest areal unit of complete measurement). We selected this definition for the sake of consistency, but some papers reported resolution as a larger area in which sub-samples were taken. For these, our estimates were finer than what the studies’ authors considered to be the resolution. Our results would also be somewhat different if we had included observations from experiments. For example, average resolution and duration would probably be finer8,9. Additionally, the token one second (Supplemental Methods) we used to represent the duration of remotely sensed temporal replicates (which are effectively instantaneous) caused us to underestimate the differences between their durations and actual durations (Supplementary Fig. 3). However, the relatively small number of remote observations suggests that the impact of this bias on our overall findings was negligible.

It is also possible that our findings misrepresent observational domains because of sampling error. Although we randomized our sample to ensure representativeness, we reviewed just 0.8% of the papers published during our study period. Our sample may therefore under- or over-represent observational coverage in certain domains, particularly for specific methods. This possibility is greatest for palaeo-reconstructions, where the small sample size probably resulted in an overestimate of typical observational extent (for example, Fig. 2b,c; however, the interval and duration values are probably more representative).

Finally, our omission of papers published after 2014 could also have biased our findings. Although our sample size was too small to assign statistical significance, we found a possible positive trend in the use of remote observations and a corresponding decline in field observations over the course of our study period. If these trends were not spurious, they suggest that including studies from 2015–2017 would result in a somewhat larger relative sample of remote observations, which could slightly increase the mean observational extent (see Supplementary Results).

Discussion

Our results suggest that modern ecology’s observational domains are fairly narrow and that ecologists still primarily rely on conventional field-based observational techniques. In the spatial dimensions, most observations have resolutions ≤1 m2 and extents ≤10,000 ha (Fig. 1a,b). In the temporal dimensions, most observations are either unreplicated or relatively infrequent (>1 month interval; Fig. 1c), and have relatively short durations (≤1 year; Fig. 1d).

Contrasting observational dimensions reveals that larger extents are associated with larger spatial replicates (Fig. 2c), while longer durations are associated with longer intervals (Fig. 2b). The latter association reflects a cost-imposed tradeoff between sampling frequency and temporal duration that is characteristic of field observations, but also appears to affect the other three methods, as evidenced by their relative domain locations. A similar tradeoff is illustrated by the inverse relationship between resolution and interval (Fig. 2a), which primarily relates to field observations, where larger spatial replicates demand greater effort, reducing sampling frequency9. Less obvious is the opposite tradeoff that affects remote observation (Supplementary Fig. 2), where finer resolution (necessary for detail) typically necessitates longer intervals15.

As a result of these tradeoffs, there are several notable observational gaps, specifically within the domains defined by high-frequency (daily to sub-daily intervals) observations with high to moderate resolutions (>1 m2 to 100 ha; Fig. 2a) and decadal or longer durations (Fig. 2b). Another gap is evident in the high-to-moderate-resolution, large-extent (1 million to 10 billion ha) domain (Fig. 2c).

Have these domains changed since the seminal papers on scale first appeared in the late 1980s?1,3,8 A comprehensive answer would require a similar analysis focused on earlier literature, but the data provided by three previous studies provide partial insight. The first dataset consists of duration values that ref. 8 extracted from 623 studies published in Ecology between 1977 and 1987. The mean duration of the most comparable subset of those values (n = 419) was 3.6 years, versus 3.3 years in our sample (or 5.1 year when excluding temporally unreplicated observations). The second dataset is found in ref. 9, which assessed the resolutions of 97 community ecology experiments published in Ecology between 1980 and 1986. The average of those (12,657 m2) was substantially smaller than the mean of our sample (1,479,465 m2), but comparing the eightieth percentile value (197 m2) in ref. 9 with ours (115 m2) shows that most contemporary observations are finer-grained than most 1980s-era experiments. Ref. 10 provides the third dataset, which compares the extent and interval of 25 studies published in 2003–2004 in Ecology. The mean interval was 178 days, compared with 684 days in our sample, but the eightieth percentile value in our study was 169 days compared with 329 days in theirs. Extent in our sample was substantially larger according to multiple summary statistics, including the mean (368,403 ha versus 114,965,072 ha in our study), median (9 ha versus 5,051 ha) and ninetieth percentile (136,000 ha versus 46,424,808 ha; this value is smaller than the mean, which is skewed by a small number of very-large-extent observations).

Although limited due to methodological differences (for example, a focus on experiments versus unmanipulated systems), these comparisons suggest that the duration and resolution of ecological observations have changed little in the past 30 years, but observational frequency and extent have both increased. A weak positive trend in our data also suggests that the mean extent of ecological observations is steadily increasing (Supplementary Fig. 5), which probably corresponds to increasing use of remote sensing (Supplementary Fig. 4).

Despite this apparent increase in observational extent, there remains a large gulf between the areas that ecologists actually observe and the areas their observations are intended to represent (Fig. 3a). A substantial discrepancy also exists between the amount of time spent observing phenomena and the time spans those observations theoretically represent (Fig. 3b). These differences between the actual and ostensible scales of observation have implications for ecological understanding, as the unobserved portions of space and time may contain important patterns and processes that are not captured by replicates, due to phenomenon-dependent factors such as autocorrelation and representativeness of the sampling scheme16,17,18,19,20. Brief, infrequent snapshots, or fine-grained, spatially sparse replicates, may be sufficient to characterize many phenomena (for example, annual changes in tree cover are well-represented by low-frequency satellite imaging21), but may be inadequate for more dynamic phenomena. For example, wildfire extent and duration can be mapped by daily return satellites22,23, but the instantaneous nature of the imaging means that they cannot be used to observe fire behaviour24. To capture such behaviour, long periods of continuous observation may be more important than frequent repeats for understanding the dynamics.

It is therefore important to examine whether the scales of the phenomena being observed are adequately captured by the design of replicates. Our methods suggest one possible procedure for assessing the scale representativeness of observations, which is to (1) calculate the autocorrelation (spatial or temporal) within the observations (for example, using a semi-variogram), (2) find the threshold distance (or time) below which a suitably strong correlation (for example, r = 0.7) will exist between neighbouring sampled values, (3) add that distance (or time) to the sample resolution (or duration) and (4) recalculate actual extent (or duration) using the adjusted resolution (or sampling duration). The difference between this autocorrelation-adjusted actual extent (or actual duration) and extent (or duration) may provide a useful additional measure of how well the replicates represent the intended scale of observation. Although increasing spatial or temporal coverage may not always be the goal of a study, if the gap between actual and ostensible values remains large, alternative sampling methods may be used to close it. For example, remote sensing provides wall-to-wall spatial coverage of a study area, erasing the difference between actual extent and extent. Furthermore, the interval of high-resolution imaging (higher resolution is preferred in images as it allows individual features to be better discerned25,26) is now approaching daily to sub-daily scales27,28, allowing improved representation of spatial and temporal dynamics. For phenomena that cannot be measured from space—either because they are not visible or because they require continuous observation—new approaches for collecting in situ or near-surface observations (for example, low-cost wireless sensors10,29,30, citizen observers31 and autonomous vehicles32) can be used to increase the spatial and temporal coverage of observations.

The aforementioned insights regarding modern observational domains must be tempered by the uncertainty within our own scale estimates, as detailed above. However, most of this uncertainty is attributable to unclear reporting of scale values in the majority of papers we reviewed (a problem also noted in geography studies33). This tendency towards vague documentation offers one final insight: despite decades of accumulated knowledge regarding its importance1,2,3,34, scale appears to remain a low priority throughout much of the ecological discipline. Beyond contributing to the broader problem of scientific reproducibility35, inattentiveness to scale increases the risk that observations inadequately represent the phenomenon of interest, thereby limiting the generalizability of any derived ecological knowledge3,33,34. To mitigate this problem, we recommend that ecological journals require authors to quantify and clearly report the values of resolution, extent, interval and duration. Fortunately, some journals already appear to be implementing such policies. For example, Global Ecology and Biogeography now requires information on the spatial, temporal and taxonomic scale of studies to be in the abstract (a policy adopted in early 2016).

Looking forwards

Our study suggests that the concept of scale has yet to fully permeate the discipline of ecology. Evidence for this assertion lies in the continued narrowness of ecology’s observational scale domains and the poor documentation of scale dimensions in the literature. However, the increasing extent of ecological observations, enabled by remote sensing and presumably motivated by many ecologists’ appreciation of scale-related issues, suggests that ecology’s scale domains are gradually changing. In the coming years, the accelerating gains in technology and analytical methods will allow researchers new and unprecedented capabilities to peer into, and thus close, the prominent holes in observational domains. A renewed, discipline-wide focus on scale’s importance, including the adoption of stricter scale-reporting standards by journals, will help to spur ecologists to address these gaps, while fostering the improved transferability of knowledge within the discipline.

Methods

Paper selection and review

We used the 2012 Web of Science impact factors to select the 30 highest-ranked ecology-themed journals that published studies with an observational component, excluding journals devoted to reviews, meta-analyses, or laboratory, cellular or experimental studies. To select a representative sample of recent ecology studies, we downloaded the metadata for all papers published in the selected journals (Supplementary Table 1) between 2004 and 2014. Our study involved 6 different observers (those reviewing the papers to extract the observational scales), each of whom was given a randomly selected batch of 500 titles. A separate set of 20 papers was also randomly selected and given to all observers to review independently. This was to (1) calibrate the interpretations and extraction of scale-related information between observers and (2) estimate between-observer variance.

Each observer first reviewed the papers in the calibration set and then commenced reviewing papers in their individual random draws, beginning at the top of the list and then proceeding until at least 20 eligible papers describing ecological observations were reviewed. In cases where the reviewed papers used observations that were described in another publication, we reviewed those source papers to extract the observational dimensions. We excluded papers that were opinion or perspectives pieces (unless they presented or used existing observational data), or theoretical studies based on generated data. We also did not collect scale information from papers (or the relevant parts of papers) describing experimental manipulations because experiments tend to be of limited extent, duration and resolution due to their higher logistical costs8,9. Including data from experiments would therefore probably have biased our findings towards finer scales, while minimizing the impact that new observing methods (for example, satellite imaging and wireless sensing) may have had in expanding the scales of ecological investigation10,36,37. A bibliography of the reviewed papers appears in the Supplementary Information.

Estimating observational scales

We recorded six primary dimensions of ecological observations—three related to space and three related to time. The space-related dimensions were resolution, extent and actual extent. Here, extent was primarily defined as the area falling within a perimeter defined by the outermost spatial replicates, while actual extent was the summed area of all spatial replicates (that is, N × resolution, where N is the number of spatial replicates, which we also recorded), or the area that ecologists observe in practice. In assessing spatial scales, our analysis only considered the Cartesian plane; we did not calculate the z (or depth) dimension, although this dimension is of greater importance for certain sub-disciplines of ecology (for example, depth profiles in marine ecology). In some cases (primarily palaeoecological studies), values extracted from the z dimension provided temporal information that was used to calculate both the interval and the duration of the observation.

For time dimensions, we extracted information related to the observational interval, duration and actual duration. Duration was defined as the time between the first and last temporal replicate, whereas actual duration quantifies the amount of time spent observing a particular location, which we calculated by multiplying the sampling duration (the time spent collecting a single temporal replicate) by the number of temporal replicates.

A full definition of all dimensions and how they were recorded is contained within a list of frequently asked questions (see Supplementary Methods), which was provided to each observer for initial study and reference, and adapted as necessary during the course of the study to ensure methodological consistency.

To account for potential differences in scales related to methodology, we classified each observation according to the following broad categories: field methods (manual in situ data collection), automated (in situ) sensing, remote sensing/other geographic data (hereafter remote observations) and palaeo-reconstruction approaches. We also recorded when observations were reported in any study with an unclear or missing scale value.

Calibration and consistency

Most studies did not explicitly report values for all the assessed scales, and thus interpretation and judgement had to be applied to develop reasonable estimates for their values. The frequently asked questions (Supplementary Methods) provided the protocol we followed, and were initially developed following consultation between observers before reviewing commenced. We conducted an iterative process of calibration to ensure consistency and reliability of the estimates. First, we used the calibration set to calculate between-observer variability with respect to paper selection/rejection and the estimation of scales. Based on this, the lead author reviewed individual records in each observer’s calibration set, flagged values where the estimation procedure departed from the protocol and returned these to observers for re-estimation without providing an estimate of the actual value. Instead, the relevant section of the protocol was highlighted, and further explanation and clarifying discussion were undertaken as needed. The protocol language was adjusted for clarity during this process, and new items were added to cover circumstances that had not been addressed by the initial version. The variability measures were recalculated after each iteration.

To ensure consistency within the main analysis, the lead author also reviewed each observer’s results from their individual draw of papers and flagged values that appeared to deviate from the protocol for re-review by the observer. Revised values were re-inspected, and in some cases a secondary review of particular papers was undertaken to cross-check the estimated scales.

Scale-estimation uncertainty

Two major and related sources of uncertainty affected our estimation of observational scales: (1) unclear documentation of observational scales in the reviewed studies; and (2) variation between observers in estimating observational scales (largely in cases where scales were not explicitly reported). To account for these uncertainties, we first quantified the between-observer variability in scale estimates (expressed as the coefficient of variation), which was constructed from each observer’s final reported calibration set results. We then used the coefficients of variation for each dimension as the basis for randomly perturbing—over the course of 1,000 iterations—the scale values for each of the sampled observations. For each observational dimension at each iteration, we perturbed its observer-estimated scale value by: (1) randomly selecting (from a uniform distribution) a percentage value p that fell between 100 + y and 100 − y (where y was the dimension-specific coefficient of variation, expressed as a percentage) and (2) multiplying the scale value by the corresponding proportion (p / 100). The perturbation occasionally resulted in physically impossible values (for example, interval or actual duration longer than duration, or actual extent larger than extent). In these cases, we capped the perturbed value in the smaller of the two dimensions (that is, resolution or interval) so that it equalled the corresponding value in the largest (that is, extent or duration). We used the resulting set of perturbed observations to quantify uncertainty within our scale estimates.

In addition to the scale-estimation coefficient of variation, we also examined how well observers agreed regarding paper inclusion/exclusion, and how many extractable observations there were per included paper (see Supplementary Results).

Analyses

To characterize the scale domains of observations, we first log-transformed (base-10) the scale values within the 1,000 member perturbed ensemble to account for the large range in values. To examine the distributions of observational scales within individual dimensions (Fig. 1), we first constructed relative frequency histograms for each of the 1,000 transformed ensemble members for each dimension and then plotted the bin means across all members, as well as the upper and lower 2.5th percentile values for each bin. This produced a histogram of observational scales within each dimension that accounted for scale-estimation uncertainty.

To evaluate the distributions of observations within two scale dimensions (Fig. 2), we used the splancs package38 of R39 to calculate a kernel density estimate of the log-transformed values across all ensemble members, using a bandwidth of 1 on a 0.1 resolution image to provide a smoothed result that served to more effectively highlight domains in which ecological observations were concentrated. Bandwidths of varying resolutions were tested on kernel density estimates of sampling interval versus plot resolution to test how sensitive our results were to the bandwidth value (see Supplementary Results). For comparisons involving interval, we removed temporally unreplicated observations because these lacked interval values.

To compare the differences between actual extent and extent and actual duration and duration (Fig. 3), we calculated the magnitude of difference (decade) between each pair as:

$${\mathrm{decade}} = {\log}_{10}x - {\log}_{10}y = {\log}_{10}\frac{x}{y}$$

Where x is either extent or duration and y is actual extent or actual duration, respectively. We then evaluated how the magnitudes of difference varied with increasing values of extent/duration, using box plots to summarize decades within the same bins used to summarize the frequency distributions of the extent and duration of observations (Fig. 1b,d). Decades were calculated for each pair for all bootstrap replicates. We plotted the box plots against their corresponding bin means to evaluate how these differences varied with scale (Fig. 3).

Trends in methods and scale

To evaluate the potential impact that excluding studies from 2015–2017 would have on our findings, we analysed the trends in (1) ecological observing methods and (2) typical scales of ecological observations over the 10 year period. To undertake the former assessment, we calculated the percentage of observations made using remote sensing, general field methods and automated in situ methods, and fit a linear regression between these percentages and the publication year, weighting the regression by the total number of observations in each year. For the second analysis, we applied the same regression approach to the four primary dimensions (resolution, extent, interval and duration) to assess whether there were any trends in observational scales.

The regressions and resulting code for trend extrapolations can be found in the ‘additional analyses’ vignette in the accompanying R package/code repository (available at https://github.com/agroimpacts/ecoscales).

Extracting and analysing data from earlier meta-analyses

To compare the results of our analysis with the observational scales of earlier ecological studies, we used graph capture software (https://automeris.io/WebPlotDigitizer/) to extract the data values from figure 6.1 of ref. 8, figure 1 of ref. 9 and figure 2 of ref. 10. To maintain as much comparability as possible with our inclusion criteria, we excluded experimental studies in the data from ref. 8, as well as the values of any studies exceeding 100 years’ duration (no upper time bound was provided for these), leaving duration values for 419 (out of 623) studies. Since ref. 8 presented duration values as a histogram, we calculated the mean duration across all studies as the weighted (by number of observations per bin) mean of bin centre-point values (that is, the weighted mean of the bin means). We also excluded 4 (of 29) observation values from the data in ref. 10 on observational extent and frequency, which, in contrast with the other 25, were not randomly selected. Ref. 10 also used irregular scales for both x (frequency) and y (extent) axes; therefore, we had to visually estimate the scale values for each data point after graphical extraction, and converted their extent values (in km) to hectares and their frequency values to intervals. Ref. 9 presented resolution as plot diameters (m), which we squared to make comparable to our resolution metric.

Calculations of scale values from these studies can be found in the ‘additional analyses’ vignette in the accompanying R package/code repository (available at https://github.com/agroimpacts/ecoscales).

Reporting Summary

Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.

Code availability

The code supporting this manuscript is available online at https://github.com/agroimpacts/ecoscales.

Data availability

The data supporting this manuscript are available online at https://github.com/agroimpacts/ecoscales.