Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Cell type prioritization in single-cell data

This article has been updated

Abstract

We present Augur, a method to prioritize the cell types most responsive to biological perturbations in single-cell data. Augur employs a machine-learning framework to quantify the separability of perturbed and unperturbed cells within a high-dimensional space. We validate our method on single-cell RNA sequencing, chromatin accessibility and imaging transcriptomics datasets, and show that Augur outperforms existing methods based on differential gene expression. Augur identified the neural circuits restoring locomotion in mice following spinal cord neurostimulation.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Augur correctly prioritizes cell types in synthetic and experimental single-cell datasets.
Fig. 2: Augur identifies neuron subtypes that enable walking after paralysis.

Similar content being viewed by others

Data availability

Raw sequencing data and count matrices have been deposited to the Gene Expression Omnibus (GSE142245).

Code availability

Augur is available from GitHub (https://github.com/neurorestore/Augur) and as Supplementary Software 1.

Change history

  • 23 July 2020

    In the PDF version of this article originally published, Extended Data Figs. 1–8 were replaced by Extended Data Figs. 3–10, respectively.

References

  1. Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382 (2009).

    CAS  PubMed  Google Scholar 

  2. Cao, J. et al. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357, 661–667 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Mathys, H. et al. Single-cell transcriptomic analysis of Alzheimer’s disease. Nature 570, 332–337 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).

    Article  CAS  PubMed  Google Scholar 

  5. Soneson, C. & Robinson, M. D. Bias, robustness and scalability in single-cell differential expression analysis. Nat. Methods 15, 255–261 (2018).

    Article  CAS  PubMed  Google Scholar 

  6. Rossi, M. A. et al. Obesity remodels activity and transcriptional state of a lateral hypothalamic brake on feeding. Science 364, 1271–1274 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Hrvatin, S. et al. Single-cell analysis of experience-dependent transcriptomic states in the mouse visual cortex. Nat. Neurosci. 21, 120–129 (2018).

    Article  CAS  PubMed  Google Scholar 

  8. Avey, D. et al. Single-cell RNA-Seq uncovers a robust transcriptional response to morphine by glia. Cell Rep. 24, 3619–3629 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Chen, R., Wu, X., Jiang, L. & Zhang, Y. Single-cell RNA-Seq reveals hypothalamic cell diversity. Cell Rep. 18, 3227–3241 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Grubman, A. et al. A single-cell atlas of entorhinal cortex from individuals with Alzheimer’s disease reveals cell-type-specific gene expression regulation. Nat. Neurosci. 22, 2087–2097 (2019).

    Article  CAS  PubMed  Google Scholar 

  11. Hagai, T. et al. Gene expression variability across cells and species shapes innate immunity. Nature 563, 197–202 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Mostafavi, S. et al. Parsing the interferon transcriptional network and its disease associations. Cell 164, 564–578 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361, 6400 (2018).

    Google Scholar 

  14. Lareau, C. A. et al. Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nat. Biotechnol. 37, 916–924 (2019).

    Article  CAS  PubMed  Google Scholar 

  15. Reyes, M. et al. Multiplexed enrichment and genomic profiling of peripheral blood cells reveal subset-specific immune signatures. Sci. Adv. 5, eaau9223 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. Moffitt, J. R. et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science 362, 6416 (2018).

    Article  CAS  Google Scholar 

  17. Gunner, G. et al. Sensory lesioning induces microglial synapse elimination via ADAM10 and fractalkine signaling. Nat. Neurosci. 22, 1075–1088 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  19. Erhard, F. et al. scSLAM-seq reveals core features of transcription dynamics in single cells. Nature 571, 419–423 (2019).

    Article  CAS  PubMed  Google Scholar 

  20. Courtine, G. et al. Transformation of nonfunctional spinal circuits into functional states after the loss of brain input. Nat. Neurosci. 12, 1333–1342 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Wagner, F. B. et al. Targeted neurotechnology restores walking in humans with spinal cord injury. Nature 563, 65–71 (2018).

    Article  CAS  PubMed  Google Scholar 

  22. Formento, E. et al. Electrical spinal cord stimulation must preserve proprioception to enable locomotion in humans with spinal cord injury. Nat. Neurosci. 21, 1728–1741 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Crone, S. A. et al. Genetic ablation of V2a ipsilateral interneurons disrupts left–right locomotor coordination in mammalian spinal cord. Neuron 60, 70–83 (2008).

    Article  CAS  PubMed  Google Scholar 

  24. Zhang, J. et al. V1 and v2b interneurons secure the alternating flexor–extensor motor activity mice require for limbed locomotion. Neuron 82, 138–150 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Crowell, H. L. et al. On the discovery of population-specific state transitions from multi-sample multi-condition single-cell RNA sequencing data. Preprint at bioRxiv https://doi.org/10.1101/713412 (2019).

  26. Yip, S. H., Sham, P. C. & Wang, J. Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data. Brief. Bioinform. 20, 1583–1589 (2019).

    Article  CAS  PubMed  Google Scholar 

  27. Brennecke, P. et al. Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods 10, 1093–1095 (2013).

    Article  CAS  PubMed  Google Scholar 

  28. Grün, D., Kester, L. & van Oudenaarden, A. Validation of noise models for single-cell transcriptomics. Nat. Methods 11, 637–640 (2014).

    Article  PubMed  CAS  Google Scholar 

  29. Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Amezquita, R. A. et al. Orchestrating single-cell analysis with Bioconductor. Nat. Methods 17, 137–145 (2020).

    Article  CAS  PubMed  Google Scholar 

  32. Lin, L. I. A concordance correlation coefficient to evaluate reproducibility. Biometrics 45, 255–268 (1989).

    Article  CAS  PubMed  Google Scholar 

  33. Phipson, B. & Smyth, G. K. Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn. Stat. Appl. Genet. Mol. Biol. 9, https://doi.org/10.2202/1544-6115.1585 (2010).

  34. Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 18, 174 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. McDavid, A. et al. Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments. Bioinformatics 29, 461–467 (2013).

    Article  CAS  PubMed  Google Scholar 

  36. Ntranos, V., Yi, L., Melsted, P. & Pachter, L. A discriminative learning approach to differential expression analysis for single-cell RNA-seq. Nat. Methods 16, 163–166 (2019).

    Article  CAS  PubMed  Google Scholar 

  37. Finak, G. et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  38. Ziegenhain, C. et al. Comparative analysis of single-cell RNA sequencing methods. Mol. Cell 65, 631–643 (2017).

    Article  CAS  PubMed  Google Scholar 

  39. Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Petukhov, V. et al. dropEst: pipeline for accurate estimation of molecular counts in droplet-based single-cell RNA-seq experiments. Genome Biol. 19, 78 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  41. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    CAS  PubMed  Google Scholar 

  42. Heimberg, G., Bhatnagar, R., El-Samad, H. & Thomson, M. Low dimensionality in gene expression data enables the accurate extraction of transcriptional programs from shallow sequencing. Cell Syst. 2, 239–250 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Griffiths, J. A., Richard, A. C., Bach, K., Lun, A. T. L. & Marioni, J. C. Detection and removal of barcode swapping in single-cell RNA-seq data. Nat. Commun. 9, 2667 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  44. Scheff, S. W., Rabchevsky, A. G., Fugaccia, I., Main, J. A. & Lumpp, J. E. Experimental modeling of spinal cord injury: characterization of a force-defined injury device. J. Neurotrauma 20, 179–193 (2003).

    Article  PubMed  Google Scholar 

  45. Squair, J. W. et al. Integrated systems analysis reveals conserved gene networks underlying response to spinal cord injury. Elife 7, e39188 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Sathyamurthy, A. et al. Massively parallel single nucleus transcriptional profiling defines spinal cord neurons and their activity during behavior. Cell Rep. 22, 2216–2225 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. https://doi.org/10.1186/s13059-019-1874-1 (2019).

  48. Zappia, L. & Oshlack, A. Clustering trees: a visualization for evaluating clusterings at multiple resolutions. Gigascience 7, giy083 (2018).

    Article  PubMed Central  CAS  Google Scholar 

  49. Rosenberg, A. B. et al. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science 360, 176–182 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Häring, M. et al. Neuronal atlas of the dorsal horn defines its architecture and links sensory input to transcriptional cell types. Nat. Neurosci. 21, 869–880 (2018).

    Article  PubMed  CAS  Google Scholar 

  51. Zeisel, A. et al. Molecular architecture of the mouse nervous system. Cell 174, 999–1014 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Grimm, D. et al. In vitro and in vivo gene therapy vector evolution via multispecies interbreeding and retargeting of adeno-associated viruses. J. Virol. 82, 5887–5911 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Anderson, M. A. et al. Astrocyte scar formation aids central nervous system axon regeneration. Nature 532, 195–200 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Asboth, L. et al. Cortico-reticulo-spinal circuit reorganization enables functional recovery after severe spinal cord contusion. Nat. Neurosci. 21, 576–588 (2018).

    Article  CAS  PubMed  Google Scholar 

  55. Wang, F. et al. RNAscope: a novel in situ RNA analysis platform for formalin-fixed, paraffin-embedded tissues. J. Mol. Diagn. 14, 22–29 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 18, 174 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  57. Wu, Y. E., Pan, L., Zuo, Y., Li, X. & Hong, W. Detecting activated cell populations using single-cell RNA-seq. Neuron 96, 313–329.e6 (2017).

    Article  CAS  PubMed  Google Scholar 

  58. Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. van den Brink, S. C. et al. Single-cell sequencing reveals dissociation-induced gene expression in tissue subpopulations. Nat. Methods 14, 935–936 (2017).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank D. Arneson, D. Avey, R. Mitra, A. Haber, O. Yilmaz, G. Chew, J. Polo, L. Adlung, I. Amit, D. Kim, D. Anderson, M. Basiri, R. Wirka, T. Quertermous and F. Zhang for providing data and/or cell type annotations. This work was supported by a Consolidator Grant from the European Research Council (ERC-2015-CoG HOW2WALKAGAIN 682999) (to G.C.), the Swiss National Science Foundation (to G.C.; subsidy 310030_185214 and 310030_192558), Genome Canada and Genome British Columbia (to L.J.F.; project 214PRO) and Wings for Life (to M.A.S.). This work was also supported in part by the Intramural Research Program of the NIH, NINDS (to K.J.E.M. and A.J.L.). This work was enabled in part by the support provided by WestGrid and Compute Canada (to A.A.P. and L.J.F.), and through computational resources and services provided by Advanced Research Computing at the University of British Columbia (to L.J.F.). M.A.S. is supported by the Canadian Institutes of Health Research (CIHR) (Vanier Canada Graduate Scholarship, Michael Smith Foreign Study Supplement), an Izaak Walton Killam Memorial Pre-Doctoral Fellowship, a UBC Four Year Fellowship, a Vancouver Coastal Health–CIHR–UBC MD/PhD Studentship, a Brain Canada Hubert van Tol fellowship and a BCRegMed Collaborative Research Travel Grant. J.W.S. is supported by a CIHR Banting postdoctoral fellowship and a Marie Skłodowska-Curie individual fellowship (No. 842578). M.A.A. and M.M. are supported by a SNF Ambizione fellowship (PZ00P3_185728). M.A.A. is supported by Wings for Life and the Morton Cure Paralysis Fund.

Author information

Authors and Affiliations

Authors

Contributions

M.A.S. and J.W.S. contributed equally to this work. C.K., M.A.A. and M.G. contributed equally to this work and are co-second authors. M.A.S. and J.W.S. designed and implemented Augur, and performed all computational analyses. M.A.S., J.W.S. and M.G. processed published datasets. J.W.S., C.K., M.A.A., T.H.H. and M.M. performed experimental validation work, including viral tract tracing and RNAscope. C.K., K.J.E.M. and A.J.L. performed nucleus extraction and single-nucleus RNA-seq. M.G. and Q.B. analyzed experimental validation data. A.A.P., L.J.F., G.L.M. and G.C. supervised the work. M.A.S., J.W.S. and G.C. wrote the manuscript; all authors contributed to its editing.

Corresponding authors

Correspondence to Michael A. Skinnider, Jordan W. Squair or Grégoire Courtine.

Ethics declarations

Competing interests

G.C. is a founder and shareholder of GTXmedical, a company with no direct relationships with the present work. M.A.S., J.W.S. and G.C. are named as co-inventors on a patent application related to this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Augur overcomes confounding factors to cell type prioritization in simulated cell populations.

a-b, Area under the receiver operating characteristic curve (AUC) of a random forest classifier trained in three-fold cross-validation to distinguish two simulated populations of cells56, with the total number of cells increasing from n = 100 to n = 1,000 and the proportion of differentially expressed genes between the two populations varying from 0% to 100%, a, or the location parameter of the differential expression factor log-normal distribution varying from 0.1 to 1.0, b. c-d, As in a-b, but with the naive random forest classifier replaced with the subsampling procedure employed by Augur. e-f, Relationship between Augur AUC and the proportion of differentially expressed genes, e, or the location parameter of the differential expression factor log-normal distribution, f, in distinguishing two simulated populations (n = 200 cells total). The mean and standard deviation of n = 10 independent simulations are shown. Inset, two-sided Pearson correlation. g, Cell type prioritizations (AUC or number of differentially expressed genes) for a naive random forest classifier, Augur, and an exemplary single-cell differential expression test5, the Wilcoxon rank-sum test, for two simulated populations of cells with 50% of genes differentially expressed and a log-normal location parameter of 0.5, with the total number of cells increasing from n = 100 to n = 1,000 cells. Like a naive random forest strategy, the number of differentially expressed genes detected by the Wilcoxon rank-sum test scales linearly with the number of cells. The mean and standard deviation of n = 10 independent simulations are shown. Dotted lines show linear regression; shaded areas show 95% confidence intervals. h-i, Number of differentially expressed genes detected by six tests for single-cell differential gene expression between two simulated populations of cells, with the total number of cells increasing from 100 to 1,000 and the proportion of differentially expressed genes between the two populations varying from 0% to 100%, h, or the location parameter of the differential expression factor log-normal distribution varying from 0.1 to 1.0, i. j, Relationship between number of differentially expressed genes detected by five tests for single-cell differential gene expression and the proportion of differentially expressed genes simulated between the two populations, for simulated populations of between 100 and 1,000 cells (see also Fig. 1e). All single-cell differential expression tests detect a larger number of differentially expressed genes in a large population of cells with modest transcriptional perturbation (20% of genes differentially expressed) than in a smaller population of cells with more profound perturbation (70% of genes differentially expressed).

Extended Data Fig. 2 Augur overcomes confounding factors to cell type prioritization in a compendium of published single-cell RNA-seq datasets.

a, Overview of n = 22 published scRNA-seq datasets comparing two or more experimental conditions, used to verify the relationship between cell type prioritizations from a random forest classifier, Augur, or single-cell differential expression tests. Left, heatmap indicating the species of origin, the sequencing protocol, and whether cells or nuclei were sequenced. Right, properties of each dataset, including the total number of cell types identified in the original studies; the total number of cells sequenced; the number of cells per type (red bars indicate mean); and the mean number of reads for cells of each type. b, Pearson correlations between the AUC of each cell type, and the number of cells of that type sequenced, across 22 datasets for Augur, bottom, and a naive random forest classifier without subsampling, top, as shown in Fig. 2c. c, Pearson correlations between the number of differentially expressed genes per cell type, at 5% FDR, and the number of cells of that type sequenced, across 22 datasets for six statistical tests for single-cell differential expression. d, Number of cells in the top-ranked cell type across 22 datasets for Augur, bottom, and a naive random forest classifier without subsampling, top. e, Number of cells in the top-ranked cell type across 22 datasets for six statistical tests for single-cell differential expression. f, Jaccard index between the top-ranked 1 to 5 cell types across 22 datasets, comparing Augur and six statistical tests for single-cell differential expression. g, Cell type prioritizations in the Grubman et al.10 dataset by Augur and a representative test for single-cell differential expression, the Wilcoxon rank-sum test (“DE”). h, Relationship between AUC and number of differentially expressed genes per cell type, at 5% FDR, in the Grubman et al.10 dataset. Dotted line shows linear regression. i, Relationship between AUC and number of cells sequenced in the Grubman et al.10 dataset. Augur cell type prioritizations are uncorrelated with the number of cells per type. Dotted line shows linear regression; inset shows two-sided Pearson correlation. j, Relationship between number of differentially expressed genes and number of cells sequenced in the Grubman et al.10 dataset. Cell type prioritizations based on the number of differentially expressed genes are strongly correlated with the number of cells per type. Dotted line shows linear regression; inset shows two-sided Pearson correlation.

Extended Data Fig. 3 Augur overcomes confounding factors to cell type prioritization in simulated tissues and across single-cell modalities.

a, Number of cells within each of eight cell types in a simulated tissue with increasingly unequal cell type proportions, as quantified by the Gini coefficient. b, Cell type prioritization in simulated scRNA-seq data from a tissue with 5,000 cells distributed in eight cell types, with 10-80% of genes DE in response to perturbation, and increasingly unequal numbers of cells per type (as quantified by the Gini coefficient). The correlation to simulation ground truth (proportion of DE genes) is shown for five tests for single-cell differential gene expression. The mean and standard deviation of n = 10 independent simulations are shown. Dashed line shows mean Gini coefficient of cell type frequencies across 22 published scRNA-seq datasets. **, p < 0.01; ***, p < 0.001, two-sided paired t-test. c, Inequality of cell type proportions in published scRNA-seq data. Top, Gini coefficient of cell type proportions across 22 published scRNA-seq datasets. Horizontal line and shaded area show the mean and standard deviation of the Gini coefficient across all datasets. Bottom, number of cells of each type across 22 published scRNA-seq datasets. d, Comparison of cell type prioritization in independent scRNA-seq and single cell imaging transcriptomics (STARmap) studies of the mouse visual cortex after light exposure. Left, Augur cell type prioritization in the STARmap dataset13. Bottom, Augur cell type prioritization in the scRNA-seq dataset7. Center, correspondence between cell types defined in the scRNA-seq and STARmap datasets, quantified as the Spearman correlation coefficient between average profiles for each cell type across 139 genes present in both datasets.

Extended Data Fig. 4 Differential cell type prioritization in single-cell RNA-seq data.

a, Schematic overview of the permutation-based test for differential prioritization with Augur. First, cell type prioritization is performed within each of two conditions separately, yielding condition-specific AUCs for each cell type. Next, sample labels are randomly permuted within each cell type, and cell type prioritization is performed on shuffled data, yielding a null distribution of AUCs for each cell type and condition. AUCs for matching cell types are compared across conditions to calculate a ‘∆AUC score' for each cell type, and a null distribution of ∆AUC scores is calculated using the permuted data. Permutation p-values can then be calculated for each cell type, enabling the identification of statistically significant differences in cell type prioritization between conditions, as well as the condition in which the cell type is more transcriptionally separable. b, Neuron subtypes with statistically significant differences in AUC between female and male mice during parenting, in a single-cell imaging transcriptomics experiment employing multiplexed error robust fluorescence in situ hybridization (MERFISH)6 (n = 79 subtypes). Eleven subtypes have significantly higher AUCs in female parents, whereas two have significantly higher AUCs in male parents. c, Relationship between differential prioritization ∆AUC for parenting between male and female mice, and AUC for sex in naive mice. Several neuronal subtypes preferentially activated during parenting in female mice are also transcriptionally distinct in naive mice, such as the I-32 cluster, which is enriched for aromatase expression, and expresses multiple sex steroid hormone receptors16. d, Neuron subtypes with statistically significant differences in AUC in response to whisker lesioning in Cx3cr1+/− as compared to Cx3cr1−/− mice, in a single-cell RNA-seq experiment17 (n = 28 subtypes). Four subtypes are have significantly higher AUCs in homozygous mice, whereas one subtype has a significantly higher AUC in heterozygous mice.

Extended Data Fig. 5 Cell type prioritization from transcriptional dynamics in acute experimental perturbations.

a, Left, schematic overview of the scSLAM-seq19 workflow. Cells are exposed to the nucleoside analogue 4-thiouridine (4sU), which is incorporated during transcription and converted to a cytosine analogue by iodoacetamide prior to RNA sequencing. This labeling permits in silico deconvolution of RNA molecules transcribed before and after 4sU exposure (‘old’ and ‘new’, respectively), and calculation of the ratio of new to total RNA (NTR), an experimental analogue to the computationally determined ‘RNA velocity’18,19. Right, AUCs for mouse fibroblasts exposed to lytic mouse cytomegalovirus (CMV) at 2 h post-infection, calculated by applying Augur to either total RNA or the NTR. The greater separability for the NTR reflects additional information specifically captured by the temporal dynamics of RNA expression in the context of this acute perturbation19. b-e, Cell type prioritization based on exonic reads, total RNA, or RNA velocity for cells of the mouse visual cortex after exposure to light for 1 h, b-c, or 4 h, d-e, in the Hrvatin et al.7 dataset. The AUC is significantly higher for RNA velocity than for either exonic reads (1 h, n = 34 cell types, 4 h, n = 35 cell types; two-sided paired t-tests: b, 1 h, p = 6.9 × 10-7; d, 4 h, p = 8.2 × 10-7) or total RNA (c, 1 h, p = 2.8 × 10-7; e, 4 h, p = 3.0 × 10-6), reflecting additional information specifically captured by acute transcriptional dynamics. f-g, Cell type prioritization based on exonic reads, total RNA, or RNA velocity in an Act-seq57 dataset, which minimizes transcriptional changes induced by single-cell dissociation. Cell types of the medial amygdala in mice subjected to 45 min of immobilization stress and control mice were profiled by Drop-seq58 after treatment with the transcription inhibitor actinomycin D. The AUC is higher for RNA velocity than for either exonic reads (f, p = 0.026, n = 6 cell types) or total RNA (g, p = 0.053), reflecting the additional information specifically captured by acute transcriptional dynamics, and indicating this is not an artefact related to the transcriptional perturbations known to be induced by conventional dissociation procedures59. h-i, Cell type prioritization based on exonic reads, total RNA, or RNA velocity in a chronic perturbation. Cell types of the lateral hypothalamic area were profiled by Drop-seq58 in mice after 9-16 weeks of maintenance on either high-fat diet or control diet6. No significant difference in AUCs was observed for RNA velocity compared to either exonic reads (h, p = 0.22, n = 13 cell types) or total RNA (i, p = 0.98), consistent with the time scale of the experimental perturbation.

Extended Data Fig. 6 Subclustering of single-neuron transcriptomes identifies 38 neuron subtypes in the mouse lumbar spinal cord.

See also Extended Data Fig. 7a. a, Dot plot showing expression of one marker gene per cell type for the 38 neuron subtypes of the mouse lumbar spinal cord. b, Neuron subtype detection across experimental conditions (n = 6,035 neurons). TESS, targeted electrical epidural stimulation of the lumbar spinal cord. c, Proportion of neurons of each subtype detected in each experimental condition. d, Neuron subtype detection across experimental replicates (n = 3 mice per condition). e, Proportion of neurons of each subtype detected in each experimental replicate.

Extended Data Fig. 7 Robustness of Augur cell type prioritizations for mouse lumbar spinal cord neurons.

a, Clustering tree48 of mouse spinal cord neurons over seven clustering resolutions, revealing the hierarchical relationships between spinal cord neuron subtypes. Node color reflects AUCs for cell type prioritization in targeted electrical epidural stimulation. b, AUCs for each of 36 neuron subtypes represented by at least 20 cells in both control and TESS-treated mice. c-e, Robustness of cell type prioritization for neuron subtypes of the mouse lumbar spinal cord. c, Impact of systematically withholding cells from each of six replicates (n = 3 per group) on cell type prioritization. Left, cell type prioritization with all six replicates, as in Fig. 2f. Grey tiles indicate neuron subtypes that were not represented by at least 20 cells in each condition after removal of cells from an experimental replicate. d, Impact of varying Augur parameters, including the number of subsamples and the size of each subsample; random forest-specific hyperparameters (number of trees, minimum split size, number of features sampled per split); and the choice of classifier (random forest, RF; L1-penalized logistic regression, LR) on cell type prioritization. Grey tiles indicate sample sizes larger than the number of cells of that type in the dataset. e, Impact of varying RNA velocity parameters, including exonic and intronic expression filters, the number of cells in the k-nearest neighbors pooling, and the extreme quantiles used to fit γ coefficients, on cell type prioritization.

Extended Data Fig. 8 Absence of colocalization of canonical marker genes for cell types not prioritized by Augur and Fos by RNAscope in situ hybridization.

Schematic indicates imaging location for each marker within the spinal cord. Bottom, proportion of cells expressing Fos from cell types prioritized by Augur (n = 3 cell types) or not prioritized by Augur (n = 6 cell types). Cell types prioritized by Augur are significantly more likely to express Fos after walking with TESS, compared to controls (p = 0.01, two-sided Fisher’s exact test), whereas cell types not prioritized by Augur do not display a statistically significant difference (p = 0.74). Error bars show standard deviation of the sample proportion.

Extended Data Fig. 9 Impact of mean gene expression level on cell type prioritization.

Cell type prioritizations were performed using both Augur and a representative single-cell differential expression method, the Wilcoxon rank-sum test, using the entire transcriptome (left column) or genes divided into five quintiles based on mean expression (right columns). Insets show two-sided Pearson correlations throughout. a, Relationship between Augur cell type prioritizations (AUC) and the proportion of differentially expressed genes between two simulated populations of cells (n = 200 cells total), as shown in Extended Data Fig. 1e. The mean and standard deviation of n = 10 independent simulations are shown. b, As in a, but with Augur applied to each quintile of gene expression separately. The AUC remains strongly correlated with the ground-truth perturbation intensity, regardless of mean expression levels (r ≥ 0.92). c, Relationship between Augur cell type prioritizations (AUC) and the location parameter of the differential expression factor log-normal distribution between two simulated populations of cells (n = 200 cells total), as shown in Supplementary Fig. 1f. The mean and standard deviation of n = 10 independent simulations are shown. d, As in c, but with Augur applied to each quintile of gene expression separately. The AUC remains strongly correlated with the ground-truth perturbation intensity, regardless of mean expression levels (r ≥ 0.95). e-f, As in a-b, but showing the number of differentially expressed genes detected by a Wilcoxon rank-sum test at 5% FDR, either across the entire transcriptome, e, or within each expression quintile, f. No differentially expressed genes are detected at 5% FDR outside of the top expression quintile. g-h, As in c-d, but showing the number of differentially expressed genes detected by a Wilcoxon rank-sum test at 5% FDR, either across the entire transcriptome, g, or within each expression quintile, h. No differentially expressed genes are detected at 5% FDR outside of the top expression quintile. i, Cell type prioritization in simulated scRNA-seq data from a tissue with 5,000 cells, distributed in eight cell types, with increasingly unequal numbers of cells per type, as quantified by the Gini coefficient and shown in Fig. 1f. The correlation to simulation ground truth (proportion of DE genes) is shown for Augur and a representative test for single-cell DE (Wilcoxon rank-sum test). The mean and standard deviation of n = 10 independent simulations are shown. j, As in i, but with both Augur and the Wilcoxon rank-sum test applied to each quintile of gene expression separately. k, Pearson correlation between Augur cell type prioritizations (AUC) and simulation ground truth (proportion of DE genes) in simulated scRNA-seq data from tissue with eight cell types, subjected to perturbations of varying intensity, as quantified by the the location parameter of the differential expression factor log-normal distribution. The mean of n = 10 independent simulations is shown for each perturbation intensity.. l, As in k, but with Augur applied to each quintile of gene expression separately. Augur incorporates information from lowly expressed genes even in subtle perturbations. m, Number of differentially expressed genes detected by a Wilcoxon rank-sum test at 5% FDR for each cell type in the Kang et al. dataset4, within each expression quintile, confirming the simulations in a-l reflect trends in real data.

Extended Data Fig. 10 Impact of batch effects on cell type prioritization.

Two populations of cells (n = 200 cells total) were simulated, with each condition sequenced in two batches, and varying degrees of perturbation-dependent differential expression and/or technical batch effects were introduced according to five different batch effect scenarios. For each of the five scenarios, the following panels are shown from left to right: i, Principal component analysis (PCA) of a representative simulation. ii, Correlation between AUC and magnitude of simulated batch effect with 0% of genes differentially expressed in response to perturbation, reflecting the introduction of a spurious difference between conditions where none exists (inset, two-sided Pearson correlation). iii, Correlation between AUC and magnitude of simulated batch effect when the random forest classifier is tasked with predicting batch rather than condition (AUCbatch), confirming the batch effect introduces the expected separability. iv, Correlation between proportion of genes differentially expressed in response to perturbation and AUC for simulated populations of cells with no batch effect, and batch effects of three different magnitudes. v, Cell type prioritizations in simulated populations of cells with varying perturbation intensity (% DE genes) and batch effect magnitudes. vi, As in i, but after computational batch effect correction by alignment of mutual nearest neighbors39. vii, As in v, but after computational batch effect correction by alignment of mutual nearest neighbors. a, Impact of batch effects on cell type prioritization when technical batch is unconfounded with either condition or differential expression. b, Impact of batch effects on cell type prioritization when batch #1 is twice as large as batch #2. c, Impact of batch effects on cell type prioritization when perturbation-dependent differential expression is stronger in one of the two batches. d, Impact of batch effects on cell type prioritization when technical batch is mildly confounded with condition (simulated cells are overrepresented in batch 1 by a factor of 20%). e, Impact of batch effects on cell type prioritization when technical batch is moderately confounded with condition (simulated cells are overrepresented in batch 1 by a factor of 50%). f, Impact of batch effects on cell type prioritization when technical batch is severely confounded with condition (simulated cells are overrepresented in batch 1 by a factor of 80%).

Supplementary information

Supplementary Information

Supplementary Notes 1 and 2 and Figs. 1–11.

Reporting Summary

Supplementary Software 1

Augur R package.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Skinnider, M.A., Squair, J.W., Kathe, C. et al. Cell type prioritization in single-cell data. Nat Biotechnol 39, 30–34 (2021). https://doi.org/10.1038/s41587-020-0605-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41587-020-0605-1

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing