Main

Disease occurs as a result of aberrations in cells and cellular ecosystems within tissues — driven by genetic variations as well as environmental impacts, from nutrients to pathogens. To understand pathogenesis and discover and deliver new treatments, we need to understand cells, their internal circuits, and their interactions in health and disease. Although this has been appreciated for many decades, technical challenges have limited our ability to simultaneously probe human disease at a large scale and at high molecular and cellular resolution.

Breakthroughs in single-cell and spatial genomics in the past decade have opened the way to single-cell and tissue atlases in health and disease (Table 1), and are poised to impact every aspect of medicine (Fig. 1). These include understanding the cell types and programs in which disease genes act, deciphering mechanisms of disease initiation and progress at the cellular and multicellular levels, defining new signatures for disease monitoring and diagnosis, and discovering and developing new molecular, gene and cell therapies and tracking their impact in patients.

Table 1 A selection of key experimental methods for construction of cell atlases at different levels of biological organization
Fig. 1: Potential medical impacts of the Human Cell Atlas and remaining challenges.
figure 1

Left, important insights that have been drawn from cell atlases on disease mechanisms, diagnosis and treatment. Right, key remaining technical and fundamental barriers for medical impact, including diversity, data availability and understanding disease progression.

As disease is only fully understood in reference to health, and vice versa, achieving this vision will require comprehensive reference maps of all human cells as a basis for both understanding human health and diagnosing, monitoring and treating disease. Mapping human cells poses major logistical and technical challenges, which are being met by the international Human Cell Atlas (HCA) initiative1. When the HCA was being planned, the initial members of the HCA community laid out our plans and goals in a white paper2, stating an ambition to accelerate biomedical research, drug discovery and development, and medical practice by fostering both curiosity-driven research and its clinical applications.

Less than a decade since the emergence of single-cell profiling methods, and 5 years since the launch of the HCA, the field has made enormous strides in delivering findings that are relevant to human health, with rapid development and application of new methods to tackle medical questions (Table 1 and Fig. 2). In particular, our community, like many others, was galvanized by the global challenge of the COVID-19 pandemic to contribute early information about the cells that are most susceptible to infection3,4,5, and later to characterize the impact of SARS-CoV-2 infection on tissues throughout the body6,7 (Box 1). Here, we explore the key ways in which cell atlases are accelerating biomedicine and their future potential.

Fig. 2: Single-cell atlases have been collected for a broad range of organs and disease tissues.
figure 2

Shown are the key organs and systems for which healthy tissue has been profiled by the Biological Networks of the Human Cell Atlas initiative (bold), and for which corresponding studies collected atlases of disease tissue from the same organ from people with common complex diseases (blue), tumors (orange), rare diseases (green), infectious diseases (yellow), or other conditions (black).

Understanding disease biology: from genes to cells, programs and tissues

From disease-associated genes to cells of action

Genetic variants — both common and rare — contribute to the risk of developing disease, and human genetic studies have identified more than 100,000 variants associated with different human traits, especially the risk of developing different diseases. However, to understand the role of these variants in disease, we must understand the cells in which they are expressed and act. In rare diseases, the relevant cell type may be unknown, or even undiscovered. In common complex diseases, the candidate loci from genome-wide association studies (GWAS) and phenome-wide association studies are often in non-coding regions that are difficult to connect to the affected protein-coding gene, cell of action or function. Moreover, even when common and rare diseases have similar clinical phenotypes, these could be the results of variants in different genes, thus making it more challenging to identify common mechanisms at the pathway or cellular level.

Cell atlases provide a way to tackle each of these challenges (Fig. 1). In rare Mendelian genetic disorders, healthy tissue atlases have led to the discovery of novel cell types, including rare ones, that uniquely express key disease genes, and have even corrected long-held assumptions. For example, the pulmonary ionocyte — a novel, rare cell type discovered in cell atlases of the trachea — is the main cell type expressing CFTR8,9, the causal gene in cystic fibrosis. In particular, studies in the Human Developmental Cell Atlas (HDCA) can shed light on Mendelian disorders that manifest at birth, such as the cellular origins of different Hirschsprung’s disease variants in the developing10 versus adult11 enteric nervous system, or the impact of trisomy 21 on bone marrow hematopoietic stem cells and their niche12.

In common complex diseases, similar analyses have related disease genes in associated loci to specific cell subsets across many inflammatory13,14,15,16, autoimmune17,18,19, neurodegenerative20,21,22,23, respiratory8,24, fibrotic25,26 and other27,28 diseases, using both healthy and disease atlases of the relevant tissue, and revealing novel unexpected associations. For example, integrating the extensive GWAS literature for ulcerative colitis (UC) with single-cell atlas data enabled the identification of key cell types expressing genes associated with UC by GWAS, including epithelial M-like cells — which are exceedingly rare in the healthy colon, but expanded significantly in the inflamed, diseased colon29. Because most risk variants are in non-coding regions30, integration of GWAS summary statistics, single-cell profiles and chromatin data8,9, as well as joint profiling of chromatin and RNA in single cells31, can further facilitate the discovery of such associations32. One such analysis showed that not only is a specific gene program induced in colonic M cells in UC, accounting for overall disease risk heritability, but that common variants in the FERMT1 locus (a gene implicated in a rare form of inflammatory bowel disease (IBD)33) contribute substantially to this association34. Moreover, because common disease genes are often pleiotropic, broader cross-tissue atlases can help to better decipher their impact throughout the body35,36,37,38. Finally, atlases also allow us to move from the level of individual risk genes to the modules and programs in which they participate, thus helping decipher gene function, nominate causal processes, and related diseases with similar morbidities at the level of programs, even when the underlying genes are distinct29. This is illustrated in monogenic and polygenic IBD39, in which programs involving M cells are enriched in both forms of the disease39. Single-cell atlases can also reveal cellular subtypes that are shared across tissues or are unique in particular locations or disease contexts, such as recent surveys of mouse40 and human41 fibroblasts.

Remodeling of cellular composition and multicellular architecture in disease tissue

Both cell-intrinsic and cell-extrinsic changes have key roles in pathogenesis and can be targeted by therapies, but changes in the cell’s internal programs and shifts in cellular composition are often confounded in bulk profiling. The cellular — and increasingly spatial — resolution provided by atlases distinguishes these contributions and allows more accurate and sensitive comparison between health and disease, as shown in studies in IBD, asthma, pulmonary fibrosis, rheumatoid arthritis, diabetic kidney disease, cardiomyopathy, Alzheimer’s disease and many other common diseases24,29,40,41,42,43,44,45,46,47,48,49,50,51.

Both compositional and cell-intrinsic expression changes can be coordinated across multiple cell types, resulting in shifts in multicellular communities in disease. For example, comparing cellular composition in the ileum of patients with Crohn’s disease with the healthy reference atlas identified a unique multicellular community of immune and stromal cells, which was predictive of a lack of response to anti-TNF therapy42. Comparison with healthy references also helps decipher the mechanisms driving these coordinated communities, and the gene programs within their constituent cells. For instance, compared with healthy tissue, atopic dermatitis and psoriasis skin lesions are characterized by the expansion of particular classes of macrophages and vascular endothelial cells that interact via the chemokine CXCL8 and its receptor ACKR1, respectively52. This interaction, which is suggested to promote lymphocyte recruitment, represents the re-emergence of a prenatal cellular program in disease tissue52. Finally, computational methods53,54,55 can now recover multicellular gene programs, where cell-intrinsic programs are coordinated between multiple different cell types across samples or physical niches. Examples include a multicellular program across five cell types implicating several disease risk genes for UC54, and the coordination of neurotransmission, cell adhesion, and development gene expression across cell types in the cortex in epilepsy53.

Mapping malignant and microenvironment cells in tumors

Our understanding of human cancer biology is also being transformed by single-cell and spatial genomic atlases. Analysis of solid tumors in comparison with healthy references helps to chart their biological complexity — combining genetic and epigenetic variation within the malignant compartment with the diversity of cells in the tumor microenvironment, including immune56,57,58,59,60,61,62,63,64,65, stroma57,66 and even neural67 cells, and their spatial organization68. This has helped identify relevant disease mechanisms69,70 and opportunities for therapeutic interventions58, as well as resistance mechanisms71, including cell communities that may predict response to therapies such as checkpoint inhibitors64,65 or chemoradiation72, and the cell of origin in both adult and pediatric tumors73,74,75 (determined in reference to healthy adult, developmental and pediatric atlases). As a brief illustrative example, in the specific context of interactions between malignant and immune cells in melanoma, studies have characterized the immune compartment, malignant cells, or both at different disease grades and with different treatment histories, describing dysfunctional versus stem-like T cell states associated with tumor resistance or reactivity76,77, recovering malignant cell programs impacting T cell excluded phenotypes58,78, and generalizing some of these findings to other tumor types59,63.

Diagnosis and treatment: single-cell insights to new clinical approaches

Towards a future of high-resolution cell and tissue diagnostics

Knowledge of all cell types in the body and their roles in disease should transform the future of common diagnostic tools, from single-cell assays such as complete blood count (CBC) and white blood cell count to histopathology. The healthy reference atlas, diseases atlases, and underlying lab and computational methods should allow for the development of new assays with higher resolution and broader molecular scope, as well as improved interpretation of results from individual patients (Fig. 1).

For the CBC — currently a census of a limited number of blood cell components that is used in a variety of diagnostic settings — we envision a future ‘CBC 2.0,’ a high-resolution portrait of the molecular profiles of nucleated blood cells, deployed in every disease. The rich and growing human reference now spans thousands of individuals and tens of millions of cells, with atlases of peripheral blood mononuclear cells from multiple diseases (such as melanoma79, rheumatoid arthritis80 and lupus81) and of immune cells in multiple tissues. Such a reference could form the basis for new diagnostic assays and for better interpretations, connecting the cell’s profile in the periphery to those in healthy and disease tissue79. Excitingly, single-cell profiling of the blood immune cell landscape is beginning to inform our understanding of therapeutic responses and prognosis, including pioneering studies that have identified the blood correlates of the anti-PD1 response in tumors79,82. For histopathology, a workhorse of medicine, we envision conventional H&E staining being elevated to ‘H&E 2.0,’ in which single-cell and spatial profiling data are overlaid on standard tissue stains to unify genomic and histological analysis — either by direct lab assays or even by machine-learning algorithms trained on spatial data to predict molecular profiles from H&E stains83. As the use of spatial profiling (for genomics, epigenomics, transcriptomics and proteomics) in healthy84,85,86 and disease64,72,84,87,88 tissue62,70,81,84,85 has grown, algorithms have been able to deconvolve low-resolution methods to single-cell resolution89, project the spatial expression of genes that were not measured directly89,90,91,92,93, and recover repeatable spatio-molecular features in tissue70,94. Given sufficient data, algorithms can also map molecular profiles and histology to each other, with the aim of predicting expression from histology95, forming the basis of an H&E 2.0 approach.

Early studies are beginning to show the potential impact of such future assays, and how atlases provide the necessary tools to understand why therapeutics work — or don’t work — in patients at the cell and tissue levels, predicting potential on-target toxicities, efficacy and mechanisms underlying intrinsic and acquired resistance. First, a healthy reference is invaluable in predicting the risk of on-target toxicities for both molecular and cellular therapies, on the basis of the cell types in which the therapeutic target is expressed. For example, a recent study has suggested that expression of CD19 by mural cells, vascular smooth muscle cells, and pericytes in the blood–brain barrier might explain neurotoxicity of CD19-targeting chimeric antigen receptor T cells96. Cross-species reference atlases for key models in safety assessment, such as rat and macaque97, would be invaluable. For response and resistance in cancer, profiling malignant and immune cells in tumors, draining lymph nodes, or the periphery can help monitor response and provide insights into resistance, as shown, for example, in response to anti-PD-L1 therapy82,98,99 or chemotherapy100. Although access to patient tissue may be more limiting in some cases, these approaches are as important in other diseases, such as IBD29,42,51, rheumatoid arthritis16,47, psoriasis101, atopic dermatitis52, and scleroderma25.

High-resolution and massively parallel methods for drug discovery

For molecular drug discovery, reference atlases and single-cell and spatial genomics open the way to high-resolution phenotypic screens for desired cell states by coupling the rich, complex and interpretable phenotypes of molecular profiles, which can be related to cells in patients, to the scale required in screening102,103 (Fig. 1). Perturb-Seq screens — pooled genetic screens with single-cell genomics readouts — have characterized the impact on single-cell profiles of perturbations in large numbers of genes102,104,105,106,107, non-coding variants associated with common complex disease108, and coding variants in cancer109 and developmental disorders110,111, and can be performed in cell culture or co-culture, in organoids, or in animal models. Focused small-molecule screens with scRNA-seq readouts have also been conducted112,113. Moreover, machine-learning algorithms can increasingly be trained on such data to yield models that predict the impact of additional perturbations in one or more genes in the same cellular context or of the same perturbations in new biological contexts114,115,116.

For regenerative medicine and cell therapy, single-cell atlases enhance our power to recover regenerative mechanisms in human tissue as therapeutic targets, develop better organoid models for drug discovery, and define better engineered cell therapies117. In each case, the comparison to reference atlases first helps define the desired target state, then helps screen for cells or organoids that achieve that state, and finally can help monitor the impact and state of the cellular therapy in the human patient. For example, when generating faithful human-derived models for regenerative medicine, healthy and disease reference atlases help compare model and human tissue, identify missing cellular components, and predict molecular mechanisms to improve the model117, as has been shown for Parkinson’s disease therapy118, brain organoid models where autism-associated gene variants were introduced111,119,120, gut enteroid cultures121,122, thymic T cells38, and organoid models of the endometrium84 or intestines123. Moreover, for in vivo tissue reprogramming, reference atlases help infer differentiation mechanisms and assess whether a therapy has the desired effect, for example to characterize the regenerative capacity of overexpressing proneural transcription factors in Müller glia124 or to map networks underpinning retinal regeneration125. Finally, for engineered cell therapy, Perturb-Seq methods help screen for perturbations that will yield therapeutically desirable cell states126,127, and single-cell profiling helps characterize the resulting cell therapy before it is administered to patients and after administration in both common diseases121 and T cell therapy in cancer128,129,130.

Challenges for cell atlases in medicine

To realize the transformative potential of cell atlases in medicine, substantial challenges need to be overcome — technical, practical and fundamental (Fig. 1). First and foremost, we must ensure that cell atlases benefit all of humanity, by assembling healthy and disease atlases that reflect human diversity, from ancestry to geography, as well as involving diverse scientists from across the globe who are experts in these approaches. This has been a core aim of the Human Cell Atlas since its inception, and has been overseen by a dedicated equity working group131,132. For effective deployment in real-world settings, lab methods need to be sufficiently cost-effective and robust to empower screening and enable adoption, including in under-resourced areas. Connections between the lab and the clinic also need to be further enhanced, including building more biobank resources with rich metadata, large-scale profiling of samples from clinically annotated and diverse cohorts, and better experimental methods to tap into banked samples, especially formalin-fixed paraffin-embedded issues, which are still incompatible with many single-cell methods133,134. Among the key computational challenges are the need for open data that reflect human diversity for training computational models, while appropriately safeguarding patient privacy; methods to decode cellular dynamics from static snapshots; algorithms and platforms for efficient querying for genes, cell states and cell types of interest; and fast iterations between lab and computation to design faithful human-derived organoids and cells for screens and therapies.

Other challenges are more fundamental. First, while analysis of expression profiles yields suggestive associations, demonstrating the causative disease role of a gene, program or cell state requires direct interventions. Using single-cell and spatial genomics with genetic screens or in human genetic cohorts and clinical trials, along with causal inference, should help advance us from correlation to causation. Moreover, although cell atlases shed light on many changes as disease unfolds, they often focus on disease onset, rather than prognosis and progression. Longitudinal studies can address this challenge, but require long-term investment. More broadly, cell atlases on their own are an important tool in our arsenal, but not a silver bullet. We draw an analogy to the impact of the Human Genome Project, which did not ‘solve’ disease on its own, but instead laid critical groundwork for many areas of biomedicine135.

Conclusion

As single-cell and spatial atlases continue to advance, they are transforming our understanding of different diseases at the cellular and tissue level, and are beginning to inform the development of diagnostics, drug discovery and novel treatment avenues. This has been impactful for new diseases like COVID-19, for long-standing ones such as cancer, and for rare and common complex diseases alike. Much of this progress has been driven by the rise of experimental technologies (Table 1) and computational algorithms that are applicable in studies at all stages of biomedicine, from understanding mechanisms to diagnosing and treating disease. As technological advances in sequencing, cell manipulation and spatial profiling are rapidly growing in scale and resolution (and dropping in cost)136,137, they enable the collection of diverse reference atlases across genders, age, ancestry and demographics that are needed for clinical work. They also enable the sort of large-scale sampling within and across human patients that is required to understand and monitor disease, as well as screening experiments that are crucial to drug discovery. Together, these will help deliver the Human Cell Atlas mission: to form a reference map as a basis for understanding human health as well as diagnosing, monitoring, and treating disease.