Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Genomic analyses implicate noncoding de novo variants in congenital heart disease

Abstract

A genetic etiology is identified for one-third of patients with congenital heart disease (CHD), with 8% of cases attributable to coding de novo variants (DNVs). To assess the contribution of noncoding DNVs to CHD, we compared genome sequences from 749 CHD probands and their parents with those from 1,611 unaffected trios. Neural network prediction of noncoding DNV transcriptional impact identified a burden of DNVs in individuals with CHD (n = 2,238 DNVs) compared to controls (n = 4,177; P = 8.7 × 10−4). Independent analyses of enhancers showed an excess of DNVs in associated genes (27 genes versus 3.7 expected, P = 1 × 10−5). We observed significant overlap between these transcription-based approaches (odds ratio (OR) = 2.5, 95% confidence interval (CI) 1.1–5.0, P = 5.4 × 10−3). CHD DNVs altered transcription levels in 5 of 31 enhancers assayed. Finally, we observed a DNV burden in RNA-binding-protein regulatory sites (OR = 1.13, 95% CI 1.1–1.2, P = 8.8 × 10−5). Our findings demonstrate an enrichment of potentially disruptive regulatory noncoding DNVs in a fraction of CHD at least as high as that observed for damaging coding DNVs.

This is a preview of subscription content, access via your institution

Access options

Fig. 1: Analysis schematic.
Fig. 2: Enrichment of noncoding de novo variants with functionally relevant HeartENN scores.
Fig. 3: Genes with multiple de novo variants in prioritized human fetal heart enhancers.
Fig. 4: Massively parallel reporter assays for selected de novo variants.
Fig. 5: Enrichment of variants in RNA-binding-protein category annotations.

Similar content being viewed by others

Data availability

Whole-genome sequencing data are deposited in the database of Genotypes and Phenotypes (dbGaP) under accession numbers phs001194.v2.p2 and phs001138.v2.p2.

Code availability

Documentation, links, and availability of source code and select supplementary data are detailed at https://github.com/frichter/wgs_chd_analysis. The DNV identification pipeline is available at https://github.com/ShenLab/igv-classifier and https://github.com/frichter/dnv_pipeline. The HeartENN algorithmic framework is available at https://github.com/FunctionLab/selene/archive/0.4.8.tar.gz. HeartENN model weights and scripts for burden tests are available at https://github.com/frichter/wgs_chd_analysis. All source code is distributed under the Massachusetts Institute of Technology license.

References

  1. van der Linde, D. et al. Birth prevalence of congenital heart disease worldwide. J. Am. Coll. Cardiol. 58, 2241–2247 (2011).

    PubMed  Google Scholar 

  2. Pediatric Cardiac Genomics Consortium et al.The Congenital Heart Disease Genetic Network Study: rationale, design, and early results. Circ. Res. 112, 698–706 (2013).

    PubMed Central  Google Scholar 

  3. Zaidi, S. et al. De novo mutations in histone-modifying genes in congenital heart disease. Nature 498, 220–223 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. Homsy, J. et al. De novo mutations in congenital heart disease with neurodevelopmental and other congenital anomalies. Science 350, 1262–1266 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Jin, S. C. et al. Contribution of rare inherited and de novo variants in 2,871 congenital heart disease probands. Nat. Genet. 49, 1593–1601 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Fischbach, G. D. & Lord, C. The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron 68, 192–195 (2010).

    CAS  PubMed  Google Scholar 

  7. Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at https://arxiv.org/abs/1207.3907v2 (2012).

  8. Richter, F. et al. Whole genome de novo variant identification with FreeBayes and neural network approaches. Preprint at bioRxiv https://doi.org/10.1101/2020.03.24.994160 (2020).

  9. Zhou, J. et al. Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk. Nat. Genet. 51, 973–980 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. An, J.-Y. et al. Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder. Science 362, eaat6576 (2018).

    PubMed  PubMed Central  Google Scholar 

  11. Jónsson, H. et al. Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature 549, 519–522 (2017).

    PubMed  Google Scholar 

  12. Goldmann, J. M. et al. Parent-of-origin-specific signatures of de novo mutations. Nat. Genet. 48, 935–939 (2016).

    CAS  PubMed  Google Scholar 

  13. Seiden, A. H. et al. Elucidation of de novo small insertion/deletion biology with parent-of-origin phasing. Hum. Mutat. 41, 800–806 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Bernstein, B. E. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Google Scholar 

  16. Mei, S. et al. Cistrome Data Browser: a data portal for ChIP–Seq and chromatin accessibility data in human and mouse. Nucleic Acids Res. 45, D658–D662 (2017).

    CAS  PubMed  Google Scholar 

  17. He, A. et al. Dynamic GATA4 enhancers shape the chromatin landscape central to heart development and disease. Nat. Commun. 5, 4907 (2014).

    CAS  PubMed  Google Scholar 

  18. Sayed, D., Yang, Z., He, M., Pfleger, J. M. & Abdellatif, M. Acute targeting of general transcription factor IIB restricts cardiac hypertrophy via selective inhibition of gene transcription. Circ. Heart Fail. 8, 138–148 (2015).

    CAS  PubMed  Google Scholar 

  19. Stefanovic, S. et al. GATA-dependent regulatory switches establish atrioventricular canal specificity during heart development. Nat. Commun. 5, 3680 (2014).

    PubMed  Google Scholar 

  20. Sayed, D., He, M., Yang, Z., Lin, L. & Abdellatif, M. Transcriptional regulation patterns revealed by high resolution chromatin immunoprecipitation during cardiac hypertrophy. J. Biol. Chem. 288, 2546–2558 (2013).

    CAS  PubMed  Google Scholar 

  21. Zhang, L. et al. KLF15 establishes the landscape of diurnal expression in the heart. Cell Rep. 13, 2368–2375 (2015).

    CAS  PubMed  Google Scholar 

  22. Anand, P. et al. BET bromodomains mediate transcriptional pause release in heart failure. Cell 154, 569–582 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Attanasio, C. et al. Tissue-specific SMARCA4 binding at active and repressed regulatory elements during embryogenesis. Genome Res. 24, 920–929 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Sakabe, N. J. et al. Dual transcriptional activator and repressor roles of TBX20 regulate adult cardiac structure and function. Hum. Mol. Genet. 21, 2194–2204 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Consortium, R. E. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

    Google Scholar 

  26. May, D. et al. Large-scale discovery of enhancers from human heart tissue. Nat. Genet. 44, 89–93 (2012).

    CAS  Google Scholar 

  27. Dickel, D. E. et al. Genome-wide compendium and functional assessment of in vivo heart enhancers. Nat. Commun. 7, 12923 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Nord, A. S. et al. Rapid and pervasive changes in genome-wide enhancer usage during mammalian development. Cell 155, 1521–1531 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Blow, M. J. et al. ChIP–Seq identification of weakly conserved heart enhancers. Nat. Genet. 42, 806–810 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Yue, F. et al. A comparative encyclopedia of DNA elements in the mouse genome. Nature 515, 355–364 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Shen, Y. et al. A map of the cis-regulatory sequences in the mouse genome. Nature 488, 116–120 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. van den Boogaard, M. et al. Genetic variation in T-box binding element functionally affects SCN5A/SCN10A enhancer. J. Clin. Invest. 122, 2519–2530 (2012).

    PubMed  PubMed Central  Google Scholar 

  33. Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning–based sequence model. Nat. Methods 12, 931–934 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Huang, Y.-F., Gulko, B. & Siepel, A. Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. Nat. Genet. 49, 618–624 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019).

    CAS  PubMed  Google Scholar 

  37. Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).

    PubMed  PubMed Central  Google Scholar 

  38. Ritchie, G. R. S., Dunham, I., Zeggini, E. & Flicek, P. Functional annotation of noncoding sequence variants. Nat. Methods 11, 294–296 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Melnikov, A., Zhang, X., Rogov, P., Wang, L. & Mikkelsen, T. S. Massively parallel reporter assays in cultured mammalian cells. J. Vis. Exp. https://doi.org/10.3791/51719 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Werling, D. M. et al. An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder. Nat. Genet. 50, 727–736 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. Turner, T. N. et al. Genomic patterns of de novo mutation in simplex autism. Cell 171, 710–722.e12 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. C Yuen, R. K. et al. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder. Nat. Neurosci. 20, 602–611 (2017).

    PubMed  Google Scholar 

  44. Hamdan, F. F. et al. High rate of recurrent de novo mutations in developmental and epileptic encephalopathies. Am. J. Hum. Genet. 101, 664–685 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. Peacock, J. D., Lu, Y., Koch, M., Kadler, K. E. & Lincoln, J. Temporal and spatial expression of collagens during murine atrioventricular heart valve development and maintenance. Dev. Dyn. 237, 3051–3058 (2008).

    PubMed  PubMed Central  Google Scholar 

  46. Kurosaka, S. et al. Arginylation regulates myofibrils to maintain heart function and prevent dilated cardiomyopathy. J. Mol. Cell. Cardiol. 53, 333–341 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. Kleffmann, W. et al. 5q31 microdeletions: definition of a critical region and analysis of LRRTM2, a candidate gene for intellectual disability. Mol. Syndromol. 3, 68–75 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. Mehta, G. et al. MITF interacts with the SWI/SNF subunit, BRG1, to promote GATA4 expression in cardiac hypertrophy. J. Mol. Cell. Cardiol. 88, 101–110 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. Tshori, S. et al. Transcription factor MITF regulates cardiac growth and hypertrophy. J. Clin. Invest. 116, 2673–2681 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Nicholson, T. B. et al. A hypomorphic lsd1 allele results in heart development defects in mice. PLoS One 8, e60913 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Hamidi, T. et al. Identification of Rpl29 as a major substrate of the lysine methyltransferase Set7/9. J. Biol. Chem. 293, 12770–12780 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. Siggs, O. M. et al. Mutation of Fnip1 is associated with B-cell deficiency, cardiomyopathy, and elevated AMPK activity. Proc. Natl Acad. Sci. USA 113, E3706–E3715 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. Chen, C.-Y. et al. Accumulation of the inner nuclear envelope protein Sun1 is pathogenic in progeric and dystrophic laminopathies. Cell 149, 565–577 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Meinke, P. et al. Muscular dystrophy-associated SUN1 and SUN2 variants disrupt nuclear-cytoskeletal connections and myonuclear organization. PLoS Genet. 10, e1004605 (2014).

    PubMed  PubMed Central  Google Scholar 

  55. Röseler, S. et al. Lethal phenotype of mice carrying a Sept11 null mutation. Biol. Chem. 392, 779–781 (2011).

    PubMed  Google Scholar 

  56. Guo, A. et al. E–C coupling structural protein junctophilin-2 encodes a stress-adaptive transcription regulator. Science 362, eaan3303 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. Yamagishi, H. et al. A history and interaction of outflow progenitor cells implicated in “Takao Syndrome.” In Etiology and Morphogenesis of Congenital Heart Disease: From Gene Function and Cellular Interaction to Morphology (eds. Nakanishi, T. et al.) 201–209 (Springer, 2016).

  58. Masuda, T. & Taniguchi, M. Congenital diseases and semaphorin signaling: overview to date of the evidence linking them. Congenit. Anom. (Kyoto). 55, 26–30 (2015).

    CAS  PubMed  Google Scholar 

  59. Pierpont, M. E. et al. Genetic basis for congenital heart disease: revisited: a scientific statement from the American Heart Association. Circulation 138, e653–e711 (2018).

    PubMed  PubMed Central  Google Scholar 

  60. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  61. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  63. Van der Auwera, G. et al. From FastQ data to high‐confidence variant calls: the genome analysis toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.1–11.10.33 (2013).

    Google Scholar 

  64. Kim, B.-Y., Park, J. H., Jo, H.-Y., Koo, S. K. & Park, M.-H. Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data. PLoS One 12, e0182272 (2017).

    PubMed  PubMed Central  Google Scholar 

  65. Bailey, J. A., Yavor, A. M., Massa, H. F., Trask, B. J. & Eichler, E. E. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 11, 1005–1017 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. Derrien, T. et al. Fast computation and applications of genome mappability. PLoS One 7, e30377 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. Li, H. Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30, 2843–2851 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. Ostrander, B. E. P. et al. Whole-genome analysis for effective clinical diagnosis and gene discovery in early infantile epileptic encephalopathy. NPJ Genom. Med. 3, 22 (2018).

    PubMed  PubMed Central  Google Scholar 

  69. Blake, J. A. et al. Mouse Genome Database (MGD)-2017: community knowledge resource for the laboratory mouse. Nucleic Acids Res. 45, D723–D729 (2017).

    CAS  PubMed  Google Scholar 

  70. Chen, K. M., Cofer, E. M., Zhou, J. & Troyanskaya, O. G. et al. Selene: a PyTorch-based deep learning library for sequence data. Nat. Methods 16, 315–318 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. Price, A. L. et al. Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet. 86, 832–838 (2010).

    PubMed  PubMed Central  Google Scholar 

  72. Lian, X. et al. Directed cardiomyocyte differentiation from human pluripotent stem cells by modulating Wnt/β-catenin signaling under fully defined conditions. Nat. Protoc. 8, 162–175 (2013).

    CAS  PubMed  Google Scholar 

  73. Buenrostro, J. D., Wu, B., Chang, H. Y. & Greenleaf, W. J. ATAC–seq: a method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol. Biol. 109, 21.29.1–21.29.9 (2015).

    Google Scholar 

  74. Corces, M. R. et al. An improved ATAC–seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods 14, 959–962 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  75. Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  76. Yu, G., Wang, L.-G. & He, Q.-Y. ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics 31, 2382–2383 (2015).

    CAS  PubMed  Google Scholar 

  77. Spurrell, C. H. et al. Genome-wide fetalization of enhancer architecture in heart disease. Preprint at bioRxiv https://doi.org/10.1101/591362 (2019).

  78. Sharma, A., Toepfer, C. N., Schmid, M., Garfinkel, A. C. & Seidman, C. E. Differentiation and contractile analysis of GFP-sarcomere reporter hiPSC-cardiomyocytes. Curr. Protoc. Hum. Genet. 96, 21.12.1–21.12.12 (2018).

    CAS  Google Scholar 

  79. Shah, A., Qian, Y., Weyn-Vanhentenryck, S. M. & Zhang, C. CLIP Tool Kit (CTK): a flexible and robust pipeline to analyze CLIP sequencing data. Bioinformatics 33, 566–567 (2017).

    CAS  PubMed  Google Scholar 

  80. Feng, H. et al. Modeling RNA-binding protein specificity in vivo by precisely registering protein-RNA crosslink sites. Mol. Cell 74, 1189–1204.e6 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We are enormously grateful to the patients and families who participated in this research. We thank the following for patient recruitment: A. Julian, M. MacNeal, Y. Mendez, T. Mendiz-Ramdeen and C. Mintz (Icahn School of Medicine at Mount Sinai); N. Cross (Yale School of Medicine); J. Ellashek and N. Tran (Children’s Hospital of Los Angeles); B. McDonough, J. Geva and M. Borensztein (Harvard Medical School); K. Flack, L. Panesar and N. Taylor (University College London); E. Taillie (University of Rochester School of Medicine and Dentistry); S. Edman, J. Garbarini, J. Tusi and S. Woyciechowski (Children’s Hospital of Philadelphia); D. Awad, C. Breton, K. Celia, C. Duarte, D. Etwaru, N. Fishman, E. Griffin, M. Kaspakoval, J. Kline, R. Korsin, A. Lanz, E. Marquez, D. Queen, A. Rodriguez, J. Rose, J. K. Sond, D. Warburton, A. Wilpers and R. Yee (Columbia Medical School); D. Gruber (Cohen Children’s Medical Center, Northwell Health). These data were generated by the PCGC, under the auspices of the Bench to Bassinet Program (https://benchtobassinet.com) of the NHLBI. The results analyzed and published here are based in part on data generated by Gabriella Miller Kids First Pediatric Research Program projects phs001138.v1.p2/phs001194.v1.p2, and were accessed from the Kids First Data Resource Portal (https://kidsfirstdrc.org/) and/or dbGaP (www.ncbi.nlm.nih.gov/gap). This manuscript was prepared in collaboration with investigators of the PCGC and has been reviewed and/or approved by the PCGC. PCGC investigators are listed at https://benchtobassinet.com/?page_id=119. This work was supported in part through the computational resources and staff expertise provided by Scientific Computing at the Icahn School of Medicine at Mount Sinai. We are grateful to all of the families at the participating Simons Simplex Collection (SSC) sites, as well as the principal investigators (A. Beaudet, R. Bernier, J. Constantino, E. Cook, E. Fombonne, D. Geschwind, R. Goin-Kochel, E. Hanson, D. Grice, A. Klin, D. Ledbetter, C. Lord, C. Martin, D. Martin, R. Maxim, J. Miles, O. Ousley, K. Pelphrey, B. Peterson, J. Piggot, C. Saulnier, M. State, W. Stone, J. Sutcliffe, C. Walsh, Z. Warren and E. Wijsman). We appreciate the access obtained to phenotypic and/or genetic data on SFARI Base. Approved researchers can obtain the SSC population dataset described in this study (https://www.sfari.org/resource/simons-simplex-collection) by applying at https://base.sfari.org. This work was supported by the Mount Sinai Medical Scientist Training Program (5T32GM007280 to F.R.), National Institute of Dental and Craniofacial Research Interdisciplinary Training in Systems and Developmental Biology and Birth Defects (T32HD075735 to F.R.), Harvard Medical School Epigenetic and Gene Dynamics Award (S.U.M. and C.E.S.), American Heart Association Post-Doctoral Fellowship (S.U.M.), and Howard Hughes Medical Institute (C.E.S.). Research conducted at the E.O. Lawrence Berkeley National Laboratory was supported by National Institutes of Health (NIH) grants (UM1HL098166 and R24HL123879) and performed under Department of Energy Contract DE-AC02-05CH11231, University of California. O.T. is a CIFAR fellow and this work was partially supported by NIH grant R01GM071966. The PCGC program is funded by the NHLBI, NIH, US Department of Health and Human Services through grants UM1HL128711, UM1HL098162, UM1HL098147, UM1HL098123, UM1HL128761 and U01HL131003. The PCGC Kids First study includes data sequenced by the Broad Institute (U24 HD090743-01).

Author information

Authors and Affiliations

Authors

Contributions

F.R., S.U.M., S.W.K., A.K., L.K.W., K.M.C., J.R.K., O.G.T., D.E.D., Y.S., J.G.S., C.E.S. and B.D.G. conceived and designed the experiments/analyses. J.R.K., J.W.N., A.G., E.G., M.B., R.K., G.A.P., D.B., W.K.C., D.S., M.T.-F., J.G.S., C.E.S. and B.D.G. contributed to cohort ascertainment, phenotypic characterization and recruitment. F.R., S.U.M., A.K., H.Q., N.P., S.R.D., M.P., J.H., J.M.G., K.B.M., M.V., A.F., G.M., W.K.C., Y.S., J.G.S., C.E.S. and B.D.G. contributed to whole-genome sequencing production, validation and analysis. F.R., S.U.M., A.K., K.M.C., H.Q., E.E.S., O.G.T., Y.S., J.G.S., C.E.S. and B.D.G. contributed to statistical analyses. F.R., K.M.C., J.Z., O.G.T. and B.D.G. developed the HeartENN model. S.U.M., S.W.K., L.K.W., D.E.D., J.G.S. and C.E.S. generated and analyzed fetal heart and iPSC data. F.R., S.U.M., S.W.K., A.K., L.K.W., K.M.C., Y.S., J.G.S., C.E.S. and B.D.G. wrote and reviewed the manuscript. All authors read and approved the manuscript.

Corresponding author

Correspondence to Bruce D. Gelb.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Other pipelines identified 94% of DNVs in control trios.

Overlaps with DNVs identified in 1,470 control trios with two other pipelines9,10. Of note, a third analysis of these trios did not include de novo calls42. For consistency with other pipelines, only SNVs were included and variants in LCRs, blacklists, segmental duplications, and repeats were excluded. Together, 94% of de novo SNVs were called by at least one other pipeline.

Extended Data Fig. 2 Correlation between parental age at proband birth and DNVs/trio.

Multiple linear regression (βpaternal_agex + βmaternal_agex + βintercept + ε) was fitted on 763 CHD and 1,611 unaffected individuals to calculate the associations between paternal and maternal age for SNVs, indels, and combined. Regression coefficients and P-values are shown, uncorrected for multiple hypotheses. Sequencing metric comparisons between the centers, colored by cases (n = 763) and controls (n = 1,611), found moderate bias in DNV quantity, so the background statistical parameter throughout the manuscript is total number of DNVs. Box plots show medians and interquartile ranges.

Extended Data Fig. 3 De novo variant (DNV) CHD-unaffected burden.

The number of DNVs in 184 noncoding annotations (points) genome-wide and within 10 kb of TSSs for 6 gene sets (facets) was counted in CHD (n = 749) and Simons unaffected (n = 1,611) individuals. The P value threshold (1.5 x 10-4, horizontal blue line) is 0.05 divided by the product of the number of effective annotations (n = 47) and number of gene sets (n = 7). The P value (y-axis) was calculated with a two-sided Fisher’s exact test, the odds ratio (x-axis) was DNVsannotation,CHD/DNVstotal,CHD vs. DNVsannotation,unaffected/DNVstotal, unaffected. No annotations surpassed the P value threshold. CHD, congenital heart disease; HHE, high heart expression.

Extended Data Fig. 4 HeartENN performance was comparable to DeepSEA.

HearENN ROC AUC mean = 0.93 and AUPRC mean = 0.34. ROC AUC, receiver operator characteristics area under the curve; AUPRC, area under the precision recall curve.

Extended Data Fig. 5 Determining an absolute functional difference score range.

a, Comparison of HGMD disease mutations (blue, n = 1,564) and polymorphism (gray, n = 642) DeepSEA absolute functional difference scores at varying functional cut-offs illustrates a similar distribution and functionally impactful range ≥0.1 (arrow) for disease mutations. No statistical significance testing was performed. b, The similarity of null distributions for DeepSEA (gray, downsampled to 184 features) and HeartENN (heart) HGMD polymorphism scores suggested that the DeepSEA functional score range was also applicable to HeartENN (gray and red n = 642). Scores of 0 set off to left (as 10-4).

Extended Data Fig. 6 Support for HeartENN ≥ 0.1 functional ranking.

For all DNVs (n = 170,171), overlap between HeartENN ≥0.1 (n = 6,415) and other noncoding scores was assessed with a two-sided Fisher’s exact test (left panel). Case–control burden for these other noncoding scores (right panel) was statistically significant for CADD ≥15 (PBonferroni = 0.019) with a two-sided Fisher’s exact test (cases n = 56,164 and controls n = 114,065). For both panels, unadjusted P-values are tabulated, and red indicates a Benjamini-Hochberg-adjusted P value false discovery rate (FDR) < 0.05.

Extended Data Fig. 7 Relationship between sequence length inserted into the pMRPA1 plasmid and the transcript reads/plasmid copies in MPRAs.

The length of the sequences inserted into the pMPRA1 plasmid (x-axis) ranged from 300 to 1,600 bp. After transfection of four libraries (color coded as per key) into the iPSC–CMs, the resulting ratios of transcript reads (mRNA) per plasmid copies (DNA) are graphed on the y-axis, showing no systematic relationship between insert length and transcriptional level.

Extended Data Fig. 8 DNVs with a trend towards decreased expression by MPRA assay.

Box plots for two DNVs for which two MPRA replicates were significantly different but overall statistical significance across all replicates was not attained. Boxplots show the median fold change (FC), first and third quartiles (lower and upper hinges), and range of values (whiskers and outlying points). Statistical significance was assessed with two-sided t-test Benjamini-Hochberg-adjusted P-values. Each boxplot has at least 3 independent experiments with 4 technical replicates each.

Extended Data Fig. 9 Fraction of DNVs in each of the canonical variant classes.

The fraction was calculated separately within CHD and unaffected subjects for each of the three methods (including overlaps) and the total number of variants in each group (right table).

Extended Data Fig. 10 DNV enrichment in phenotype subgroups.

a, Enrichment of DNVs with predicted functional impacts (score ≥0.1) for HeartENN (left) and DeepSEA (right) within phenotype subgroups. b, Enrichment of de novo SNVs with H3K36me3 marks implicated in RNA-binding protein disruption in different subgroups for the most significant (left) and highest effect size (right) hits. Both a and b were performed with a two-sided Fisher’s exact test (unadjusted P-values and 95% C.I.s shown) comparing the fraction of DNVs in each subgroup (HeartENN ≥ 0.1, DeepSEA ≥ 0.1, etc.) to the same control cohort. For HeartENN, there were n = 4,177 control DNVs with HeartENN ≥ 0.1 and n = 109,888 control DNVs with HeartENN < 0.1. NDD, neurodevelopmental disorder; ECA, extracardiac anomaly.

Supplementary information

Supplementary Information

Supplementary Note and Fig. 1

Reporting Summary

Supplementary Table

Supplementary Tables 1–16

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Richter, F., Morton, S.U., Kim, S.W. et al. Genomic analyses implicate noncoding de novo variants in congenital heart disease. Nat Genet 52, 769–777 (2020). https://doi.org/10.1038/s41588-020-0652-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-020-0652-z

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing