Development of a clinical polygenic risk score assay and reporting workflow

Hao, Limin; Kraft, Peter; Berriz, Gabriel F.; Hynes, Elizabeth D.; Koch, Christopher; Korategere V Kumar, Prathik; Parpattedar, Shruti S.; Steeves, Marcie; Yu, Wanfeng; Antwi, Ashley A.; Brunette, Charles A.; Danowski, Morgan; Gala, Manish K.; Green, Robert C.; Jones, Natalie E.; Lewis, Anna C. F.; Lubitz, Steven A.; Natarajan, Pradeep; Vassy, Jason L.; Lebo, Matthew S.

doi:10.1038/s41591-022-01767-6

Download PDF

Article
Open access
Published: 18 April 2022

Development of a clinical polygenic risk score assay and reporting workflow

Nature Medicine volume 28, pages 1006–1013 (2022)Cite this article

29k Accesses
54 Citations
230 Altmetric
Metrics details

Subjects

Abstract

Implementation of polygenic risk scores (PRS) may improve disease prevention and management but poses several challenges: the construction of clinically valid assays, interpretation for individual patients, and the development of clinical workflows and resources to support their use in patient care. For the ongoing Veterans Affairs Genomic Medicine at Veterans Affairs (GenoVA) Study we developed a clinical genotype array-based assay for six published PRS. We used data from 36,423 Mass General Brigham Biobank participants and adjustment for population structure to replicate known PRS–disease associations and published PRS thresholds for a disease odds ratio (OR) of 2 (ranging from 1.75 (95% CI: 1.57–1.95) for type 2 diabetes to 2.38 (95% CI: 2.07–2.73) for breast cancer). After confirming the high performance and robustness of the pipeline for use as a clinical assay for individual patients, we analyzed the first 227 prospective samples from the GenoVA Study and found that the frequency of PRS corresponding to published OR > 2 ranged from 13/227 (5.7%) for colorectal cancer to 23/150 (15.3%) for prostate cancer. In addition to the PRS laboratory report, we developed physician- and patient-oriented informational materials to support decision-making about PRS results. Our work illustrates the generalizable development of a clinical PRS assay for multiple conditions and the technical, reporting and clinical workflow challenges for implementing PRS information in the clinic.

Improving reporting standards for polygenic scores in risk prediction studies

Article 10 March 2021

Hannah Wand, Samuel A. Lambert, … Genevieve L. Wojcik

Selection, optimization and validation of ten chronic disease polygenic risk scores for clinical implementation in diverse US populations

Article Open access 19 February 2024

Niall J. Lennon, Leah C. Kottyan, … Eimear E. Kenny

An apparent quandary: adoption of polygenics and gene panels for personalised breast cancer risk stratification

Article Open access 18 September 2023

Jerry S. Lanchbury & Holly J. Pederson

Main

Genome-wide association studies (GWAS) have identified thousands of genomic variants significantly associated with a range of common complex human diseases^1,2. Given that the risk conferred by an individual common variant is usually insignificantly small, investigators have aggregated risk alleles across the genome into genetic risk scores to provide a single measure of genetic association for a given trait due to known common variant effects. Although the earliest genetic scores consisted only of variants meeting genome-wide significance^3,4,5, recent computational and methodological advances have leveraged the summary statistics of all available variants from increasingly larger GWAS to calculate polygenic risk scores (PRS)^6,7,8,9. For some diseases, a PRS in the upper tail of the distribution approximates risks equivalent to those conferred by established clinical risk factors and by genetic variants associated with monogenic disease^7,10. Although PRS are typically derived from weights from cross-sectional GWAS of prevalent disease cases and controls, further work has demonstrated their potential to estimate the risk of incident disease^11,12,13,14.

Suitable clinical implementation of PRS is now an area of active research across many disease areas^15,16,17. The translation of PRS from discovery to the clinic can be conceptualized as having at least three necessary phases (Fig. 1): the first phase relates to epidemiology and statistical genetics, in which PRS are developed and validated in large cohorts and improved with advances in statistical methods; the second phase involves the laboratory, in which laboratory geneticists must develop an analytically and clinically valid pipeline for calculating, interpreting and reporting PRS results for an individual patient; and the third phase involves patient care, in which a treating physician makes medical decisions after putting a patient’s PRS results into the larger clinical context, which involves non-genetic risk factors, comorbidities and patient preferences. The first phase has seen significant methodological advances¹⁸ but challenges for the second and third phases remain.

**Fig. 1: Translation of PRS from discovery to the clinic, including a clinical PRS laboratory pipeline for prospectively collected samples.**

A key assumption underlying the laboratory phase is that a laboratory can develop and implement a valid clinical assay and interpretation pipeline to report PRS results for an individual patient. The development of a clinical assay from a published PRS is not trivial, and significant barriers to the process persist. First, uncertainty exists about whether commonly used, cost-effective genotyping arrays and clinical imputation pipelines can calculate a PRS for an individual with the analytic validity expected of a clinical assay, as opposed to one that is adequate for research. Second, laboratories must implement methods to account for the reduced validity of most PRS in patients of non-European and admixed ancestry^19,20. This limitation applies both to the calculation of the PRS itself for an individual patient and to its clinical interpretation, given that published effect sizes are from populations of primarily European ancestry¹⁹. Third, laboratories must make several decisions about the content and format of a clinical PRS report, including decisions about where the laboratory’s role as an interpretative service ends and where the role of the treating physician in patient care begins. In the patient care phase, there remain unanswered questions about the information and support that physicians need when contextualizing the PRS results of an individual patient to make clinical decisions, and how those decisions affect patient outcomes.

In the Genomic Medicine at Veterans Affairs (GenoVA) Study (ClinicalTrials.gov identifier: NCT04331535) we have developed processes to advance the laboratory and patient care phases of the clinical translation of PRS. The GenoVA Study is a clinical trial in which patients and their primary care physicians receive a clinical PRS laboratory report on five diseases commonly screened for and initially managed in primary care: coronary artery disease (CAD), type 2 diabetes mellitus (T2D), atrial fibrillation (AFib), colorectal cancer (CRCa), and either prostate cancer (PrCa) in male patients or breast cancer (BrCa) in female patients. Because the objectives of the GenoVA Study are to observe how PRS impact existing disease screening and diagnosis paradigms and enable increased detection of undiagnosed prevalent or newly incident disease, eligible patients have no known diagnoses of the target diseases and are aged 50–70 years, an age range during which much guideline-recommended screening and diagnosis of new disease occurs. Here, we describe the processes created in the GenoVA Study to develop and validate a genotype array-based clinical assay and report for six PRS and to support their effective translation into clinical care by the treating physicians.

Results

Replication of published PRS

Sample characteristics

To demonstrate the accuracy of a prospective PRS pipeline, we first wanted to ensure that we could implement published PRS effectively. We used data from 36,423 Mass General Brigham Biobank (MGBB) participants to replicate the performance of PRS for the six target diseases (Supplementary Table 1). The mean (s.d.) age of MGBB participants was 58.8 (17.1) years (range, 9–106 years), 19,719 (54.1%) were female, and 5,706 (15.7%) were of reported race other than white (white, n = 30,716 (84.3%); Black, n = 1,807 (5.0%); Asian, n = 786 (2.2%); and other or unknown race, n = 3,113 (8.5%) as determined from electronic health record data). Case counts ranged from 392 for CRCa to 3,554 for CAD. Figure 2a shows the counts of participants with one or multiple target diseases. The most common disease co-occurrences were the combinations of CAD and T2D (n = 641) and CAD and AFib (n = 495).

**Fig. 2: Frequency of disease and high-risk PRS results by race in the MGBB.**

Unadjusted and adjusted PRS distributions

We identified PRS from large GWAS for the six target diseases, for which the summary statistics (base files with alleles and weights) were publicly available from the Polygenic Score Catalog²¹ (AFib, CAD, T2D, BrCa) or the Cancer PRSWeb (CRCa, PrCa)²² as of 26 December 2019. Supplementary Table 2 lists the number of single-nucleotide polymorphisms (SNPs) in the base file for each of the six published PRS, ranging from 81 SNPs for CRCa²³ to 6,917,436 for T2D⁷, and the subsets of these available as directly genotyped or imputed data from each of three arrays used for MGBB participants, demonstrating minimal loss of information compared with the original published PRS. As shown in Fig. 3, when using the published weights to calculate standardized PRS (PRS_std-raw, see Methods) we observed marked variation in the distribution of each PRS by reported race in the MGBB, most notably in AFib, CAD, and T2D. For example, only 1.7% of white MGBB participants (516/30,716) but almost all of the Black MGBB participants (88.9%, 1,606/1,807) had PRS_std-raw above the published threshold associated with an odds ratio (OR) = 2 for T2D in the 2018 study by Khera et al.⁷ (Supplementary Tables 3 and 4). The use of residualized, population structure-adjusted, standardized PRS (PRS_std-adj, see Methods) minimized this variation (Fig. 3), such that, for example, 8.6% of white MGBB participants (2,651/30,716) and 4.2% of Black MGBB participants (75/1,807) had a T2D PRS_std-adj above the published OR > 2 threshold. The distributions of PRS_std-adj were well aligned when examined by genotyping batch, decile of age, and sex (Extended Data Figs. 1–3).

**Fig. 3: PRS distributions by reported race before and after adjustment for population structure.**

Replication of PRS–disease association

As shown in Fig. 4, quantile of PRS_std-adj was highly correlated with log(odds) of disease across the six phenotypes in the MGBB, with correlation coefficients ranging from 0.68 for CRCa to 0.95 for T2D. Extended Data Figs. 4–7 show the correlation of PRS_std-adj quantile and log(odds) of disease in the reported racial groups separately. Our analyses also replicated the published PRS thresholds corresponding to OR > 2. As shown in Table 1, at the published PRS_std-adj thresholds we observed OR ranging from 1.75 (95% CI: 1.57–1.95) for T2D to 2.38 (95% CI: 2.07–2.73) for BrCa in MGBB participants overall. Except for T2D, the 95% confidence interval of the replicated OR for all diseases either included or, in the case of BrCa and AFib, exceeded a point estimate of 2. Results were consistent in analyses restricted to white participants but were variable in other groups, largely because of the small number of disease cases in certain racial subgroups. In 22 of 24 analyses stratified by reported race, subjects with PRS_std-adj above the published OR > 2 thresholds had higher odds of disease than those below these thresholds. In the MGBB overall, the prevalence of a high-risk PRS_std-adj ranged from 5.4% for CRCa to 13.2% for PrCa (in men). Figure 2b illustrates the number of participants with PRS_std-adj above the published OR > 2 threshold for one or more of the target diseases. Of note, similar to the disease co-occurrences observed in MGBB participants, the most common co-occurrences of high-risk PRS_std-adj were the combinations of CAD and T2D (n = 333) and CAD and AFib (n = 211).

**Fig. 4: Correlation between adjusted PRS and odds of disease.**

Table 1 Prevalence and disease associations of high-risk PRS for six diseases in MGBB overall and by reported race

Full size table

Prospective PRS assay

Sensitivity and specificity of array and imputation

The replication results above supported the development of a genotype array-based clinical assay for PRS and secondary findings from the American College of Medical Genetics and Genomics v2.0 list (ACMG SF v2.0)²⁴. To determine the performance of the arrays used in the prospective assay and of the imputation pipeline, we used three reference Genome In A Bottle (GIAB) samples (NA12878, NA24385 and NA24631, Supplementary Table 5)²⁵. Sensitivity and positive predictive value (PPV) for single-nucleotide variants (SNV) were > 99.7% on average, with lower performance for indels (sensitivity, 96.3%; PPV, 97.8%). Of note, although sensitivity in the ACMG SF v2.0 regions was high (96.2%), PPV was low (63.6%) due to these regions having an excess of poorly performing rare variants^26,27.

As expected, sensitivity and PPV decreased for imputed data, especially for indels (SNV sensitivity, 98.0%; SNV PPV, 97.5%; indel sensitivity, 92.8%; indel PPV, 90.7%) (Supplementary Table 5). NA12878 was not evaluated for imputation accuracy because it is present in the imputation reference dataset from the 1000 Genomes Project and has artificially high imputation accuracy. To further evaluate imputation accuracy, we compared genome sequencing data to array data for 22 diverse samples. Analytical performance was lower in this dataset than in the GIAB high-confidence data (~3% reduction in performance for sensitivity and PPV, Supplementary Table 6).

Performance of prospective PRS assay

For the GIAB samples, PRS_std-adj was robust across different array versions and consistent with results from whole genome sequence (WGS) data; all three GIAB samples were below the high-risk threshold (OR > 2) for all diseases in all methods (Supplementary Table 7). In evaluating the 22 samples with WGS and prospective array data, PRS_std-adj scores were similarly concordant, particularly for AFib, CAD and T2D (Extended Data Fig. 8). Additionally, 108/110 high-risk status classifications were concordant in this dataset (98.2% agreement; Matthews correlation coefficient, 0.84; P < 0.001), with the two discordant values (one in CAD and one in CRCa) being very close to the high-risk threshold (Supplementary Table 8). Finally, we compared nine individuals with high-risk PRS for 10 diseases identified in the MGBB genotyping data to their PRS risk status using the prospective assay (one individual at high risk for AFib, one individual at high risk for BrCa, three individuals at high risk for CAD, three individuals at high risk for CRCa, one individual at high risk for PrCa and one individual at high risk for T2D). All PRS categories were consistent across the two different arrays used for MGBB genotyping and for the clinical assay (Supplementary Table 9).

Clinical PRS report

We then developed a PRS laboratory report consistent in format and content with other clinical genetic test reports (Supplementary File 1)^28,29,30. That is, it includes a description of the test performed and a prominently displayed summary of important findings and their interpretations. Subsequent sections of the report give more detail about the results, including, for each disease, general population prevalence and a brief summary of the GWAS from which the PRS was derived. Sections on methodology and literature references are at the end of the report. The report also reflects several choices made during its development. A graphic highlights in red the disease(s) for which the patient has increased polygenic disease risk, as defined by a PRS corresponding to a published OR > 2 for disease, mirroring both a common threshold from Mendelian genetics³¹ and the effect sizes for disease risk factors already considered in current clinical care^{32,33,34,35,36}. Any PRS not categorized as high risk is described as conferring average risk. Monogenic disease variants and PRS results are reported separately, without comment on any possible interaction between a monogenic result and a relevant PRS (for example, an average-risk BrCa PRS and a pathogenic variant in BRCA1 associated with hereditary breast and ovarian cancer). Illustrating the boundary where the role of the clinical laboratory ends (phase 2 of Fig. 1) and the role of the treating physician begins (phase 3), the laboratory report does not include information about absolute disease risk or the role of other, non-genetic factors in disease risk, and it is not directive in its recommendations for clinical management of high-risk results.

Clinical processes and supportive materials

In recognition that physicians and patients require additional guidance in contextualizing high-risk PRS results, the GenoVA Study has developed processes and materials to support the clinical use of PRS. A genetic counselor contacts each patient with a high-risk PRS result or monogenic disease variant to discuss the result’s health significance and offer guidance for a conversation to have with their physician. All patients and their primary care physicians receive a copy of the laboratory report, and each patient with at least one high-risk PRS result is additionally given patient-oriented educational materials about the relevant disease(s) (Supplementary File 2). The patient’s primary care physician also receives a copy of physician-oriented educational materials to support their decision-making about PRS (Supplementary File 3). Given the current state of the evidence, the physician materials note that professional guidelines do not recommend specific changes to general screening or prevention recommendations based on PRS results, but these materials are updated over the course of the study as evidence accrues to support distinct recommendations.

Results from the first 227 prospective samples

As of 21 October 2021, 227 GenoVA trial participants have been assayed using the prospective PRS pipeline from two primary sample types (130 blood, 97 saliva). Of these, 108 participants (48%) self-report as white race and non-Hispanic/Latinx ethnicity, and 78 (34%) identify as women. In this preliminary sample of trial enrollees, the proportions of participants whose PRS are above the study threshold for high risk are consistent with those observed in the MGBB, ranging from 5.7% for CRCa to 15.3% for PrCa (Table 2). Two actionable ACMG SF v2.0 variants have been identified and confirmed in the first 227 enrollees (BRCA1:NM_007294 c.2748delT (p.Asn916LysfsX84), likely pathogenic; BRCA2:NM_000059 c.3545_3546delTT (p.Phe1182X), pathogenic). The reporting of these results to trial participants and their physicians is underway. The study will determine whether PRS implementation affects clinical management and enables the detection of undiagnosed prevalent cases and incident cases during the observation period.

Table 2 Summary of PRS results from the first six batches of clinical samples in the GenoVA Study

Full size table

Discussion

Bridging two significant gaps between PRS development and clinical implementation, we developed a clinical genotyping array-based assay for six PRS and a process to report the results to patients and primary care physicians. The PRS were robust across multiple genotyping arrays and imputation pipelines. The distributions of unadjusted PRS varied by reported race in a large biobank, impeding clinical validation, but adjustment for population structure enabled the replication of published PRS–disease associations. These results supported the development of a population structure-adjusted pipeline for PRS calculation and reporting for individual patients, now implemented in a clinical trial of PRS testing along with patient and physician educational materials and genetic counseling support.

The development and implementation of our PRS assay and report illustrate key choices that laboratories must make in what we term phase 2 of the PRS implementation pathway. First, for each target disease, we had to choose the specific PRS to implement among multiple publicly available options (that is, PRS developed and validated by others in phase 1)^21,22. Considerations include the performance of the PRS in both the published discovery and replication cohorts in addition to the population that the laboratory is interested in targeting. Guidelines are emerging on what defines high-quality PRS reporting³⁷, and this improved transparency should help laboratories to select appropriate PRS from the many available. Second, we chose to use a genotype array-based approach instead of genome sequencing. Like genotyping, low-coverage genome sequencing technology is also relatively low cost³⁸. We chose the Illumina GDA because its widespread use in the All of Us Research Program³⁹, eMERGE Consortium¹⁵ and other projects optimizes the likelihood that it will be a well-supported genotyping platform for future improvements, and enhances the generalizability of our methods to other institutions looking to implement clinical PRS testing. Third, although published methods can adjust for population structure in large cohorts of people^40,41, these methods are not immediately applicable for correcting a PRS for a prospectively genotyped individual patient, whose sample is at best part of a small clinically analyzed batch with insufficient data for robust population structure adjustment. Correction thus requires additional decisions about how to adjust for population structure and which reference to use. We chose to impute data against 1000 Genomes Project phase 3 data and to project each new individual patient sample onto the principal components from the MGBB. Other laboratories may choose to impute against the larger TOPMed (Trans-Omics for Precision Medicine) population⁴², although issues of genome build discrepancy and regulatory prohibition against sending patient data to external research servers are limitations. Fourth, once a platform is selected, a clinical laboratory must determine the benchmarks that define an analytically valid PRS assay. We chose to verify the PRS performance in our laboratory to determine the appropriate parameters for our assay; calculate the analytical performance of the genotyping array and imputation pipeline using both well-characterized reference samples and individual level genome data; and calculate the robustness and performance of the PRS using genome data and multiple array platforms from both reference and individual samples. This multi-step approach helped ensure the accuracy of the data going into the PRS as well as the final performance of the PRS itself.

We also made numerous choices in how to report PRS results and interpretations to patients and physicians. We decided to report a dichotomous PRS interpretation (that is, high risk versus average risk) instead of a continuous result (for example, percentile rank, relative risk or absolute risk). We have previously described the trade-offs of these approaches, including the need for actionability thresholds; transparency about the limitations of PRS, particularly in underrepresented populations; and the absence of validated predictions models that incorporate both PRS and other clinical risk factors⁴³. For the GenoVA Study we favored a dichotomous result to indicate a possible clinical action threshold to the treating physician. We chose OR > 2 to define high polygenic risk, consistent with effect sizes of traditional risk factors considered for the target diseases^{32,33,34,35,36}. Another laboratory may use the methods we describe to produce measures of continuous risk or of categorical risk at different thresholds thought to be clinically meaningful, which will probably vary among the diseases for which they choose to implement PRS. Estimating absolute disease risk (for example, with the BOADICEA model for breast cancer⁴⁴ or the Pooled Cohort Equations for atherosclerotic cardiovascular disease⁴⁵) may be considered the gold standard for risk stratification, but validated absolute risk models are not available for most diseases and require patient information (for example, mammographic breast density and blood pressure) that is often unavailable to the interpreting laboratory. Drawing on other examples from primary care, we chose not to include directive clinical recommendations on the PRS laboratory report itself, instead assigning such activities to phase 3 of the PRS implementation pathway, supported by informational materials and genetic counseling support. We note that, for example, although a laboratory reports the results of a patient’s low-density lipoprotein cholesterol and reference range for the assay, it is the treating physician who contextualizes that result with the patient’s other characteristics to decide whether to offer cholesterol-lowering therapy.

The question of how to support physician management of PRS results without under- or overselling the potential benefits of PRS is controversial, given the lack of prospective data showing that the clinical use of PRS improves patient outcomes. In this early era of PRS implementation, the most prudent course of action is probably to develop educational and consultant resources, such as those used in the GenoVA Study, to present transparently the evidence for and limitations of PRS interpretations without being overly prescriptive in their recommendations. Given the participant age range and choice of diseases in the GenoVA Study, we anticipate that most physician actions will fall within already clinically acceptable practices (for example, more frequent hemoglobin A1c screening for T2D or favoring colonoscopy screening over fecal immunochemical testing for CRCa screening). Stronger evidence of benefit will be needed to justify actions that deviate more significantly from accepted practice, such as screening starting at much younger ages or requiring more invasive or expensive procedures. As they do in all areas of medicine, physicians will need to use available evidence and clinical judgment to make the best decisions with their patients. The GenoVA Study is collecting data on what physicians do with PRS results and their preferences for how they can be supported in this decision-making.

Although other laboratories are developing PRS assays in both clinical and research settings and have reported the aggregate performance of these PRS in a population, including biobanks or customers of direct-to-consumer companies^{15,38,46,47,48}, none has described the development and validation of a clinical, population structure-adjusted assay for prospectively tested individuals. Although the eMERGE consortium and other studies are actively developing trans-ancestry PRS for a number of common diseases^15,49, we report, here, a single clinical assay for population structure-adjusted PRS for multiple diseases. And while other laboratories may make different decisions about the number of disease PRS they choose to implement, whether and how to compare the performance of multiple available PRS for each disease, and the format of the clinical PRS report, our work provides a framework for how a laboratory can clinically validate and implement a prospective PRS suitable for an individual patient.

Much has been written about the reduced validity of most PRS in populations of non-European ancestry, due to their use of non-causal loci and effect sizes from GWAS in predominantly European discovery cohorts^19,20,50,51. As we await larger datasets from more diverse populations and the methodological advances that will improve the performance of trans-ancestry PRS^10,15,49, a clinical laboratory looking to develop a PRS assay for a given disease has the following options: (1) postpone implementation, as done by some commercial laboratories;^52,53 (2) implement separate ancestry-specific published PRS only in those ancestral groups from which they were derived and validated; or (3) implement a single PRS that aims for applicability across ancestry groups and report transparently any applicable limitations in the underlying evidence and its interpretation for specific individuals or ancestral groups. Because the second option requires the assignment of an individual patient to a specific ancestry group, either before or during PRS analysis, and, problematically, risks the inequitable provision of PRS to some populations but not to others, we chose the third option for the GenoVA Study and implemented a single method of adjustment for population structure. After doing so, we observed that the chosen PRS threshold corresponding to OR > 2 generally identified subjects at higher risk of disease across reported race in the MGBB replication cohort. The magnitude and precision of this effect did vary by reported race, probably due to two factors: small numbers of MGBB cases for certain diseases in certain racial groups; and real differences in the ability of these PRS to correlate with disease risk in non-European ancestry groups, as has been observed even in well-developed trans-ancestry PRS^10,54. Methodological advances that leverage local ancestry or GWAS summary statistics from multiple diverse populations will improve the performance of PRS across ancestry groups^55,56. In the meantime, we have developed a clinically validated PRS assay, the application of which in diverse ancestry groups is defensible but the results of which, nonetheless, have limitations. These limitations are clearly presented in a clinical laboratory report (phase 2), which can then be contextualized by the physician for each individual (phase 3). Applying population-level data to individual patient care represents both the science and art of medical practice, particularly when the individual patient is not well represented in the available data^57,58.

In conclusion, data from increasingly larger and more diverse populations, coupled with computational advances, are propelling PRS into consideration for clinical implementation. We have shown that laboratory assay development and PRS reporting to patients and physicians are feasible (but non-trivial) next phases in PRS implementation. As the performance of PRS continues to improve, particularly for individuals of underrepresented ancestry groups, the implementation processes we describe can serve as generalizable models for laboratories and health systems looking to realize the potential of PRS for improved patient health.

Methods

Selection of PRS for implementation

We identified large GWAS for the six target diseases for which the summary statistics (base files with alleles and weights) were freely available from the Polygenic Score (PGS) Catalog²¹ (AFib, CAD, T2D, BrCa) or the Cancer PRSWeb (CRCa, PrCa)²² as of 26 December 2019. For the three cardiometabolic diseases (AFib, CAD and T2D) we chose the PRS derived from the UK Biobank in Khera et al. 2018 (ref. ⁷): for AFib, the PGS Catalog Publication (PGP) ID is PGP000006 and the PGS ID is PGS000016; for CAD, the PGP ID is PGP000006 and the PGS ID is PGS000013; and for T2D the PGP ID is PGP000006 and the PGS ID is PGS000014. For the three cancers we chose PRS derived from the largest published GWAS at the time: for BrCa we used Michailidou et al. 2017 and Mavaddat et al. 2019 (PGS ID = PGS000007, PGP ID = PGP000002) (ref. ^59,60); for CRCa we used Huyghe et al. 2019 (PRSWEB_PHECODE153_CRC-Huyghe_PT_MGI_20191112, PRS tuning parameter: 3.98107170553497e-07) (ref.²³); and for PrCa we used Schumacher et al. 2018 (PRSWEB_PHECODE185_Pca- PRACTICAL_LASSOSUM_MGI_20191112, PRS tuning parameter: s0.5_Lambda0.00695192796177561) (ref. ⁶¹).

Replication of published PRS

Population and sample

Given that the GenoVA Study is enrolling participants from eastern Massachusetts, USA, we used data from the Mass General Brigham (formerly Partners Healthcare) Biobank (MGBB)⁶² to evaluate the performance of the selected PRS in a similar population and workflow for our study and assay. MGBB participants were not included in the published derivation and validation studies for the PRS used. In brief, MGBB was launched in 2010 with the initial goal of collecting DNA, plasma, and serum samples from 75,000 patients from Brigham and Women’s Hospital, Massachusetts General Hospital, and other MGBB-affiliated healthcare facilities, and obtaining patient consent for the linkage between biospecimen data, medical record data and survey data. We use the terms ‘race’ and ‘ethnicity’ to refer to social constructs often used in healthcare operations and biomedical research to evaluate and address disparities between populations. Racial categories of participants in the MGBB (for example, white or Asian) are derived from electronic health record (EHR) data. For the present analysis we collapsed reported race in MGBB into four categories: Asian, Black, white, and other/unknown. Race and ethnicity of GenoVA Study participants were collected through EHR data and self-report and categorized using the five racial categories (American Indian or Alaska Native, Asian, Black or African American, Native Hawaiian or Other Pacific Islander, and white) and two ethnic categories (Hispanic/Latinx and Not Hispanic/Latinx) required by US federal data collection standards. We use the term ‘ancestry’ to describe the genetic construct describing inheritance of variants from global ancestral populations.

Disease phenotyping

We used validated computed phenotypes from MGBB to define case and control status for each of the six diseases (Supplementary Table 1). Validated MGB phenotypes are available for CAD (PPV = 95%), AFib (PPV = 94%), T2D (PPV = 95%), and colorectal (PPV = 100%), breast (PPV = 95%) and prostate cancer (PPV = 100%)^63,64,65. For each disease, ‘caseness’ was defined as prevalent disease on 16 December 2019. For subgroup analyses, participant age was determined on 16 December 2019 or at death, if earlier. Only women and men were assigned case or control status for breast and prostate cancer, respectively.

Genotyping and imputation

We used genotype data from the 36,423 MGBB participants with available genotyping data as of 16 December 2019. Genotyping was performed using standard processing described previously on one of three Illumina Infinium genotyping arrays: (1) a pre-release version developed by the Multi-Ethnic Genotyping Array Consortium (Multi-Ethnic Genotyping Array (MEGA), n = 4,924); (2) an expanded version of this pre-commercial array (Expanded Multi-Ethnic Genotyping Array (MEGAEX), n = 5,345); and (3) the final commercial version (Multi-Ethnic Global (MEG), n = 26,157). The MEGA, MEGAEX and MEG arrays consisted of 1.39, 1.74 and 1.78 million probes, respectively⁶⁶. For MEGA and MEGAEX data, only probes found in the commercial version of the array (MEG) were used in the present analysis. Quality control for the genotyping requires samples to have at least a 99% call rate and concordant sex between the EHR and what is computed from the array data. We used existing MGBB imputed data generated by batching sets of ~5,000 participants and imputing against the 1000 Genomes Project phase 3 data using the Michigan Imputation Server⁶⁷ (https://imputationserver.sph.umich.edu/index.html#!), with ShapeIT (v2.r790) (ref. ⁶⁸) used for phasing and Minimac3 used for imputation with default settings. Sets of imputed variants were compared with the base files for each PRS to ensure sufficient representation of probes (Supplementary Table 2) (ref. ⁶⁷).

Calculation of PRS and adjustment for population structure

Unadjusted raw PRS (PRS_raw) for each disease were calculated using PLINK (v.2.0a) by taking the product of the count of risk alleles and the risk allele weight at each locus in the PRS and then summing across available risk loci. The loci included in each PRS, the risk alleles and the corresponding weights were downloaded from the PGS Catalog or Cancer PRSWeb. A population structure-adjusted PRS was calculated for each disease, using a previously described approach⁴⁰ implementing principal components analysis to compute adjusted residualized PRS for each disease. Principal components were calculated using all genotyped MGBB participants and a set of 16,385 of 16,443 previously reported ancestry-informative SNPs⁶⁹. For each disease we then fit a linear model for PRS_raw as a function of the first four principal components in controls for that disease (PRS_raw ~ PC1 + PC2 + PC3 + PC4) in R (v.4.0.3). We then applied this model to calculate a predicted PRS (PRS_pred) for each disease in all cases and controls. Residualized, population structure-adjusted PRS (PRS_adj) were then computed for each individual for each disease as the difference between the raw and the predicted PRS (PRS_raw − PRS_pred). For PRS_raw, values were standardized (PRS_std-raw) using the mean and standard deviation in the MGBB of the PRS_raw values (Supplementary Table 3). Similarly, PRS_std-adj was computed using the mean and standard deviation in the MGBB of the PRS_adj values (Supplementary Table 3). The distributions of PRS_std-raw and PRS_std-adj by genotype array, sex, age deciles and reported race were compared among all subjects using the density function in R (v.4.0.3).

PRS–disease association

The association of PRS_std-adj with the odds of disease was replicated in MGBB participants using the six disease phenotypes described above. For each PRS and disease, odds of disease (n_cases/n_controls) were calculated for each of 50 PRS quantiles. For race-stratified analyses, PRS deciles were used if too few cases were available for analysis across 50 quantiles. To visualize the PRS–disease associations, we plotted the log(odds) of disease against the mean PRS_std-adj in each quantile. Correlation was measured with Pearson correlation coefficients using RStudio (v.1.1.383) with R (v.4.0.3).

PRS threshold for high risk

We set a predicted polygenic OR > 2 to identify individuals at high polygenic risk for each disease, mirroring both a common threshold from Mendelian genetics³¹ and the effect sizes for disease risk factors already considered in current clinical care^{32,33,34,35,36}. To operationalize this OR > 2 threshold, we compared standardized PRS Z scores for each individual to a disease-specific cut off 𝛕, based on previously published estimates of the change in odds of disease per standard deviation change in the PRS (Supplementary Table 3). Specifically, 𝛕 = ln(2)/ln(OR_s.d.), where 2 is the target OR threshold defining high risk and OR_s.d. is the estimated multiplicative change in odds per standard deviation change in the PRS. Assuming that the published OR_s.d. accurately captures the relationship between PRS and disease, the odds of disease for individuals with standardized PRS Z score = 𝛕 are twofold that of individuals with a median PRS Z score. These standardized PRS thresholds were used to assign individual patients to risk categories as described below (PRS calculation for clinical assay for individual samples).

Clinical PRS assay for individual samples

Based on the results of the above methods, we developed and validated a genotype array-based clinical assay for PRS, in addition to secondary findings from the ACMG v2.0 list (ACMG SF v2.0, Fig. 1)²⁴. We include additional variants identified by the ACMG or other organizations as important secondary findings as updated recommendations accrue⁷⁰.

Validation samples

Replicates of each of three reference samples from GIAB²⁵ maintained by the National Institute of Standards and Technology were included in the validation assay: NA12878 × 9, NA24631 × 6 and NA24385 × 6. Analytical performance (sensitivity and PPV for presence or absence of variant sites) was determined in the benchmarking regions (v3.3.2). In addition, we included 22 samples with polymerase chain reaction-free genome sequencing data (described below) and 9 samples with high-risk PRS for one of the six diseases as determined by the MGBB data, including one individual with high-risk PRS for two diseases. To test the sensitivity of the secondary finding analysis, we genotyped 20 samples with previously identified pathogenic or likely pathogenic variants in the ACMG SF v2.0 list.

Genotyping and imputation

Validation samples were genotyped according to manufacturer-standard workflows on either a pre-commercial release of the Illumina Global Diversity Array (GDA-PC) or the final commercial release of the Global Diversity Array (GDA). The Illumina-specific files containing called genotypes in AA/AB/BB format (GTC files) generated by genotype array were converted to variant call format (VCF) using a modified version of the gtc2vcf script from Illumina. All samples required an overall call rate of greater than 98.5%. Imputation was performed using updated software, with EAGLE v2.4.1 (ref. ⁷¹) for phasing and Minimac4 (ref. ⁶⁷) for imputation using the 1000 Genomes Project phase 3 dataset. Importantly, monomorphic sites were not removed during the imputation process due to the small batch sizes used in the prospective assay.

PRS calculation for clinical assay for individual samples

PRS_raw was calculated for each sample as described above. To determine PRS_adj, unadjusted PRS (PRS_raw) were first calculated for each individual sample as described for the overall MGBB cohort. For each individual, the eigenvariable, eigenvalue and frequency output from the MGBB principal components analysis were used to project each new individual sample onto the MGBB principal components, using the following command in PLINK v.2.0a:⁷²

plink2 —pfile individual_data —read-freq ref_pcs.acount —score ref_pcs.eigenvec.allele 2 5 header-read no-mean-imputation variance-standardize —score-col-nums 6-15 —out new_projection

The resulting projected principal components were then scaled to match the MGBB principal components by taking the square root of the eigenvalue and then multiplying by 2. The scaled principal components (PCs) were fitted into the linear model for each disease developed in the MGBB data to obtain PRS_pred:

BrCa: PRSpred = 17.609341*PC1 − 4.146935*PC2 + 5.335144*PC3 + 3.833931*PC4 − 0.421679

CRCa: PRSpred = −13.659121*PC1 + 6.411109*PC2 − 2.483703*PC3 − 6.869127*PC4 + 6.131384

PrCa: PRSpred = 23.441147*PC1 + 13.724771*PC2 − 9.528270*PC3 + 4.118756*PC4 + 11.506243

AFib: PRSpred = 9.6269881*PC1 − 3.2878238*PC2 − 6.6519006*PC3 − 3.0149108*PC4 + 32.4067610

CAD: PRSpred = −6.1974327*PC1 − 3.6757094*PC2 − 1.3488677*PC3 − 1.3490566*PC4 + 18.0582457

T2D: PRSpred = 26.4700782*PC1 − 7.4283370*PC2 + 9.3782116*PC3 + 1.6994457*PC4 + 55.6998719

PRS_adj was then calculated as the difference between PRS_raw and PRS_pred. Standardized, adjusted PRS values (PRS_std-adj) were calculated using the mean and standard deviation of PRS_adj in MGBB and compared against the PRS threshold corresponding to OR > 2 as determined from the original publications (Supplementary Table 3). Any PRS_std-adj result above the PRS threshold corresponding to OR > 2 was categorized as high polygenic risk.

Genome sequencing

We selected 22 diverse samples that had previously undergone clinical whole genome sequencing to determine the robustness of PRS across different platforms. Genome sequencing was performed at the Clinical Research Sequencing Platform of the Broad Institute using polymerase chain reaction-free library construction and sequencing on an Illumina NovaSeq with two 150 bp paired-end reads with ≥95% of bases covered at ≥20-fold. Reads were aligned to GRCh37 using the Burrows–Wheeler Aligner (BWA v.0.7.15)⁷³ and variant calls were made using HaplotypeCaller from the Genomic Analysis Tool Kit (GATK v.4.0.3.0)^74,75. PRS_raw, PRS_std-raw, PRS_adj and PRS_std-adj were calculated as above for the other prospective samples. As stated above, these 22 samples were also analyzed on the GDA-PC array to compare PRS between genome sequencing and array. The difference between the sequence-based and array-based PRS were visualized, and dichotomous risk classifications were formally compared using the Matthews correlation coefficient⁷⁶.

Identification of actionable variants associated with monogenic disease

Variants from the original genotyping VCF were annotated and filtered to the 59 genes suggested for screening of secondary findings as recommended by the ACMG (ACMG SF v2.0)²⁴ to find: (1) variants previously identified as disease causing by the MGB Laboratory for Molecular Medicine; (2) variants classified as pathogenic or likely pathogenic in ClinVar with a minor allele frequency (MAF) < 0.1%; (3) variants classified as a disease-causing mutation in the Human Gene Mutation Database with a MAF < 0.03%; and (4) loss-of-function variants (nonsense, frameshift, canonical splice-site, and initiating methionine variants) with a MAF < 0.1% in genes in which that is a disease mechanism. Clinical variant classification was carried out in accordance with the criteria set by the guidelines by the ACMG and the Association of Molecular Pathology⁷⁷, with disease-specific modifications as recommended by the Clinical Genome Resource Expert Panels⁷⁸.

Prospectively enrolled trial participants

The assay described above is now in use in the ongoing GenoVA Study randomized trial of clinical PRS (ClinicalTrials.gov identifier: NCT04331535), in which eligible participants are patients of the VA Boston Healthcare System, aged 50–70 years, without known diagnoses of the six target diseases. Enrollees provide a clinical blood or saliva sample for analysis at the Laboratory for Molecular Medicine.

Ethics declaration

Analyses of the genomic and MGBB samples and data have been reviewed and approved by the Mass General Brigham institutional review board (2019P001933). Analyses for the prospective pipeline, including the use of prior clinical samples, were conducted under the Mass General Brigham institutional review board (2004P001056); all individuals with clinical testing, including those with genome sequencing data, gave consent for clinical testing, and all individual data were de-identified. The GenoVA Study is approved by the VA Boston Healthcare System (no. 3241) and Harvard Medical School institutional review board (IRB19-0594), and all enrollees provided written informed consent.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The majority of the MGBB genotyped samples are deposited in dbGAP as part of the eMERGE consortium, phase 3 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001584.v2.p2). Additional MGBB data were accessed under institutional review board protocol for this current study and are not publicly available due to restrictions on the data. Data from the GenoVA Study trial will be made publicly available after study completion. The 1000 Genomes Project phase 3 dataset used in this study was v5a and was downloaded from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/.

Code availability

The code used to adjust the PRS for population structure is available for download here: https://github.com/MGB-Personalized-Medicine/PRS-adjustment.

References

Shendure, J., Findlay, G. M. & Snyder, M. W. Genomic medicine: progress, pitfalls, and promise. Cell 177, 45–57 (2019).
Article CAS PubMed PubMed Central Google Scholar
GWAS Catalog (National Human Genome Research Institute); https://www.ebi.ac.uk/gwas/
Meigs, J. B. et al. Genotype score in addition to common risk factors for prediction of type 2 diabetes. N. Engl. J. Med. 359, 2208–2219 (2008).
Article CAS PubMed PubMed Central Google Scholar
Ripatti, S. et al. A multilocus genetic risk score for coronary heart disease: case-control and prospective cohort analyses. Lancet 376, 1393–1400 (2010).
Article PubMed PubMed Central Google Scholar
Zheng, S. L. et al. Cumulative association of five genetic variants with prostate cancer. N. Engl. J. Med. 358, 910–919 (2008).
Article CAS PubMed Google Scholar
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
Article CAS PubMed PubMed Central Google Scholar
Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).
Article CAS PubMed PubMed Central Google Scholar
Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).
Article PubMed PubMed Central CAS Google Scholar
Inouye, M. et al. Genomic risk prediction of coronary artery disease in 480,000 adults: implications for primary prevention. J. Am. Coll. Cardiol. 72, 1883–1893 (2018).
Article PubMed PubMed Central Google Scholar
Conti, D. V. et al. Trans-ancestry genome-wide association meta-analysis of prostate cancer identifies new susceptibility loci and informs genetic risk prediction. Nat. Genet. 53, 65–75 (2021).
Article CAS PubMed PubMed Central Google Scholar
Klarin, D. et al. Genome-wide association analysis of venous thromboembolism identifies new risk loci and genetic overlap with arterial vascular disease. Nat. Genet. 51, 1574–1579 (2019).
Article CAS PubMed PubMed Central Google Scholar
Mosley, J. D. et al. Predictive accuracy of a polygenic risk score compared with a clinical risk score for incident coronary heart disease. JAMA 323, 627–635 (2020).
Article CAS PubMed PubMed Central Google Scholar
Vassy, J. L. et al. Polygenic type 2 diabetes prediction at the limit of common variant detection. Diabetes 63, 2172–2182 (2014).
Article CAS PubMed PubMed Central Google Scholar
Seibert, T. M. et al. Polygenic hazard score to guide screening for aggressive prostate cancer: development and validation in large scale cohorts. BMJ 360, j5757 (2018).
Article PubMed PubMed Central Google Scholar
National Human Genome Research Institute (NHGRI). Electronic Medical Records and Genomics (eMERGE) Network https://www.genome.gov/Funded-Programs-Projects/Electronic-Medical-Records-and-Genomics-Network-eMERGE (2020).
Shieh, Y. et al. Breast cancer screening in the precision medicine era: risk-based screening in a population-based trial. J. Natl Cancer Inst. 109, https://doi.org/10.1093/jnci/djw290 (2017).
Brockman, D. G. et al. Design and user experience testing of a polygenic score report: a qualitative study of prospective users. BMC Med. Genomics 14, 238 (2021).
Article PubMed PubMed Central Google Scholar
Choi, S. W., Mak, T. S.-H. & O’Reilly, P. F. Tutorial: a guide to performing polygenic risk score analyses. Nat. Protoc. 15, 2759–2772 (2020).
Article CAS PubMed PubMed Central Google Scholar
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lewis, C. M. & Vassos, E. Polygenic risk scores: from research tools to clinical instruments. Genome Med. 12, 44 (2020).
Article PubMed PubMed Central Google Scholar
Lambert, S. A. et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat. Genet. 53, 420–425 (2021).
Article CAS PubMed Google Scholar
Fritsche, L. G. et al. Cancer PRSweb: an online repository with polygenic risk scores for major cancer traits and their evaluation in two independent biobanks. Am. J. Hum. Genet. 107, 815–836 (2020).
Article CAS PubMed PubMed Central Google Scholar
Huyghe, J. R. et al. Discovery of common and rare genetic risk variants for colorectal cancer. Nat. Genet. 51, 76–87 (2019).
Article CAS PubMed Google Scholar
Kalia, S. S. et al. Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2016 update (ACMG SF v2.0): a policy statement of the American College of Medical Genetics and Genomics. Genet. Med. 19, 249–255 (2017).
Article PubMed Google Scholar
Zook, J. M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016).
Article CAS PubMed PubMed Central Google Scholar
Bowling, K. M. et al. Identifying rare, medically relevant variation via population-based genomic screening in Alabama: opportunities and pitfalls. Genet. Med. 23, 280–288 (2021).
Article CAS PubMed Google Scholar
Weedon, M. N. & Wright, C. F. et al. Use of SNP chips to detect rare pathogenic variants: retrospective, population based diagnostic evaluation. BMJ 372, n214 (2021).
Google Scholar
Scheuner, M. T., Edelen, M. O., Hilborne, L. H. & Lubin, I. M. Effective communication of molecular genetic test results to primary care providers. Genet. Med. 15, 444–449 (2013).
Article PubMed Google Scholar
McLaughlin, H. M. et al. A systematic approach to the reporting of medically relevant findings from whole genome sequencing. BMC Med. Genet. 15, 134 (2014).
Article PubMed PubMed Central CAS Google Scholar
Farmer, G. D., Gray, H., Chandratillake, G., Raymond, F. L. & Freeman, A. L. J. Recommendations for designing genetic test reports to be understood by patients and non-specialists. Eur. J. Hum. Genet. 28, 885–895 (2020).
Article PubMed PubMed Central Google Scholar
Senol-Cosar, O. et al. Considerations for clinical curation, classification, and reporting of low-penetrance and low effect size variants associated with disease risk. Genet. Med. 21, 2765–2773 (2019).
Article PubMed Google Scholar
Goff, D. C. et al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on practice guidelines. Circulation 129(Suppl. 2), S49–S73 (2014).
PubMed Google Scholar
American Diabetes Association. 2. Classification and diagnosis of diabetes: standards of medical care in diabetes – 2021. Diabetes Care 44(Suppl. 1), S15–S33 (2021).
Article Google Scholar
Grossman, D. C. et al. Screening for prostate cancer: US Preventive Services Task Force recommendation statement. JAMA 319, 1901–1913 (2018).
Article PubMed Google Scholar
Siu, A. L., US Preventive Services Task Force Screening for breast cancer: US Preventive Services Task Force recommendation statement. Ann. Intern. Med. 164, 279–296 (2016).
Article PubMed Google Scholar
Davidson, K. W. et al., US Preventive Services Task Force Screening for colorectal cancer: US Preventive Services Task Force recommendation statement. JAMA 325, 1965–1977 (2021).
Article PubMed Google Scholar
Wand, H. et al. Improving reporting standards for polygenic scores in risk prediction studies. Nature 591, 211–219 (2021).
Article CAS PubMed PubMed Central Google Scholar
Homburger, J. R. et al. Low coverage whole genome sequencing enables accurate assessment of common variants and calculation of genome-wide polygenic scores. Genome Med. 11, 74 (2019).
Article CAS PubMed PubMed Central Google Scholar
Denny, J. C. et al. The ‘All of Us’ Research Program. N. Engl. J. Med. 381, 668–676 (2019).
Article PubMed Google Scholar
Khera, A. V. et al. Whole-genome sequencing to characterize monogenic and polygenic contributions in patients hospitalized with early-onset myocardial infarction. Circulation 139, 1593–1602 (2019).
Article CAS PubMed PubMed Central Google Scholar
Dikilitas, O. et al. Predictive utility of polygenic risk scores for coronary heart disease in three major racial and ethnic groups. Am. J. Hum. Genet. 106, 707–716 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kowalski, M. H. et al. Use of >100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations. PLoS Genet. 15, e1008500 (2019).
Article PubMed PubMed Central CAS Google Scholar
Lewis, A. C. F., Green, R. C. & Vassy, J. L. Polygenic risk scores in the clinic: translating risk into action. HGG Adv. 2, 100047 (2021).
PubMed PubMed Central Google Scholar
Lee, A. et al. BOADICEA: a comprehensive breast cancer risk prediction model incorporating genetic and nongenetic risk factors. Genet. Med. 21, 1708–1718 (2019).
Article PubMed PubMed Central Google Scholar
Stone, N. J. et al. 2013 ACC/AHA guideline on the treatment of blood cholesterol to reduce atherosclerotic cardiovascular risk in adults: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Circulation 129(Suppl. 2), S1–S45 (2014).
PubMed Google Scholar
Hughes, E. et al. Development and validation of a clinical polygenic risk score to predict breast cancer risk. JCO Precis. Oncol. 4, 585–592 (2020).
Article Google Scholar
Our Health + Ancestry DNA Service - 23andMe (23andMe); https://www.23andme.com/dna-health-ancestry/
Chen, S.-F. et al. Genotype imputation and variability in polygenic risk score estimation. Genome Med. 12, 100 (2020).
Article PubMed PubMed Central Google Scholar
National Human Genome Research Institute. Polygenic RIsk MEthods in Diverse populations (PRIMED) Consortium https://www.genome.gov/Funded-Programs-Projects/PRIMED-Consortium
Manolio, T. A. Using the data we have: improving diversity in genomic research. Am. J. Hum. Genet. 105, 233–236 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lewis, A. C. F. & Green, R. C. Polygenic risk scores in the clinic: new perspectives needed on familiar ethical issues. Genome Med. 13, 14 (2021).
Article PubMed PubMed Central Google Scholar
Ray, T. Myriad Genetics recalibrates breast cancer PRS for all ancestries in anticipation of broader launch. Genomeweb https://www.genomeweb.com/molecular-diagnostics/myriad-genetics-recalibrates-breast-cancer-prs-all-ancestries-anticipation (2021).
Ambry Product Team. Important discontinuation notice: AmbryScore: polygenic risk scores (PRS) https://info.ambrygenetics.com/take-a-brief-survey-for-entry-into-amazon-gift-card-drawing
Ge, T. et al. Validation of a trans-ancestry polygenic risk score for type 2 diabetes in diverse populations. Preprint at medRxiv https://doi.org/10.1101/2021.09.11.21263413 (2021).
Marnetto, D. et al. Ancestry deconvolution and partial polygenic score can improve susceptibility predictions in recently admixed individuals. Nat. Commun. 11, 1628 (2020).
Article PubMed PubMed Central CAS Google Scholar
Ruan, Y. et al. Improving polygenic prediction in ancestrally diverse populations. Preprint at medRxiv https://doi.org/10.1101/2020.12.27.20248738 (2021).
Armstrong, K. A. & Metlay, J. P. Annals clinical decision making: translating population evidence to individual patients. Ann. Intern. Med. 172, 610–616 (2020).
Article PubMed Google Scholar
Sniderman, A. D., LaChapelle, K. J., Rachon, N. A. & Furberg, C. D. The necessity for clinical reasoning in the era of evidence-based medicine. Mayo Clin. Proc. 88, 1108–1114 (2013).
Article PubMed Google Scholar
Michailidou, K. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, 92–94 (2017).
Article PubMed PubMed Central CAS Google Scholar
Mavaddat, N. et al. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am. J. Hum. Genet. 104, 21–34 (2019).
Article CAS PubMed Google Scholar
Schumacher, F. R. et al. Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci. Nat. Genet. 50, 928–936 (2018).
Article CAS PubMed PubMed Central Google Scholar
Karlson, E. W., Boutin, N. T., Hoffnagle, A. G. & Allen, N. L. Building the Partners Healthcare Biobank at Partners Personalized Medicine: informed consent, return of research results, recruitment lessons and operational considerations. J. Pers. Med. 6, 2 (2016).
Article PubMed Central Google Scholar
Yu, S. et al. Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources. J. Am. Med. Inform. Assoc. 22, 993–1000 (2015).
Article PubMed PubMed Central Google Scholar
Yu, S. et al. Enabling phenotypic big data with PheNorm. J. Am. Med. Inform. Assoc. 25, 54–60 (2018).
Article PubMed Google Scholar
Gainer, V. S. et al. The Biobank portal for Partners Personalized Medicine: a query tool for working with consented Biobank samples, genotypes, and phenotypes using i2b2. J. Pers. Med. 6, 11 (2016).
Article PubMed Central Google Scholar
Blau, A., Brown, A., Mahanta, L. & Amr, S. S. The translational genomics core at Partners Personalized Medicine: facilitating the transition of research towards personalized medicine. J. Pers. Med. 6, 10 (2016).
Article PubMed Central Google Scholar
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
Article CAS PubMed PubMed Central Google Scholar
Delaneau, O., Zagury, J.-F. & Marchini, J. Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods 10, 5–6 (2013).
Article CAS PubMed Google Scholar
Libiger, O. & Schork, N. J. A method for inferring an individual’s genetic ancestry and degree of admixture associated with six major continental populations. Front. Genet. 3, 322 (2013).
Article PubMed PubMed Central Google Scholar
Miller, D. T. et al. ACMG SF v3.0 list for reporting of secondary findings in clinical exome and genome sequencing: a policy statement of the American College of Medical Genetics and Genomics (ACMG). Genet. Med. 23, 1381–1390 (2021).
Article PubMed Google Scholar
Loh, P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).
Article CAS PubMed PubMed Central Google Scholar
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).
Article PubMed PubMed Central CAS Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv https://doi.org/10.1101/201178 (2018).
Van der Auwera, G. A. & O’Connor, B. D. Genomics in the Cloud: Using Docker, GATK, and WDL in Terra 1st edn (O’Reilly Media, 2020).
Shi, L. et al. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat. Biotechnol. 28, 827–838 (2010).
Article CAS PubMed Google Scholar
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).
Article PubMed PubMed Central Google Scholar
Rivera‐Muñoz, E. A. et al. ClinGen Variant Curation Expert Panel experiences and standardized processes for disease and gene‐level specification of the ACMG/AMP guidelines for sequence variant interpretation. Hum. Mutat. 39, 1614–1622 (2018).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by NIH National Human Genome Research Institute grant R35HG010706 (J.L.V.). S.A.L. is supported by NIH grants R01HL139731, R01HL157635 and American Heart Association 18SFRN34250007. P.N. is supported by grants from the National Heart, Lung and Blood Institute (R01HL142711, R01HL148050, R01HL151283, R01HL127564, R01HL148565, R01HL135242, R01HL151152), National Institute of Diabetes and Digestive and Kidney Diseases (R01DK125782), Fondation Leducq (TNE-18CVD04) and Massachusetts General Hospital (Fireman Chair). M.S.L. is supported by NIH grants U01HG008685, R01HG010372, R01HL143295, OT2OD02750, U41HG006834 and U01TR003201, all unrelated to the present work.

Author information

Posthumous authorship: Wanfeng Yu
These authors contributed equally: Jason L. Vassy, Matthew S. Lebo.

Authors and Affiliations

Laboratory for Molecular Medicine, Mass General Brigham Personalized Medicine, Cambridge, MA, USA
Limin Hao, Gabriel F. Berriz, Elizabeth D. Hynes, Christopher Koch, Prathik Korategere V Kumar, Shruti S. Parpattedar, Marcie Steeves, Wanfeng Yu & Matthew S. Lebo
Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
Peter Kraft
Medical Genetics, Massachusetts General Hospital, Boston, MA, USA
Marcie Steeves
Veterans Affairs Boston Healthcare System, Boston, MA, USA
Ashley A. Antwi, Charles A. Brunette, Morgan Danowski, Natalie E. Jones & Jason L. Vassy
Division of Gastroenterology, Massachusetts General Hospital, Boston, MA, USA
Manish K. Gala
Harvard Medical School, Boston, MA, USA
Manish K. Gala, Robert C. Green, Pradeep Natarajan, Jason L. Vassy & Matthew S. Lebo
Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
Robert C. Green & Natalie E. Jones
Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
Robert C. Green & Jason L. Vassy
Precision Population Health, Ariadne Labs, Boston, MA, USA
Robert C. Green & Jason L. Vassy
E J Safra Center for Ethics, Harvard University, Cambridge, MA, USA
Anna C. F. Lewis
Cardiovascular Disease Initiative, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
Steven A. Lubitz & Pradeep Natarajan
Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
Steven A. Lubitz & Pradeep Natarajan
Demoulas Center for Cardiac Arrhythmias, Massachusetts General Hospital, Boston, MA, USA
Steven A. Lubitz
Department of Pathology, Brigham and Women’s Hospital, Boston, MA, USA
Matthew S. Lebo

Authors

Limin Hao
View author publications
You can also search for this author in PubMed Google Scholar
Peter Kraft
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel F. Berriz
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth D. Hynes
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Koch
View author publications
You can also search for this author in PubMed Google Scholar
Prathik Korategere V Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Shruti S. Parpattedar
View author publications
You can also search for this author in PubMed Google Scholar
Marcie Steeves
View author publications
You can also search for this author in PubMed Google Scholar
Wanfeng Yu
View author publications
You can also search for this author in PubMed Google Scholar
Ashley A. Antwi
View author publications
You can also search for this author in PubMed Google Scholar
Charles A. Brunette
View author publications
You can also search for this author in PubMed Google Scholar
Morgan Danowski
View author publications
You can also search for this author in PubMed Google Scholar
Manish K. Gala
View author publications
You can also search for this author in PubMed Google Scholar
Robert C. Green
View author publications
You can also search for this author in PubMed Google Scholar
Natalie E. Jones
View author publications
You can also search for this author in PubMed Google Scholar
Anna C. F. Lewis
View author publications
You can also search for this author in PubMed Google Scholar
Steven A. Lubitz
View author publications
You can also search for this author in PubMed Google Scholar
Pradeep Natarajan
View author publications
You can also search for this author in PubMed Google Scholar
Jason L. Vassy
View author publications
You can also search for this author in PubMed Google Scholar
Matthew S. Lebo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.H., P.K., C.A.B., M.K.G., R.C.G., A.C.F.L., S.A.L., P.N., J.L.V. and M.S.L. conceived the study and contributed to its design. L.H., G.F.B., E.D.H., C.K., P.K.V.K., S.S.P., M.S., W.Y. and C.A.B. analyzed the data from the MGBB and GenoVA samples. A.A.A., C.A.B., M.D., N.E.J. and J.L.V. collected GenoVA data. L.H., J.L.V. and M.S.L. drafted the manuscript, and all authors reviewed the scientific content of the manuscript prior to submission.

Corresponding author

Correspondence to Jason L. Vassy.

Ethics declarations

Competing interests

A.C.F.L. owns stock in Fabric Genomics. S.A.L. receives sponsored research support from Bristol Myers Squibb, Pfizer, Boehringer Ingelheim, Fitbit, Medtronic, Premier and IBM, and has consulted for Bristol Myers Squibb, Pfizer, Blackstone Life Sciences and Invitae. P.N. reports investigator-initiated grants from Amgen, Apple, Boston Scientific, AstraZeneca and Novartis, personal fees from Apple, Genentech, AstraZeneca, Blackstone Life Science, Foresite Labs and Novartis, spousal employment at Vertex, and being co-founder of TenSixteen Bio, all unrelated to the present work. R.C.G. has received compensation for advising the following companies: AIA, Allelica, Embryome, GenomeWeb, Genomic Life, Grail, Humanity, Meenta, OptumLabs, Plumcare, Verily, VinBigData; and is co-founder of Genome Medical, Inc. C.K. now works at Novartis Institutes for BioMedical Research. A.A.A., C.A.B., M.D. and J.L.V. are employees of the US Department of Veterans Affairs (DVA); the views expressed in this paper do not represent those of the DVA or US government. All other authors have no competing interests.

Peer review

Peer review information

Nature Medicine thanks Krista Fischer and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Anna Maria Ranzoni was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Distribution of standardized, adjusted PRS by release batch for six diseases in MGBB.

Standardized, adjusted PRS (PRS_std-adj) plotted by eight batches of three versions of Illumina genotyping arrays (MEG, MEGA, MEGAEX) used to analyze data from up to 36,423 MGBB participants. Abbreviations: AFib, atrial fibrillation; BrCa, breast cancer; CAD, coronary artery disease; CRCa, colorectal cancer; MEG, Multi-Ethnic Global; MEGA, Multi-Ethnic Genotyping Array; MEGAEX, Expanded Multi-Ethnic Genotyping Array; MGBB, Mass General Brigham Biobank; PrCa, prostate cancer; PRS, polygenic risk score; T2D, type 2 diabetes.

Extended Data Fig. 2 Distribution of standardized, adjusted PRS by age for six diseases in MGBB.

Standardized, adjusted PRS (PRS_std-adj) plotted by decade of age among up to 36,423 MGBB participants. Abbreviations: AFib, atrial fibrillation; BrCa, breast cancer; CAD, coronary artery disease; CRCa, colorectal cancer; MGBB, Mass General Brigham Biobank; PrCa, prostate cancer; PRS, polygenic risk score; T2D, type 2 diabetes.

Extended Data Fig. 3 Distribution of adjusted PRS by sex for four diseases in MGBB.

Standardized, adjusted PRS plotted by sex among 16,704 male and 19,719 female MGBB participants. Abbreviations: AFib, atrial fibrillation; BrCa, breast cancer; CAD, coronary artery disease; CRCa, colorectal cancer; MGBB, Mass General Brigham Biobank; PrCa, prostate cancer; PRS, polygenic risk score; T2D, type 2 diabetes.

Extended Data Fig. 4 Correlation between standardized, adjusted PRS and odds of disease in reported white MGBB participants.

Plots show log(odds) of each of six diseases versus quantile (n = 50) of standardized population structure-adjusted PRS (PRS_std-adj) among up to 30,716 MGBB participants of reported white race. The correlation coefficient, r, is shown in each panel. Abbreviations: AFib, atrial fibrillation; BrCa, breast cancer; CAD, coronary artery disease; CRCa, colorectal cancer; MGBB, Mass General Brigham Biobank; OR, odds ratio; PrCa, prostate cancer; PRS, polygenic risk score; T2D, type 2 diabetes.

Extended Data Fig. 5 Correlation between standardized, adjusted PRS and odds of disease in reported Black MGBB participants.

Plots show log(odds) of each of six diseases versus quantile (n = 10) of standardized population structure-adjusted PRS (PRS_std-adj) among up to 1,807 MGBB participants of reported Black race. Results not reported for CRCa due to 0 CRCa cases in at least one quantile. The correlation coefficient, r, is shown in each panel. Abbreviations: AFib, atrial fibrillation; BrCa, breast cancer; CAD, coronary artery disease; CRCa, colorectal cancer; MGBB, Mass General Brigham Biobank; OR, odds ratio; PrCa, prostate cancer; PRS, polygenic risk score; T2D, type 2 diabetes.

Extended Data Fig. 6 Correlation between standardized, adjusted PRS and odds of disease in reported Asian MGBB participants.

Plots show log(odds) of each of six diseases versus quantile (n = 10) of standardized population structure-adjusted PRS (PRS_std-adj) among up to 786 MGBB participants of reported Asian race. Results not reported for CRCa, PrCa, or AFib due to 0 cases in at least one quantile. The correlation coefficient, r, is shown in each panel. Abbreviations: AFib, atrial fibrillation; BrCa, breast cancer; CAD, coronary artery disease; CRCa, colorectal cancer; MGBB, Mass General Brigham Biobank; OR, odds ratio; PrCa, prostate cancer; PRS, polygenic risk score; T2D, type 2 diabetes.

Extended Data Fig. 7 Correlation between standardized, adjusted PRS and odds of disease in MGBB participants of unknown or other reported race.

Plots show log(odds) of each of six diseases versus quantile (n = 50 for T2D, n = 10 for all other disease) of standardized population structure-adjusted PRS (PRS_std-adj) among up to 3,113 MGBB participants of unknown or other reported race. Results not reported for CRCa due to 0 cases in at least one quantile (n = 10). The correlation coefficient, r, is shown in each panel. Abbreviations: AFib, atrial fibrillation; BrCa, breast cancer; CAD, coronary artery disease; CRCa, colorectal cancer; MGBB, Mass General Brigham Biobank; OR, odds ratio; PrCa, prostate cancer; PRS, polygenic risk score; T2D, type 2 diabetes.

Extended Data Fig. 8 Difference in standardized, adjusted PRS between WGS and imputed genotyping arrays for 22 individual samples.

The PRS_std-adj of 22 samples obtained from WGS and from imputed genotyping arrays are subtracted, and the distribution of the difference of the scores is plotted for each disease. Abbreviations: AFib, atrial fibrillation; BrCa, breast cancer; CAD, coronary artery disease; CRCa, colorectal cancer; IMPU, imputed genotype data; MGBB, Mass General Brigham Biobank; PrCa, prostate cancer; PRS, polygenic risk score; T2D, type 2 diabetes; WGS, whole genome sequencing.

Supplementary information

Supplementary Information

Supplementary Tables 1–9 and Files 1–3.

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hao, L., Kraft, P., Berriz, G.F. et al. Development of a clinical polygenic risk score assay and reporting workflow. Nat Med 28, 1006–1013 (2022). https://doi.org/10.1038/s41591-022-01767-6

Download citation

Received: 22 July 2021
Accepted: 02 March 2022
Published: 18 April 2022
Issue Date: May 2022
DOI: https://doi.org/10.1038/s41591-022-01767-6

This article is cited by

Recent advances in polygenic scores: translation, equitability, methods and FAIR tools
- Ruidong Xiang
- Martin Kelemen
- Samuel A. Lambert
Genome Medicine (2024)
Whole genome sequencing in clinical practice
- Frederik Otzen Bagger
- Line Borgwardt
- Finn Cilius Nielsen
BMC Medical Genomics (2024)
Integration of polygenic and gut metagenomic risk prediction for common diseases
- Yang Liu
- Scott C. Ritchie
- Michael Inouye
Nature Aging (2024)
Polygenic risk scores, radiation treatment exposures and subsequent cancer risk in childhood cancer survivors
- Todd M. Gibson
- Danielle M. Karyadi
- Lindsay M. Morton
Nature Medicine (2024)
Assessing the efficacy of target adaptive sampling long-read sequencing through hereditary cancer patient genomes
- Wataru Nakamura
- Makoto Hirata
- Yuichi Shiraishi
npj Genomic Medicine (2024)

Subjects

Abstract

Similar content being viewed by others

Main

Results

Replication of published PRS

Sample characteristics

Unadjusted and adjusted PRS distributions

Replication of PRS–disease association

Prospective PRS assay

Sensitivity and specificity of array and imputation

Performance of prospective PRS assay

Clinical PRS report

Clinical processes and supportive materials

Results from the first 227 prospective samples

Discussion

Methods

Selection of PRS for implementation

Replication of published PRS

Population and sample

Disease phenotyping

Genotyping and imputation

Calculation of PRS and adjustment for population structure

PRS–disease association

PRS threshold for high risk

Clinical PRS assay for individual samples

Validation samples

Genotyping and imputation

PRS calculation for clinical assay for individual samples

Genome sequencing

Identification of actionable variants associated with monogenic disease

Prospectively enrolled trial participants

Ethics declaration

Reporting Summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links