Genetically adjusted PSA levels for prostate cancer screening

Kachuri, Linda; Hoffmann, Thomas J.; Jiang, Yu; Berndt, Sonja I.; Shelley, John P.; Schaffer, Kerry R.; Machiela, Mitchell J.; Freedman, Neal D.; Huang, Wen-Yi; Li, Shengchao A.; Easterlin, Ryder; Goodman, Phyllis J.; Till, Cathee; Thompson, Ian; Lilja, Hans; Van Den Eeden, Stephen K.; Chanock, Stephen J.; Haiman, Christopher A.; Conti, David V.; Klein, Robert J.; Mosley, Jonathan D.; Graff, Rebecca E.; Witte, John S.

doi:10.1038/s41591-023-02277-9

Download PDF

Article
Open access
Published: 01 June 2023

Genetically adjusted PSA levels for prostate cancer screening

Nature Medicine volume 29, pages 1412–1423 (2023)Cite this article

13k Accesses
12 Citations
467 Altmetric
Metrics details

Subjects

Abstract

Prostate-specific antigen (PSA) screening for prostate cancer remains controversial because it increases overdiagnosis and overtreatment of clinically insignificant tumors. Accounting for genetic determinants of constitutive, non-cancer-related PSA variation has potential to improve screening utility. In this study, we discovered 128 genome-wide significant associations (P < 5 × 10⁻⁸) in a multi-ancestry meta-analysis of 95,768 men and developed a PSA polygenic score (PGS_PSA) that explains 9.61% of constitutive PSA variation. We found that, in men of European ancestry, using PGS-adjusted PSA would avoid up to 31% of negative prostate biopsies but also result in 12% fewer biopsies in patients with prostate cancer, mostly with Gleason score <7 tumors. Genetically adjusted PSA was more predictive of aggressive prostate cancer (odds ratio (OR) = 3.44, P = 6.2 × 10⁻¹⁴, area under the curve (AUC) = 0.755) than unadjusted PSA (OR = 3.31, P = 1.1 × 10⁻¹², AUC = 0.738) in 106 cases and 23,667 controls. Compared to a prostate cancer PGS alone (AUC = 0.712), including genetically adjusted PSA improved detection of aggressive disease (AUC = 0.786, P = 7.2 × 10⁻⁴). Our findings highlight the potential utility of incorporating PGS for personalized biomarkers in prostate cancer screening.

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

Utility of polygenic scores across diverse diseases in a hospital cohort for predictive modeling

Article Open access 12 April 2024

Ting-Hsuan Sun, Chia-Chun Wang, … Kai-Cheng Hsu

Genomic data in the All of Us Research Program

Article Open access 19 February 2024

The All of Us Research Program Genomics Investigators

Main

Prostate-specific antigen (PSA) is an enzyme produced by the prostate gland that degrades gel-forming seminal proteins to release motile sperm and is encoded by the KLK3 (kallikrein 3) gene^1,2,3. As prostate epithelial tissue becomes disrupted by a tumor, greater PSA concentrations are released into circulation^2,3. PSA levels can also rise due to prostatic inflammation, infection, benign prostatic hyperplasia, older age and increased prostate volume^3,4,5. Increased body mass index is associated with lower PSA levels, but the underlying mechanisms remain unclear^6,7. Low PSA levels, thus, do not rule out prostate cancer, and PSA elevation is not sufficient for a conclusive diagnosis⁸. Although PSA testing reduces deaths from prostate cancer⁹, between 20% and 60% of cancers detected using PSA testing are estimated to be overdiagnoses^10,11,12. In addition, the long-term risk of lethal prostate cancer remains low, especially in men with PSA below the age-specific median^13,14. As a result, clinical guidelines in the United States and globally advise against population-level PSA screening and promote a shared decision-making model^15,16.

One avenue for refining PSA screening is to account for variability in PSA due to genetic factors. PSA is highly heritable, with 40 independent loci identified in the largest previous genome-wide association study (GWAS)^17,18. The goal of genetically correcting PSA levels is to increase the relative variation in PSA attributable to prostate cancer, thereby improving their predictive value for disease detection. The first study to genetically correct PSA using just four variants reclassified 3% of participants to warranting biopsy and 3% to avoiding biopsy¹⁹. Incorporating additional genetic predictors has the potential to personalize PSA testing, reduce overdiagnosis-related morbidity and improve detection of lethal disease. To maximize the utility of this approach, it is critical to distinguish genetic variants that influence constitutive PSA levels from those affecting prostate tumor development. PSA and prostate cancer share many genetic loci^{17,19,20,21,22}, but the extent to which this overlap reflects screening bias remains unclear, as GWASs of prostate cancer may capture signals for disease susceptibility and incidental detection due to benign PSA elevation.

Our study explores the genetic architecture of PSA levels in men without prostate cancer, with a view toward assessing whether genetic adjustment of PSA improves clinical decision-making related to prostate cancer diagnosis. It also provides a novel framework for the clinical translation of polygenic scores (PGSs) for non-causal cancer biomarkers.

Results

The study design of the Precision PSA study is illustrated in Fig. 1. Using data from five studies (Methods), we conducted genome-wide analyses of PSA levels ≤10 ng ml⁻¹ in cis-gender men never diagnosed with prostate cancer. GWAS results were meta-analyzed within ancestry groups and then combined across populations for a total sample size of 95,768 individuals.

**Fig. 1: Overview of the Precision PSA study design.**

Genetic architecture of PSA variation

The heritability (h²) of PSA levels was investigated using several methods to assess sensitivity to underlying modeling assumptions (Methods). Across 26,491 men of European ancestry in the UK Biobank (UKB) with linked clinical records, the median PSA value was 2.35 ng ml⁻¹ (Supplementary Fig. 1). Using individual-level data for variants with minor allele frequency (MAF) ≥ 0.01 and imputation INFO > 0.80, PSA heritability was h² = 0.41 (95% confidence interval (CI): 0.36–0.46) based on GCTA²³ and h² = 0.30 (95% CI: 0.26–0.33) based on LDAK²⁴ (Supplementary Table 1 and Extended Data Fig. 1). Applying LDAK to GWAS summary statistics generated from the same individuals produced similar estimates (h² = 0.35, 95% CI: 0.28–0.43), whereas other methods^25,26 were biased downward. In the European ancestry GWAS meta-analysis (n_EUR = 85,824), LDAK estimated h² = 0.30 (95% CI: 0.29–0.31). Sample sizes for other ancestries were too small for reliable heritability estimates.

The multi-ancestry meta-analysis of 95,768 men from five studies identified 128 independent index variants (P < 5.0 × 10⁻⁸, linkage disequilibrium (LD) r² < 0.01 within ±10-Mb windows) across 90 chromosomal cytoband regions (Fig. 2). The strongest associations were in known PSA loci^17,19,21,22, such as KLK3 (rs17632542, P = 3.2 × 10⁻⁶³⁸), 10q26.12 (rs10886902, P = 8.2 × 10⁻¹¹⁸), MSMB (rs10993994, P = 7.3 × 10^–87), NKX3-1 (rs1160267, P = 6.3 × 10⁻⁸³), CLPTM1L (rs401681, P = 7.0 × 10⁻⁵⁴) and HNF1B (rs10908278, P = 2.1 × 10⁻⁴⁶). Eighty-two index variants were independent of previously detected associations in the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort¹⁷; they mapped to 56 cytobands where PSA signals have not previously been reported. Associations initially detected in the UKB (Extended Data Fig. 1b) strengthened in the meta-analysis: TEX11 in Xq13.1 (rs62608084, P = 1.7 × 10⁻²⁴); THADA in 2p21 (rs11899863, P = 1.7 × 10⁻¹³); OTX1 in 2p15 (rs58235267, P = 4.9 × 10⁻¹³); SALL3 in 18q23 (rs71279357, P = 1.8 × 10⁻¹²); and ST6GAL1 in 3q27.3 (rs12629450, P = 2.6 × 10⁻¹⁰). Additional novel findings included CDK5RAP1 (rs291671, P = 1.2 × 10⁻¹⁸), LDAH (rs10193919, P = 1.5 × 10⁻¹⁵), ABCC4 (rs61965887, P = 3.7 × 10⁻¹⁴), INKA2 (rs2076591, P = 2.6 × 10⁻¹³), SUDS3 (rs1045542, P = 1.2 × 10⁻¹³), FAF1 (rs12569177, P = 3.2 × 10⁻¹³), JARID2 (rs926309, P = 1.6 × 10⁻¹²), GPC3 (rs4829762, P = 5.9 × 10⁻¹²), EDA (rs2520386, P = 4.2 × 10⁻¹¹) and ODF3 (rs7103852, P = 1.2 × 10⁻⁹) (Supplementary Tables 2 and 3).

**Fig. 2: Multi-ancestry GWAS of PSA levels.**

Of the 128 index variants, 96 reached genome-wide significance in the European ancestry meta-analysis, as did three in the East Asian ancestry meta-analysis (KLK3: rs2735837 and rs374546878; MSMB: rs10993994; n_EAS = 3,337), two in the Hispanic/Latino meta-analysis (KLK3: rs17632542 and rs2735837; n_HIS/LAT = 3,098) and one in the African ancestry meta-analysis (FGFR2: rs10749415; n_AFR = 3,509) (Supplementary Table 4). Effect sizes from the European ancestry GWAS were modestly correlated with estimates from other ancestries (Spearman’s ρ_HIS/LAT = 0.48, P = 1.1 × 10⁻⁸; ρ_AFR = 0.27, P = 2.0 × 10⁻³; ρ_EAS = 0.16, P = 0.068) (Supplementary Fig. 2). However, cross-population comparisons of correlations should be interpreted with caution as they are confounded by higher sampling error in groups with smaller sample sizes.

There was heterogeneity (Cochran’s Q P_Q < 0.05) across ancestry-specific fixed-effects meta-analyses for 12 of 128 index variants, four of which had effects in different directions: rs58235267 (OTX1), rs1054713 (KLK1), rs10250340 (EIF4HP1) and rs7020681 (SLC35D2) (Supplementary Table 5). An alternative meta-analysis approach, MR-MEGA²⁷, which partitions effect size heterogeneity into components correlated with ancestry and residual variation, identified one additional signal in 5q15 (rs291812, P_MR-MEGA = 1.0 × 10⁻⁸) that was driven by the East Asian ancestry results (P_EAS = 1.2 × 10⁻⁶) (Supplementary Table 6).

Predicted functional consequences of the 128 index variants were explored using CADD²⁸. Scores >13 (corresponding to the 5% most deleterious substitutions genome-wide) were observed for 16 of the 128 index variants detected in the original fixed effects meta-analysis, including ten new signals: rs10193919 (LDAH); rs7732515 in 5q14.3; rs11899863 (THADA); rs58235267 (OTX1); rs926309 (JARID2); rs4829762 (GPC3) and rs13268, a missense variant in FBLN1; rs78378222 in TP53 and rs3760230 in SMG6; and rs712329 in SLC25A21 (Supplementary Table 7). Sixty-one variants had significant (false discovery rate (FDR) < 0.05) effects on gene expression, including 15 prostate tissue expression quantitative trait loci (eQTLs) for 17 eGenes, 55 blood eQTLs for 185 eGenes and nine eQTLs with effects in both tissues. Notable eGenes included RUVBL1, a chromatin-remodeling factor that modulates pro-inflammatory NF-κB signaling and transcription of Myc and β-catenin²⁹; ODF3, which maintains elastic structures in the sperm tail³⁰; and LDAH, which promotes cholesterol mobilization in macrophages³¹. Several PSA-associated variants were eQTLs for genes involved in immune response (IFITM2, IFITM3 and HS1BP3).

Impact of PSA-related selection bias on prostate cancer GWAS

Because prostate cancer detection often hinges on PSA elevation, genetic factors resulting in higher constitutive PSA levels may appear to increase prostate cancer risk because of more frequent screening. Of the 128 lead PSA variants, 52 (41%) were associated with prostate cancer at the Bonferroni-corrected threshold (P < 0.05/128) in the PRACTICAL consortium’s European ancestry GWAS³² (Supplementary Table 8). Using the method by Dudbridge et al.³³, we investigated whether index event bias could partly explain these shared signals^33,34 (Methods, Fig. 3 and Supplementary Table 9). Applying the estimated bias correction factor (b = 1.144) decreased the number of variants associated with prostate cancer from 52 to 34 (Extended Data Fig. 2). When we corrected 209 European ancestry prostate cancer risk variants (P < 5.0 × 10⁻⁸, LD r² < 0.01) for screening bias, 93 (45%) remained genome-wide significant. Notably, rs76765083 (KLK3) remained genome-wide significant but reversed direction. Sensitivity analyses using SlopeHunter³⁵ resulted in 150 (72%) variants with P < 5 × 10⁻⁸ (Supplementary Table 10).

**Fig. 3: Influence of PSA-related index event bias on prostate cancer GWAS.**

Development and validation of PGS_PSA

We considered two approaches for constructing a PGS for PSA: clumping genome-wide significant associations from the multi-ancestry meta-analysis (PGS₁₂₈) and a genome-wide score generated using the Bayesian PRS-CSx algorithm (PGS_CSx) (ref.³⁶) (Methods). Each score was validated in the Prostate Cancer Prevention Trial (PCPT) and the Selenium and Vitamin E Cancer Prevention Trial (SELECT), which were excluded from the discovery GWAS. Most of the men in both cohorts were of European ancestry, although SELECT offered larger sample sizes for other ancestry groups (Extended Data Fig. 3). PGS_CSx was ultimately selected, as it was more predictive of baseline PSA than PGS₁₂₈ in multi-ancestry analyses and most ancestry subgroups (Supplementary Table 11).

In the PCPT, PGS_CSx accounted for 8.13% of variation in baseline PSA levels (β per s.d. increase = 0.186, P = 3.3 × 10⁻¹¹²) in the pooled multi-ancestry sample of 5,883 men (Fig. 4a–c and Supplementary Table 11). PGS_CSx was associated with PSA across age groups, although effects attenuated in participants aged ≥70 years (Extended Data Fig. 4). PGS_CSx was validated in 5,725 participants of European ancestry (EUR ≥ 0.80) (PGS_CSx: β = 0.194, P = 1.7 × 10⁻¹¹⁵), but neither PGS₁₂₈ nor PGS_CSx reached nominal significance in the admixed European and African ancestry (0.20 < AFR/EUR < 0.80, n = 103) or East Asian ancestry (EAS ≥ 0.80, n = 55) populations.

**Fig. 4: Validation of the PGS_PSA in two cancer prevention trials.**

In the SELECT, PGS_CSx was associated with baseline PSA levels in the pooled sample of 25,917 men (β = 0.258, P = 1.3 × 10⁻⁶¹⁹) and among men of European ancestry (n = 22,253, β_PGS = 0.283, P = 5.5 × 10⁻⁶¹⁰), accounting for 9.61% to 10.94% of variation, respectively (Fig. 4b–d and Supplementary Table 11). PGS_CSx also validated in the East Asian (n = 257, β = 0.258, P = 5.9 × 10⁻⁷) and admixed EAS/EUR (n = 321, β = 0.315, P = 5.2 × 10⁻¹²) ancestry groups. In men with admixed AFR/EUR ancestry (n = 1,763), PGS_CSx explained 4.22% of PSA variation (β = 0.157, P = 4.8 × 10⁻¹⁹). PGS₁₂₈ was more predictive than PGS_CSx (β = 0.163, P = 8.2 × 10⁻¹¹ versus β = 0.098, P = 8.0 × 10⁻⁶) in men of African ancestry (AFR ≥ 0.80, n = 1,173) and the pooled AFR and admixed (0.20 <EUR/AFR < 0.80) group (n = 2,936).

We also examined associations with temporal trends in PSA: velocity, calculated using log(PSA) values at two timepoints, and doubling time in months (Methods and Supplementary Table 12). In men with a PSA increase (SELECT pooled sample: n = 14,908), PGS_CSx was associated with less rapid velocity (PGS_CSx: β = −4.06 × 10⁻⁴, P = 3.7 × 10⁻⁵) and longer doubling time (PGS_CSx: β = 10.41, P = 1.9 × 10⁻⁸). In men with a PSA decrease between the first and last timepoint (SELECT pooled sample: n = 6,970), PGS_CSx was only suggestively associated with slowing PSA decline (β = 5.02 × 10⁻⁴, P = 0.068). The same pattern was observed in the PCPT, with higher PGS_CSx values conferring less rapid changes in PSA.

PGS_CSx, referred to as PGS_PSA from here onward, was used to genetically adjust baseline or earliest pre-randomization PSA values (PSA^G) for each individual, relative to the population mean (Methods and equations 1 and 2). PSA^G and unadjusted PSA were strongly correlated in the PCPT (Pearson’s r = 0.841, 0.833–0.848) and the SELECT (r = 0.854, 0.851–0.857). The number of participants with PSA^G > 4 ng ml⁻¹, a commonly used threshold for diagnostic testing, increased from 0 to 24 in the PCPT and from 5 to 413 in the SELECT (Fig. 4e,f), reflecting the preferential trial selection of men with low PSA^8,37.

Impact of PSA-related bias on PGS associations

In men of European ancestry in the UKB excluded from the PSA GWAS, there was a strong positive relationship between the 269-variant prostate cancer PGS (PGS₂₆₉)³² and PGS_PSA in cases (n = 11,568, β = 0.190, P = 2.3 × 10⁻⁹⁶) and controls (n = 152,884, β = 0.236, P < 10⁻⁷⁰⁰) (Extended Data Fig. 5 and Supplementary Table 13). Re-fitting PGS₂₆₉ using weights corrected for index event bias (PGS₂₆₉^adj) substantially attenuated associations in cases (β_adj = 0.029, P = 2.7 × 10⁻³) and controls (β_adj = 0.052, P = 2.2 × 10⁻⁸⁹).

To further characterize the impact of this bias, we examined PGS₂₆₉ associations with prostate cancer status in 3,673 cases and 2,363 biopsy-confirmed, European ancestry controls from GERA. PGS₂₆₉^adj had a larger magnitude of association with prostate cancer (OR for top decile = 3.63, 95% CI: 3.01–4.37) than PGS₂₆₉ (odds ratio (OR) = 2.71, 95% CI: 2.28–3.21) and higher area under the curve (AUC: 0.685 versus 0.677, P = 3.91 × 10⁻³) (Supplementary Table 14). The impact of bias correction was most pronounced for Gleason ≥7 tumors (PGS₂₆₉^adj AUC = 0.692 versus PGS₂₆₉ AUC = 0.678, P = 1.91 × 10⁻³), although these AUC estimates are inflated due to overlap with the GWAS used to develop PGS₂₆₉ (ref. ³²). In case-only analyses, PGS_PSA and PGS₂₆₉ were inversely associated with Gleason score, illustrating how screening bias decreases the likelihood of identifying high-grade disease (Supplementary Table 15). Compared to Gleason ≤6 tumors, an s.d. increase in PGS_PSA was inversely associated with Gleason 7 disease (OR = 0.79, 95% CI: 0.76–0.83) and Gleason ≥8 disease (OR = 0.71, 95% CI: 0.64–0.81). Patients in the top decile of PGS₂₆₉ were approximately 30% less likely to have Gleason ≥8 tumors (OR = 0.72, 95% CI: 0.54–0.96) than Gleason ≤6 tumors, but this association was attenuated after bias correction (PGS₂₆₉^adj: OR = 0.94, 95% CI: 0.75–1.17).

Impact of genetic adjustment of PSA on biopsy eligibility

Among GERA participants who underwent prostate biopsy, we examined how adjustment using PGS_PSA reclassified individuals for biopsy recommendation at age-specific thresholds used by Kaiser Permanente: 40–49 years old = 2.5 ng ml⁻¹; 50–59 years old = 3.5 ng ml⁻¹; 60–69 years old = 4.5 ng ml⁻¹; and 70–79 years old = 6.5 ng ml⁻¹ (Methods). For men of European ancestry, mean PSA levels in men with a negative biopsy (n = 2,363, 7.2 ng ml⁻¹) were higher than in men without prostate cancer who did not have a biopsy (n = 24,811, 1.5 ng ml⁻¹) (Supplementary Table 16). Relative to all controls, where standardized $\overline {PGS} _{PSA}$ = 0, biopsied men were enriched for PSA-increasing alleles (cases: $\overline {PGS} _{PSA}$ = 0.278; controls: $\overline {PGS} _{PSA}$ = 0.934). After genetic adjustment, 31.7% of biopsy-negative men were reclassified below the PSA level for recommending biopsy, and 2.5% became biopsy eligible, resulting in a net reclassification of 29.3% (27.5% to 31.21%) (Fig. 5a). Among 3,673 cases, PSA^G values below the biopsy referral threshold were more prevalent than upward adjustment, resulting in a net reclassification of −8.6% (−9.48% to −7.67%) (Fig. 5a). Of the patients who became ineligible, most had Gleason <7 tumors (n = 300, 72%; Supplementary Table 16). In men of African ancestry, there were few changes in biopsy eligibility among patients (n = 392), with 3.1% reclassified upward and 4.6% downward (Fig. 5b and Supplementary Table 16). Of 108 biopsy-negative controls, 75 (69.4%) were reclassified below the referral threshold based on PSA^G, reflecting high enrichment for predisposition to PSA elevation ($\overline {PGS} _{PSA}$ = 1.710). The overall net reclassification was positive, suggesting that PSA^G has some clinical utility in both populations.

**Fig. 5: Genetically adjusted PSA influences biopsy eligibility.**

PSA genetic adjustment improves prostate cancer detection

The utility of PSA^G, alone and in combination with PGS₂₆₉, was first assessed in the PCPT, where end-of-study biopsies were performed in all participants, effectively eliminating potential misclassification of prostate cancer status. Among 335 cases and 5,548 controls, PGS_PSA was not associated with prostate cancer incidence (pooled: OR per s.d. = 1.01, P = 0.83), confirming that it captures genetic determinants of non-cancer PSA variation. The magnitude of association for genetically adjusted baseline PSA^G with prostate cancer (OR per unit increase in log(PSA ng ml⁻¹) = 1.90, 95% CI: 1.56–2.31) was slightly larger than for PSA (OR = 1.88, 95% CI: 1.55–2.29) in the European ancestry group (Supplementary Table 17). The magnitude of association with prostate cancer was larger for PGS₂₆₉^adj (pooled and European: OR per s.d. = 1.57, 95% CI: 1.40–1.76) than for PGS₂₆₉ without bias correction (pooled: OR = 1.52, 95% CI: 1.36–1.70; European: OR = 1.53, 95% CI: 1.36–1.72) (Supplementary Table 17). The model with PGS₂₆₉^adj and PSA^G achieved the best classification in the pooled (AUC = 0.686) and European ancestry (AUC = 0.688) populations and outperformed PGS₂₆₉^adj alone (pooled: AUC = 0.656, P_AUC = 7.5 × 10⁻⁴; European: AUC = 0.658, P_AUC = 1.4 × 10⁻³).

The benefit of genetically adjusting PSA was most evident for detection of aggressive prostate cancer, defined as Gleason ≥7, PSA ≥ 10 ng ml⁻¹, T3–T4 stage and/or distant or nodal metastases. In the PCPT, PSA^G conferred an approximately threefold risk increase (pooled: OR = 2.87, 95% CI: 1.98–4.65, AUC = 0.706; European: OR = 2.99, 95% CI: 1.95–4.59, AUC = 0.711) compared to PGS₂₆₉^adj (pooled: OR = 1.55, 95% CI: 1.23–1.95, AUC = 0.651; European: OR = 1.55, 95% CI: 1.22–1.96, AUC = 0.657) (Fig. 6a and Supplementary Table 18). The model with PSA^G and PGS₂₆₉^adj achieved AUC = 0.726 (European: AUC = 0.734) for aggressive tumors but had lower discrimination for non-aggressive disease (pooled and European: AUC = 0.681) (Supplementary Table 19). Among patients with prostate cancer, PSA^G (pooled: OR = 2.06, 95% CI: 1.23–3.45) and baseline PSA (pooled: OR = 1.81, 85% CI: 1.12–3.10) were associated with higher likelihood of aggressive compared to non-aggressive tumors, whereas PGS₂₆₉ (pooled: OR = 0.91, P = 0.54) and PGS₂₆₉^adj (OR = 0.97, P = 0.85) were not (Supplementary Table 20).

**Fig. 6: Genetic associations with aggressive prostate cancer.**

In the SELECT, associations with risk of prostate cancer overall (Supplementary Table 21), aggressive disease (Fig. 6b and Supplementary Table 22) and non-aggressive disease (Supplementary Table 23) in the pooled and European ancestry analyses were similar to the PCPT. In men of East Asian ancestry, associations for PSA^G (OR = 2.15, 95% CI: 0.82–5.62) were attenuated compared to PSA (OR = 2.60, 95% CI: 1.03–6.54). This was also observed in men of African ancestry, although the effect size for PSA^G derived using PGS₁₂₈ (OR = 3.37, 95% CI: 2.38–4.78) was larger than for PSA^G based on PGS_CSx (OR = 2.68, 95% CI: 1.94–3.69), consistent with the larger proportion of variation in PSA explained by PGS₁₂₈ than PGS_CSx in this population. Models for prostate cancer including PSA^G were calibrated in the pooled and European ancestry individuals, whereas, in the African ancestry subgroup, PSA^G inaccurately estimated risk in upper deciles (Supplementary Figs. 3–6).

The largest improvement in discrimination from PSA^G (OR = 3.81, 95% CI: 2.62–5.54, AUC = 0.777) relative to PSA (OR = 3.40, 95% CI: 2.34–4.93, AUC = 0.742, P_AUC = 0.026) and to PGS₂₆₉ (OR = 1.76, 95% CI: 1.41–2.21, AUC = 0.726, P_AUC = 0.057) was for aggressive tumors in men of European ancestry (106 cases, 23,667 controls). In the pooled African ancestry population (18 cases, 2,733 controls), PSA^G based on PGS₁₂₈ (OR = 2.96, 95% CI: 1.43–6.12), but not PGS_CSx (OR = 2.48, 95% CI: 1.24–4.97), was more predictive than unadjusted PSA (OR = 2.82, 95% CI: 1.33–5.99) (Supplementary Table 22). The best model for aggressive disease included PSA^G and PGS₂₆₉^adj for pooled (AUC = 0.788, 95% CI: 0.744–0.831) and European ancestry (AUC = 0.804, 95% CI: 0.757–0.851) populations, but, for African ancestry individuals, unadjusted PSA and PGS₂₆₉ without bias correction achieved the highest AUC of 0.828 (95% CI: 0.739–0.916). PSA^G was better calibrated than PSA in pooled and European ancestry groups but not in African ancestry participants (Supplementary Figs. 7 and 8).

Discussion

Serum PSA is the most widely used biomarker for prostate cancer detection, although concerns with specificity and, to a lesser degree, sensitivity have limited adoption of PSA testing for population-level screening. Leveraging PGS to personalize diagnostic biomarkers, such as PSA, provides a new avenue for translating GWAS discoveries into clinical practice. This concept, termed ‘de-Mendelization’, is essentially Mendelian randomization in reverse—subtracting the genetically predicted component of trait variance instead of using it to estimate causal effects. De-Mendelization of non-causal predictive biomarkers can maximize disease-related signal and improve disease detection^38,39. Although previous work on PSA genetics¹⁹ and other biomarkers^38,40 has alluded to the potential of genetic adjustment to produce clinically meaningful shifts in the PSA distribution, the value of this approach for reducing overdiagnosis and detecting aggressive disease has not been previously shown.

Risk-stratified, personalized screening for prostate cancer will require parallel efforts to elucidate the genetic architecture of prostate cancer susceptibility and PSA variation in individuals without disease. Our GWAS advances these efforts by discovering 82 novel PSA-associated variants. The strongest novel signals map to genes involved in reproductive processes, potentially reflecting non-cancer function of PSA in liquefying seminal fluid. TEX11 on Xq13.1, for example, is preferentially expressed in male germ cells and early spermatocytes. TEX11 mutations cause meiotic arrest and azoospermia, and this gene regulates homologous chromosome synapsis and double-strand DNA break repair⁴¹. ODF3 encodes a component of sperm flagella fibers and has been linked to regulation of platelet count and volume⁴². Other novel loci contained genes involved in embryonic development, epigenetic regulation and chromatin organization, including DNMT3A, OTX1, CHD3, JARID2, HMGA1, HMGA2 and SUDS3. DNMT3A is a methyltransferase that regulates imprinting and X-chromosome inactivation and has been studied extensively in the context of height⁴³, clonal hematopoiesis and hematologic cancers⁴⁴. CHD3 is involved in chromatin remodeling during development and suppresses herpes simplex virus infection⁴⁵. Multiple PSA-associated variants were in genes related to infection and immunity, including HLA-A; ST6GAL1, involved in IgG N-glycosylation⁴⁶; KLRG1, which regulates natural killer (NK) cell function and IFN-γ production⁴⁷; and FUT2, which affects ABO precursor H antigen presentation and confers susceptibility to viral and bacterial infections⁴⁸.

Although our GWAS was restricted to men without prostate cancer, several cancer susceptibility genes were among the PSA-associated loci, including a pan-cancer risk variant in TP53 (rs78378222) (ref.⁴⁹) and signals in TP63, GPC3 and THADA. Although we cannot rule out undiagnosed prostate cancer in our participants, its prevalence is unlikely to be high enough to produce appreciable bias. Pervasive pleiotropy and omnigenic architecture⁵⁰ may explain the diverse functions of PSA loci implicated in inflammation, epigenetic regulation and growth factor signaling. Even established tumor suppressor genes, such as TP53, GPC3 and THADA, have pleiotropic effects on obesity via dysregulation of cell growth and metabolism^51,52,53. Furthermore, distinct p63 isoforms regulate epithelial and craniofacial development as well as apoptosis of male germ cells and spermatogenesis^54,55. Mutations in GPC3 cause Simson–Golabi–Behmel syndrome, which is characterized by visceral and skeletal abnormalities and excess risk of embryonic tumors⁵⁶.

Distinguishing variants that influence prostate cancer detection via PSA screening from genetic signals for prostate carcinogenesis has implications for deciphering biological mechanisms and developing risk prediction models. Prostate cancer detection depends on PSA testing, whereas PSA screening is influenced by genetic factors affecting constitutive PSA levels. The bias arising from this complex relationship may be substantial. Our findings suggest that bias-corrected effect sizes more accurately capture the contribution of GWAS-identified variants to prostate cancer risk, without conflating it with detection. Correction for PSA-related bias and subsequent improvement in PGS₂₆₉ performance for detecting aggressive disease is an extension of de-Mendelization. Adjusting risk allele weights may be a more effective strategy than filtering out variants based on associations with PSA. Generally, the improvements in PSA^G and PGS₂₆₉ are proportional to the extent of their de-noising of signals for PSA elevation unrelated to prostate cancer. The impact of bias correction was most pronounced in populations selected for high PSA, such as men who underwent prostate biopsy in GERA, but it was also observed in the PCPT and the SELECT, which enrolled men with low PSA.

Our investigation of index event bias has several limitations. The Dudbridge method assumes that direct genetic effects on PSA levels and prostate cancer susceptibility are uncorrelated, and violations of this assumption over-attribute shared genetic signals to selection bias³³. Although SlopeHunter relaxes this assumption³⁵, analyses of PGS₂₆₉ suggest that it under-corrects selection bias. SlopeHunter relies on clustering to distinguish PSA-specific from pleiotropic variants³⁵, with small or poorly separated clusters resulting in unstable bias estimates. Disentangling genetic associations between PSA and prostate cancer with greater certainty will require experiments such as CRISPR screens and massively parallel reporter assays.

Another limitation is that the reported magnitude of biopsy reclassification may be specific to GERA and Kaiser Permanente clinical guidelines and biased because GERA controls comprised 30% of the PSA discovery GWAS. Because it was unlikely for men with low PSA to be biopsied, and most patients with prostate cancer already had PSA values at or above the biopsy referral cutoff, there were limited opportunities to increase biopsy eligibility in this population. Despite these limitations, our findings indicate that genetically adjusted PSA may reduce overdiagnosis and overtreatment, albeit accompanied by some undesirable loss of sensitivity. Although reclassifying cases to not receive biopsy is concerning, most such reclassifications occurred among patients with non-aggressive disease, a group susceptible to overdiagnosis⁵⁷.

Our PGS-based approach updates the first application of PSA genetic correction by Gudmundsson et al.¹⁹ while retaining straightforward calculation of the genetic correction factor. Increasing the specificity of an established, clinically useful biomarker is efficient and would have low adoption barriers. However, analytic choices, such as selecting an optimal PGS algorithm and reference population for obtaining mean PGS_PSA, are not trivial. The choice of reference population affects the magnitude of correction and clinical decisions based on absolute PSA values. Furthermore, any new biomarker would require validation in real-world settings to identify populations who would benefit most and characterize barriers to implementation, such as physician familiarity with PGS and patient education about genetic testing. Genetically adjusted PSA should also be evaluated in conjunction with other procedures used for prostate cancer detection, such as targeted magnetic resonance imaging, and explored as a criterion for refining selection of participants into screening trials.

Our study highlights the importance and challenge of developing a PGS that adequately performs across the spectrum of ancestry. Compared to PGS₁₂₈, PGS_CSx did not improve performance in men of African ancestry. This may reflect the ‘meta’ estimation procedure, which does not require a separate dataset for hyperparameter tuning but is less accurate³⁶. GWAS efforts in larger and more diverse cohorts are underway and will expand the catalog of PSA-associated variants and increase their utility. Genetic adjustment using a PGS_PSA that does not explain a sufficiently high proportion of trait variation risks decreasing the accuracy of PSA screening.

Future research should assess whether genetically adjusted PSA levels improve prediction of prostate cancer mortality and investigate PSA-related biomarkers, such as the ratio of free to total PSA and pro-PSA (a precursor PSA isoform), which may have higher specificity for prostate cancer detection^58,59. Although PGS_PSA was associated with PSA doubling time and velocity, these metrics assess change between two timepoints and may not capture PSA trajectories that are meaningful for disease detection⁶⁰. Clinical guidelines for PSA kinetics are also lacking in the context of prostate cancer screening. Regardless, we think that genetic adjustment may improve the accuracy of any heritable PSA biomarker and may be a valuable addition to multi-omic biomarkers.

In summary, by detecting genetic variants associated with non-prostate cancer PSA variation, we developed a PGS_PSA that captures the contribution of common genetic variants to a man’s inherent PSA level. We showed that a straightforward calculation of genetically adjusted, personalized PSA levels using PGS_PSA provides clinically meaningful improvements in prostate cancer diagnostic characteristics. Moreover, genetic determinants of PSA provide an avenue for mitigating selection bias due to PSA screening in prostate cancer GWASs and improving disease prediction. These results illustrate a proof of concept for incorporating genetic factors into PSA screening for prostate cancer and expanding this approach to other diagnostic biomarkers.

Methods

Informed consent was obtained from all study participants. The UKB received ethics approval from the Research Ethics Committee (reference: 11/NW/0382) in accordance with the UKB Ethics and Governance Framework. The research was conducted with approved access to UKB data under application number 14105. We used previously published PSA GWAS results from the GERA cohort by Hoffmann et al.¹⁷. The original study was approved by the Kaiser Permanente Northern California institutional review board and the University of California, San Francisco Human Research Protection Program Committee on Human Research. The Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial was approved by the institutional review board at each participating center and the National Cancer Institute. The informed consent document signed by PLCO study participants allows use of these data by investigators for discovery and hypothesis generation in the investigation of the genetic contributions to cancer and other adult diseases. Our study includes publicly posted genomic summary results from the PLCO Atlas⁶¹. No institutional review board review is required for PLCO summary data use. The Vanderbilt University Medical Center institutional review board approved the BioVU study. The Malmö Diet and Cancer Study (MDCS) was approved by the local ethics committee.

Study populations and phenotyping

Genome-wide association analyses of PSA levels were conducted using germline genetic data derived from DNA extracted from non-prostatic tissues (for example, blood and buccal swabs). Analyses were restricted to cis-gender men, defined as individuals of biological male sex and self-reported male gender identity who had never been diagnosed with prostate cancer. Men with a history of surgical resections of the prostate were excluded in studies for which this information was available. To reduce potential for reverse causation, analyses were limited to PSA values ≤10 ng ml⁻¹, which corresponds to low-risk prostate cancer based on the D’Amico prostate cancer risk classification system⁶², and PSA > 0.01 ng ml⁻¹, to ensure that individuals had a functional prostate not impacted by surgery or radiation.

The UKB is a population-based prospective cohort of over 500,000 individuals aged 40–69 years at enrollment in 2006–2010 with genetic and phenotypic data⁶³. Health-related outcomes were ascertained via individual record linkage to national cancer and mortality registries and hospital inpatient encounters. PSA values were abstracted from primary care records for a subset of participants with genetic data. Field code mappings used to identify PSA values included any serum PSA measure except for free PSA or ratio of free to total PSA (Supplementary Table 25).

The Kaiser Permanente GERA cohort used in this analysis was previously described in Hoffmann et al.¹⁷. In brief, prostate cancer status was ascertained from the Kaiser Permanente Northern California Cancer Registry, the Kaiser Permanente Southern California Cancer Registry or through review of clinical electronic health records. PSA levels were abstracted from Kaiser Permanente electronic health records from 1981 through 2015.

The PLCO Cancer Screening Trial is a completed randomized trial that enrolled approximately 155,000 participants between November 1993 and July 2001. The PLCO Cancer Screening Trial was designed to determine the effects of screening on cancer-related mortality and secondary endpoints in men and women aged 55–74 years⁶⁴. Men randomized to the screening arm of the trial underwent annual screening with PSA for 6 years and digital rectal exam (DRE) for 4 years⁶⁴. These analyses were limited to men with a baseline PSA measurement who were randomized to the screening arm of the trial (n = 29,524). Men taking finasteride at the time of PSA measurement were excluded from analysis.

The Vanderbilt University Medical Center BioVU resource is a synthetic derivative biobank linked to de-identified electronic health records⁶⁵. Analyses were based on PSA levels that were measured as part of routine clinical care.

The MDCS is a population-based prospective cohort study that recruited men and women aged between 44 years and 74 years of age who were living in Malmö, Sweden between 1991 and 1996 to investigate the impact of diet on cancer risk and mortality⁶⁶. These analyses included men from the MDCS who were not diagnosed with prostate cancer as of December 2014 and had available genotyping and baseline PSA measurements⁶⁶.

The PCPT is a completed phase 3 randomized, double-blind, placebo-controlled trial of finasteride for prostate cancer prevention that began in 1993 (ref. ⁸). The PCPT randomly assigned 18,880 men aged 55 years or older who had a normal DRE and PSA level ≤3 ng ml⁻¹ to either finasteride or placebo. For men with multiple pre-randomization PSA values, the earliest value was selected. Cases included all histologically confirmed prostate cancers detected during the 7-year treatment period and tumors that were detected by the end-of-study prostate biopsy. Our analyses included the subset of PCPT participants that was genotyped on the Illumina Infinium Global Screening Array 24 v2.0.

The SELECT is a completed phase 3 randomized, placebo-controlled trial of selenium (200 µg per day from l-selenomethionine) and/or vitamin E (400 IU per day of all rac-α-tocopheryl acetate) supplementation for prostate cancer prevention³⁷. Between 2001 and 2004, 34,888 eligible participants were randomized. The minimum enrollment age was 50 years for African American men and 55 years for all other men³⁷. Additional eligibility requirements included no prior prostate cancer diagnosis, ≤4 ng ml⁻¹ of PSA in serum and a DRE not suspicious for cancer. For men who had multiple pre-randomization PSA values, the earliest value was selected. Our analyses included a subset of SELECT participants genotyped on the Illumina Infinium Global Screening Array 24 v2.0.

Quality control and genome-wide association analyses

Standard genotyping and quality control (QC) procedures were implemented in each participating study. Before meta-analysis, we applied variant-level QC filters that included low imputation quality (INFO < 0.30), MAF < 0.005 and deviations from Hardy–Weinberg equilibrium (P_HWE < 1 × 10⁻⁵). Sample-level filtering was performed to remove samples with discordant genetic sex and self-reported gender and call rate < 0.97. One sample from each pair of first-degree relatives was also excluded. GWAS phenotypes and adjustment covariates are reported in Supplementary Table 26. Genome-wide association analyses performed linear regression of log(PSA) as the outcome, using age and genetic ancestry principal components (PCs) as the minimum set of covariates.

UKB

Genotyping and imputation for the UKB cohort were previously described⁶³. In brief, participants were genotyped on the UKB Affymetrix Axiom array (89%) or the UK BiLEVE array (11%) with imputation performed using the Haplotype Reference Consortium (HRC) and the merged UK10K and 1000 Genomes phase 3 reference panels. Genetic ancestry PCs were computed using fastPCA based on a set of 407,219 unrelated samples and 147,604 genetic markers⁶³. Association analyses in the UKB were restricted to individuals of European ancestry based on self-report (‘White’) and after excluding samples with either of the first two genetic ancestry PCs outside of 5 s.d. of the population mean, as previously descibed⁴⁹. We removed samples with discordant self-reported and genetic sex as well as one sample from each pair of first-degree relatives identified using KING⁶⁷. Using a subset of genotyped autosomal variants with MAF ≥ 0.01 and call rate ≥ 97%, we filtered samples with heterozygosity >5 s.d. from the mean. For participants with multiple PSA measurements, the median value PSA was used. Sensitivity analyses were conducted comparing this approach to a GWAS of individual-specific random effects derived from fitting a linear mixed model to repeated log(PSA) values.

GERA

Genotyping, imputation and QC of the GERA cohort were previously described^17,68,69. In brief, all men were genotyped for over 650,000 single-nucleotide polymorphisms (SNPs) on four race/ethnicity-specific Affymetrix Axiom arrays that were optimized for individuals who self-identified as non-Hispanic white, Latino, East Asian and African American, respectively^68,69. Genotype QC procedures and imputation for the original GERA cohort were performed on an array-wise basis, as previously described^17,70. Pre-phasing was done by SHAPEIT version 2.5 (ref. ⁷¹) and imputation with IMPUTE2 version 2.3.1 (ref. ⁷²) using the 1000 Genomes phase 3 release with 2,504 samples. The top ten genetic ancestry PCs from EIGENSTRAT version 4.2 were included in the linear model as ancestry covariates⁷³. Analyses were conducted according to self-identified race/ethnicity groups. Residuals were computed from linear mixed models that were fit to repeated log(PSA) measures. This approach was nearly identical to a long-term average, except that it used the median instead of the mean to handle any potential outlier PSA level values.

PLCO Atlas

Our study used GWAS summary statistics from the PLCO Atlas Project, a resource for multi-trait GWAS. Genotyping, QC and imputation procedures for this resource are described by Machiela et al.⁶¹. The Atlas Project combined genotyping data previously generated by high-density arrays for 25,831 participants (OncoArray, Omni2.5M and OmniExpress) with a new round of genotyping using the Illumina Global Screening Array (GSA). For participants genotyped on multiple genotyping arrays (n = 1,192), data from only one array were retained, with the following prioritization: GSA > OncoArray > Omni2.5M > OmniExpress. Extensive QC filtering was performed for subsequent imputation and association analyses. Iterative 80% and 95% sample-level and variant-level call rate filters were applied to remove poorly genotyped samples and variants. Samples with > 20% estimated contamination based on VerifyIDintensity⁷⁴ were also removed. Samples with discordant self-reported gender and genetically inferred sex were identified based on X-chromosome method-of-moments F coefficient from PLINK, using 0.5 as the threshold (F coefficients are close to 0.0 for males and 1.0 for females). Heterozygosity outliers were detected using absolute values from PLINK method-of-moments F coefficients > 0.2.

Genetic ancestry was determined using GRAF⁷⁵ on a set of 10,000 pre-selected fingerprinting variants. Participants were assigned to nine ancestral groups: ‘African’, ‘African American’, ‘East Asian’, ‘European’, ‘Hispanic1’, ‘Hispanic2’, ‘Other’, ‘Other Asian’ and ‘South Asian’. Hispanic1 included individuals of Dominican or Puerto Rican ancestry, whereas Hispanic2 included individuals of Mexican or Latin American ancestry. For parsimony, we merged ‘African’ and ‘African American’ into an ‘African American (Combined)’ and ‘East Asian’ and ‘Other Asian’ into an ‘East Asian (Combined)’. Imputation was performed using the TOPMed 5b reference panel, which is accessible via the TOPMed Imputation Server hosted on the Michigan Imputation Server. Before imputation, variants with MAF ≤ 0.01, missingness ≥ 0.05 and Hardy–Weinberg deviations (P_HWE ≤1 × 10⁻⁶) were removed. Genotyped data were aligned to reference datasets using a community-recommended script (HRC-1000G-check-bim.pl from https://www.well.ox.ac.uk/~wrayner/tools/) that was modified to support the TOPMed 5b reference panel using a pre-existing test imputation with 1000 Genomes subjects. Pre-phasing using phased reference data from TOPMed release 5b was conducted using Eagle 2.4 (ref. ⁷⁶). Imputation was conducted against the same reference panel using minimac4. GWAS was based on the first PSA value for each PLCO participant.

BioVU

Participants were identified using Vanderbilt University Medical Center’s BioVU resource, a DNA biobank comprising ~270,000 individuals and linked to a de-identified electronic health record⁶⁵. All participants (n = 8,074) were genotyped on Illumina’s Expanded Multi-Ethnic Genotyping Array (MEGA^EX) platform. Genetic ancestries were assigned by running principal component analysis using SNPRelate⁷⁷ on a set of pruned SNPs (Rsq < 0.5, MAF ≥ 0.1). Participants were classified as European ancestry if their first two PCs were within 4 s.d. of the median for the participants reporting ‘White’ as their race. Participants were classified as African ancestry if their first two PCs were within 4 s.d. of the median for participants reporting their race as ‘Black’. All QC procedures were performed using PLINK version 1.90. We removed one randomly selected sample out of each pair of related individuals (pi-hat ≥ 0.2) identified using identity-by-descent. We excluded participants with SNP missingness > 3% or heterozygosity >5 s.d. from the mean. Before imputation, data were pre-processed using the HRC-1000G-check-bim.pl (from http://www.well.ox.ac.uk/~wrayner/tools/) and pre-phased using Eagle version 2.4 (ref. ⁷⁶). Genetic data were imputed on the Michigan Imputation Server using 1000 Genomes phase 3 version 5 as the reference panel. For men with multiple PSA measurements, the median PSA was used.

MDCS

Data from multiple batches of genotyping of 4,069 MDCS participants using different Illumina Omni arrays were merged. For variants that appeared more than once under different names on the same Illumina array, those with the higher genotyping rate were retained. Indels, ambiguous palindromic (for example, A/T or C/G alleles) and multi-allelic variants were removed. Only SNPs that we could unambiguously map to the 1000 Genomes phase 1 dataset were kept. Individuals with > 10% missingness were removed. Next, SNPs with a missingness rate > 10% or deviation from Hardy–Weinberg equilibrium (P_HWE < 0.001) were removed. At this stage, the PCs of ancestry were computed. Individuals for whom the inferred sex based on X-chromosome heterozygosity was not male, or for whom there were more than two genetic mismatches with 40 SNPs that we had previously genotyped in these samples with targeted genotyping⁶⁶, were excluded.

To assess genetic ancestry, MDCS data were combined with data from HapMap phase 3 for variants present in all genotyping batches. These SNPs were further filtered to have < 0.01% missingness and LD pruned (–indep-parwise 50 5 0.05). SMARTPCA in EIGENSOFT (https://github.com/chrchang/eigensoft) was run on the resulting 18,299 SNPs to generate the top ten genetic ancestry PCs. Analyses were restricted to individuals of European ancestry based on clustering with HapMap reference populations and exclusion of outliers with a z-score on PC1 and PC2 > 5. Imputation was performed using the TOPMed 5b reference panel, which is accessible via the TOPMed Imputation Server hosted on the Michigan Imputation Server. Before imputation, the input file was aligned to the build37 reference genome on the basis of chromosome, position and alleles. A total of 847,133 SNPs that passed pre-imputation QC were uploaded to the imputation server. From the resulting imputed files, analyses were restricted to individuals without a prostate cancer diagnosis by 31 December 2014, with individual missingness < 3% and a z-score < 5.0 for heterozygosity. Log(PSA) values were analyzed using robust linear regression with Tukey biweights. GWAS was performed using linear regression on the residuals extracted from the fitted models.

PCPT and SELECT

Participants from PCPT and SELECT were genotyped on the Illumina Infinium Global Screening Array 24 v2.0 and underwent the same QC and imputation procedures. Genotyping calling and QC were performed at the Center for Inherited Disease Research at Johns Hopkins. After removal of samples that failed to produce valid output during initial processing and clustering, the completion rate was 0.9951 and 0.9959 in PCPT and SELECT, respectively. A two-stage filter by completion rate threshold of 0.8 for samples and 0.8 for variants, followed by 0.95 for samples and 0.95 for variants, was performed. Samples with discordant self-reported gender and genetically inferred sex were identified based on X-chromosome method-of-moments F coefficient from PLINK, using 0.5 as the threshold (F coefficients are close to 0.0 for males and 1.0 for females). Identity-by-descent for all subject pairs was determined using PLINK, with close (first and second degree) relatives identified based on a threshold of 0.20. One randomly selected sample from each pair of relatives was retained.

Ancestry was estimated using a set of LD-pruned markers and running SNPWEIGHTS⁷⁸ with the reference panel provided containing the following populations: European, West African and East Asian, with a threshold of 0.8 used for imputed ancestry designation. Participants were assigned to a single ancestry group if the ancestry score was ≥0.80 for just one group. Participants were assigned to an admixed cluster if their ancestry score was > 0.20 and <0.80 for only one group (for example, ADMIXED_AFR where AFR = 0.75, EUR = 0.17, EAS = 8). Intermediate ancestry clusters included individuals with ancestry scores matching those criteria in multiple groups: 0.20 < AFR_EUR < 0.80 (for example, AFR = 0.65, EUR = 0.33) and 0.20 < EAS_EUR < 0.80 (for example, EUR = 0.55, EAS = 0.43). Autosomal heterozygosity was assessed using the method-of-moments F coefficient calculated within each ancestry cluster. Heterozygosity outliers were identified and excluded using a threshold of 0.10. Principal component analysis was performed with SMARTPCA in EIGENSOFT (https://github.com/chrchang/eigensoft) on a set of LD-pruned markers after splitting by ancestry cluster, to resolve more detailed population substructure. Genetic ancestry PCs were not computed for small clusters (n < 50) or individuals who failed other QC filters. For validation of PGS_PSA in PCPT and SELECT, we combined ADMIXED_AFR and AFR_EUR and treated this as a single group with admixed AFR and EUR ancestry proportions (AFR/EUR). ADMIXED_EAS and EAS_EUR were also combined into a single cluster with admixed EAS and EUR ancestry (EAS/EUR).

To prepare genotype data for imputation with the TOPMed 5b reference panel, variants with MAF < 0.001, call rate < 98% or evidence of deviation from Hardy–Weinberg equilibrium (P_HWE < 10⁻⁶) were removed. After these QC steps, a total of 474,046 variants remained for PCPT, and 491,015 variants were retained for SELECT. Before submitting the data to the TOPMed Imputation Server, files were pre-processed using the check-bim.pl script (http://www.well.ox.ac.uk/~wrayner/tools/). Next, chromosomal positions were lifted over from GRCh37/hg19 to GRCh38 and aligned against the TOPMed reference SNP list based on chromosome, position and alleles to ensure that reference and alternate alleles were correct in the resulting VCF files.

Heritability of PSA levels attributed to common variants

Heritability of PSA levels was estimated using individual-level data and GWAS summary statistics. UKB participants with available PSA and genetic data were analyzed using LDAK version 5.1 (ref. ²⁴) and GCTA version 1.93 (ref. ²³), following the approach previously implemented in the GERA cohort¹⁷. Genetic relationship matrices were filtered to ensure that no pairwise relationships with kinship estimates >0.05 remained. Heritability was estimated using common (MAF ≥ 0.01) LD-pruned (r² < 0.80) variants with imputation INFO > 0.80. We implemented the LDAK-Thin model using the recommended genetic relatedness matrix (GRM) settings (INFO > 0.95, LD r² < 0.98 within 100 kb) and the same parameters as GCTA for comparison (LD r² < 0.80, INFO > 0.80). For both methods, sensitivity analyses were conducted using more stringent GRM settings (kinship = 0.025, genotyped variants).

Summary statistics from GWAS results based on the same set of UKB participants (n = 26,491) and from a European ancestry GWAS meta-analysis (n = 85,824) were analyzed using LDAK, LD score regression (LDSR)²⁵ and an extension of LDSR using a high-definition likelihood (HDL) approach²⁶. For LDSR, we used the default panel comprising variants available in HapMap3 with weights computed in 1000 Genomes version 3 EUR individuals and in-house LD scores computed in UKB European ancestry participants⁴⁹. The baseline linkage disequilibrium (BLD)-LDAK model was fit using pre-computed tagging files calculated in UKB GBR (white British) individuals for HapMap3 variants from the LDSR default panel. HDL analyses were conducted using the UKB-derived panel restricted to high-quality imputed HapMap3 variants²⁶. All GWAS summary statistics had sufficient overlap with the reference panels, not exceeding the 1% missingness threshold for HDL and the 5% missingness threshold for LDAK and LDSR.

Genome-wide meta-analysis

Each ancestral population was analyzed separately, and GWAS summary statistics were combined via meta-analysis (Fig. 1). We first used METAL⁷⁹ to conduct an inverse-variance-weighted fixed-effects meta-analysis in each ancestry group and then meta-analyzed the ancestry-stratified results. Multi-ancestry meta-analysis results were processed using clumping to identify independent association signals by grouping variants based on LD within specific windows. Clumps were formed around index variants with the lowest genome-wide significant (P < 5 × 10⁻⁸) meta-analysis P value. All other variants with LD r² > 0.01 within a ±10-Mb window were considered non-independent and assigned to that lead variant. Since over 90% of the meta-analysis consisted of individuals of European ancestry, clumping was performed using 1000 Genomes phase 3 EUR and UKB reference panels, which yielded concordant results. We confirmed that LD among the resulting lead variants did not exceed r² = 0.05 using a merged 1000 Genomes ALL reference panel.

We first examined heterogeneity in the multi-ancestry fixed-effects meta-analysis results using Cochran’s Q statistic. To assess heterogeneity specifically due to ancestry, we applied MR-MEGA²⁷, a meta-regression approach for aggregating GWAS results across diverse populations. Summary statistics from each GWAS were meta-analyzed using MR-MEGA without combining by ancestry first. The MR-MEGA analysis was performed across four axes of genetic variation derived from pairwise allele frequency differences, based on the recommendation for separating major global ancestry groups. Index variants from the MR-MEGA analysis were selected using the same clumping parameters as described above (LD r² < 0.01 within a ±10-Mb window), based on the merged 1000 Genomes ALL reference panel. For each variant, we report two heterogeneity P values: one that is correlated with ancestry and accounted for in the meta-regression (P_Het-Anc) and the residual heterogeneity that is not due to population genetic differences (P_Het-Res).

PGS_PSA development and validation

We implemented two strategies for generating a genetic score for PSA levels. In the first approach, we selected 128 variants that were genome-wide significant (P < 5 × 10⁻⁸) in the multi-ancestry meta-analysis and were independent (LD r² < 0.01 within a ±10-Mb window) in 1000 Genomes EUR and (LD r² < 0.05) 1000 Genomes ALL populations (PGS₁₂₈). Each variant in PGS₁₂₈ was weighted by the meta-analysis effect size estimated using METAL. As an alternative strategy to clumping and thresholding, we fit a genome-wide score using the PRS-CSx algorithm³⁶, which takes GWAS summary statistics from each ancestry group as inputs and estimates posterior SNP effect sizes under coupled continuous shrinkage priors across populations (PGS_CSx). Analyses were conducted using pre-computed population-specific LD reference panels from the UKB, which included 1,287,078 HapMap3 variants that are available in both the UKB and 1000 Genomes phase 3.

We calculated a single trans-ancestry PGS that can be applied to all participants in the target cohort, rather than optimizing a PGS within each ancestry group. This approach is more robust to differences in genetic ancestry assignments across studies and does not require separate testing and validation datasets for parameter tuning each ancestry group³⁶. To facilitate this type of analysis, PRS-CSx provides a –meta option that integrates population-specific posterior SNP effects using an inverse-variance-weighted meta-analysis in the Gibbs sampler³⁶. The global shrinkage parameter was set to φ = 0.0001. PRS-CSx was run on the intersection of variants that were in the LD reference panel and had imputation quality (INFO > 0.90), resulting in 1,058,163 variants in PCPT and 1,071,268 variants in SELECT. Because PRS-CSx considers only autosomes, chrX variants that were included in PGS₁₂₈ were added to PGS_CSx separately, when output files from each chromosome produced by the PLINK–score command were concatenated.

The predictive performance of PGS_CSx and PGS₁₂₈ was evaluated in two independent cancer prevention trials that were not included in the meta-analysis: PCPT and SELECT. Analyses were conducted in the pooled sample for each cohort, which included individuals of all ancestries who passed QC filters (Supplementary Note). Ancestry-stratified analyses were conducted for clusters with n > 50 with available genetic ancestry PCs. Ancestry scores were computed with SNPWEIGHTS⁷⁸. Individuals with ancestry scores ≥0.80 for a single group were assigned to clusters for predominantly European (EUR), West African (AFR) and East Asian (EAS) ancestry. Admixed individuals with intermediate ancestry scores for at least one group were assigned to separate clusters: 0.20 < EUR/AFR < 0.80 or 0.20 < EUR/EAS < 0.80. Pooled analyses were adjusted for ten within-cluster PCs and global ancestry proportions (AFR and EAS).

Index event bias analysis

Index event bias occurs when individuals are selected based on the occurrence of an event or specific criterion. This is analogous to the direct dependence of one phenotype on another, as in the commonly used example of cancer survival³⁴. Due to unmeasured confounding, this dependence can induce correlations between previously independent risk factors among those selected^33,34. Genetic effects on prostate cancer can be viewed as conditional on PSA levels, because elevated PSA typically triggers diagnostic investigation. Genetic factors resulting in higher constitutive PSA levels may also increase the likelihood of prostate cancer detection due to more frequent testing (Fig. 4). This selection mechanism could bias prostate cancer GWAS associations by capturing both direct genetic effects on disease risk and selection-induced PSA signals. In the GWAS setting, methods using summary statistics have been developed to estimate and correct for this bias^33,35. Although typically derived assuming a binary selection trait, these methods are still applicable to selection or adjustment based on quantitative phenotypes³³. In this study, we conceptualized PSA variation as the selection trait and prostate cancer incidence as the outcome trait (Fig. 4).

We applied the method described in Dudbridge et al.³³, which tests for index event bias and estimates the corresponding correction factor (b) by regressing genetic effects on the selection trait (PSA) against their effects on the subsequent trait (prostate cancer), with inverse variance weights: w = 1/(SE_PrCa)². Summary statistics for prostate cancer were obtained from the most recent prostate cancer GWAS from the PRACTICAL consortium³². Sensitivity analyses were performed using SlopeHunter³⁵, an extension of the Dudbridge approach that allows for direct genetic effects on the index trait and subsequent trait to be correlated. For both methods, analyses were conducted using relevant summary statistics and 127,906 variants pruned at the recommended threshold³³ (LD r² < 0.10 in 250-kb windows) with MAF ≥ 0.05 in the 1000 Genomes EUR reference panel. After merging the pruned 1000 Genomes variants with each set of summary statistics, variants with large effects, (|β| > 0.20) on either log(PSA) or prostate cancer, were excluded. The resulting estimate (b), adjusted regression dilution using the SIMEX algorithm, was used as a correction factor to recover unbiased genetic effects for each variant:$\beta _{PrCa}^\prime$ = β_PrCa−b×β_PSA, where β_PSA is the per-allele effect on log(PSA), andβ_PrCa is the log(OR) for prostate cancer.

The impact of the bias correction was assessed in three ways. First, genome-wide significant prostate cancer index variants were selected from the European ancestry PRACTICAL GWAS meta-analysis (85,554 cases and 91,972 controls) using clumping (LD r² < 0.01 within 10 Mb) (ref. ³²). We tabulated the number of variants that remained associated at P < 5 × 10⁻⁸ after bias correction. Next, we fit genetic scores for PSA and prostate cancer in men of European ancestry in the UKB who were not included in the PSA or prostate cancer GWAS (11,568 prostate cancer cases and 152,884 controls). We compared the correlation between the PGS for PSA (PGS_PSA), comprising 128 lead variants, and the 269-variant prostate cancer risk score fit with original risk allele weights (PGS₂₆₉) and with weights corrected for index event bias (PGS₂₆₉^adj). To allow adjustment for genetic ancestry PCs and genotyping array, associations between the two scores were estimated using linear regression models. Next, we examined associations for each genetic score (PGS₂₆₉, PGS₂₆₉^adj, PGS₂₆₉^adj-S) with prostate cancer in a subset of GERA participants who underwent a biopsy. Because GERA controls were included in the PSA GWAS meta-analysis, AUC estimates and corresponding bootstrapped 95% CIs were obtained using tenfold cross-validation. We also examined PGS associations with Gleason score, a marker of disease aggressiveness, which was not available in the UKB. Multinomial logistic regression models with Gleason score ≤6 (reference), 7 and ≥8 as the outcome were fit for each score in 4,584 cases from the GERA cohort.

Application of genetically adjusted PSA for biopsy referral and prostate cancer detection

Genetically corrected PSA values were calculated for individual i as follows^17,19:

$$PSA_i^G = \frac{{PSA_i}}{{a_i}}$$

(1)

where a_i is a personalized adjustment factor derived from PGS_PSA. Because genetic effects were estimated for log(PSA), a_i for correcting PSA in ng ml⁻¹ was derived as:

$$a_i = \frac{{\exp \left( {PGS_i} \right)}}{{\exp ( {\overline {PGS} } )}}$$

(2)

$\overline {PGS}$ can be estimated in controls without prostate cancer or obtained from an external control population^17,19. We see that a_i > 1 when an individual has a higher multiplicative increase in PSA than the sample average due to their genetic profile, resulting in a lower genetically adjusted PSA compared to the observed value ($PSA_i^G < PSA_i$).

We evaluated the potential utility of PGS_PSA in two clinical contexts. First, we quantified the impact of using $PSA_i^G$ on biopsy referrals by examining reclassification at age-specific PSA thresholds used in the Kaiser Permanente health system. Analyses were conducted in GERA participants with information on biopsy date and outcome, comprising prostate cancer cases not included in the PSA GWAS and controls that were part of the PSA GWAS. To use the same normalization factor for both cases and controls while mitigating bias due to control overlap with the PSA discovery GWAS, a_i for GERA participants was calculated by substituting $\overline {PGS}$ from out-of-sample UKB controls (n = 152,884). Upward classification resulting in biopsy eligibility occurred when $PSA_i^G > PSA_i \cap PSA_i^G > ref$, where ref is the biopsy referral threshold. Downward classification resulting in biopsy ineligibility was defined as: $PSA_i^G < PSA_i \cap PSA_i^G < ref$. Net reclassification (NR) was summarized separately for cases and controls:

$$NR_{{{{\mathrm{case}}}}} = P\left( {up{{{\mathrm{|case}}}}} \right) - P\left( {down{{{\mathrm{|case}}}}} \right)$$

$$NR_{{{{\mathrm{control}}}}} = P\left( {down{{{\mathrm{|control}}}}} \right) - P\left( {up{{{\mathrm{|control}}}}} \right)$$

This is equivalent to tabulating the proportion of individuals in each biopsy eligibility category:

$$NR_{{{{\mathrm{case}}}}} = \left( {\frac{{n_{{{{\mathrm{eligible}}}}}}}{{n_{{{{\mathrm{case}}}}}}}} \right) - \left( {\frac{{n_{{{{\mathrm{ineligible}}}}}}}{{n_{{{{\mathrm{case}}}}}}}} \right)$$

$$NR_{{{{\mathrm{control}}}}} = \left( {\frac{{n_{{{{\mathrm{ineligible}}}}}}}{{n_{{{{\mathrm{control}}}}}}}} \right) - \left( {\frac{{n_{{{{\mathrm{eligible}}}}}}}{{n_{{{{\mathrm{control}}}}}}}} \right)$$

For each NR proportion, 95% CIs were obtained using the normal approximation:

$$NR \pm 1.96 \times \sqrt {\frac{{\left| {NR} \right| \times \left( {1 - \left| {NR} \right|} \right)}}{n}}$$

Next, we assessed the performance of risk prediction models for prostate cancer overall, aggressive prostate cancer and non-aggressive prostate cancer in the PCPT and the SELECT.

Because both studies were excluded from the PSA GWAS meta-analysis, a_i and $PSA_i^G$ for then PCPT and the SELECT were calculated using $\overline {PGS}$ observed in each respective study. Consistent with the PGS_PSA validation analysis, pooled analyses included individuals of all ancestries who passed QC filters. To facilitate ancestry-stratified analyses in SELECT, especially for aggressive disease, we combined AFR and AFR/EUR clusters into a single group (AFR pooled) and similarly pooled EAS and EAS/EUR (EAS pooled). Aggressive prostate cancer was defined as Gleason score ≥7, PSA ≥ 10 ng ml⁻¹, T3–T4 stage and/or distant or nodal metastases. We compared AUC estimates for logistic regression models using the following predictors, alone and in combination: baseline PSA, genetically adjusted baseline PSA (PSA^G) PGS_PSA, prostate cancer risk score with original weights (PGS₂₆₉) (ref. ³²) and weights corrected for index event bias (PGS₂₆₉^adj).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

UK Biobank data are publicly available by request from https://www.ukbiobank.ac.uk. To maintain individuals’ privacy, data on the GERA cohort are available by application to the Kaiser Permanente Research Bank (https://researchbank.kaiserpermanente.org/). All PLCO genotype data are available in the database of Genotypes and Phenotypes (dbGAP) under accession number phs001286.v2.p2 (https://identifiers.org/dbgap:phs001286.v2.p2). Companion phenotype data can be requested through the NCI Cancer Data Access System (https://cdas.cancer.gov/plco/). GWAS summary statistics are available directly from the PLCO Atlas GWAS Explorer website (https://exploregwas.cancer.gov/plco-atlas/) as well as accessed directly through API access (https://exploregwas.cancer.gov/plco-atlas/#/api-access). Genome-wide summary statistics for the PSA multi-ancestry meta-analysis and ancestry-stratified summary statistics for the development of the genome-wide PSA polygenic score are available from https://doi.org/10.5281/zenodo.7460134. Scoring files for fitting PSA polygenic scores are available from the PGS Catalog: http://www.pgscatalog.org/score/PGS003378/ and http://www.pgscatalog.org/score/PGS003379/.

Code availability

Genome-wide association analyses were conducted using PLINK version 2.0a3LM (https://www.cog-genomics.org/plink/2.0/). Fixed-effects inverse-variance-weighted meta-analysis was performed with METAL using SCHEME STDERR (https://genome.sph.umich.edu/wiki/METAL_Documentation). Weights for the genome-wide polygenic score for PSA were estimated using PRS-CSx (https://github.com/getian107/PRScsx). Scripts for fitting polygenic scores, performing the index event bias analysis and calculating genetically adjusted PSA values are available at https://github.com/lkachuri/precision_PSA.

References

Lilja, H. A kallikrein-like serine protease in prostatic fluid cleaves the predominant seminal vesicle protein. J. Clin. Invest. 76, 1899–1903 (1985).
Article CAS PubMed PubMed Central Google Scholar
Balk, S. P., Ko, Y. J. & Bubley, G. J. Biology of prostate-specific antigen. J. Clin. Oncol. 21, 383–391 (2003).
Article CAS PubMed Google Scholar
Lilja, H., Ulmert, D. & Vickers, A. J. Prostate-specific antigen and prostate cancer: prediction, detection and monitoring. Nat. Rev. Cancer 8, 268–278 (2008).
Article CAS PubMed Google Scholar
Pinsky, P. F. et al. Prostate volume and prostate-specific antigen levels in men enrolled in a large screening trial. Urology 68, 352–356 (2006).
Article PubMed Google Scholar
Lee, S. E. et al. Relationship of prostate-specific antigen and prostate volume in Korean men with biopsy-proven benign prostatic hyperplasia. Urology 71, 395–398 (2008).
Article PubMed Google Scholar
Grubb, R. L. 3rd et al. Serum prostate-specific antigen hemodilution among obese men undergoing screening in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. Cancer Epidemiol. Biomark. Prev. 18, 748–751 (2009).
Article CAS Google Scholar
Harrison, S. et al. Systematic review and meta-analysis of the associations between body mass index, prostate cancer, advanced prostate cancer, and prostate-specific antigen. Cancer Causes Control 31, 431–449 (2020).
Article PubMed PubMed Central Google Scholar
Thompson, I. M. et al. Assessing prostate cancer risk: results from the Prostate Cancer Prevention Trial. J. Natl Cancer Inst. 98, 529–534 (2006).
Article PubMed Google Scholar
Schroder, F. H. et al. Screening and prostate-cancer mortality in a randomized European study. N. Engl. J. Med. 360, 1320–1328 (2009).
Article PubMed Google Scholar
Telesca, D., Etzioni, R. & Gulati, R. Estimating lead time and overdiagnosis associated with PSA screening from prostate cancer incidence trends. Biometrics 64, 10–19 (2008).
Article PubMed Google Scholar
Welch, H. G. & Black, W. C. Overdiagnosis in cancer. J. Natl Cancer Inst. 102, 605–613 (2010).
Article PubMed Google Scholar
Vickers, A. J. et al. Empirical estimates of prostate cancer overdiagnosis by age and prostate-specific antigen. BMC Med. 12, 26 (2014).
Article PubMed PubMed Central Google Scholar
Vickers, A. J. et al. Strategy for detection of prostate cancer based on relation between prostate specific antigen at age 40–55 and long term risk of metastasis: case–control study. BMJ 346, f2023 (2013).
Article PubMed PubMed Central Google Scholar
Kovac, E. et al. Association of baseline prostate-specific antigen level with long-term diagnosis of clinically significant prostate cancer among patients aged 55 to 60 years: a secondary analysis of a cohort in the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial. JAMA Netw. Open 3, e1919284 (2020).
Article PubMed PubMed Central Google Scholar
Tikkinen, K. A. O. et al. Prostate cancer screening with prostate-specific antigen (PSA) test: a clinical practice guideline. BMJ 362, k3581 (2018).
Article PubMed PubMed Central Google Scholar
US Preventive Services Task Force et al. Screening for prostate cancer: US Preventive Services Task Force recommendation statement. JAMA 319, 1901–1913 (2018).
Article Google Scholar
Hoffmann, T. J. et al. Genome-wide association study of prostate-specific antigen levels identifies novel loci independent of prostate cancer. Nat. Commun. 8, 14248 (2017).
Article CAS PubMed PubMed Central Google Scholar
Bansal, A. et al. Heritability of prostate-specific antigen and relationship with zonal prostate volumes in aging twins. J. Clin. Endocrinol. Metab. 85, 1272–1276 (2000).
CAS PubMed Google Scholar
Gudmundsson, J. et al. Genetic correction of PSA values using sequence variants associated with PSA levels. Sci. Transl. Med. 2, 62ra92 (2010).
Article CAS PubMed PubMed Central Google Scholar
Benafif, S., Kote-Jarai, Z., Eeles, R. A. & Consortium, P. A review of prostate cancer genome-wide association studies (GWAS). Cancer Epidemiol. Biomark. Prev. 27, 845–857 (2018).
Article Google Scholar
Wiklund, F. et al. Association of reported prostate cancer risk alleles with PSA levels among men without a diagnosis of prostate cancer. Prostate 69, 419–427 (2009).
Article PubMed PubMed Central Google Scholar
Kim, S., Shin, C. & Jee, S. H. Genetic variants at 1q32.1, 10q11.2 and 19q13.41 are associated with prostate-specific antigen for prostate cancer screening in two Korean population-based cohort studies. Gene 556, 199–205 (2015).
Article CAS PubMed Google Scholar
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Article CAS PubMed PubMed Central Google Scholar
Speed, D., Holmes, J. & Balding, D. J. Evaluating and improving heritability models using summary statistics. Nat. Genet. 52, 458–462 (2020).
Article CAS PubMed Google Scholar
Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
Ning, Z., Pawitan, Y. & Shen, X. High-definition likelihood inference of genetic correlations across human complex traits. Nat. Genet. 52, 859–864 (2020).
Article CAS PubMed Google Scholar
Magi, R. et al. Trans-ethnic meta-regression of genome-wide association studies accounting for ancestry increases power for discovery and improves fine-mapping resolution. Hum. Mol. Genet. 26, 3639–3650 (2017).
Article CAS PubMed PubMed Central Google Scholar
Rentzsch, P., Schubach, M., Shendure, J. & Kircher, M. CADD-Splice—improving genome-wide variant effect prediction using deep learning-derived splice scores. Genome Med. 13, 31 (2021).
Article CAS PubMed PubMed Central Google Scholar
Mao, Y. Q. & Houry, W. A. The role of pontin and reptin in cellular physiology and cancer etiology. Front. Mol. Biosci. 4, 58 (2017).
Article PubMed PubMed Central Google Scholar
Egydio de Carvalho, C. et al. Molecular cloning and characterization of a complementary DNA encoding sperm tail protein SHIPPO 1. Biol. Reprod. 66, 785–795 (2002).
Article CAS PubMed Google Scholar
Currall, B. B. et al. Loss of LDAH associated with prostate cancer and hearing loss. Hum. Mol. Genet. 27, 4194–4203 (2018).
Article CAS PubMed PubMed Central Google Scholar
Conti, D. V. et al. Trans-ancestry genome-wide association meta-analysis of prostate cancer identifies new susceptibility loci and informs genetic risk prediction. Nat. Genet. 53, 65–75 (2021).
Article CAS PubMed PubMed Central Google Scholar
Dudbridge, F. et al. Adjustment for index event bias in genome-wide association studies of subsequent events. Nat. Commun. 10, 1561 (2019).
Article PubMed PubMed Central Google Scholar
Paternoster, L., Tilling, K. & Davey Smith, G. Genetic epidemiology and Mendelian randomization for informing disease therapeutics: conceptual and methodological challenges. PLoS Genet. 13, e1006944 (2017).
Article PubMed PubMed Central Google Scholar
Mahmoud, O., Dudbridge, F., Davey Smith, G., Munafo, M. & Tilling, K. A robust method for collider bias correction in conditional genome-wide association studies. Nat. Commun. 13, 619 (2022).
Article CAS PubMed PubMed Central Google Scholar
Ruan, Y. et al. Improving polygenic prediction in ancestrally diverse populations. Nat. Genet. 54, 573–580 (2022).
Article CAS PubMed PubMed Central Google Scholar
Lippman, S. M. et al. Effect of selenium and vitamin E on risk of prostate cancer and other cancers: the Selenium and Vitamin E Cancer Prevention Trial (SELECT). JAMA 301, 39–51 (2009).
Article CAS PubMed Google Scholar
Kjaergaard, A. D., Bojesen, S. E., Nordestgaard, B. G., Johansen, J. S. & Smith, G. D. Biomarker de-Mendelization: principles, potentials and limitations of a strategy to improve biomarker prediction by reducing the component of variance explained by genotype. Preprint at bioRxiv https://doi.org/10.1101/428276 (2018).
Holmes, M. V. & Davey Smith, G. Can Mendelian randomization shift into reverse gear? Clin. Chem. 65, 363–366 (2019).
Article CAS PubMed Google Scholar
Enroth, S., Johansson, A., Enroth, S. B. & Gyllensten, U. Strong effects of genetic and lifestyle factors on biomarker variation and use of personalized cutoffs. Nat. Commun. 5, 4684 (2014).
Article CAS PubMed Google Scholar
Yatsenko, A. N. et al. X-linked TEX11 mutations, meiotic arrest, and azoospermia in infertile men. N. Engl. J. Med. 372, 2097–2107 (2015).
Article CAS PubMed PubMed Central Google Scholar
Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415–1429 (2016).
Article CAS PubMed PubMed Central Google Scholar
Marouli, E. et al. Rare and low-frequency coding variants alter human adult height. Nature 542, 186–190 (2017).
Article CAS PubMed PubMed Central Google Scholar
Bick, A. G. et al. Inherited causes of clonal haematopoiesis in 97,691 whole genomes. Nature 586, 763–768 (2020).
Article CAS PubMed PubMed Central Google Scholar
Arbuckle, J. H. & Kristie, T. M. Epigenetic repression of herpes simplex virus infection by the nucleosome remodeler CHD3. mBio 5, e01027-13 (2014).
Article PubMed PubMed Central Google Scholar
Shen, X. et al. Multivariate discovery and replication of five novel loci associated with immunoglobulin G N-glycosylation. Nat. Commun. 8, 447 (2017).
Article PubMed PubMed Central Google Scholar
Wang, J. M. et al. KLRG1 negatively regulates natural killer cell functions through the Akt pathway in individuals with chronic hepatitis C virus infection. J. Virol. 87, 11626–11636 (2013).
Article CAS PubMed PubMed Central Google Scholar
Kachuri, L. et al. The landscape of host genetic factors involved in immune response to common viral infections. Genome Med. 12, 93 (2020).
Article CAS PubMed PubMed Central Google Scholar
Rashkin, S. R. et al. Pan-cancer study detects genetic risk variants and shared genetic basis in two large cohorts. Nat. Commun. 11, 4423 (2020).
Article CAS PubMed PubMed Central Google Scholar
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
Article CAS PubMed PubMed Central Google Scholar
Di Giovannantonio, M. et al. Heritable genetic variants in key cancer genes link cancer risk with anthropometric traits. J. Med. Genet. 58, 392–399 (2021).
Article PubMed Google Scholar
Filmus, J. & Capurro, M. The role of glypican-3 in the regulation of body size and cancer. Cell Cycle 7, 2787–2790 (2008).
Article CAS PubMed Google Scholar
Moraru, A. et al. THADA regulates the organismal balance between energy storage and heat production. Dev. Cell 41, 72–81 e76 (2017).
Article CAS PubMed PubMed Central Google Scholar
Vanbokhoven, H., Melino, G., Candi, E. & Declercq, W. p63, a story of mice and men. J. Invest. Dermatol. 131, 1196–1207 (2011).
Article CAS PubMed Google Scholar
Wang, H. et al. Transcriptional regulation of P63 on the apoptosis of male germ cells and three stages of spermatogenesis in mice. Cell Death Dis. 9, 76 (2018).
Article CAS PubMed PubMed Central Google Scholar
Neri, G., Gurrieri, F., Zanni, G. & Lin, A. Clinical and molecular aspects of the Simpson–Golabi–Behmel syndrome. Am. J. Med. Genet. 79, 279–283 (1998).
Article CAS PubMed Google Scholar
Gulati, R., Inoue, L. Y., Gore, J. L., Katcher, J. & Etzioni, R. Individualized estimates of overdiagnosis in screen-detected prostate cancer. J. Natl Cancer Inst. 106, djt367 (2014).
Article PubMed PubMed Central Google Scholar
Catalona, W. J. et al. Use of the percentage of free prostate-specific antigen to enhance differentiation of prostate cancer from benign prostatic disease: a prospective multicenter clinical trial. JAMA 279, 1542–1547 (1998).
Article CAS PubMed Google Scholar
Loeb, S. et al. The prostate health index selectively identifies clinically significant prostate cancer. J. Urol. 193, 1163–1169 (2015).
Article PubMed Google Scholar
Vickers, A. J. & Brewster, S. F. PSA velocity and doubling time in diagnosis and prognosis of prostate cancer. Br. J. Med Surg. Urol. 5, 162–168 (2012).
Article PubMed PubMed Central Google Scholar
Machiela, M. J. et al. GWAS Explorer: an open-source tool to explore, visualize, and access GWAS summary statistics in the PLCO Atlas. Sci. Data 10, 25 (2023).
Article PubMed PubMed Central Google Scholar
D’Amico, A. V. Risk-based management of prostate cancer. N. Engl. J. Med. 365, 169–171 (2011).
Article PubMed Google Scholar
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Article CAS PubMed PubMed Central Google Scholar
Andriole, G. L. et al. Mortality results from a randomized prostate-cancer screening trial. N. Engl. J. Med. 360, 1310–1319 (2009).
Article CAS PubMed PubMed Central Google Scholar
Roden, D. M. et al. Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clin. Pharmacol. Ther. 84, 362–369 (2008).
Article CAS PubMed Google Scholar
Klein, R. J. et al. Evaluation of multiple risk-associated single nucleotide polymorphisms versus prostate-specific antigen at baseline to predict prostate cancer in unscreened men. Eur. Urol. 61, 471–477 (2012).
Article CAS PubMed Google Scholar
Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).
Article CAS PubMed PubMed Central Google Scholar
Hoffmann, T. J. et al. Design and coverage of high throughput genotyping arrays optimized for individuals of East Asian, African American, and Latino race/ethnicity using imputation and a novel hybrid SNP selection algorithm. Genomics 98, 422–430 (2011).
Article CAS PubMed Google Scholar
Hoffmann, T. J. et al. Next generation genome-wide association tool: design and coverage of a high-throughput European-optimized SNP array. Genomics 98, 79–89 (2011).
Article CAS PubMed Google Scholar
Kvale, M. N. et al. Genotyping informatics and quality control for 100,000 subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. Genetics 200, 1051–1060 (2015).
Article PubMed PubMed Central Google Scholar
Delaneau, O., Marchini, J. & Zagury, J. F. A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 (2011).
Article PubMed Google Scholar
Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G. R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–959 (2012).
Article CAS PubMed PubMed Central Google Scholar
Banda, Y. et al. Characterizing race/ethnicity and genetic ancestry for 100,000 subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. Genetics 200, 1285–1295 (2015).
Article PubMed PubMed Central Google Scholar
Jun, G. et al. Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. Am. J. Hum. Genet. 91, 839–848 (2012).
Article CAS PubMed PubMed Central Google Scholar
Jin, Y., Schaffer, A. A., Sherry, S. T. & Feolo, M. Quickly identifying identical and closely related subjects in large databases using genotype data. PLoS ONE 12, e0179106 (2017).
Article PubMed PubMed Central Google Scholar
Loh, P. R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).
Article CAS PubMed PubMed Central Google Scholar
Zheng, X. et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328 (2012).
Article CAS PubMed PubMed Central Google Scholar
Chen, C. Y. et al. Improved ancestry inference using weights from external reference panels. Bioinformatics 29, 1399–1406 (2013).
Article CAS PubMed PubMed Central Google Scholar
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The Precision PSA study is supported by funding from the National Institutes of Health (NIH) National Cancer Institute (NCI) under awards R01CA241410 (J.S.W.), U01CA261339 (J.S.W. and D.V.C.) and R00CA246076 (L.K.), and the Young Investigator Award from the Prostate Cancer Foundation (R.E.G.). Contributing studies were supported by research grants from the NIH National Institute of General Medical Sciences (NIGMS) under award R01GM130791 (J.D.M.); NIH/NCI Cancer Center Support Grant to Memorial Sloan Kettering Cancer Center (P30CA008748); MSKCC Specialized Programs of Research Excellence in Prostate Cancer (P50CA92629, H.L.); the Swedish Cancer Society (Cancerfonden 20 1354 PjF, H.L.); and the General Hospital in Malmö Foundation for Combating Cancer. This work was supported, in part, through the computational resources and staff expertise provided by Scientific Computing at the Icahn School of Medicine at Mount Sinai. Research reported in this paper was supported by the Office of Research Infrastructure of the NIH under award S10OD026880 and NIH/NCI funding (R01CA175491 and R01CA244948, R.J.K.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Author information

These authors jointly supervised this work: Rebecca E. Graff, John S. Witte.

Authors and Affiliations

Department of Epidemiology & Biostatistics, University of California, San Francisco, San Francisco, CA, USA
Linda Kachuri, Thomas J. Hoffmann, Rebecca E. Graff & John S. Witte
Department of Epidemiology & Population Health, Stanford University School of Medicine, Stanford, CA, USA
Linda Kachuri, Yu Jiang & John S. Witte
Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
Linda Kachuri & John S. Witte
Institute of Human Genetics, University of California, San Francisco, San Francisco, CA, USA
Thomas J. Hoffmann
Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, USA
Sonja I. Berndt, Mitchell J. Machiela, Neal D. Freedman, Wen-Yi Huang, Shengchao A. Li & Stephen J. Chanock
Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
John P. Shelley & Jonathan D. Mosley
Vanderbilt-Ingram Cancer Center, Nashville, TN, USA
Kerry R. Schaffer
Biological and Medical Informatics, University of California, San Francisco, San Francisco, CA, USA
Ryder Easterlin
Fred Hutchinson Cancer Research Center, Seattle, WA, USA
Phyllis J. Goodman
SWOG Statistics and Data Management Center, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
Cathee Till
CHRISTUS Santa Rosa Medical Center Hospital, San Antonio, TX, USA
Ian Thompson
Departments of Laboratory Medicine, Surgery and Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Hans Lilja
Department of Translational Medicine, Lund University, Skåne University Hospital, Malmö, Sweden
Hans Lilja
Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
Stephen K. Van Den Eeden
Center for Genetic Epidemiology, Department of Population and Preventive Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
Christopher A. Haiman & David V. Conti
Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
Christopher A. Haiman & David V. Conti
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Robert J. Klein
Department of Internal Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
Jonathan D. Mosley
Departments of Biomedical Data Science and Genetics, Stanford University, Stanford, CA, USA
John S. Witte

Authors

Linda Kachuri
View author publications
You can also search for this author in PubMed Google Scholar
Thomas J. Hoffmann
View author publications
You can also search for this author in PubMed Google Scholar
Yu Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Sonja I. Berndt
View author publications
You can also search for this author in PubMed Google Scholar
John P. Shelley
View author publications
You can also search for this author in PubMed Google Scholar
Kerry R. Schaffer
View author publications
You can also search for this author in PubMed Google Scholar
Mitchell J. Machiela
View author publications
You can also search for this author in PubMed Google Scholar
Neal D. Freedman
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Yi Huang
View author publications
You can also search for this author in PubMed Google Scholar
Shengchao A. Li
View author publications
You can also search for this author in PubMed Google Scholar
Ryder Easterlin
View author publications
You can also search for this author in PubMed Google Scholar
Phyllis J. Goodman
View author publications
You can also search for this author in PubMed Google Scholar
Cathee Till
View author publications
You can also search for this author in PubMed Google Scholar
Ian Thompson
View author publications
You can also search for this author in PubMed Google Scholar
Hans Lilja
View author publications
You can also search for this author in PubMed Google Scholar
Stephen K. Van Den Eeden
View author publications
You can also search for this author in PubMed Google Scholar
Stephen J. Chanock
View author publications
You can also search for this author in PubMed Google Scholar
Christopher A. Haiman
View author publications
You can also search for this author in PubMed Google Scholar
David V. Conti
View author publications
You can also search for this author in PubMed Google Scholar
Robert J. Klein
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan D. Mosley
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca E. Graff
View author publications
You can also search for this author in PubMed Google Scholar
John S. Witte
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Concept and design: L.K., T.J.H., R.J.K., J.D.M., R.E.G. and J.S.W. Acquisition, analysis or interpretation of data: L.K., T.J.H., Y.J., S.I.B., J.P.S., K.S., M.J.M., N.D.F., W.-Y.H., S.A.L., R.E., P.J.G., C.T., I.T., H.L, S.K.V.D.E., S.J.C., C.A.H., D.V.C., R.J.K., J.D.M., R.E.G. and J.S.W. Drafting of the manuscript: L.K., T.J.H., R.E.G. and J.S.W. Critical revision of the manuscript for important intellectual content: L.K., T.J.H., Y.J., S.I.B., J.P.S., K.S., M.J.M., N.D.F., W.-Y.H., S.A.L., R.E., P.J.G., C.T., I.T., H.L, S.K.V.D.E., S.J.C., C.A.H., D.V.C., R.J.K., J.D.M., R.E.G. and J.S.W.

Corresponding authors

Correspondence to Rebecca E. Graff or John S. Witte.

Ethics declarations

Competing interests

J.S.W. is a non-employee cofounder of Avail Bio. H.L. is named on a patent for intact PSA assays and a patent for a statistical method to detect prostate cancer that is licensed to and commercialized by OPKO Health. H.L. receives royalties from sales of the test and has stock in OPKO Health. All other authors have no competing interests.

Peer review

Peer review information

Nature Medicine thanks Jason Vassy, Jian Yang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary handling editor: Anna Maria Ranzoni, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Heritability (h²) of PSA levels and GWAS results in men of European ancestry without prostate cancer.

a, Crossbars show h² estimates, annotated below, and corresponding 95% confidence intervals across statistical methods. In the UK Biobank (UKB), heritability was estimated using GCTA and Linkage Disequilibrium Adjusted Kinships (LDAK)-Thin models from a genetic relatedness matrix (GRM) of common (MAF ≥ 0.01) LD-pruned (r² < 0.80) variants with imputation quality INFO > 0.80. These estimates were compared to analyses of GWAS summary statistics from the UK Biobank and the EUR meta-analysis using the baseline linkage disequilibrium LDAK model and a high-definition likelihood (HDL) method by Ning et al.²⁶ b, UKB GWAS results where known PSA loci are labeled with the corresponding cytoband region and new regions are labeled with the nearest gene. Highlighted peaks include variants in LD (r² ≥ 0.01) with the lead novel variant. Two-sided p-values are derived from linear regression models.

Extended Data Fig. 2 Impact of correction for PSA-related selection bias on genetic associations with prostate cancer.

Associations with prostate cancer for 128 PSA-associated index variants were obtained from the PRACTICAL GWAS by Conti et al.³² PSA index variants were selected from the multi-ancestry GWAS meta-analysis using clumping and thresholding (P < 5 × 10⁻⁸, linkage disequilibrium r² < 0.01). a, GWAS effect sizes for prostate cancer (β_PrCa) are aligned to the PSA-increasing allele. Bias-adjusted effect sizes (β_adj) are denoted by triangles. b, Two-sided GWAS p-values for prostate cancer (P_PrCa) were derived from an inverse-variance-weighted fixed-effects meta-analysis. Two-sided bias-adjusted p-values (P_adj), denoted by triangles, were calculated from a chi-squared test statistic based on β_adj and corresponding standard errors. Genome-wide significance threshold (P < 5 × 10⁻⁸) is indicated by the dotted line.

Extended Data Fig. 3 Ancestry composition of validation cohorts.

Admixture plots visualizing genetic ancestry proportions for participants within population clusters in a, Prostate Cancer Prevention Trial (PCPT) and b, Selenium and Vitamin E Cancer Prevention Trial (SELECT). Both cohorts were excluded from the PSA GWAS used for polygenic score development. For each individual, the proportion of African (AFR), European (EUR), and East Asian (EAS) genetic ancestry is shown. Single-ancestry clusters include individuals with ancestry scores ≥0.80 in one ancestry group. Admixed ancestry clusters AFR/EUR and EAS/EUR include individuals with ancestry proportions >0.20 and <0.80. For analyses of prostate cancer risk in SELECT, AFR and AFR/EUR and EAS and EAS/EUR were combined into pooled African ancestry (n = 2,936) and pooled East Asian ancestry (n = 578), respectively.

Extended Data Fig. 4 Age-stratified PGS_PSA associations.

Performance of the genome-wide PGS_PSA developed using the PRS-CSx algorithm was evaluated in the two cancer prevention trials: a, Prostate Cancer Prevention Trial (PCPT) and b, Selenium and Vitamin E Cancer Prevention Trial (SELECT). Crossbars visualize the effect estimates (β) and corresponding 95% confidence intervals per standard deviation (SD) increase in the standardized PGS_PSA. Associations between PGS_PSA and baseline log(PSA) were estimated in the pooled sample and stratified by age group. All p-valued are two-sided and derived from linear regression models adjusted for age at baseline, top 10 population-specific genetic ancestry principial components, and proportions of African and East Asian genetic ancestry.

Extended Data Fig. 5 Impact of index event bias on polygenic score (PGS) associations.

Association between PGS for PSA (PGS_PSA) and PGS for prostate cancer (PGS₂₆₉) fit using original weights, as reported in Conti et al.³², is compared to PGS₂₆₉ fit using weights that have been adjusted for index event bias (PGS₂₆₉^adj) using the Dudbridge et al.³³ method. Linear regression lines with shaded 95% confidence intervals visualizing the PGS associations in a, prostate cancer cases and b, men not diagnosed with prostate cancer (controls) are overlaid on individual data points summarized as hexbins. Analyses were restricted to male UK Biobank participants of European ancestry who were excluded from the GWAS of PSA levels.

Supplementary information

Supplementary Information

Supplementary Figs. 1–8 and Tables 1–6 and 8–26. (Supplementary Table 7 is uploaded as an Excel workbook).

Reporting Summary

Supplementary Table

Supplementary Table 7: Predicted functional features and annotations for 128 genome-wide significant (P < 5 × 10⁻⁸) index variants identified in the multi-ancestry meta-analysis of PSA levels in 95,768 men without prostate cancer.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kachuri, L., Hoffmann, T.J., Jiang, Y. et al. Genetically adjusted PSA levels for prostate cancer screening. Nat Med 29, 1412–1423 (2023). https://doi.org/10.1038/s41591-023-02277-9

Download citation

Received: 14 March 2022
Accepted: 27 February 2023
Published: 01 June 2023
Issue Date: June 2023
DOI: https://doi.org/10.1038/s41591-023-02277-9

This article is cited by

Transcending frontiers in prostate cancer: the role of oncometabolites on epigenetic regulation, CSCs, and tumor microenvironment to identify new therapeutic strategies
- Giulia Ambrosini
- Marco Cordani
- Ilaria Dando
Cell Communication and Signaling (2024)
Characterizing prostate cancer risk through multi-ancestry genome-wide discovery of 187 novel risk variants
- Anqi Wang
- Jiayi Shen
- Christopher A. Haiman
Nature Genetics (2023)