Main

Sodium–glucose cotransporter 2 (SGLT2) inhibitors have been shown to be of benefit in patients with heart failure (HF), leading to significant reductions in the composite outcome of worsening HF (often leading to hospitalization) or death from cardiovascular (CV) causes1,2,3,4,5. We planned a prospective, patient-level pooled meta-analysis of the Dapagliflozin and Prevention of Adverse Outcomes in Heart Failure (DAPA-HF) and Dapagliflozin Evaluation to Improve the LIVEs of Patients With Preserved Ejection Fraction Heart Failure (DELIVER) trials to provide additional data about the efficacy and safety of dapagliflozin as a treatment for patients with HF1,2. The individual trials were powered for their primary composite endpoints6,7 and the purpose of the pooled analysis was to evaluate the key components of these endpoints and important secondary efficacy outcomes that required more power than provided by the individual trials. In particular, we pre-specified examination of the effect of dapagliflozin on mortality and the composite of death from CV causes, myocardial infarction (MI) or stroke (MACE). We also pre-specified that these outcomes would be examined in a limited number of patient subgroups to examine the consistency of the effects of dapagliflozin. One of these, left ventricular ejection fraction (LVEF), has become a key clinical question since the pooled analysis was originally conceived8. Treatments for heart failure that work through neurohumoral pathways have their greatest benefit in patients with a reduced LVEF, that is, ≤40%. Analyses of trials testing such treatments demonstrated attenuated benefit in patients with an ejection fraction >55–60%9,10,11. This pattern is considered biologically plausible because patients with lower ejection fractions exhibit greater neurohumoral activation than patients with higher ejection fractions9,10,11. SGLT2 inhibitors are not thought to act through neurohumoral pathways and no gradient in their effect related to ejection fraction was anticipated. However, a pooled analysis of the EMPagliflozin outcomE tRial in patients with chrOnic heaRt failure (EMPEROR) trials unexpectedly suggested a similar pattern of attenuated benefit in patients with a normal ejection fraction3,4,12. If correct, this finding has major implications for the treatment of patients with HF, a large proportion of whom have a normal ejection fraction, as well as our understanding of the pathophysiology of this syndrome and how SGLT2 inhibitors exert their benefits in HF. For this reason, before DELIVER2 was unblinded, we prepared an updated statistical analysis plan to pre-specify additional analyses of the effects of dapagliflozin across the full range of LVEF at baseline (Supplementary information).

Results

Patient-level pooled meta-analysis of DAPA-HF and DELIVER

Of the 11,007 participants included in this analysis, 4,744 had an LVEF ≤ 40% and 6,263 an ejection fraction >40%, with 5,503 randomized to placebo and 5,504 randomized to dapagliflozin. The distributions of LVEFs in the overall population are shown in Extended Data Fig. 1. The mean LVEF was 44% (s.d. 14%) and the median 44% (interquartile range (IQR) 34–55%). The median follow-up was 22 months (IQR 17–30 months).

Baseline characteristics

Compared with participants with a lower ejection fraction, those with a higher ejection fraction were older and more likely to be a woman (Table 1). Blood pressure was 11 mmHg higher and body mass index (BMI) was 2 kg m−2 higher in those with an ejection fraction >60% compared with ≤30%. A history of hypertension and atrial fibrillation was more common and that of MI less common in patients with higher ejection fractions. The proportion of patients in New York Heart Association (NYHA) class III/IV was lower among those with a higher ejection fraction but patient-reported health status, measured by the Kansas City Cardiomyopathy Questionnaire—Total symptom score (KCCQ-TSS), was worse in participants with higher ejection fractions. Both N-terminal pro-brain natriuretic peptide (NT-proBNP) and estimated glomerular filtration rate (eGFR) were lower in the patients with higher ejection fraction, as was the use of angiotensin-converting enzyme (ACE) inhibitors, angiotensin receptor blockers (ARBs), sacubitril/valsartan, β-blockers, mineralocorticoid receptor antagonists (MRAs) and intracardiac devices.

Table 1 Baseline characteristics of the pooled DAPA-HF and DELIVER cohort by ejection fraction category

Effect of dapagliflozin on outcomes according to ejection fraction

The rate of each pre-specified outcome was lower in the dapagliflozin group (Fig. 1). In the overall population, dapagliflozin reduced the risk of death from CV causes with an HR of 0.86 (95% CI 0.76–0.97), P = 0.01. There was no evidence of effect modification by ejection fraction examined as either a categorical (Table 2 and Fig. 2) or a continuous variable (P for interaction = 0.63 and 0.94, respectively).

Fig. 1: Effect of dapagliflozin on key clinical outcomes in pooled DAPA-HF and DELIVER dataset.
figure 1

af, Incidence of: death from CV causes (a); death from all causes (b); the total number of hospital admissions for HF (c); time to first hospital admission for HF (d); death from CV causes, MI or stroke (e); and death from CV causes or hospital admission for HF (f), according to randomized therapy. Participants randomized to dapagliflozin are shown in blue and those randomized to placebo in red. All figures are Kaplan–Meier curves with an HR and 95% CI estimated from Cox’s model with two-sided P values except for the total number of hospital admissions for HF, which was plotted using the Gosh and Lin method accounting for death from CV causes (the RR is estimated from the joint frailty model with a two-sided P value). No adjustment for multiple comparisons was made. NNT indicates the number of patients who need to be treated over the median duration of follow-up to prevent one event (of the type in each panel). An NNT could not be calculated for the total number of hospital admissions for HF because this was an episode-based rather than a patient-based analysis (that is, patients may have had more than one hospital admission). ARRs and NNTs are shown with a 95% CI.

Table 2 Clinical outcomes according to ejection fraction category and randomized therapy
Fig. 2: Effect of dapagliflozin on clinical outcomes across the range of ejection fraction.
figure 2

af, Effect of dapagliflozin on: death from CV causes (a); death from all causes (b); the total number of hospital admissions for HF (c); time to first hospital admission for HF (d); death from CV causes, MI or stroke (e); and death from CV causes or hospital admission for HF (f), according to baseline LVEF. The horizontal blue line shows the continuous HR across the range of LVEF and the shaded area around this line represents the 95% CI from Cox’s model. The overall effect of treatment in the pooled population is shown in each panel as an HR (95% CI) with the two-sided P value from Cox’s model for Wald’s test of interaction between treatment and LVEF. No adjustment for multiple comparisons was made. aRestricted cubic spline and interaction P value derived from LWYY model for total HF hospitalization.

In sensitivity analyses, the results were unchanged when undetermined deaths were excluded from the definition of death from CV causes or if the definition of death from CV causes used in each trial was examined (Extended Data Fig. 2). The absolute risk reduction (ARR) was 1.5% (95% CI 0.4–2.6%) and the number needed to treat (NNT) over the median follow-up was 68 (95% CI 39–281).

The risk of death from any cause was also reduced (HR 0.90 (95% CI 0.82–0.99); P = 0.03) with no evidence of an interaction between ejection fraction and treatment, whether ejection fraction was analyzed by category (P for interaction = 0.79) or as a continuous variable (P for interaction = 0.58). The ARR was 1.5% (95% CI 0.2–2.8%) and the NNT over the median follow-up was 67 (95% CI 36–603).

Dapagliflozin reduced the risk of total (that is, first and subsequent) hospital admissions for HF (RR 0.71 (95% CI 0.65–0.78), P < 0.001) and there was no evidence of a treatment interaction with ejection fraction, whether analyzed by category (P for interaction = 0.62) or as a continuous variable (P for interaction = 0.84). The pre-specified supportive analysis of time to first hospital admission showed a consistent benefit of dapagliflozin (HR 0.74 (95% CI 0.66–0.82); P < 0.001). The ARR was 3.2% (95% CI 2.0–4.4%) and the NNT over the median follow-up was 32 (95% CI 23–51).

Applying the overall relative risk reduction to the placebo group event rate gave an NNT (95% CI) to prevent a death from CV causes in patients with reduced, mildly reduced and preserved ejection fractions of 61 (37–246), 59 (35–237) and 76 (46–309), respectively. The corresponding NNTs for a first hospitalization for HF were 28 (21–41), 30 (24–45) and 29 (23–43) and, for death from any cause, 72 (39–764), 56 (31–593) and 64 (35–684), respectively.

Compared with placebo, dapagliflozin also reduced the incidence of the MACE composite of death from CV causes, MI or stroke, although this effect was of borderline statistical significance (HR 0.90 (5% CI 0.81–1.00); P = 0.045). Again, there was no interaction between ejection fraction and the effect of treatment whether analyzed categorically (P for interaction = 0.72) or as a continuous measure (P for interaction = 0.93). The ARR was 1.3% (95% CI 0.0–2.6%) and the NNT over the median follow-up was 76 (95% CI 39–2187).

To address the possible attenuation of treatment benefit at higher ejection fractions reported in the EMPEROR trials12, we examined the effect of dapagliflozin on the primary composite endpoint used in those trials, that is, time to the first occurrence of hospital admission for worsening HF or death from CV causes. Dapagliflozin reduced the risk of this outcome by 22% (HR 0.78 (95% CI 0.72–0.86); P < 0.001) (Table 2 and Fig. 2). The benefit appeared consistent across ejection fraction categories, with the test for interaction between ejection fraction and the effect of dapagliflozin giving a P value of 0.82 (Table 2). Inspection of the restricted cubic spline showed that the HR was below unity across the full range of ejection fraction, with the upper 95% CI around the HR crossing unity only at the extreme ends of the range (at around 9% and 70%, respectively), probably due to the small number of patients with either a very high or a very low ejection fraction. The P value for the test of interaction was 0.71. In sensitivity analyses, the results were unchanged if undetermined deaths were excluded from the definition of death from CV causes or if the definition from the individual trials was used (Extended Data Fig. 2).

Effect of dapagliflozin in the pre-specified subgroups

The effect of dapagliflozin on CV death was consistent across the pre-specified subgroups except for NYHA class, where the benefit seemed to be less in patients who were in a worse functional class (Fig. 3). To determine whether this interaction was likely to be true or to reflect the play of chance, we also examined the interaction between the KCCQ-TSS score and the effect of dapagliflozin on death from CV causes in a post-hoc subgroup analysis and found that the interaction was not significant (Fig. 3). We also conducted a post-hoc subgroup analysis using NT-proBNP as a continuous measure modeled as a restricted cubic spline and found no evidence of a difference in the effect of dapagliflozin by baseline NT-proBNP levels for any of the outcomes (Fig. 4).

Fig. 3: Effect of randomized treatment on CV death according to the pre-specified subgroups.
figure 3

Estimates are HRs with error bars representing 95% CIs from Cox’s model and a two-sided P value for interaction from Wald’s test of Cox’s model. No adjustment for multiple comparisons was made. aNot a pre-specified subgroup.

Fig. 4: Effect of dapagliflozin on clinical outcomes across the range of NT-proBNP.
figure 4

af, Effect of dapagliflozin on: death from CV causes (a); death from all causes (b); the total number of hospital admissions for HF (c); time to first hospital admission for HF (d); death from CV causes, MI or stroke (e); and death from CV causes or hospital admission for HF (f), according to baseline NT-proBNP level. The horizontal blue line shows the continuous HR across the range of NT-proBNP levels at baseline and the shaded area around this line represents the 95% CI from Cox’s model. The overall effect of treatment in the pooled population is shown in each panel as an HR (95% CI) with the two-sided P value for Wald’s test of interaction between treatment and NT-proBNP level from Cox’s model. No adjustment for multiple comparisons was made. aRestricted cubic spline and interaction P value derived from LWYY model for total HF hospitalization.

Discussion

In a patient-level pooled meta-analysis of 11,007 participants in DAPA-HF and DELIVER1,2, compared with placebo, dapagliflozin 10 mg once daily reduced the risk of each of the pre-specified endpoints, that is, death from CV causes (by 14%), death from any cause (by 10%), total (first and repeat) hospital admissions for HF (by 29%) and the composite of death from CV causes, MI or stroke (by 10%), in patients with HF, with no evidence of heterogeneity of the benefit across the range of ejection fractions.

The original reason for planning a pooled analysis of DAPA-HF and DELIVER was to provide a more statistically robust estimate of the effect of dapagliflozin on outcomes that the individual trials had limited power to examine. Of particular interest was death from CV causes, and death from any cause, as neither trial was powered to show a modest benefit of dapagliflozin on these endpoints, which could still be clinically important. There was a significant benefit of dapagliflozin on death from CV causes in DAPA-HF (HR 0.82 (95% CI 0.69–0.98)) but the present analysis provides a more reliable and precise estimate of the effect of treatment (HR 0.86 (95% CI 0.76–0.97)). Using the pooled analysis of DAPA-HF and DELIVER, the number of patients with HF who needed to be treated (NNT) for a median of 22 months to prevent one death from CV causes was 68 (95% CI 39–281). The conclusion for death from any cause was similar, with a modest-sized benefit that was statistically significant. The reduction in MACE was of borderline statistical significance. However, the beneficial effect on hospital admissions for HF was substantial, as was observed in the individual trials with SGLT2 inhibitors in HF. As a result, our pooled analysis demonstrates the large and generally consistent effect of dapagliflozin on this key outcome in patients with HF, irrespective of ejection fraction phenotype. Although there was a nominally significant interaction between NYHA class and the effect of dapagliflozin, NYHA class and KCCQ-TSS score were dissociated across the spectrum of LVEF at baseline and the effect of dapagliflozin was consistent across the range of KCCQ-TSS scores included.

The second and potentially more important reason to conduct the pooled analysis of DAPA-HF and DELIVER was to address the surprising findings of a pooled analysis of the EMPEROR trials, which appeared to show that the size of the reduction in risk of hospital admission for worsening HF with empagliflozin declined as LVEF increased, with an apparent loss of effect in patients with an ejection fraction in the region of 60–65%12. Although this attenuation of benefit with increasing ejection fraction has been shown repeatedly with treatments acting on neurohumoral pathways9,10,11, it was not expected with SGLT2 inhibitors. We did not find any attenuation of the effect of dapagliflozin with increasing ejection fraction for any of the outcomes of interest, including the EMPEROR primary endpoint of first hospitalization for HF or death from CV causes, with consistently nonsignificant tests of interaction between ejection fraction and the effect of treatment. We also found no interaction according to baseline NT-proBNP level as a measure of neurohumoral activation, although the minimum NT-proBNP inclusion threshold was 300 pg ml−1 and some patients with HF with preserved ejection fraction (HFpEF) have levels below this13.

The seemingly contrary findings of the pooled EMPEROR trials11 and the present analysis are not explained by the distribution of ejection fraction, which was similar in each. The pooled analysis of the dapagliflozin trials included 1,289 more patients than the equivalent analysis of the empagliflozin trials. Therefore, we think that the findings of the present analysis are probably more reliable and those of the EMPEROR analysis may have been spurious, given that they were unexpected and observed in a post-hoc analysis, and whether there was a significant ejection fraction-by-treatment interaction was uncertain. However, we cannot conclude that this is definitely the case and our findings cannot necessarily be generalized to other SGLT2 inhibitors. In addition, in a randomized trial testing the effect of dapagliflozin on symptoms and functional capacity in patients with HFpEF, there was no heterogeneity of treatment effect according to ejection fraction14.

Our findings have clinical implications. Currently, except for diuretics, treatment for HF depends on knowledge of ejection fraction, the measurement of which may not be immediately available, especially where there are limited healthcare resources or geographical or other barriers to obtaining specialist care. The consistency of benefit of SGLT2 inhibitors across the range of ejection fraction, the rapidity with which benefit is obtained15,16, the lack of requirement for titration of dose and the excellent safety profile suggest that this treatment could be initiated while waiting for ejection fraction to be measured. A modeling exercise suggested that first-line treatment with an SGLT2 inhibitor maximizes the benefit of evidence-based treatments in patients with reduced ejection fraction17. Moreover, no other treatment for patients with mildly reduced or preserved ejection fraction has the same strength of evidence as SGLT2 inhibitors18.

Our study has several limitations. LVEF was reported by investigators and was not measured in a core laboratory. As commonly found, there was digit preference in the ejection fraction measurements reported. However, we minimized this effect by examining all outcomes with ejection fraction modeled as a continuous variable and using categories that utilized mid-point ranges rather than whole numbers. We also had a minimum NT-proBNP inclusion threshold of 300 pg ml−1 in DELIVER and it is known that some patients with HFpEF have an NT-proBNP level below this value. Consequently, we cannot be sure about the generalizability of our findings to these patients.

Our analysis demonstrates that, in patients with HF, dapagliflozin led to significant reductions in the risk of death from CV causes and any cause, as well as MACE, irrespective of LVEF. There was a larger reduction in total hospital admissions for HF than in death, which was also consistent across the range of ejection fractions. Most patients with HF, regardless of ejection fraction, are likely to benefit from treatment with an SGLT2 inhibitor, although the ARR is somewhat smaller in patients with higher compared with lower ejection fractions. This analysis supports a recommendation that treatment with dapagliflozin can be initiated in patients with a clinical diagnosis of HF and no contraindications, even if a measurement of ejection fraction is awaited.

Methods

Patient-level pooled meta-analysis of DAPA-HF and DELIVER

The design and results of the DAPA-HF (clinicaltrials.gov identifier NCT03036124) and DELIVER (clinicaltrials.gov identifier NCT03619213) trials have been published1,2,6,7. Each enrolled patient had a diagnosis of HF, functional limitation and elevated natriuretic peptides. The principal difference between the two trials was that patients with an ejection fraction ≤40% were randomized in DAPA-HF and those with an ejection fraction >40% in DELIVER. In both trials, patients were randomized to dapagliflozin at a dose of 10 mg once daily, or a matching placebo, in addition to standard care. The ethics committees of the participating institutions approved the protocols and all patients gave written informed consent.

Trial patients

Patients in NYHA functional classes II–IV, with an LVEF ≤ 40% and an elevated NT-proBNP level, were eligible for DAPA-HF. Participants were also required to receive guideline-recommended treatments for HF with reduced ejection fraction. The main exclusions to enrollment were a history of type 1 diabetes mellitus, hypotension causing symptoms or a systolic blood pressure <95 mmHg and an eGFR <30 ml per min per 1.73 m2.

Patients in NYHA functional classes II–IV, with an LVEF > 40% and an elevated NT-proBNP level were eligible for DELIVER. Participants were also required to have evidence of structural heart disease (defined as either left atrial enlargement or left ventricular hypertrophy). All patients in DELIVER had to be receiving at least intermittent diuretic therapy, but no specific background therapy was mandated during the trial. The key exclusion criteria were similar to those in DAPA-HF, although the eGFR threshold was lower in DELIVER (25 ml per min per 1.73 m2).

In both trials, patients with and without type 2 diabetes were randomized and randomization in both trials was stratified by type 2 diabetes status.

Outcomes

Both trials were event driven and had the same primary endpoint, which was a composite of the time to the first occurrence of worsening HF or death from a CV cause. Worsening HF was defined as unplanned hospital admission for HF or an urgent visit for worsening HF resulting in the administration of an intravenous diuretic.

In the original ‘regulatory’ statistical analysis plan for the meta-analysis (dated 2 August 2019), a pre-specified hierarchy of endpoints was provided with control of alpha (see Statistical analysis below). The endpoints were: death from CV causes; death from any cause; total (that is, first and repeat) hospital admissions for HF (with an additional supportive analysis of time to the first occurrence of hospital admissions for HF, outside alpha control); and the composite of death from CV causes, MI or stroke (MACEs). As a result of the possible attenuation of the benefit of SGLT2 inhibition at higher ejection fractions reported in the EMPEROR trials12 (as described in the introduction), we also examined the composite outcome used in the EMPEROR trials, that is, time to the first occurrence of hospital admission for worsening HF or death from CV causes in our analyses.

The original statistical analysis plan stated that the consistency of the effect of dapagliflozin on CV death would be examined in a limited number of subgroups defined by age (≤65, >65 years), sex (male, female), race (white, black or African, Asian, other), NYHA class at enrollment (II, III/IV), LVEF at enrollment (≤40, >40%), diagnosis of type 2 diabetes mellitus at baseline (yes, no) and eGFR at baseline (<60 or ≥60 ml per min per 1.73 m2). As described below, additional ejection fraction subgroups were included in an updated statistical analysis plan.

In DAPA-HF, the definition of a CV death included any death not judged to have a non-CV cause, that is, deaths where the cause could not be determined. By contrast, in DELIVER, deaths in which the cause could not be determined were excluded from the definition of death from CV causes. In the pre-specified statistical analysis plan, the definition of death from CV causes included deaths of undetermined causes. However, we also conducted a sensitivity analysis using the definitions originally employed in the individual trials.

MI and stroke were adjudicated in DAPA-HF but not in DELIVER, where serious adverse event reports were used to ascertain these outcomes.

The ‘academic’ statistical analysis plan, dated 30 March 2022, stated that additional LVEF subgroups in addition to those described in the DELIVER statistical analysis plan (that is, ≤ 49%, 50–59%, ≥60%) would be considered to limit digit preference and the effects of treatment would be examined using LVEF as a continuous measure.

Statistical analysis

Before pooling DELIVER and DAPA-HF, between-trial heterogeneity was tested as pre-specified using Q and I2 statistics. There was little evidence of heterogeneity for the effect of treatment on the primary outcome, that is, death from CV causes (Q = 0.47, P = 0.50 and I2 = 0%).

The estimand was formulated as treatment with dapagliflozin would reduce the risk of: death from CV causes; death from any cause; total (that is, first and repeat) hospital admissions for HF; and the composite of death from CV causes, MI or stroke (MACEs) in adults with HF, irrespective of exposure, treatment discontinuation or concomitant treatment. To control the family-wise error rate at the 5% alpha level, a fixed sequence procedure was used with the testing procedure continued down the hierarchy, if the preceding endpoint was rejected at the 5% alpha level.

Baseline characteristics were summarized as means (s.d.), median (IQRs) or percentages and described across groups according to ejection fraction. Ejection fraction was normally distributed but demonstrated digit preference and, to account for this, sextiles were used to describe the distribution of baseline characteristics. Cochrane, Armitage and Cuzick’s tests were used to examine trends across ejection fraction quantiles. Rates were calculated using the total number of events divided by the person-years of follow-up and expressed as a rate per 100 person-years. Cox’s models included randomized therapy and were stratified by diabetes status at enrollment and trial (DAPA-HF or DELIVER). To account for the clustering within trials, a variable denoting the trial was used as a stratification variable in the model, to indicate that different trial populations are exposed to different baseline risks19. The effect of therapy according to ejection fraction was tested in Cox’s models by entering an interaction term between randomized therapy and ejection fraction as a continuous variable modeled as a restricted cubic spline. Three knots were chosen (ejection fraction of 6%, 45% and 84%) after examining the Akaike information criterion (AIC) for different numbers of knots, and the spline with the lowest AIC was chosen. All models used the full range of ejection fraction values. The interaction was represented graphically showing the HR for the effect of dapagliflozin against placebo across the range of ejection fraction. Total HF hospitalizations were analyzed by a joint frailty model with CV death treated as a competing risk20. The model included the treatment term and adjustment for previous hospital admission for HF, diabetes status at enrollment and trial (DAPA-HF or DELIVER). The nonparametric estimates of the marginal mean of the cumulative number of total hospital admissions for HF over time were calculated allowing for death as a terminal event, and the estimates were plotted according to the approach of Ghosh and Lin21. To examine the interaction between the effect of dapagliflozin on each CV death and total hospital admissions for HF, a spline term for ejection fraction, as outlined above, was entered into an extension of the proportional hazards model for recurrent events as described by the Lin–Wei–Yang–Ying (LWYY) model, which is a semiparametric proportional rates model22. The continuous RR interaction term was then plotted.

All analyses were conducted using Stata v.17.0 and SAS v.9.4. There were no missing data for the variables used in the models and missing follow-up data were handled by censoring at the time of the assessment for potential endpoints. Few patients in either trial had an incomplete follow-up. A P < 0.05 was considered statistically significant.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.