Elsevier

International Journal of Cardiology

Volume 386, 1 September 2023, Pages 149-156
International Journal of Cardiology

Using machine learning to predict cardiovascular risk using self-reported questionnaires: Findings from the 45 and Up Study

https://doi.org/10.1016/j.ijcard.2023.05.030Get rights and content
Under a Creative Commons license
open access

Highlights

  • Risk prediction models were developed for CVD mortality and IHD hospitalisation.

  • These models were developed using machine learning methods.

  • These models used exclusively self-reported variables for risk prediction.

  • These models may allow for cost-effective CVD risk screening and monitoring.

Abstract

Background

Machine learning has been shown to outperform traditional statistical methods for risk prediction model development. We aimed to develop machine learning-based risk prediction models for cardiovascular mortality and hospitalisation for ischemic heart disease (IHD) using self-reported questionnaire data.

Methods

The 45 and Up Study was a retrospective population-based study in New South Wales, Australia (2005–2009). Self-reported healthcare survey data on 187,268 participants without a history of cardiovascular disease was linked to hospitalisation and mortality data. We compared different machine learning algorithms, including traditional classification methods (support vector machine (SVM), neural network, random forest and logistic regression) and survival methods (fast survival SVM, Cox regression and random survival forest).

Results

A total of 3687 participants experienced cardiovascular mortality and 12,841 participants had IHD-related hospitalisation over a median follow-up of 10.4 years and 11.6 years respectively. The best model for cardiovascular mortality was a Cox survival regression with L1 penalty at a re-sampled case/non-case ratio of 0.3 achieved by under-sampling of the non-cases. This model had the Uno's and Harrel's concordance indexes of 0.898 and 0.900 respectively. The best model for IHD hospitalisation was a Cox survival regression with L1 penalty at a re-sampled case/non-case ratio of 1.0 with Uno's and Harrel's concordance indexes of 0.711 and 0.718 respectively.

Conclusion

Machine learning-based risk prediction models developed using self-reported questionnaire data had good prediction performance. These models may have the potential to be used in initial screening tests to identify high-risk individuals before undergoing costly investigation.

Keywords

Cardiovascular disease
Classification
Machine learning
Risk prediction
Survey

Abbreviations

ADPC
Admitted Patient Data Collection
ARIA
Accessibility/Remoteness Index of Australia
ATC
Anatomical Therapeutic Chemical
AUPRC
area under the precision-recall curve
AUROC
area under the receiver operating characteristic curve
CHeReL
Centre for Health Record Linkage
COD URF
Cause of Death Unit Record File
CVD
cardiovascular disease
ICD
International Classification of Diseases
IHD
ischemic heart disease
NSW
New South Wales
RBDM
Register of Births Deaths and Marriages
PBS
Pharmaceutical Benefits Scheme
SHAP
SHapley Additive exPlanations
SLA
Statistical Local Area
SEIFA IRSD
Socio-Economic Index for Areas Index of Relative Social Disadvantage
SVM
support vector machine.

Cited by (0)