Research in context
Evidence before this study
On July 2, 2022, we searched PubMed without language or date restrictions for studies reporting on the development and validation of machine learning-based models for coronary artery disease, including atherosclerosis, death, and myocardial infarction. The following terms and related terms were used when searching: (“machine learning”, “artificial intelligence”, or “random forest”) and (“coronary artery disease”, “atherosclerosis”, “plaque”, or “myocardial infarction”). We identified several machine learning models in the past decade that predict coronary artery disease. However, these studies used machine learning models as a classification tool to simply predict the case-control status of coronary artery disease (binary framework of disease) and none used models to capture coronary artery disease on a spectrum of disease probabilities (quantitative framework of disease). Many of the studies were based on a limited set of features or predetermined risk factors. Hence, assessments of the clinical utility of coronary artery disease-predictive machine learning models are scarce. Therefore, we investigated probabilities generated by a machine learning model as an in-silico marker for coronary artery disease. Its clinical utility to quantify atherosclerotic plaque burden, survival, and risk of myocardial infarction on a continuum was assessed in a longitudinal multi-ethnic cohort, and underdiagnosed individuals with coronary artery disease were identified as an example of its intervenability. Our multimodal model analyses millions of diverse clinical datapoints of diagnoses, laboratory test results, medications, and vitals contained in the electronic health records (EHRs) of participants.
Added value of this study
To our knowledge, this study is the first that constructs a quantitative marker for coronary artery disease risk, severity, and prognosis from a machine learning model trained on clinical data from EHRs. Individuals with common diseases occupy a spectrum of disease that represents an individual's combination of risk factors and pathogenic processes; quantitative differences in coronary stenosis, for example, result in gradations of risk of death. Quantification of where an individual falls on the disease spectrum is needed for clinical screening and management. We developed and externally tested a coronary artery disease-predictive machine learning model using 95 935 EHRs in the multi-ethnic BioMe Biobank and UK Biobank, and from it generated an in-silico score for coronary artery disease (ISCAD). We found that coronary stenosis from angiography data increased quantitatively with ascending ISCAD, including risk of obstructive coronary artery disease, multivessel coronary artery disease, and stenosis of each major coronary artery, such as the left main and proximal left anterior descending arteries. All-cause death increased stepwise over ascending ISCAD and sequelae, such as recurrent myocardial infarction, rose in gradations with ISCAD. ISCAD showed greater associations with these coronary artery disease outcomes than did conventional risk scores of pooled cohort equations and polygenic risk scores. We identified participants with high ISCAD who had no coronary artery disease diagnosis and found that almost 50% of them had clinical evidence of underdiagnosed coronary artery disease on manual chart review.
Implications of all the available evidence
Our study shows a reconceptualisation of coronary artery disease—including atherosclerosis, death, and sequelae—as a spectrum of disease that is quantifiable with artificial intelligence trained on clinical data. This in-silico marker derived from machine learning captured coronary artery disease pathophysiology and clinical outcomes on a continuum. The model is holistic in drawing on a wide array of clinical information from population-based biobanks, inclusive in representing diverse populations, and faithful in preserving the complexity of disease. The implementation of machine learning-based quantitative markers for coronary artery disease might help to define the disease state and clinical outcomes in patients, while optimising the detection of disease and reducing underdiagnosis.