Triage-driven diagnosis of Barrett’s esophagus for early detection of esophageal adenocarcinoma using deep learning

Gehrung, Marcel; Crispin-Ortuzar, Mireia; Berman, Adam G.; O’Donovan, Maria; Fitzgerald, Rebecca C.; Markowetz, Florian

doi:10.1038/s41591-021-01287-9

Article
Published: 15 April 2021

Triage-driven diagnosis of Barrett’s esophagus for early detection of esophageal adenocarcinoma using deep learning

Nature Medicine volume 27, pages 833–841 (2021)Cite this article

8141 Accesses
60 Citations
84 Altmetric
Metrics details

Subjects

Abstract

Deep learning methods have been shown to achieve excellent performance on diagnostic tasks, but how to optimally combine them with expert knowledge and existing clinical decision pathways is still an open challenge. This question is particularly important for the early detection of cancer, where high-volume workflows may benefit from (semi-)automated analysis. Here we present a deep learning framework to analyze samples of the Cytosponge-TFF3 test, a minimally invasive alternative to endoscopy, for detecting Barrett’s esophagus, which is the main precursor of esophageal adenocarcinoma. We trained and independently validated the framework on data from two clinical trials, analyzing a combined total of 4,662 pathology slides from 2,331 patients. Our approach exploits decision patterns of gastrointestinal pathologists to define eight triage classes of varying priority for manual expert review. By substituting manual review with automated review in low-priority classes, we can reduce pathologist workload by 57% while matching the diagnostic performance of experienced pathologists.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Cytosponge procedure, triage scheme and data summary.**

**Fig. 2: Tile- and patient-level classification of Cytosponge-TFF3 samples.**

**Fig. 3: Application of quality control and diagnostic confidence class scheme to the internal validation cohort.**

**Fig. 4: Triage-driven approach with incremental triage class substitution scheme on internal validation set.**

**Fig. 5: Triage model applied to the external validation cohort and simulation of cohort variation.**

Segment anything in medical images

Article Open access 22 January 2024

Jun Ma, Yuting He, … Bo Wang

A visual-language foundation model for computational pathology

Article 19 March 2024

Ming Y. Lu, Bowen Chen, … Faisal Mahmood

Towards a general-purpose foundation model for computational pathology

Article 19 March 2024

Richard J. Chen, Tong Ding, … Faisal Mahmood

Data availability

The dataset is governed by data usage policies specified by the data controller (University of Cambridge, Cancer Research UK). We are committed to complying with Cancer Research UK’s Data Sharing and Preservation Policy. Whole-slide images used in this study will be available for non-commercial research purposes upon approval by a Data Access Committee according to institutional requirements. Applications for data access should be directed to rcf29@cam.ac.uk. Data derived from the raw images are freely available at a public repository: https://github.com/markowetzlab/cytosponge-triage. The code and included data enable replication of the results and figures in this manuscript.

Code availability

The source code of this work is freely available at a public repository: https://github.com/markowetzlab/cytosponge-triage.

References

Hawkes, N. Cancer survival data emphasise importance of early diagnosis. Br. Med. J. 364, l408 (2019).
Article Google Scholar
Schiffman, J. D., Fisher, P. G. & Gibbs, P. Early detection of cancer: past, present, and future. Am. Soc. Clin. Oncol. Educ. Book 35, 57–65 (2015).
Article Google Scholar
Nanda, K. et al. Accuracy of the Papanicolaou test in screening for and follow-up of cervical cytologic abnormalities: a systematic review. Ann. Intern. Med. 132, 810–819 (2000).
Article CAS Google Scholar
Cyr, P. R. Atypical moles. Am. Fam. Physician 78, 735–740 (2008).
Google Scholar
Talbot, I., Price, A. & Salto-Tellez, M. Biopsy Pathology in Colorectal Disease (CRC Press, 2006).
Maung, R. Pathologists’ workload and patient safety. Diagn. Histopathol. 22, 283–287 (2016).
Article Google Scholar
Esteva, A. et al. A guide to deep learning in healthcare. Nat. Med. 25, 24 (2019).
Article CAS Google Scholar
Miotto, R., Wang, F., Wang, S., Jiang, X. & Dudley, J. T. Deep learning for healthcare: review, opportunities and challenges. Brief. Bioinform. 19, 1236–1246 (2018).
Article Google Scholar
Yu, K.-H., Beam, A. L. & Kohane, I. S. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2, 719–731 (2018).
Article Google Scholar
Bray, F. et al. Global cancer statistics 2018: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68, 394–424 (2018).
Article Google Scholar
Pohl, H., Sirovich, B. & Welch, H. G. Esophageal adenocarcinoma incidence: are we reaching the peak? Cancer Epidemiol. Prev. Biomark. 19, 1468–1470 (2010).
Article Google Scholar
Smyth, E. C. et al. Oesophageal cancer. Nat. Rev. Dis. Primers 3, 17048 (2017).
Article Google Scholar
Peters, Y. et al. Barrett oesophagus. Nat. Rev. Dis. Primers 5, 35 (2019).
El-Serag, H. B., Sweet, S., Winchester, C. C. & Dent, J. Update on the epidemiology of gastro-oesophageal reflux disease: a systematic review. Gut 63, 871–880 (2014).
Article Google Scholar
Spechler, S. J. & Souza, R. F. Barrett’s esophagus. N. Engl. J. Med. 371, 836–845 (2014).
Article CAS Google Scholar
Odze, R. Histology of Barrett’s metaplasia: do goblet cells matter? Dig. Dis. Sci. 63, 2042–2051 (2018).
Article Google Scholar
Kadri, S. R. et al. Acceptability and accuracy of a non-endoscopic screening test for Barrett’s oesophagus in primary care: cohort study. Br. Med. J. 341, c4372 (2010).
Article Google Scholar
Ross-Innes, C. S. et al. Evaluation of a minimally invasive cell sampling device coupled with assessment of trefoil factor 3 expression for diagnosing Barrett’s esophagus: a multi-center case–control study. PLoS Med. 12, e1001780 (2015).
Article Google Scholar
Freeman, M., Offman, J., Walter, F. M., Sasieni, P. & Smith, S. G. Acceptability of the cytosponge procedure for detecting Barrett’s oesophagus: a qualitative study. BMJ Open 7, e013901 (2017).
Article Google Scholar
Paterson, A. L., Gehrung, M., Fitzgerald, R. C. & O’Donovan, M. Role of TFF3 as an adjunct in the diagnosis of Barrett’s esophagus using a minimally invasive esophageal sampling device—The Cytosponge^TM. Diagn. Cytopathol. 48, 253–264 (2019).
Lao-Sirieix, P. et al. Non-endoscopic screening biomarkers for Barrett’s oesophagus: from microarray analysis to the clinic. Gut 58, 1451–1459 (2009).
Article CAS Google Scholar
Fitzgerald, R. et al. Cytosponge-trefoil factor 3 versus usual care to identify Barrett’s oesophagus in a primary care setting: a prospective, multicentre, pragmatic, randomised controlled trial. Lancet 396, 333–344 (2020).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. Commun. ACM https://doi.org/10.1145/3065386 (2017).
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 4700–4708 (IEEE, 2017).
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2818–2826 (IEEE, 2016).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).
Iandola, F. N. et al. Squeezenet: Alexnet-level accuracy with 50× fewer parameters and <0.5MB model size. Preprint at https://arxiv.org/abs/1602.07360 (2016).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. Preprint at https://arxiv.org/abs/1409.1556 (2014).
Selvaraju, R. R. et al. Grad-cam: visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision 618–626 (IEEE, 2017).
Fitzgerald, R. C. et al. British Society of Gastroenterology guidelines on the diagnosis and management of Barrett’s oesophagus. Gut 63, 7–42 (2014).
Article Google Scholar
Fan, X. & Snyder, N. Prevalence of Barrett’s esophagus in patients with or without GERD symptoms: role of race, age, and gender. Dig. Dis. Sci. 54, 572–577 (2009).
Article Google Scholar
Rex, D. K. et al. Screening for Barrett’s esophagus in colonoscopy patients with and without heartburn. Gastroenterology 125, 1670–1677 (2003).
Article Google Scholar
Elizondo, J. H. et al. Prevalence of Barrett’s esophagus: an observational study from a gastroenterology clinic. Rev. Gastroenterol. Mex. 82, 296–300 (2017).
Google Scholar
Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).
Article CAS Google Scholar
Iizuka, O. et al. Deep learning models for histopathological classification of gastric and colonic epithelial tumours. Sci. Rep. 10, 1504 (2020).
Article CAS Google Scholar
Kather, J. N. et al. Predicting survival from colorectal cancer histology slides using deep learning: a retrospective multicenter study. PLoS Med. 16, e1002730 (2019).
Article Google Scholar
Mobadersany, P. et al. Predicting cancer outcomes from histology and genomics using convolutional networks. Proc. Natl Acad. Sci. USA 115, E2970–E2979 (2018).
Article CAS Google Scholar
Saillard, C. et al. Predicting survival after hepatocellular carcinoma resection using deep learning on histological slides. Hepatology 72, 2000–2013 (2020).
Echle, A. et al. Clinical-grade detection of microsatellite instability in colorectal tumors by deep learning. Gastroenterology 159, 1406–1416 (2020).
Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559 (2018).
Article CAS Google Scholar
Kather, J. N. et al. Pan-cancer image-based detection of clinically actionable genetic alterations. Nat. Cancer 1, 789–799 (2020).
Fu, Y. et al. Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nat. Cancer 1, 800–810 (2020).
Bien, N. et al. Deep-learning-assisted diagnosis for knee magnetic resonance imaging: development and retrospective validation of MRNet. PLoS Med. 15, e1002699 (2018).
Article Google Scholar
Steiner, D. F. et al. Impact of deep learning assistance on the histopathologic review of lymph nodes for metastatic breast cancer. Am. J. Surg. Pathol. 42, 1636–1646 (2018).
Article Google Scholar
Hekler, A. et al. Superior skin cancer classification by the combination of human and artificial intelligence. Eur. J. Cancer 120, 114 (2019).
Article Google Scholar
Kyono, T., Gilbert, F. J. & van der Schaar, M. Improving workflow efficiency for mammography using machine learning. J. Am. Coll. Radiol. 17, 56–63 (2020).
Article Google Scholar
Tschandl, P. et al. Human–computer collaboration for skin cancer recognition. Nat. Med. 26, 1229–1234 (2020).
Bejnordi, B. E., Timofeeva, N., Otte-Höller, I., Karssemeijer, N. & van der Laak, J. A. Quantitative analysis of stain variability in histology slides and an algorithm for standardization. In Medical Imaging 2014: Digital Pathology (eds Gurcan, M. N. & Madabhushi, A.) https://doi.org/10.1117/12.2043683 (SPIE, 2014).
Imperiale, T. F. et al. Multitarget stool DNA testing for colorectal-cancer screening. N. Engl. J. Med. 370, 1287–1297 (2014).
Article CAS Google Scholar
Liu, M. C. et al. Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA. Ann. Oncol. 31, 745–759 (2020).
Kieffer, B., Babaie, M., Kalra, S. & Tizhoosh, H. R. Convolutional neural networks for histopathology image classification: training vs. using pre-trained networks. In 2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA) 1–6 (IEEE, 2017).
Courtiol, P. et al. Deep learning-based classification of mesothelioma improves prediction of patient outcome. Nat. Med. 25, 1519–1525 (2019).
Article CAS Google Scholar
Sharma, P. et al. The development and validation of an endoscopic grading system for Barrett’s esophagus: the Prague C & M criteria. Gastroenterology 131, 1392–1399 (2006).
Article Google Scholar
Levine, D. S. et al. An endoscopic biopsy protocol can differentiate high-grade dysplasia from early adenocarcinoma in Barrett’s esophagus. Gastroenterology 105, 40–50 (1993).
Article CAS Google Scholar
Litjens, G. ASAP https://github.com/computationalpathologygroup/ASAP (2015).
Paszke, A. et al. Pytorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (eds. Wallach, H. et al.) 4399 (Curran Associates, 2019).

Download references

Acknowledgements

This research was supported by Cancer Research UK (FM: C14303/A17197), the Medical Research Council (RCF: RG84369) and Cambridge University Hospitals NHS Foundation Trust. BEST2 was funded by Cancer Research UK (12088 and 16893). M.G. acknowledges support from an Enrichment Fellowship from the Alan Turing Institute. M.C.O. acknowledges support from a Borysiewicz Fellowship from the University of Cambridge and a Junior Research Fellowship from Trinity College, Cambridge. F.M. is a Royal Society Wolfson Research Merit Award holder. We thank M. Schneider, R. Drews, P. Martinez-Gonzalez and T. Whitmarsh for valuable input on this work. The authors thank the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014) and the Experimental Cancer Medicine Centre for their support and for providing the infrastructure for the research procedures in Cambridge. The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care. In addition, we thank the Human Research Tissue Bank, which is supported by the UK National Institute for Health Research Cambridge Biomedical Research Centre, from Addenbrookes Hospital. Finally, we thank the BEST2 trial team, the Histopathology core facility at the Cancer Research UK Cambridge Institute and Pathognomics Ltd. for their support.

Author information

These authors contributed equally: Rebecca C. Fitzgerald, Florian Markowetz.

Authors and Affiliations

Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
Marcel Gehrung, Mireia Crispin-Ortuzar, Adam G. Berman & Florian Markowetz
The Alan Turing Institute, London, UK
Marcel Gehrung
MRC Cancer Unit, University of Cambridge, Cambridge, UK
Maria O’Donovan & Rebecca C. Fitzgerald
Department of Pathology, Cambridge University Hospitals NHS Trust, Cambridge, UK
Maria O’Donovan

Authors

Marcel Gehrung
View author publications
You can also search for this author in PubMed Google Scholar
Mireia Crispin-Ortuzar
View author publications
You can also search for this author in PubMed Google Scholar
Adam G. Berman
View author publications
You can also search for this author in PubMed Google Scholar
Maria O’Donovan
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca C. Fitzgerald
View author publications
You can also search for this author in PubMed Google Scholar
Florian Markowetz
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.G. conceived and led the analysis. M.C.O. and A.B. contributed to the analysis. M.G. and A.B. wrote the code for analysis. M.O. and R.C.F. were involved in the collection and labeling of the data. R.C.F. conceived the study. R.C.F. and F.M. directed the project. M.G. and F.M. wrote the manuscript with the assistance and feedback of all other co-authors.

Corresponding authors

Correspondence to Rebecca C. Fitzgerald or Florian Markowetz.

Ethics declarations

Competing interests

The Cytosponge device technology and the associated TFF3 biomarker are licensed to Covidien GI solutions (now owned by Medtronic) by the Medical Research Council. M.G., M.C.O. and F.M. are named inventors on a patent pertaining to technology applied in this work. R.C.F. and M.O. are named inventors on patents pertaining to the Cytosponge and associated technology. M.G., M.O. and R.C.F. are shareholders of Cyted Ltd., a company working on early detection technology.

Additional information

Peer review information Nature Medicine thanks Marnix Jansen, Nasir Rajpoot and Pratik Shah for their contribution to the peer review of this work. Javier Carmona was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Differential increase of training partition size for ResNet-18.

Training subset refers to the relative proportion of the training partition used in the model training phase. Development subset refers to the relative proportion of the training partition used in the model development phase. The peak development weighted recall (a) and precision (b) correspond to the best performing cohort for each training run. The size of the development set was fixed at 15 patients. For each patient, an average of 3,500 tiles was used. For both H&E and TFF3 no substantial increase in performance metrics could be observed after a training subset size of 50 patients. Individual Cytosponge H&E sections are already highly heterogeneous, which means that the value gained by increasing the size of the training dataset is limited. We opted for retaining all the annotated data in the training set, to maximize the chances of capturing the whole spectrum of data variability and therefore the robustness of the model. H&E benefited more from an increased number of patients than the TFF3 model. This difference is associated with the increased complexity of detecting different tissue morphologies on H&E vs. brown goblet cells on TFF3. In TFF3 slides regions were extensively annotated by pathologists and this ground truth served as a comparator for the recall provided in both figures.

Extended Data Fig. 2 Comparison of pathologist landmarks with saliency maps extracted from VGG-16 architectures.

Additional examples of saliency maps for Hematoxylin & Eosin stain (squamous cells and columnar epithelium) and Trefoil factor 3 (positive goblet cells). Landmarks selected by an experienced pathologist are shown as overlays with red borders on pathology tile images. For all classes, there was visual agreement between highlighted areas by the pathologist and saliency map activations.

Extended Data Fig. 3 Determination of probability thresholds in order to obtain number of tiles.

Both plots show the AUC-ROC for individual probability thresholds (after softmax) which are used to decide whether a tile falls into the relevant class. a, AUC-ROC for quality control (QC) ground truth determined by the pathologist compared with number of tiles containing columnar epithelium at individual probability thresholds. b, AUC-ROC for diagnosis ground truth determined by the endoscopy (with confirmed IM on pathology) compared with number of tiles containing positive goblet cells at individual probability thresholds.

Extended Data Fig. 4 Performance of all deep learning architectures on the calibration cohort.

(a) ROC analysis of number of tiles containing columnnar epithelium on H\&E compared with pathologist ground truth from Cytosponge (b) ROC analysis of number of tiles containing positive goblet cells on TFF3 compared with pathologist ground truth from Cytosponge (c) ROC analysis of number of tiles containing positive goblet cells on TFF3 compared with endoscopy (with confirmed IM) ground truth. A weak AUC dependency on architecture complexity can be observed.

Extended Data Fig. 5 Performance of all deep learning architectures on the internal validation cohort.

a, ROC analysis of number of tiles containing columnnar epithelium on H&E compared with pathologist ground truth from Cytosponge (b) ROC analysis of number of tiles containing positive goblet cells on TFF3 compared with pathologist ground truth from Cytosponge (c) ROC analysis of number of tiles containing positive goblet cells on TFF3 compared with endoscopy (with confirmed IM) ground truth. As in the calibration cohort, a weak AUC dependency on architecture complexity can be observed.

Extended Data Fig. 6 Application of quality control and diagnostic confidence class scheme to calibration cohort.

The lines indicate operating points chosen by three different expert observers. a, Quality ground truth by pathologist from Cytosponge (top) compared with number of detected columnar epithelium (CE) tiles on H\&E detected by VGG-16 (bottom). For the first operating point, E#2 and E#3 agreed whereas E#1 selected a higher cut-off. Majority voting resulted in the lower cut-off being chosen. For the second operating point, all thee observers (E#1, E#2, and E#3) agreed on the same threshold. The line drawn by E#1 for the second operating point effectively resulted in the same operating point as E#2 and E#3. b, Diagnosis ground truth by pathologist from Cytosponge (top), Endoscopy (with confirmed IM on biopsy) ground truth (middle) compared with number of detected TFF3-positive tiles on TFF3 detected by ResNet-18 (bottom). For both the first and second operating points E#1, E#2, and E#3 agreed. The line drawn by E#3 for the second operating point effectively resulted in the same operating point as E#1 and E#2.

Extended Data Fig. 7 Performance of semi-automated, triage-driven model on external validation cohort.

a, Cumulative substitution scheme starting with fully manual review, followed by substitution with automated review of class no. 1, then 1 and 2, etc. b, Cumulative substitution scheme starting with fully manual review, followed by substitution with automated review of class no. 8, then 8 and 7, etc.

Supplementary information

Supplementary Information

Supplementary Tables 1–6.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gehrung, M., Crispin-Ortuzar, M., Berman, A.G. et al. Triage-driven diagnosis of Barrett’s esophagus for early detection of esophageal adenocarcinoma using deep learning. Nat Med 27, 833–841 (2021). https://doi.org/10.1038/s41591-021-01287-9

Download citation

Received: 24 April 2020
Accepted: 17 February 2021
Published: 15 April 2021
Issue Date: May 2021
DOI: https://doi.org/10.1038/s41591-021-01287-9

This article is cited by

All models are wrong and yours are useless: making clinical prediction models impactful for patients
- Florian Markowetz
npj Precision Oncology (2024)
Digital pathology-based artificial intelligence models for differential diagnosis and prognosis of sporadic odontogenic keratocysts
- Xinjia Cai
- Heyu Zhang
- Tiejun Li
International Journal of Oral Science (2024)
Enabling large-scale screening of Barrett’s esophagus using weakly supervised deep learning in histopathology
- Kenza Bouzid
- Harshita Sharma
- Javier Alvarez-Valle
Nature Communications (2024)
DeepCraftFuse: visual and deeply-learnable features work better together for esophageal cancer detection in patients with Barrett’s esophagus
- Luis A. Souza
- André G. C. Pacheco
- João Paulo Papa
Neural Computing and Applications (2024)
Artificial intelligence in the treatment of cancer: Changing patterns, constraints, and prospects
- Mohammad Ali
- Shahid Ud Din Wani
- Seema Mehdi
Health and Technology (2024)