Abstract
Although deep learning algorithms show increasing promise for disease diagnosis, their use with rapid diagnostic tests performed in the field has not been extensively tested. Here we use deep learning to classify images of rapid human immunodeficiency virus (HIV) tests acquired in rural South Africa. Using newly developed image capture protocols with the Samsung SM-P585 tablet, 60 fieldworkers routinely collected images of HIV lateral flow tests. From a library of 11,374 images, deep learning algorithms were trained to classify tests as positive or negative. A pilot field study of the algorithms deployed as a mobile application demonstrated high levels of sensitivity (97.8%) and specificity (100%) compared with traditional visual interpretation by humans—experienced nurses and newly trained community health worker staff—and reduced the number of false positives and false negatives. Our findings lay the foundations for a new paradigm of deep learning–enabled diagnostics in low- and middle-income countries, termed REASSURED diagnostics1, an acronym for real-time connectivity, ease of specimen collection, affordable, sensitive, specific, user-friendly, rapid, equipment-free and deliverable. Such diagnostics have the potential to provide a platform for workforce training, quality assurance, decision support and mobile connectivity to inform disease control strategies, strengthen healthcare system efficiency and improve patient outcomes and outbreak management in emerging infections.
Similar content being viewed by others
Main
Rapid diagnostic tests (RDTs) save lives by informing case management, treatment, screening, disease control and elimination programs1. Lateral flow tests are among the most common RDTs, and hundreds of millions of these tests are performed worldwide each year. They have the potential to support near-person testing and decentralized management of a range of clinically important diseases (including malaria, HIV, syphilis, tuberculosis, influenza and noncommunicable diseases2), making it convenient for the end user and more affordable for health systems3. However, RDTs also present some issues, namely: errors in performing the test and interpreting the result4,5, quality control and lack of electronic data capture records of the test and results within health systems and surveillance. Many of these would be overcome with the real-time connectivity associated with REASSURED—the new criterion for an ideal test to reflect the importance of digital connectivity, coined by Peeling and coworkers1. Real-time connectivity involves the use of mobile-phone-connected RDTs. To date there have been few peer-reviewed studies or evaluations of the effectiveness of connected lateral flow tests at scale in populations in low- and middle-income countries.
Recent studies comparing the human interpretation of a HIV RDT to various gold standards, such as immunoblot6,7,8,9, enzyme immunoassay7,9,10,11, standardized test panels12 or different HIV RDTs13,14,15, have highlighted the common issue of subjective interpretation of the test result, which can lead to incorrect diagnosis. User error (especially in the case of weak reactive lines) and inadequate supervision of testers were identified as prime factors for misinterpretation16. In a study of differently experienced users interpreting results of HIV RDTs by looking at pictures of tests17, the accuracy of interpretation varied between 80 and 97%. This highlights the importance of experience in reading the test, as well as the subjectivity involved in reading a weak test line. Evidence also suggests that some fieldworkers struggle to interpret RDTs because of color blindness or short-sightedness18. Another study used photographs of HIV RDTs to quantify the subtle difference in tests with faint lines declared as true positive (TP) or false positive (FP) by a panel of human users19. While these were small-scale studies (n = 148 and 8, respectively), both highlighted the potential for photographs to improve quality control and decision making.
Deep learning algorithms, harnessing advances in large datasets and processing power, have recently shown the ability to exceed human performance in a plethora of visual tasks, including cell-based diagnostics20, interpretation of dermatologic21, ophthalmologic22 and radiographic images23, playing strategic games24 and in clinical medicine when used alongside appropriate guidelines25,26. While some emerging studies are looking at the application of deep learning to the interpretation of RDTs27,28, little is known about the ability of machine learning models to analyze field-acquired diagnostic test data, with concerns about the potential uniformity of images (for example, focus and tilt), harsh environmental factors such as lighting (for example, brightness and shadowing), and the variety of test types. In addition, there is a general lack of large real-world datasets available to successfully train deep learning classifiers, particularly from low- and middle-income countries. Recent advances in consumer electronic devices and deep learning have the potential to improve RDT quality assurance, staff training and connectivity, eventually supporting self-testing such as for HIV, which has been shown to be cost effective29, to appeal to young people30 and help reduce anxiety31.
Mobile health (mHealth) approaches, which marry RDTs with widely available mobile phones, take advantage of inbuilt sensors (for example, cameras) found in the phones, battery life, processing power, screens to display results and connectivity to send results to health databases. A recent field study has shown high levels of acceptability for a device sending HIV RDT results to online databases in real time32. An array of approaches have been piloted at small scales (n ≤ 283) and have shown good performance. However, most require a physical attachment such as a dongle (92–100% sensitivity, 97–100% specificity)33, a cradle34 or a portable reader (97–98% sensitivity)35, which increases cost and complexity, and these are typically reliant on simple image analysis software.
We explore the potential of deep learning algorithms to classify field-based RDT images as either positive or negative, focusing on HIV as an exemplar and piloting at scale in population ‘test beds’ in KwaZulu-Natal, typical of semi-rural settings in subSaharan Africa. Figure 1 shows the concept of our deep learning–enabled REASSURED diagnostic system to capture and interpret RDT results. Our approach first involved building a large image library of field-acquired test images as a training dataset, optimizing algorithms for high sensitivity and specificity and then deploying our classifier in a pilot study to assess its performance compared to traditional visual interpretation with a range of end users having varying levels of training.
Our standard image collection protocol (Fig. 2a) and library are described in Methods. In brief, 11,374 photographs of HIV RDT were captured by >60 fieldworkers using Samsung tablets (SM-P585, 8-megapixel camera, f1/9 with autofocus capability). Embedding of routine image collection into staff workflows was acceptable and feasible, and participant consent rate was 96%. We optimized our mHealth system for the two different HIV RDTs used in the study as part of routine household population surveillance. At first glance these RDTs appear similar, but have different features and numbers of test lines. To reduce the number of variables, we cropped the images around the region of interest (ROI) (Fig. 2b). Figure 2c shows a snapshot of the very diverse real-world field conditions where the images were captured (indoors, outdoors, in the shade and in direct sunlight).
Each image was labeled (Methods) according to the test result. Figure 3a details the number of images used to train classifiers to automatically read the result of HIV RDT images. The training process is described in Methods. To test the reproducibility of the process, we performed a tenfold cross-validation. As can be seen in Fig. 3b, the average sensitivity (95.9 ± 5.1% for type A, 98.7 ± 1.7% for type B) and specificity (99.0 ± 0.6% for type A, 99.8 ± 0.2% for type B) achieved across the ten folds was high and consistent for both types of HIV RDT. We therefore used all available data to train a final classifier for each type of test, which were then used in our field study. We investigated different common classification methods in use for clinical diagnostics (support vector machine36 (SVM) and convolutional neural networks (CNNs)), including three different CNN architectures (ResNet50 (ref. 37), MobileNetV2 (refs. 38,39) and MobileNetV3 (ref. 40), and found MobileNetV2 the most appropriate for our task, as can be seen in Fig. 3c.
We then conducted a field pilot study in rural South Africa to assess the performance of our mHealth system compared to visual interpretation, with a range of end users having varying levels of training (Methods). Five participants (two nurses and three newly trained community health workers) were each asked to give their interpretation of 40 HIV RDTs and to acquire a photograph of the RDT via the application. The plastic trays used to collect the image library were not used in this pilot study. All five participants (100%) were able to use our mHealth system without training, demonstrating its feasibility and acceptability. The photographs were then evaluated by an expert RDT interpreter, followed by our deep learning algorithms on a secure server. The results were not fed back to the study participants, to avoid confirmation bias. The performance results can be seen in Fig. 4.
When comparing the traditional visual interpretation of RDTs we observed varied levels of agreement between participants (61–100%) as can be seen in Fig. 4a. As expected, agreement between nurses (N1 and N2: 100 and 94.4% agreement for test types A and B, respectively) was greater than that between newly trained community health workers (C1, C2 and C3: 80–90 and 61.1–94.4% for test types A and B, respectively). Test type B showed the lower level of agreement. The low level of agreement between participants, and variability due to the type of HIV RDT, were of concern and highlighted the need for a more objective and consistent method to interpret HIV RDTs in the field. The confusion matrices in Fig. 4b demonstrate that our mHealth system reduced the number of errors in reading RDTs. The number of FP results from our mHealth system was found to be lower than that for traditional visual interpretation (0 compared to 11—the largest variation being observed for community health workers, 10), which translates as an improvement in specificity from 89 to 100% and an improvement in positive predictive value from 88.7 to 100%. Similarly, the number of false-negative (FN) results was just two in our mHealth system compared to four for traditional visual interpretation, which translates as an improvement in sensitivity from 95.6 to 97.8% and an improvement in negative predictive value from 95.7 to 98%. We plotted the ratio of our mHealth system performance to participant performance, for both sensitivity and specificity (Fig. 4c). All participants had a sensitivity index ≥1 for test type A; four out of five participants (N1, N2, C1 and C2) also had the same index for test type B, demonstrating that our mHealth system was more effective than those participants at reading positive test results. Our system was also more reliable at reading negative tests, because all participants had a specificity index ≥1 for both types of HIV RDT.
We acknowledge the following limitations of our study. First, our pilot study involved a relatively small number of participants (five) although we note this is comparable to other similar pilot studies reported in the field. In future, larger evaluation studies and clinical trials will be needed to assess the performance of the system, involving participants with a broader range of demographics including age, gender and different levels of digital literacy, as well as more expert readers. In addition, future studies would benefit from the inclusion of an invalid test classifier and different mobile phone types with varying camera specifications. Although images were analyzed on a secure server, future analysis could be on-device and thus overcome the need to upload images. We are also currently investigating an image segmentation approach using deep learning for the next iteration of the smartphone application.
To conclude, we have demonstrated the potential of deep learning for accurate classification of RDT images, with an overall performance of 98.9% accuracy, notably higher than traditional visual interpretation of study partipants (92.1%), comparable to reports of 80–97% accuracy17. Given that >100 million HIV tests are performed annually, even a small improvement in quality assurance could impact the lives of millions of people by reducing the risk of FP and FN. We believe our real-world image library is the first of its kind at this scale and we demonstrate that deep learning models can be deployed with mobile devices in the field, without the need for cradles, dongles or other attachments. It lays the foundation for deep learning–enabled REASSURED diagnostics, demonstrating that RDTs linked to a mobile device could standardize the capture and interpretation of test results for decision makers, reducing interpretation and transcription errors and workforce training. Our findings are based on HIV testing decision support for fieldworkers, nurses and community health workers, but in future could be applicable to decision support for self-testing. We focused on HIV as an exemplar, but the capacity of the classifier for adaptation to two different test types suggests that it is amenable to a large range of RDTs spanning both communicable and noncommunicable diseases. This platform could be utilized for workforce training, quality assurance, decision support and mobile connectivity to inform disease control strategies, strengthen healthcare system efficiency and improve patient outcomes and outbreak management. The ideal connected system would link connected RDTs to laboratory systems, whereby remote monitoring of RDT functionality and utilization could also allow health programs to optimize testing deployment and supply management to deliver sustainable development goals and ensure that no one is left behind. The real-time alerting capability of connected RDTs could also support public health outbreak management by mapping ‘hotspots’ for epidemics, including COVID-19, to protect populations.
Methods
Ethics
Ethical approval for the demographic surveillance study was granted by the Biomedical Research Ethics Committee of the University of KwaZulu-Natal, South Africa (no. BE435/17). Separate informed consent was required for the main household survey, the HIV sero-survey, the HIV point of care test and photographs of the HIV test.
Ethical approval for the collection of human blood samples used in the pilot study was granted by the Biomedical Research Ethics Committee of the University of KwaZulu-Natal, South Africa (no. BFCJ 11/18).
Recruitment of participants to the Africa Health Research Institute Population Implementation Platform for the image library
Eligible participants were all individuals aged 15 years and older and resident within the geographic boundaries of the Africa Health Research Institute (AHRI) population intervention program surveillance area (see ref. 41 for the cohort profile). Individuals who had died or outmigrated before the surveillance visit were no longer eligible. There were three contact attempts by the fieldworker team and a further three contact attempts by a tracking team before an individual was considered uncontactable. All individuals in the study gave informed consent. Specifically, all contacted eligible individuals who gave informed consent for this study were offered a rapid HIV test if they were not currently being administered antiretroviral therapy. For children under the age of 18 years, written consent for rapid HIV testing was obtained from the parent or guardian and assent from the participant.
HIV RDT image library collection
The original RDT images library was collected in rural South Africa by a team of 60 fieldworkers between 2017 and 2019. AHRI fieldworkers survey a population of 170,000 people in rural KwaZulu-Natal. Participants were visited at their home, those giving informed consent were tested for HIV using a combination of two HIV RDTs and, following further consent, a photograph of their two HIV RDTs was captured by the fieldworker on a tablet at the time of interpretation. Both HIV RDTs were used as part of routine demographic surveillance in AHRI. The test type continued to change during this study following recommendations by the South African government, exemplifying the need for robust systems in reading multiple test formats.
While the two HIV RDTs used in this study have their own instructions for use (see manufacturer’s instructions), they all generally follow the same principle of collecting a drop of blood from the participant’s fingertip, delivering that drop of blood to the sample pad and using a drop of chase buffer to facilitate sample flow through the length of the paper strip. The result (a combination of one or two lines appearing on the paper strip) is then read out after a period of 10–40 min, depending on the type of HIV RDT used.
For minimal disturbance of workflow, a plastic tray designed to hold both HIV RDTs was given to each fieldworker (Fig. 2a). This ensured that fieldworkers were required to capture only one image per participant. The tasks of separating the two HIV RDTs and isolating the ROI used to train the classifier were conducted further down the line as part of data preprocessing.
A standard operating procedure (SOP) on how to capture the image was cocreated and optimized with the team of fieldworkers; a copy of the SOP can be found in Extended Data Fig. 1. The SOP was designed to minimize the impact of environmental factors, as well as to ensure a standard means of capturing images. All fieldworkers attended a 2-day initial training program during which the objectives of data collection and design of the plastic tray were clearly explained, and each fieldworker was personally trained and given feedback on how to capture valid photographs. A training protocol was also established to ensure that newly enrolled fieldworkers who did not attend the initial training session could also be trained to capture images for the project. Finally, picture quality assessment sessions were conducted to give the fieldworkers team feedback, and to ensure that most images were of sufficient quality for use in training the classifier.
All images were captured using Samsung tablets (SM-P585, 8-megapixel camera, f1/9 with autofocus capability) using the native Android camera application and stored on the device until the end of the day, when they were transferred to a secure database at AHRI. Our mHealth system allows the saving of only one picture per test and per participant to the tablet and uploading to the AHRI database. After anonymization (including stripping of geocoordinates from the image EXIF data), batches of 2,000–3,000 images were securely transferred to University College London team members on a quarterly basis and stored securely in a ‘data-safe haven’ managed by the university.
Levels of both feasibility (93%) and acceptability (98%) of the system used to capture HIV RDT images were high, according to a survey taken by fieldworkers involved in the study.
For the purposes of this study, an initial batch of 11,374 images were used. Because only very few invalid results were obtained from the field, it was decided, for the purposes of this proof-of-concept study, to focus on training the classifier to distinguish between positive and negative results. To optimize this task, the ROI around each HIV RDT was isolated and used to train the classifier.
Image labeling
All preprocessed images were labeled by a group of three RDT experts (99.2% agreement with fieldworkers’ labeling). Labeling is the process of sorting images into categories, which are then used to train the classifier. The categories chosen here correspond to the possibilities for the HIV RDT result—that is, positive and negative. We recognize that a third outcome, ‘invalid’, is also possible and needs to be considered when using the system to provide a confident diagnosis. However, the absence of invalid test results in our library of images collected by fieldworkers did not allow us to train the classifier on this third category in the present study. We therefore focused training on the two main categories (positive and negative), and are exploring other ways to incorporate the invalid outcome in our mHealth system. This could mean either using data augmentation techniques on the low numbers of invalid test results images, or adding a preprocessing step to detect the presence of a control line on the image before deciding to feed it (or not, in the case where the control line is absent) to the classifier.
Training library
The labeled images were divided into two subcategories corresponding to the HIV RDT type. The two types of test in our library are:
-
Type A: ABON HIV 1/2/O Tri-Line Human Immunodeficiency Virus Rapid Test Device (whole blood/serum/plasma) (ABON Biopharm (Hangzhou) Co., Ltd)
-
Type B: ADVANCED QUALITY ONE STEP Anti-HIV (1&2) Test (InTec PRODUCTS, INC.).
While two tests were administered per patient, in this study we treat each test individually since the tests are from different manufacturers and therefore could respond differently to the same blood sample. The collection system design also guaranteed that there was never more than one image of a given test per participant.
Image normalization
Before being used for training, each image was resized to the dimensions of the input layer then standardized. Standardization of the data was performed using equation (1) below, where xs is the standardized pixel value, xo the original pixel value and μ and σ are the mean and s.d. of all pixels in the image, respectively.
Cross-validation
Each dataset (one for each type of HIV RDT) was randomly divided into ten equal folds. Using the leave-one-out method, ten classifiers were trained using nine folds as the training set (further randomly divided into 80% training and 20% validation). To account for imbalanced datasets (roughly 13:1 negative:positive ratio), we forced every batch during training to contain 50% positive images and 50% negative images using random sampling. Each model was then optimized by creating a receiver operating characteristic curve using the validation set. This yielded an optimal threshold which was used to evaluate the model performance on the testing set (the remaining tenth fold). The deployment models were obtained by retraining using all the available data, for each type of HIV RDT. All training and evaluation were conducted using the scikit-learn and Tensorflow libraries in Python.
Comparison with established classification methods
The SVM was trained using preprocessed features extracted using the histogram of oriented gradients, with principal component analysis used to filter out less important features. The three CNNs (ResNet50, MobileNetV2 and MobileNetV3) were pretrained using the ImageNet dataset then retrained using our dataset. For all four methods, training and evaluation were conducted using the scikit-learn and Tensorflow libraries in Python.
Android application
We developed a smartphone/tablet Android application designed for end users to capture a picture of their HIV RDT at the time of reading of the test result. Together with end users, we optimized the design to maximize the simplicity of the process to make our mHealth system accessible to end users with a broad range of digital literacy. All that is required from the end user is to roughly align a semitransparent template of the HIV RDT with their HIV RDT and press a button to capture an image. Cropping around the ROI is then performed automatically in the background (using the pixel coordinates of the template overlay), as is the process of sending the ROI to our classifier and receiving our mHealth system result. For the purpose of this pilot study, participants were not made aware of our mHealth system’s interpretation of the test results, to avoid bias for their own interpretation. Screenshots of the application can be found in Extended Data Fig. 2.
Field pilot study protocol
The Android application was deployed in a field pilot study in KwaZulu-Natal, South Africa. Five participants were randomly selected from the staff at AHRI—two experienced nurses and three community health workers. Forty HIV RDTs (20 type A, 20 type B) were performed following the manufacter’s guidelines using discarded, anonymized human blood samples (ten positive, ten negative according to enzyme-linked immunosorbent assay). For each of the 40 HIV RDTs, every participant was asked to record their visual interpretation of the test result, then to use our mHealth system on a tablet to capture a photograph of the HIV RDT. The system consisted of our Android application (described above) installed on a single Samsung SM-P585 tablet, identical to those used by fieldworkers for data collection. Participants were not shown the automated interpretation of the test result provided by our mHealth system, to avoid confirmation bias. The field pilot study took place at the AHRI rural site in the heart of the community (Mtubatuba, KwaZulu-Natal) under lighting conditions identical to those under which the mHealth system is intended to be used. A short (10-min) demonstration on how to use the smartphone application was given to all participants, who were then left on their own to proceed with the task of reading the HIV RDTs and capturing images.
Field pilot study data analysis
The data analysis consisted of the comparison of three datasets:
-
1.
Traditional visual interpretation by study participants
-
2.
Independent expert interpretation of the images captured by study participants
-
3.
Automated machine learning interpretation by our classifier.
Traditional visual interpretaiton was recorded on the tablet by each study participant immediately after being shown the HIV RDTs. Only two of the 40 HIV RDTs (corresponding to ten images out of 200) had to be discarded from the analysis, because one participant took a photograph of the wrong HIV RDTs and it was therefore not possible to compare interpretation results across all five participants.
An independent RDT expert subsequently visually interpreted all 190 HIV RDT images; this expert had substantial experience conducting performance evaluations of lateral flow rapid tests for ocular and genital Chlamydia trachomatis in the Phillippines, the Gambia and Senegal. Visual interpretation was performed 1–5 h after sample addition. The independent expert certified that none of the HIV RDT results had changed during this time frame.
The automated machine learning interpretation by our classifiers was processed on our secured server. The results were compared to traditional visual interpretation (shown in the confusion matrices in Fig. 4) while the independent expert then analyzed the results using the performance indicators described below.
Performance indicators
The four indicators of performance investigated were sensitivity, specificity, positive predictive value (PPV) and negative predicitve value (NPV). For each image, the classifier produces an outcome that belongs to one of the four categories TP, true negative (TN), FP or FN. Whether the outcome is true or false depends on comparison with the gold standard chosen.
Sensitivity is the ability of the classifier to correctly detect a positive result by measuring the ratio \(\frac{{\mathrm{TP}}}{{\mathrm{TP + FN}}}\), while the specificity is the ratio \(\frac{{\mathrm{TN}}}{{\mathrm{TN + FP}}}\) and translates the ability of the classifier to correctly detect a negative result. PPV is the ratio \(\frac{{\mathrm{TP}}}{{\mathrm{TP + FP}}}\) and NPV is the ratio \(\frac{{\mathrm{TN}}}{{\mathrm{TN + FN}}}\). These indicate the proportions of positive and negative results, as determined by a diagnostic test, that are true positves and true negatives, respectively.
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
The datasets generated during and/or analyzed during the current study are available from the AHRI data repository https://doi.org/10.23664/AHRI.M-AFRICA.2019.V1.
Code availability
Custom code used in this study is available at the public repository https://xip.uclb.com/product/classify_ai.
References
Land, K. J., Boeras, D. I., Chen, X.-S., Ramsay, A. R. & Peeling, R. W. REASSURED diagnostics to inform disease control strategies, strengthen health systems and improve patient outcomes. Nat. Microbiol. 4, 46–54 (2019).
Second WHO Model List of Essential In Vitro Diagnostics (WHO, 2019).
Peeling, R. W. Diagnostics in a digital age: an opportunity to strengthen health systems and improve health outcomes. Int. Health 7, 384–389 (2015).
Ghani, A. C., Burgess, D. H., Reynolds, A. & Rousseau, C. Expanding the role of diagnostic and prognostic tools for infectious diseases in resource-poor settings. Nature 528, S50–S52 (2015).
Figueroa, C. et al. Reliability of HIV rapid diagnostic tests for self-testing compared with testing by health-care workers: a systematic review and meta-analysis. Lancet HIV 5, e277–e290 (2018).
Klarkowski, D. B. et al. The evaluation of a rapid in situ HIV confirmation test in a programme with a high failure rate of the WHO HIV two-test diagnostic algorithm. PLoS ONE 4, e4351 (2009).
Gray, R. H. et al. Limitations of rapid HIV-1 tests during screening for trials in Uganda: diagnostic test accuracy study. Brit. Med. J. 335, 188 (2007).
Martin, E. G., Salaru, G., Paul, S. M. & Cadoff, E. M. Use of a rapid HIV testing algorithm to improve linkage to care. J. Clin. Virol. 52, S11–S15 (2011).
Cham, F. et al. The World Health Organization African region external quality assessment scheme for anti-HIV serology. Afr. J. Lab. Med. 1, 39 (2012).
Galiwango, R. M. et al. Evaluation of current rapid HIV test algorithms in Rakai, Uganda. J. Virol. Methods 192, 25–27 (2013).
Louis, F. J. et al. Evaluation of an external quality assessment program for HIV testing in Haiti, 2006–2011. Am. J. Clin. Pathol. 140, 867–871 (2013).
Peck, R. B. et al. What should the ideal HIV self-test look like? A usability study of test prototypes in unsupervised HIV self-testing in Kenya, Malawi, and South Africa. AIDS Behav. 18, 422–432 (2014).
Baveewo, S. et al. Potential for false positive HIV test results with the serial rapid HIV testing algorithm. BMC Res. Notes 5, 154 (2012).
Crucitti, T., Taylor, D., Beelaert, G., Fransen, K. & Van Damme, L. Performance of a rapid and simple HIV testing algorithm in a multicenter phase III microbicide clinical trial. Clin. Vaccine Immunol. 18, 1480–1485 (2011).
Tegbaru, B. et al. Assessment of the implementation of HIV-rapid test kits at different levels of health institutions in Ethiopia. Ethiop. Med. J. 45, 293–299 (2007).
Johnson, C. C. et al. To err is human, to correct is public health: a systematic review examining poor quality testing and misdiagnosis of HIV status. J. Int. AIDS Soc. 20, 21755 (2017).
Learmonth, K. M. et al. Assessing proficiency of interpretation of rapid human immunodeficiency virus assays in nonlaboratory settings: ensuring quality of testing. J. Clin. Microbiol. 46, 1692–1697 (2008).
García, P. J. et al. Rapid syphilis tests as catalysts for health systems strengthening: a case study from Peru. PLoS ONE 8, e66905 (2013).
Sacks, R., Omodele-Lucien, A., Whitbread, N., Muir, D. & Smith, A. Rapid HIV testing using DetermineTM HIV 1/2 antibody tests: is there a difference between the visual appearance of true- and false-positive tests? Int. J. STD AIDS 23, 644–646 (2012).
Doan, M. & Carpenter, A. E. Leveraging machine vision in cell-based diagnostics to do more with less. Nat. Mater. 18, 414–418 (2019).
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).
De Fauw, J. et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24, 1342–1350 (2018).
Xu, Y. et al. Deep learning predicts lung cancer treatment response from serial medical imaging. Clin. Cancer Res. 25, 3266–3275 (2019).
Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 1140–1144 (2018).
Ascent of machine learning in medicine. Nat. Mater. 18, 407 (2019).
Ching, T. et al. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15, 20170387 (2018).
Zeng, N., Wang, Z., Zhang, H., Liu, W. & Alsaadi, F. E. Deep belief networks for quantitative analysis of a gold immunochromatographic strip. Cogn. Comput. 8, 684–692 (2016).
Carrio, A., Sampedro, C., Sanchez-Lopez, J. L., Pimienta, M. & Campoy, P. Automated low-cost smartphone-based lateral flow saliva test reader for drugs-of-abuse detection. Sensors (Basel) 15, 29569–29593 (2015).
Neuman, M. et al. The effectiveness and cost-effectiveness of community-based lay distribution of HIV self-tests in increasing uptake of HIV testing among adults in rural Malawi and rural and peri-urban Zambia: protocol for STAR (self-testing for Africa) cluster randomized evaluations. BMC Public Health 18, 1234 (2018).
Aicken, C. R. H. et al. Young people’s perceptions of smartphone-enabled self-testing and online care for sexually transmitted infections: qualitative interview study. BMC Public Health 16, 974 (2016).
Witzel, T. C., Weatherburn, P., Rodger, A. J., Bourne, A. H. & Burns, F. M. Risk, reassurance and routine: a qualitative study of narrative understandings of the potential for HIV self-testing among men who have sex with men in England. BMC Public Health 17, 491 (2017).
Nsabimana, A. P. et al. Bringing real-time geospatial precision to HIV surveillance through smartphones: feasibility study. JMIR Public Health Surveill. 4, e11203 (2018).
Laksanasopin, T. et al. A smartphone dongle for diagnosis of infectious diseases at the point of care. Sci. Transl. Med. 7, 273re1 (2015).
Mudanyali, O. et al. Integrated rapid-diagnostic-test reader platform on a cellphone. Lab Chip 12, 2678–2686 (2012).
Allan-Blitz, L.-T. et al. Field evaluation of a smartphone-based electronic reader of rapid dual HIV and syphilis point-of-care immunoassays. Sex. Transm. Infect. 94, 589–593 (2018).
Feng, S. et al. Immunochromatographic diagnostic test analysis using Google Glass. ACS Nano 8, 3069–3079 (2014).
Guan, Q. et al. Diagnose like a radiologist: attention guided convolutional neural network for thorax disease classification. Preprint at https://arxiv.org/abs/1801.09927 (2018).
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. MobileNetV2: inverted residuals and linear bottlenecks. Preprint at https://arxiv.org/abs/1801.04381 (2018).
Chaturvedi, S. S., Gupta, K. & Prasad, P. S. Skin lesion analyser: an efficient seven-way multi-class skin cancer classification using MobileNet. In International Conference on Advanced Machine Learning Technologies and Applications 165–176 (Springer, 2021).
Howard, A. et al. Searching for MobileNetV3. Preprint at https://arxiv.org/abs/1905.02244 (2019).
Gareta, D. et al. Cohort profile update: Africa Centre Demographic Information System (ACDIS) and population-based HIV survey. Int. J. Epidemiol. 50, 33 (2021).
Acknowledgements
We thank the community of the uMkhanyakude district and the study participants, as well as the AHRI team of fieldworkers and their supervisors. We thank A. Koza, Z. Thabethe, T. Madini, N. Okesola and S. Msane for their help with the pilot study; D. Gareta and J. Dreyer for IT support; V. Lampos and I. J. Cox for useful discussions; and E. Manning and J. McHugh for their help with editing and project management. This research was funded by the m-Africa Medical Research Council GCRF Global Infections Foundation Award (no. MR/P024378/1, to C.H., D.P., K.H., M.S., R.A.M. and V.T.) and is part of the EDCTP2 program supported by the European Union, i-sense Engineering and Physical Sciences Research Council Interdisciplinary Research Collaboration (EPSRC IRC) in Early Warning Sensing Systems for Infectious Disease (no. EP/K031953/1, to R.A.M., V.T., D.P., S.M., S.G., N.A. and M.S.), the i-sense: EPSRC IRC in Agile Early Warning Sensing Systems for Infectious Diseases and Antimicrobial Resistance (no. EP/R00529X/1, to R.A.M., V.T., D.P., S.G., N.A. and S.M.) and supported by the National Institute for Health Research University College London Hospitals Biomedical Research Centre (R.A.M. and S.M.). We thank the m-Africa and i-sense investigators and advisory boards. The AHRI is supported by core funding from the Wellcome Trust (core grant no. 082384/Z/07/Z, to T.S., D.P. and K.H.).
Author information
Authors and Affiliations
Contributions
V.T. and R.A.M. wrote the manuscript with input from coauthors. V.T., C.H., T. Mngomezulu, N.D. and T. Mhlongo collected field data. V.T. and S.M. developed the machine learning models with contributions from V.C., K.S., S.G. and R.A.M. V.T., N.A. and J.B. were involved in manual data preprocessing. K.H. oversaw data collection and management. T.S. and M.S. provided access to anonymized blood samples used in the pilot study. R.A.M., V.T., M.S., K.H. and D.P. conceived the overall project, designed the study and secured funding. R.A.M. was the principal investigator with overall responsibility for the i-sense EPSRC IRC and m-Africa programs, and was supervisor of the research associates (V.T., S.M. and N.A.) and students (V.C., K.S. and J.B.) involved in this study.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Medicine thanks Nicholas Durr and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Michael Basson was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Standard Operating Procedure for HIV RDT image collection.
Document used for training and distributed to all AHRI fieldworkers involved in data collection. Left-hand side: example of valid and invalid photographs. Right-hand side: step-by-step guidelines for capturing pictures of HIV RDTs.
Extended Data Fig. 2 Screenshots of the Android application, to illustrate the capture of the HIV RDT image at the time of reading the test result.
Images were captured sequentially from left to right. The end user is asked to align the test with the overlay on the screen, then continuously press the capture button for 3 seconds, after which the image is automatically captured and processed to extract the ROI. The 3 seconds press feature was implemented as a result of consultation with end users in the optimisation phase of the app development.
Supplementary information
Rights and permissions
About this article
Cite this article
Turbé, V., Herbst, C., Mngomezulu, T. et al. Deep learning of HIV field-based rapid tests. Nat Med 27, 1165–1170 (2021). https://doi.org/10.1038/s41591-021-01384-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41591-021-01384-9
This article is cited by
-
Rapid, label-free histopathological diagnosis of liver cancer based on Raman spectroscopy and deep learning
Nature Communications (2023)
-
Smartphone-based platforms implementing microfluidic detection with image-based artificial intelligence
Nature Communications (2023)
-
Sample-to-answer platform for the clinical evaluation of COVID-19 using a deep learning-assisted smartphone-based assay
Nature Communications (2023)
-
Computer vision meets microfluidics: a label-free method for high-throughput cell analysis
Microsystems & Nanoengineering (2023)
-
Rapidly adaptable automated interpretation of point-of-care COVID-19 diagnostics
Communications Medicine (2023)