In medicine, images (eg, clinical or dermoscopic images and imaging scans) are the most commonly used form of data for AI development. Convolutional neural networks (CNN), a subtype of ANN, are frequently used for this purpose. These networks use a hierarchical neural network architecture, similar to the visual cortex, that allows for composition of complex features (eg, shapes) from simpler features (eg, image intensities), which leads to more efficient data processing.10-12
In recent years, CNNs have been applied in a number of image-based medical fields, including radiology, dermatology, and pathology. Initially, studies were largely led by computer scientists trying to match clinician performance in detection of disease categories. However, there has been a shift toward more physicians getting involved, which has motivated development of large curated (ie, expert-labeled) and standardized clinical data sets in training the CNN. Although training on quality-controlled data is a work in progress across medical disciplines, it has led to improved machine performance.11,12
Recent Advances in AI
In recent years, the number of studies covering CNN in diagnosis has increased exponentially in several medical specialties. The goal is to improve software to close the gap between experts and the machine in live clinical settings. The current literature focuses on a comparison of experts with the machine in simulated settings; prospective clinical trials are still lagging in the real world.9,11,13
We look at radiology to explore recent advances in AI diagnosis for 3 reasons: (1) radiology has the largest repository of digital data (using a picture archiving and communication system) among medical specialties; (2) radiology has well-defined, image-acquisition protocols in its clinical workflow14; and (3) gray-scale images are easier to standardize because they are impervious to environmental variables that are difficult to control (eg, recent sun exposure, rosacea flare, lighting, sweating). These are some of the reasons we think radiology is, and will be, ahead in training AI algorithms and integrating them into clinical practice. However, even radiology AI studies have limitations, including a lack of prospective, real-world clinical setting, generalizable studies, and a lack of large standardized available databases for training algorithms.
Narrowing our discussion to studies of mammography—given the repetitive nature and binary output of this modality, which has made it one of the first targets of automation in diagnostic imaging1,2,5,13—AI-based CAD in mammography, much like its predecessor feature-based CAD, has shown promising results in artificial settings. Five key mammography CNN studies have reported a wide range of diagnostic accuracy (area under the curve, 69.2 to 97.8 [mean, 88.2]) compared to radiologists.15-19
In the most recent study (2019), Rodriguez-Ruiz et al15 compared machines and a cohort of 101 radiologists, in which AI showed performance comparability. However, results in this artificial setting were not followed up with prospective analysis of the technology in a clinical setting. First-generation, feature-based CADs in mammography also showed expert-level performance in artificial settings, but the technology became extinct because these results were not generalizable to real-world in prospective trials. To our knowledge, a limitation of radiology AI is that all current CNNs have not yet been tested in a live clinical setting.13-19
The second limitation of radiology AI is lack of standardization, which also applies to mammography, despite this subset having the largest and oldest publicly available data set. In a recent review of 23 studies on AI-based algorithms in mammography (2010-2019), clinicians point to one of the biggest flaws: the use of small, nonstandardized, and skewed public databases (often enriched for malignancy) as training algorithms.13
Standardization refers to quality-control measures in acquisition, processing, and image labeling that need to be met for images to be included in the training data set. At present, large stores of radiologic data that are standardized within each institution are not publicly accessible through a unified reference platform. Lack of large standardized training data sets leads to selection bias and increases the risk for overfitting, which occurs when algorithm models incorporate background noise in the data into its prediction scheme. Overfitting has been noted in several AI-based studies in mammography,13 which limits the generalizability of algorithm performance in the real-world setting.