External Validation of Deep Learning Algorithms for Radiologic Diagnosis: A Systematic Review
2026-01-24https://doi.org/10.1148/atlas.1769275925198
90
Overview
Schema Version
https://atlas.rsna.org/schemas/2025-11/model.json
Name
External Validation of Deep Learning Algorithms for Radiologic Diagnosis: A Systematic Review
Link
https://dx.doi.org/10.1148/ryai.210064
Indexing
Keywords: Systematic review, external validation, deep learning, radiologic diagnosis, generalizability, AUC, sensitivity, specificity
Content: IN, RS
Author(s)
Alice C. Yu
Bahram Mohajer
John Eng
Organization(s)
Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins University School of Medicine
Version
1.0
Funding
Authors declared no funding for this work.
Ethical review
Systematic review; exempt from institutional review board review (stated).
Date
Published: 2022-05-04
Created: 2021-02-25
Model
Architecture
Various deep convolutional neural networks; multiple architecture types represented with ResNet being the most common among included studies.
Clinical benefit
Assesses generalizability of published DL diagnostic imaging algorithms by comparing internal vs external performance.
Clinical workflow phase
Research synthesis and evidence assessment; not a deployable model.
Input
Radiologic images across multiple modalities and body parts from included studies.
Limitations
Substantial heterogeneity across included studies (body parts, modalities, diseases, performance measures); limited methodological and clinical reporting; few studies adhered to reporting guidelines; external datasets typically smaller; inability to pool quantitatively; potential publication bias.
Output
CDEs: RDE1661, RDE1665, RDE1660
Description: Synthesis of external validation results: differences between development (internal) and external performance of DL algorithms for radiologic diagnosis.
Recommendation
Future studies should include external validation and improved reporting to assess generalizability; be cautious interpreting higher external performance due to potential data leakage or unrepresentative datasets.