Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/model.json

Name

Data-efficient Image Transformer (DeiT-B) for Radiograph Disease Classification (chest and upper extremity)

Link

https://github.com/zachmurphy1/transformer-radiographs

Indexing

Keywords: Computer-aided Diagnosis, Informatics, Neural Networks, Thorax, Skeletal-Appendicular, Convolutional Neural Network, Supervised Learning, Machine Learning, Deep Learning, Visual Transformer, DeiT, DenseNet121, MURA, Chest X-ray 14, CheXpert, PadChest, MIMIC-CXR

Content: CH, MK

RadLex: RID5350, RID5625, RID28494, RID34941, RID35057, RID5352, RID3875, RID34539, RID5335, RID35016, RID28525, RID39056, RID4867, RID5541, RID43412, RID34995, RID28536, RID10321, RID50088, RID28493, RID4673

Author(s)

Zachary R. Murphy, MD, MSE, MA

Kesavan Venkatesh

Jeremias Sulam, PhD

Paul H. Yi, MD

Organization(s)

Department of Anesthesiology, University of Michigan

Department of Biomedical Engineering, Johns Hopkins University Whiting School of Engineering

University of Maryland Medical Intelligent Imaging (UM2ii) Center, Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine

Version

1.0

Contact

Paul H. Yi (Corresponding author), email: ude.dnalyramu.mos@iyp

Funding

Authors declared no funding for this work.

Ethical review

HIPAA-compliant retrospective study using public datasets; institutional review board approval was not required.

Date

Updated: 2022-08-31

Published: 2022-09-21

Created: 2022-01-18

References

[1] Murphy ZR; Venkatesh K; Sulam J; Yi PH. "Visual Transformers and Convolutional Neural Networks for Disease Classification on Radiographs: A Comparison of Performance, Sample Efficiency, and Hidden Stratification". Radiology: Artificial Intelligence. 2022 Nov;4(6):e220012.. 2022-09-21. doi:10.1148/ryai.220012. PMID: 36523640. PMCID: PMC9745440.

Model

Architecture

Data-efficient Image Transformer (DeiT-B; Visual Transformer) pretrained on ImageNet; compared against CNNs (DenseNet121, ResNet152, EfficientNetB7).

Availability

Code and data splits available: https://github.com/zachmurphy1/transformer-radiographs

Clinical benefit

Research evaluation of automated disease/abnormality classification on chest and upper extremity radiographs; potential decision support utility.

Decision threshold

For reported accuracy/precision/recall/F1, threshold selected where true-positive rate equals 1 minus false-positive rate.

Indications for use

Classification of thoracic diseases on chest radiographs (14-label NIH ChestX-ray14 taxonomy) and detection of abnormal vs normal on upper extremity radiographs (MURA) in research settings.

Input

Projection radiography images: chest radiographs (NIH Chest X-ray 14; external: CheXpert, PadChest, MIMIC-CXR) and upper extremity radiographs (MURA).

Instructions

Models were fine-tuned from ImageNet-pretrained weights using PyTorch. Hyperparameters selected via grid search (128 combinations) including optimizer, dropout, initial learning rate, weight decay, and MixUp/CutMix augmentation. Learning rate decayed to 10% every 3 epochs; early stopping if no ≥1e-4 improvement in validation AUC for 5 epochs; random horizontal flip used. Final models trained on combined training+validation sets with optimal hyperparameters.

Limitations

Only two ViT and three CNN architectures evaluated; no external validation for MURA due to lack of comparable datasets; hyperparameter grid search did not include multiple runs per grid point; did not use knowledge distillation; sample-efficiency used hyperparameters identified on full datasets; susceptibility to hidden stratification requires further study.

Output

CDEs: RDE2896, RDE2722, RDE2134, RDE2149, RDE2439, RDE2893, RDE730, RDE2723, RDE2721, RDE2421, RDE2440, RDE1702.16, RDE339, RDE2724, RDE2238

Description: Multi-label classification probabilities for 14 thoracic findings on chest radiographs; binary abnormal/normal classification at study level for upper extremity radiographs (MURA).

Recommendation

For research and benchmarking; differences vs CNNs were small and may not be clinically significant; continued evaluation of ViTs recommended.

Reproducibility

Code and exact data splits are publicly available; models trained with specified early stopping and learning rate schedule; training performed with PyTorch on a single RTX 5000 GPU for final DeiT-B and DenseNet121 models.

Sustainability

Reported final training times: chest radiographs—DeiT-B 101.95 min (4 epochs; 25.49 min/epoch) vs DenseNet121 136.34 min (12 epochs; 11.36 min/epoch). MURA—DeiT-B 81.00 min (8 epochs; 10.13 min/epoch) vs DenseNet121 74.78 min (15 epochs; 4.99 min/epoch). Total GPU hours for hyperparameter search across primary models: 1789.

Use

Intended: Detection and diagnosis

User

Intended: Researcher