Attention-based Saliency Maps for Pneumothorax Classification using Vision Transformers
2026-01-24https://doi.org/10.1148/atlas.1769272033492
51
Overview
Schema Version
https://atlas.rsna.org/schemas/2025-11/model.json
Name
Attention-based Saliency Maps for Pneumothorax Classification using Vision Transformers
Link
https://doi.org/10.1148/ryai.220187
Indexing
Keywords: pneumothorax, chest radiograph, vision transformer, ViT, transformer multimodal explainability, TMME, Grad-CAM, saliency map, explainability, interpretability
Content: CH
RadLex: RID34539, RID10321, RID28493, RID35261, RID35057, RID5352
SNOMED: 36118008, 8186001, 60046008, 46621007
Author(s)
Alessandro Wollek
Robert Graf
Saša Čečatka
Nicola Fink
Theresa Willem
Bastian O. Sabel
Tobias Lasser
Organization(s)
Munich Institute of Biomedical Engineering and Department of Informatics, Technical University of Munich
Department of Radiology, University Hospital LMU
Munich School of Technology in Society, Technical University of Munich
Version
1.0
Contact
A.W. (email: ed.mut@kellow.ordnassela)
Funding
Supported by the German Federal Ministry of Health’s Program for Digital Innovations for the Improvement of Patient-Centered Care in Health Care (grant agreement no. 2520DAT920).
Ethical review
Retrospective study using public datasets; institutional review board review and patient informed consent were not required.
Date
Updated: 2023-02-16
Created: 2022-03-01
References
[1] Wollek A, Graf R, Čečatka S, Fink N, Willem T, Sabel BO, Lasser T. "Attention-based Saliency Maps Improve Interpretability of Pneumothorax Classification". Radiology: Artificial Intelligence. 2023;5(2):e220187.. . doi:10.1148/ryai.220187. PMID: 37035429. PMCID: PMC10077084.
Model
Architecture
Vision Transformer (deit_base_distilled_patch16_224) fine-tuned for multi-label chest radiograph classification; baseline comparison with DenseNet-121. Attention-based saliency via Transformer Multimodal Explainability (TMME) and Grad-CAM for comparison.
Clinical benefit
Assists pneumothorax detection on chest radiographs by providing classification scores and more interpretable attention-based saliency maps that can reveal model focus and potential biases.
Clinical workflow phase
Clinical decision support systems; reader aid for image interpretation and model explainability.
Decision threshold
Sensitivity and specificity reported at the threshold maximizing F1 score.
Degree of automation
Decision support; provides probabilities and saliency maps to assist radiologists; does not automate final diagnosis.
Indications for use
Exploratory research use on adult chest radiographs to classify pneumothorax (and other thoracic findings: cardiomegaly, consolidation, pleural effusion, atelectasis) and provide attention-based saliency maps; intended for radiology reading environments.
Input
Frontal chest radiographs from public datasets, preprocessed and resized to 224×224 pixels; normalization per ImageNet mean/SD; data augmentation during training.
Instructions
Model outputs class probabilities and saliency maps; attention-based TMME saliency maps were more useful and consistent than Grad-CAM in reader study and quantitative metrics.
Limitations
Bias observed toward confounders such as chest tubes influencing pneumothorax predictions; limited training data pooled from public datasets; images resized to 224×224 may reduce detectability of subtle pneumothoraces; limited user study size; segmentation masks largely available only for pneumothorax, limiting precise EHR evaluation for other classes.
Output
CDEs: RDE2439, RDE2294, RDE2927, RDE2295, RDE2296
Description: Per-image probabilities for pneumothorax, cardiomegaly, consolidation, pleural effusion, and atelectasis; accompanying saliency maps (attention-based TMME and Grad-CAM) highlighting image regions contributing to the prediction.
Recommendation
Attention-based TMME saliency maps are recommended over Grad-CAM for interpretability in this context; saliency maps can help identify model biases (e.g., chest tube reliance).
Regulatory information
Authorization status: Not a regulated medical device; exploratory research study.
Reproducibility
Training details reported: ViT fine-tuned with SGD (lr=0.001), batch size 64, 500 epochs with early stopping; images normalized to ImageNet statistics; data augmentations (random affine, translation, scale, horizontal flip); DenseNet baseline fine-tuned with SGD (lr=0.001), batch size 32, 30 epochs with early stopping.
Use
Intended: Decision support, Detection and diagnosis
Out-of-scope: Detection and diagnosis
Excluded: Detection and diagnosis
User
Intended: Radiologist, Researcher
Out-of-scope: Layperson
Excluded: Layperson