Attention-based Saliency Maps for Pneumothorax Classification using Vision Transformers
model2026-01-24https://doi.org/10.1148/atlas.1769272033492
51

Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/model.json

Name

Attention-based Saliency Maps for Pneumothorax Classification using Vision Transformers

Link

https://doi.org/10.1148/ryai.220187

Indexing

Keywords: pneumothorax, chest radiograph, vision transformer, ViT, transformer multimodal explainability, TMME, Grad-CAM, saliency map, explainability, interpretability
Content: CH
RadLex: RID34539, RID10321, RID28493, RID35261, RID35057, RID5352
SNOMED: 36118008, 8186001, 60046008, 46621007

Author(s)

Alessandro Wollek
Robert Graf
Saša Čečatka
Nicola Fink
Theresa Willem
Bastian O. Sabel
Tobias Lasser

Organization(s)

Munich Institute of Biomedical Engineering and Department of Informatics, Technical University of Munich
Department of Radiology, University Hospital LMU
Munich School of Technology in Society, Technical University of Munich

Version

1.0

Contact

A.W. (email: ed.mut@kellow.ordnassela)

Funding

Supported by the German Federal Ministry of Health’s Program for Digital Innovations for the Improvement of Patient-Centered Care in Health Care (grant agreement no. 2520DAT920).

Ethical review

Retrospective study using public datasets; institutional review board review and patient informed consent were not required.

Date

Updated: 2023-02-16
Created: 2022-03-01

References

[1] Wollek A, Graf R, Čečatka S, Fink N, Willem T, Sabel BO, Lasser T. "Attention-based Saliency Maps Improve Interpretability of Pneumothorax Classification". Radiology: Artificial Intelligence. 2023;5(2):e220187.. . doi:10.1148/ryai.220187. PMID: 37035429. PMCID: PMC10077084.

Model

Architecture

Vision Transformer (deit_base_distilled_patch16_224) fine-tuned for multi-label chest radiograph classification; baseline comparison with DenseNet-121. Attention-based saliency via Transformer Multimodal Explainability (TMME) and Grad-CAM for comparison.

Clinical benefit

Assists pneumothorax detection on chest radiographs by providing classification scores and more interpretable attention-based saliency maps that can reveal model focus and potential biases.

Clinical workflow phase

Clinical decision support systems; reader aid for image interpretation and model explainability.

Decision threshold

Sensitivity and specificity reported at the threshold maximizing F1 score.

Degree of automation

Decision support; provides probabilities and saliency maps to assist radiologists; does not automate final diagnosis.

Indications for use

Exploratory research use on adult chest radiographs to classify pneumothorax (and other thoracic findings: cardiomegaly, consolidation, pleural effusion, atelectasis) and provide attention-based saliency maps; intended for radiology reading environments.

Input

Frontal chest radiographs from public datasets, preprocessed and resized to 224×224 pixels; normalization per ImageNet mean/SD; data augmentation during training.

Instructions

Model outputs class probabilities and saliency maps; attention-based TMME saliency maps were more useful and consistent than Grad-CAM in reader study and quantitative metrics.

Limitations

Bias observed toward confounders such as chest tubes influencing pneumothorax predictions; limited training data pooled from public datasets; images resized to 224×224 may reduce detectability of subtle pneumothoraces; limited user study size; segmentation masks largely available only for pneumothorax, limiting precise EHR evaluation for other classes.

Output

CDEs: RDE2439, RDE2294, RDE2927, RDE2295, RDE2296
Description: Per-image probabilities for pneumothorax, cardiomegaly, consolidation, pleural effusion, and atelectasis; accompanying saliency maps (attention-based TMME and Grad-CAM) highlighting image regions contributing to the prediction.

Recommendation

Attention-based TMME saliency maps are recommended over Grad-CAM for interpretability in this context; saliency maps can help identify model biases (e.g., chest tube reliance).

Regulatory information

Authorization status: Not a regulated medical device; exploratory research study.

Reproducibility

Training details reported: ViT fine-tuned with SGD (lr=0.001), batch size 64, 500 epochs with early stopping; images normalized to ImageNet statistics; data augmentations (random affine, translation, scale, horizontal flip); DenseNet baseline fine-tuned with SGD (lr=0.001), batch size 32, 30 epochs with early stopping.

Use

Intended: Decision support, Detection and diagnosis
Out-of-scope: Detection and diagnosis
Excluded: Detection and diagnosis

User

Intended: Radiologist, Researcher
Out-of-scope: Layperson
Excluded: Layperson