Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/model.json

Name

Attention-based Saliency Maps for Pneumothorax Classification using Vision Transformers

Link

https://doi.org/10.1148/ryai.220187

Indexing

Keywords: pneumothorax, chest radiograph, vision transformer, ViT, transformer multimodal explainability, TMME, Grad-CAM, saliency map, explainability, interpretability

Content: CH

RadLex: RID34539, RID10321, RID28493, RID35261, RID35057, RID5352

SNOMED: 36118008, 8186001, 60046008, 46621007

Author(s)

Alessandro Wollek

Robert Graf

Saša Čečatka

Nicola Fink

Theresa Willem

Bastian O. Sabel

Tobias Lasser

Organization(s)

Munich Institute of Biomedical Engineering and Department of Informatics, Technical University of Munich

Department of Radiology, University Hospital LMU

Munich School of Technology in Society, Technical University of Munich

Version

1.0

Contact

A.W. (email: ed.mut@kellow.ordnassela)

Funding

Supported by the German Federal Ministry of Health’s Program for Digital Innovations for the Improvement of Patient-Centered Care in Health Care (grant agreement no. 2520DAT920).

Ethical review

Retrospective study using public datasets; institutional review board review and patient informed consent were not required.

Date

Updated: 2023-02-16

Created: 2022-03-01

References

[1] Wollek A, Graf R, Čečatka S, Fink N, Willem T, Sabel BO, Lasser T. "Attention-based Saliency Maps Improve Interpretability of Pneumothorax Classification". Radiology: Artificial Intelligence. 2023;5(2):e220187.. . doi:10.1148/ryai.220187. PMID: 37035429. PMCID: PMC10077084.

Model

Architecture

Vision Transformer (deit_base_distilled_patch16_224) fine-tuned for multi-label chest radiograph classification; baseline comparison with DenseNet-121. Attention-based saliency via Transformer Multimodal Explainability (TMME) and Grad-CAM for comparison.

Clinical benefit

Assists pneumothorax detection on chest radiographs by providing classification scores and more interpretable attention-based saliency maps that can reveal model focus and potential biases.

Clinical workflow phase

Clinical decision support systems; reader aid for image interpretation and model explainability.

Decision threshold

Sensitivity and specificity reported at the threshold maximizing F1 score.

Degree of automation

Decision support; provides probabilities and saliency maps to assist radiologists; does not automate final diagnosis.

Indications for use

Exploratory research use on adult chest radiographs to classify pneumothorax (and other thoracic findings: cardiomegaly, consolidation, pleural effusion, atelectasis) and provide attention-based saliency maps; intended for radiology reading environments.

Input

Frontal chest radiographs from public datasets, preprocessed and resized to 224×224 pixels; normalization per ImageNet mean/SD; data augmentation during training.

Instructions

Model outputs class probabilities and saliency maps; attention-based TMME saliency maps were more useful and consistent than Grad-CAM in reader study and quantitative metrics.

Limitations

Bias observed toward confounders such as chest tubes influencing pneumothorax predictions; limited training data pooled from public datasets; images resized to 224×224 may reduce detectability of subtle pneumothoraces; limited user study size; segmentation masks largely available only for pneumothorax, limiting precise EHR evaluation for other classes.

Output

CDEs: RDE2439, RDE2294, RDE2927, RDE2295, RDE2296

Description: Per-image probabilities for pneumothorax, cardiomegaly, consolidation, pleural effusion, and atelectasis; accompanying saliency maps (attention-based TMME and Grad-CAM) highlighting image regions contributing to the prediction.

Recommendation

Attention-based TMME saliency maps are recommended over Grad-CAM for interpretability in this context; saliency maps can help identify model biases (e.g., chest tube reliance).

Regulatory information

Authorization status: Not a regulated medical device; exploratory research study.

Reproducibility

Training details reported: ViT fine-tuned with SGD (lr=0.001), batch size 64, 500 epochs with early stopping; images normalized to ImageNet statistics; data augmentations (random affine, translation, scale, horizontal flip); DenseNet baseline fine-tuned with SGD (lr=0.001), batch size 32, 30 epochs with early stopping.

Use

Intended: Decision support, Detection and diagnosis

Out-of-scope: Detection and diagnosis

Excluded: Detection and diagnosis

User

Intended: Radiologist, Researcher

Out-of-scope: Layperson

Excluded: Layperson