Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/model.json

Name

Prediction–Saliency Correlation (PSC) for evaluating saliency methods in radiology AI

Link

https://dx.doi.org/10.1148/ryai.220221

Indexing

Keywords: saliency maps, explainability, trustworthiness, prediction-saliency correlation (PSC), CheXpert, DenseNet-121, ResNet-152, brain MRI, adversarial perturbation, SSIM, AUC

Content: CH, MR, NR, RS

RadLex: RID10312

Author(s)

Jiajin Zhang

Hanqing Chao

Giridhar Dasegowda

Ge Wang

Mannudeep K. Kalra

Pingkun Yan

Organization(s)

Rensselaer Polytechnic Institute, Department of Biomedical Engineering, Center for Biotechnology and Interdisciplinary Studies

Massachusetts General Hospital, Department of Radiology, Harvard Medical School

Version

1.0

License

Funding

Supported by National Science Foundation (2046708) and National Institutes of Health (R01EB032716).

Ethical review

Retrospective study using fully de-identified public datasets; IRB approval exempt and HIPAA compliant.

Date

Published: 2023-11-08

References

[1] Zhang J, Chao H, Dasegowda G, Wang G, Kalra MK, Yan P. "Revisiting the Trustworthiness of Saliency Methods in Radiology AI". Radiology: Artificial Intelligence. 2024;6(1):e220221. Published online 2023 Nov 8.. 2023-11-08. doi:10.1148/ryai.220221. PMID: 38166328. PMCID: PMC10831523.

Model

Architecture

Evaluation framework using standard CNN classifiers (DenseNet-121, ResNet-152, ResNet-50) and seven saliency methods (vanilla backpropagation, vanilla BP×image, Grad-CAM, guided-Grad-CAM, integrated gradients, SmoothGrad, XRAI). Includes assessment of a commercial black-box prototype.

Availability

Data used: CheXpert (https://stanfordmlgroup.github.io/competitions/chexpert) and a brain tumor MRI dataset (https://www.kaggle.com/datasets/masoudnickparvar/brain-tumor-mri-dataset).

Clinical benefit

Provides a quantitative method (PSC) to assess trustworthiness of saliency explanations in medical AI, potentially informing safer interpretation and deployment of AI outputs.

Clinical workflow phase

Research and validation; methodology to support evaluation of AI explainability prior to clinical deployment.

Degree of automation

Analytical/evaluation tool; does not automate clinical decisions; quantifies agreement between model prediction changes and saliency map changes.

Indications for use

Quantitative evaluation of the sensitivity and robustness of saliency-based explanations for AI models using medical images (e.g., chest radiographs and brain MRI) in research settings.

Input

Medical images (frontal chest radiographs from CheXpert; brain MRI images from a public dataset); model predictions and saliency maps from various saliency methods or a commercial prototype.

Instructions

Apply defined perturbation strategies to create prediction-changing (for sensitivity) or saliency-changing (for robustness) adversarial images; compute PSC as Pearson correlation between prediction and saliency changes (measured via Jensen–Shannon divergence); assess AUC and SSIM as reported.

Limitations

Study focused on attribution-based saliency methods; other explainability techniques (e.g., counterfactuals) not evaluated; generalization beyond evaluated datasets and models not established; commercial prototype evaluated without access to internal architecture; details such as image file formats/resolutions not specified.

Output

CDEs: RDE28, RDE17

Description: PSC coefficient (−1 to +1) quantifying correlation between prediction changes and saliency map changes; accompanying AUC and SSIM measurements for sensitivity and robustness analyses.

Recommendation

Use PSC to validate saliency methods before relying on them for interpreting medical AI outputs; exercise caution as commonly used saliency maps showed low sensitivity and robustness under subtle perturbations.

Regulatory information

Comment: Research method; no regulatory authorization applicable.

Reproducibility

Model-agnostic and generalizable across tested CNN architectures and two datasets; human reader study showed perturbed images are difficult to detect by experts.

Use

Intended: Detection and diagnosis

Out-of-scope: Decision support

Excluded: Decision support

User

Intended: Researcher

Out-of-scope: Patient

Excluded: Layperson