Prediction–Saliency Correlation (PSC) for evaluating saliency methods in radiology AI
model2025-11-30https://doi.org/10.1148/atlas.1764531889856
61

Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/model.json

Name

Prediction–Saliency Correlation (PSC) for evaluating saliency methods in radiology AI

Link

https://dx.doi.org/10.1148/ryai.220221

Indexing

Keywords: saliency maps, explainability, trustworthiness, prediction-saliency correlation (PSC), CheXpert, DenseNet-121, ResNet-152, brain MRI, adversarial perturbation, SSIM, AUC
Content: CH, MR, NR, RS
RadLex: RID10312

Author(s)

Jiajin Zhang
Hanqing Chao
Giridhar Dasegowda
Ge Wang
Mannudeep K. Kalra
Pingkun Yan

Organization(s)

Rensselaer Polytechnic Institute, Department of Biomedical Engineering, Center for Biotechnology and Interdisciplinary Studies
Massachusetts General Hospital, Department of Radiology, Harvard Medical School

Version

1.0

License

Text: © 2023 by the Radiological Society of North America, Inc.

Funding

Supported by National Science Foundation (2046708) and National Institutes of Health (R01EB032716).

Ethical review

Retrospective study using fully de-identified public datasets; IRB approval exempt and HIPAA compliant.

Date

Published: 2023-11-08

References

[1] Zhang J, Chao H, Dasegowda G, Wang G, Kalra MK, Yan P. "Revisiting the Trustworthiness of Saliency Methods in Radiology AI". Radiology: Artificial Intelligence. 2024;6(1):e220221. Published online 2023 Nov 8.. 2023-11-08. doi:10.1148/ryai.220221. PMID: 38166328. PMCID: PMC10831523.

Model

Architecture

Evaluation framework using standard CNN classifiers (DenseNet-121, ResNet-152, ResNet-50) and seven saliency methods (vanilla backpropagation, vanilla BP×image, Grad-CAM, guided-Grad-CAM, integrated gradients, SmoothGrad, XRAI). Includes assessment of a commercial black-box prototype.

Availability

Data used: CheXpert (https://stanfordmlgroup.github.io/competitions/chexpert) and a brain tumor MRI dataset (https://www.kaggle.com/datasets/masoudnickparvar/brain-tumor-mri-dataset).

Clinical benefit

Provides a quantitative method (PSC) to assess trustworthiness of saliency explanations in medical AI, potentially informing safer interpretation and deployment of AI outputs.

Clinical workflow phase

Research and validation; methodology to support evaluation of AI explainability prior to clinical deployment.

Degree of automation

Analytical/evaluation tool; does not automate clinical decisions; quantifies agreement between model prediction changes and saliency map changes.

Indications for use

Quantitative evaluation of the sensitivity and robustness of saliency-based explanations for AI models using medical images (e.g., chest radiographs and brain MRI) in research settings.

Input

Medical images (frontal chest radiographs from CheXpert; brain MRI images from a public dataset); model predictions and saliency maps from various saliency methods or a commercial prototype.

Instructions

Apply defined perturbation strategies to create prediction-changing (for sensitivity) or saliency-changing (for robustness) adversarial images; compute PSC as Pearson correlation between prediction and saliency changes (measured via Jensen–Shannon divergence); assess AUC and SSIM as reported.

Limitations

Study focused on attribution-based saliency methods; other explainability techniques (e.g., counterfactuals) not evaluated; generalization beyond evaluated datasets and models not established; commercial prototype evaluated without access to internal architecture; details such as image file formats/resolutions not specified.

Output

CDEs: RDE28, RDE17
Description: PSC coefficient (−1 to +1) quantifying correlation between prediction changes and saliency map changes; accompanying AUC and SSIM measurements for sensitivity and robustness analyses.

Recommendation

Use PSC to validate saliency methods before relying on them for interpreting medical AI outputs; exercise caution as commonly used saliency maps showed low sensitivity and robustness under subtle perturbations.

Regulatory information

Comment: Research method; no regulatory authorization applicable.

Reproducibility

Model-agnostic and generalizable across tested CNN architectures and two datasets; human reader study showed perturbed images are difficult to detect by experts.

Use

Intended: Detection and diagnosis
Out-of-scope: Decision support
Excluded: Decision support

User

Intended: Researcher
Out-of-scope: Patient
Excluded: Layperson