Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/model.json

Name

Hurdles to Artificial Intelligence Deployment: Noise in Schemas and “Gold” Labels

Link

https://dx.doi.org/10.1148/ryai.220056

Indexing

Keywords: Radiology AI, Dataset creation, Noise in datasets, Schema noise, Label noise, Chest radiograph, CheXpert, ChestX-ray14

Content: CH, RS, IN

RadLex: RID5557, RID43255, RID5352

SNOMED: 36118008, 233604007

Author(s)

Mohamed Abdalla

Benjamin Fine

Organization(s)

Institute for Better Health, Trillium Health Partners

University of Toronto

Centre for Information Technology, Department of Computer Science, University of Toronto

Department of Medical Imaging, University of Toronto

Version

1.0

License

Contact

Mohamed Abdalla: ude.otnorot.sc@asm

Funding

Acknowledgments note support for M.A. from a Vanier Scholarship (Government of Canada) and the Vector Institute; the AI Deployment and Evaluation Laboratory at Trillium Health Partners and B.F. supported by TD Bank, Canada’s Supercluster, and Trillium Health Partners Foundation.

Ethical review

No human research was performed; study was exempt from institutional review board review (as stated in Label Noise Demonstration section).

Date

Updated: 2023-03-01

Published: 2023-01-11

Created: 2022-03-24

Model

Clinical benefit

Not a deployable model; study characterizes schema and label noise affecting evaluation and deployment of chest radiograph AI classifiers.

Clinical workflow phase

Evaluation/validation considerations prior to deployment; guidance for dataset design and external testing.

Input

Chest radiograph annotations (CheXpert test set; eight annotators; 14 classes; 500 images); comparison of class schemas across CheXpert, ChestX-ray14, and one proprietary classifier.

Limitations

Limited to CheXpert test set; does not analyze underlying images; annotator uncertainty levels unavailable; image-only context (no clinical history/priors); focus on pairwise agreement summaries though Fleiss k also reported.

Output

CDEs: RDE1401, RDE1402

Description: Study outputs include quantified schema overlaps between datasets/classifiers and agreement metrics (percent agreement, Cohen’s kappa, Fleiss kappa) across simulated gold label sets.

Recommendation

Report schema justification and label noise metrics; consider ontology-anchored schemas; prefer soft labels where appropriate; standardize use cases (e.g., ACR DSI) to reduce noise.

Reproducibility

Analyses based on publicly described CheXpert test annotations and combinatorial re-sampling of annotator panels; supplemental tables/figures provide details.