RSNA Pulmonary Embolism Detection (RSNA-PED) Dataset
RSNA-PED
2025-11-21https://doi.org/10.1148/atlas.1763165061264
114
Overview
Schema Version
https://atlas.rsna.org/schemas/2025-11/dataset.json
Name
RSNA Pulmonary Embolism Detection (RSNA-PED) Dataset
Link
https://mira.rsna.org/dataset/2
Indexing
Keywords: Pulmonary Embolism, CT Dataset, Expert Annotations, Thoracic Radiologists, Machine Learning Models, CT Pulmonary Angiography, Medical Imaging, Publicly Available Dataset, Kaggle Competition
Content: CH, CT
RadLex: RID4834
Author(s)
Errol Colak
Felipe C. Kitamura
Stephen B. Hobbs
Carol C. Wu
Matthew P. Lungren
Luciano M. Prevedello
Jayashree Kalpathy-Cramer
Robyn L. Ball
George Shih
Anouk Stein
Safwan S. Halabi
Emre Altinmakas
Meng Law
Parveen Kumar
Karam A. Manzalawi
Dennis Charles Nelson Rubio
Jacob W. Sechrist
Pauline Germaine
Eva Castro Lopez
Tomas Amerio
Pushpender Gupta
Manoj Jain
Fernando U. Kay
Cheng Ting Lin
Saugata Sen
Jonathan Wesley Revels
Carola C. Brussaard
John Mongan
Organization(s)
Radiological Society of North America
Society of Thoracic Radiology
Stanford University
Unity Health Toronto
Federal University of S��o Paulo
Alfred Health
Koç University
MD.ai
License
Text: RSNA MIRA DATASET RESEARCH USE AGREEMENT
URL: https://docs.google.com/document/d/1r8_0yW-5XqxSqhFzFq2fV6L4NxIQ6drF0sBjXXJevXU/edit?tab=t.0#heading=h.1iah9825ct0r
Contact
Errol Colak (Errol.Colak@unityhealth.to)
Ethical review
Each contributing site was responsible for obtaining institutional approval and adhering to local legal regulations and best practices.
Comments
This multinational dataset is, to our knowledge, the largest publicly available pulmonary embolism (PE) CT dataset that includes expert annotations from a large group of subspecialist thoracic radiologists (RSNA-STR Annotators and Dataset Curation Contributors). Data were collected from institutions in five different countries. A subset of the dataset was used for the 2020 RSNA AI Challenge hosted by Kaggle.
Date
Published: 2021-01-04
Dataset
Motivation
The intent of this dataset is to spur research and innovation in machine learning that will ultimately lead to improvements in the quality, efficiency, and availability of patient care worldwide.
Sampling
For the Kaggle challenge subset, 30% of negative examinations in both training and test sets were randomly chosen and excluded.
Partitioning scheme
The training portion was annotated by a single individual, while the test portion was triple read with consensus and adjudication by experienced cardiothoracic radiologists.
Relationships between instances
Only one series per study and one study per patient from a site were included in the final dataset.
Noise
Labels for 'Flow Artifact', 'QA-motion', and 'QA-contrast' were used to distinguish studies with image quality issues.
External data
A subset of the dataset was made available for the RSNA/STR Pulmonary Embolism Detection Challenge on Kaggle.
Re-identification
De-identification processes ensured the elimination of “private” DICOM metadata tags and that “burned-in” or pixel-level patient identifiers were not present.
Sensitive data
Patient identifiers were de-identified.