RSNA Pulmonary Embolism Detection (RSNA-PED) Dataset
RSNA-PED
dataset2025-11-21https://doi.org/10.1148/atlas.1763165061264
114

Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/dataset.json

Name

RSNA Pulmonary Embolism Detection (RSNA-PED) Dataset

Link

https://mira.rsna.org/dataset/2

Indexing

Keywords: Pulmonary Embolism, CT Dataset, Expert Annotations, Thoracic Radiologists, Machine Learning Models, CT Pulmonary Angiography, Medical Imaging, Publicly Available Dataset, Kaggle Competition
Content: CH, CT
RadLex: RID4834

Author(s)

Errol Colak
Felipe C. Kitamura
Stephen B. Hobbs
Carol C. Wu
Matthew P. Lungren
Luciano M. Prevedello
Jayashree Kalpathy-Cramer
Robyn L. Ball
George Shih
Anouk Stein
Safwan S. Halabi
Emre Altinmakas
Meng Law
Parveen Kumar
Karam A. Manzalawi
Dennis Charles Nelson Rubio
Jacob W. Sechrist
Pauline Germaine
Eva Castro Lopez
Tomas Amerio
Pushpender Gupta
Manoj Jain
Fernando U. Kay
Cheng Ting Lin
Saugata Sen
Jonathan Wesley Revels
Carola C. Brussaard
John Mongan

Organization(s)

Radiological Society of North America
Society of Thoracic Radiology
Stanford University
Unity Health Toronto
Federal University of S��o Paulo
Alfred Health
Koç University
MD.ai

License

Text: RSNA MIRA DATASET RESEARCH USE AGREEMENT
URL: https://docs.google.com/document/d/1r8_0yW-5XqxSqhFzFq2fV6L4NxIQ6drF0sBjXXJevXU/edit?tab=t.0#heading=h.1iah9825ct0r

Contact

Errol Colak (Errol.Colak@unityhealth.to)

Ethical review

Each contributing site was responsible for obtaining institutional approval and adhering to local legal regulations and best practices.

Comments

This multinational dataset is, to our knowledge, the largest publicly available pulmonary embolism (PE) CT dataset that includes expert annotations from a large group of subspecialist thoracic radiologists (RSNA-STR Annotators and Dataset Curation Contributors). Data were collected from institutions in five different countries. A subset of the dataset was used for the 2020 RSNA AI Challenge hosted by Kaggle.

Date

Published: 2021-01-04

Dataset

Motivation

The intent of this dataset is to spur research and innovation in machine learning that will ultimately lead to improvements in the quality, efficiency, and availability of patient care worldwide.

Sampling

For the Kaggle challenge subset, 30% of negative examinations in both training and test sets were randomly chosen and excluded.

Partitioning scheme

The training portion was annotated by a single individual, while the test portion was triple read with consensus and adjudication by experienced cardiothoracic radiologists.

Relationships between instances

Only one series per study and one study per patient from a site were included in the final dataset.

Noise

Labels for 'Flow Artifact', 'QA-motion', and 'QA-contrast' were used to distinguish studies with image quality issues.

External data

A subset of the dataset was made available for the RSNA/STR Pulmonary Embolism Detection Challenge on Kaggle.

Re-identification

De-identification processes ensured the elimination of “private” DICOM metadata tags and that “burned-in” or pixel-level patient identifiers were not present.

Sensitive data

Patient identifiers were de-identified.