Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/dataset.json

Name

RSNA Pulmonary Embolism Detection (RSNA-PED) Dataset

Link

https://mira.rsna.org/dataset/2

Indexing

Keywords: Pulmonary Embolism, CT Dataset, Expert Annotations, Thoracic Radiologists, Machine Learning Models, CT Pulmonary Angiography, Medical Imaging, Publicly Available Dataset, Kaggle Competition

Content: CH, CT

RadLex: RID4834

Author(s)

Errol Colak

Felipe C. Kitamura

Stephen B. Hobbs

Carol C. Wu

Matthew P. Lungren

Luciano M. Prevedello

Jayashree Kalpathy-Cramer

Robyn L. Ball

George Shih

Anouk Stein

Safwan S. Halabi

Emre Altinmakas

Meng Law

Parveen Kumar

Karam A. Manzalawi

Dennis Charles Nelson Rubio

Jacob W. Sechrist

Pauline Germaine

Eva Castro Lopez

Tomas Amerio

Pushpender Gupta

Manoj Jain

Fernando U. Kay

Cheng Ting Lin

Saugata Sen

Jonathan Wesley Revels

Carola C. Brussaard

John Mongan

Organization(s)

Radiological Society of North America

Society of Thoracic Radiology

Stanford University

Unity Health Toronto

Federal University of S��o Paulo

Alfred Health

Koç University

MD.ai

License

Text: RSNA MIRA DATASET RESEARCH USE AGREEMENT

URL: https://docs.google.com/document/d/1r8_0yW-5XqxSqhFzFq2fV6L4NxIQ6drF0sBjXXJevXU/edit?tab=t.0#heading=h.1iah9825ct0r

Contact

Errol Colak (Errol.Colak@unityhealth.to)

Ethical review

Each contributing site was responsible for obtaining institutional approval and adhering to local legal regulations and best practices.

Comments

This multinational dataset is, to our knowledge, the largest publicly available pulmonary embolism (PE) CT dataset that includes expert annotations from a large group of subspecialist thoracic radiologists (RSNA-STR Annotators and Dataset Curation Contributors). Data were collected from institutions in five different countries. A subset of the dataset was used for the 2020 RSNA AI Challenge hosted by Kaggle.

Date

Published: 2021-01-04

Dataset

Motivation

The intent of this dataset is to spur research and innovation in machine learning that will ultimately lead to improvements in the quality, efficiency, and availability of patient care worldwide.

Sampling

For the Kaggle challenge subset, 30% of negative examinations in both training and test sets were randomly chosen and excluded.

Partitioning scheme

The training portion was annotated by a single individual, while the test portion was triple read with consensus and adjudication by experienced cardiothoracic radiologists.

Relationships between instances

Only one series per study and one study per patient from a site were included in the final dataset.

Noise

Labels for 'Flow Artifact', 'QA-motion', and 'QA-contrast' were used to distinguish studies with image quality issues.

External data

A subset of the dataset was made available for the RSNA/STR Pulmonary Embolism Detection Challenge on Kaggle.

Re-identification

De-identification processes ensured the elimination of “private” DICOM metadata tags and that “burned-in” or pixel-level patient identifiers were not present.

Sensitive data

Patient identifiers were de-identified.