Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/dataset.json

Name

RSNA Mammography Breast Cancer Detection (RSNA-SMBC) Dataset

Link

https://mira.rsna.org/dataset/3

Indexing

Keywords: Screening Mammography, Breast Cancer Detection, Artificial Intelligence, BI-RADS, Breast Density, Pathological Outcomes, Digital Mammography, Multi-institutional Dataset

Content: BR, OI

RadLex: RID28749, RID39055, RID10357

Author(s)

Katherine P. Andriole

Robyn Ball

Yan Chen

Helen Frazer

Tatiana Kelil

Felipe Kitamura

Jackson Kwok

Matthew Lungren

Ritse Mann

John Mongan

Linda Moy

George Partridge

Hari Trivedi

Xin Wang

Luyan Yao

Tianyu Zhang

Organization(s)

Radiological Society of North America (RSNA)

BreastScreen Victoria

Emory University | Health Innovation and Translational Informatics (HITI) Lab

Version

1.0

License

Text: RSNA MIRA DATASET RESEARCH USE AGREEMENT

URL: https://docs.google.com/document/d/1r8_0yW-5XqxSqhFzFq2fV6L4NxIQ6drF0sBjXXJevXU/edit?tab=t.0

Contact

informatics@rsna.org

Ethical review

All data were de-identified in accordance with HIPAA guidelines and institutional review board (IRB) approval.

Comments

This dataset was curated for the RSNA Screening Mammography Cancer Detection Challenge. It is a large, multi-institutional dataset with high-resolution digital mammograms, comprehensive metadata, and rigorous follow-up data from two diverse screening populations.

Date

Created: 2022-11-28

Dataset

Motivation

To accelerate the development and benchmarking of AI algorithms for breast cancer detection and to foster collaboration within the medical imaging research community.

Sampling

The dataset was intentionally enriched to 4% cancer prevalence, approximately five times higher than the population prevalence.

Partitioning scheme

The dataset was partitioned into a training set, a public test set, and a private test set for the challenge.

Missing information

The dataset excludes interval cancers. It does not include human-provided annotations that precisely localize or delineate specific lesions within the mammograms.

Relationships between instances

Only one exam per patient was included in the dataset. Each exam contains four images (views) as denoted by the laterality and view columns.

Noise

Examinations with missing images, images with invalid or missing DICOM metadata, errant images, and exams with inconsistencies between BI-RADS assessments and pathology outcomes were removed during curation.

External data

The Github for notebooks and metadata can be found here: https://github.com/RSNA/AI-Challenge-Data/wiki/

Confidentiality

All data were de-identified in accordance with HIPAA guidelines.

Re-identification

PHI-containing elements such as names, dates, accession numbers, patient IDs, and addresses were de-identified or removed prior to data collection.