RSNA Mammography Breast Cancer Detection (RSNA-SMBC) Dataset
dataset2025-11-21https://doi.org/10.1148/atlas.1763497273917
101

Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/dataset.json

Name

RSNA Mammography Breast Cancer Detection (RSNA-SMBC) Dataset

Link

https://mira.rsna.org/dataset/3

Indexing

Keywords: Screening Mammography, Breast Cancer Detection, Artificial Intelligence, BI-RADS, Breast Density, Pathological Outcomes, Digital Mammography, Multi-institutional Dataset
Content: BR, OI
RadLex: RID28749, RID39055, RID10357

Author(s)

Katherine P. Andriole
Robyn Ball
Yan Chen
Helen Frazer
Tatiana Kelil
Felipe Kitamura
Jackson Kwok
Matthew Lungren
Ritse Mann
John Mongan
Linda Moy
George Partridge
Hari Trivedi
Xin Wang
Luyan Yao
Tianyu Zhang

Organization(s)

Radiological Society of North America (RSNA)
BreastScreen Victoria
Emory University | Health Innovation and Translational Informatics (HITI) Lab

Version

1.0

License

Text: RSNA MIRA DATASET RESEARCH USE AGREEMENT
URL: https://docs.google.com/document/d/1r8_0yW-5XqxSqhFzFq2fV6L4NxIQ6drF0sBjXXJevXU/edit?tab=t.0

Contact

informatics@rsna.org

Ethical review

All data were de-identified in accordance with HIPAA guidelines and institutional review board (IRB) approval.

Comments

This dataset was curated for the RSNA Screening Mammography Cancer Detection Challenge. It is a large, multi-institutional dataset with high-resolution digital mammograms, comprehensive metadata, and rigorous follow-up data from two diverse screening populations.

Date

Created: 2022-11-28

Dataset

Motivation

To accelerate the development and benchmarking of AI algorithms for breast cancer detection and to foster collaboration within the medical imaging research community.

Sampling

The dataset was intentionally enriched to 4% cancer prevalence, approximately five times higher than the population prevalence.

Partitioning scheme

The dataset was partitioned into a training set, a public test set, and a private test set for the challenge.

Missing information

The dataset excludes interval cancers. It does not include human-provided annotations that precisely localize or delineate specific lesions within the mammograms.

Relationships between instances

Only one exam per patient was included in the dataset. Each exam contains four images (views) as denoted by the laterality and view columns.

Noise

Examinations with missing images, images with invalid or missing DICOM metadata, errant images, and exams with inconsistencies between BI-RADS assessments and pathology outcomes were removed during curation.

External data

The Github for notebooks and metadata can be found here: https://github.com/RSNA/AI-Challenge-Data/wiki/

Confidentiality

All data were de-identified in accordance with HIPAA guidelines.

Re-identification

PHI-containing elements such as names, dates, accession numbers, patient IDs, and addresses were de-identified or removed prior to data collection.