RSNA Mammography Breast Cancer Detection (RSNA-SMBC) Dataset
2025-11-21https://doi.org/10.1148/atlas.1763497273917
101
Overview
Schema Version
https://atlas.rsna.org/schemas/2025-11/dataset.json
Name
RSNA Mammography Breast Cancer Detection (RSNA-SMBC) Dataset
Link
https://mira.rsna.org/dataset/3
Indexing
Keywords: Screening Mammography, Breast Cancer Detection, Artificial Intelligence, BI-RADS, Breast Density, Pathological Outcomes, Digital Mammography, Multi-institutional Dataset
Content: BR, OI
RadLex: RID28749, RID39055, RID10357
Author(s)
Katherine P. Andriole
Robyn Ball
Yan Chen
Helen Frazer
Tatiana Kelil
Felipe Kitamura
Jackson Kwok
Matthew Lungren
Ritse Mann
John Mongan
Linda Moy
George Partridge
Hari Trivedi
Xin Wang
Luyan Yao
Tianyu Zhang
Organization(s)
Radiological Society of North America (RSNA)
BreastScreen Victoria
Emory University | Health Innovation and Translational Informatics (HITI) Lab
Version
1.0
License
Text: RSNA MIRA DATASET RESEARCH USE AGREEMENT
URL: https://docs.google.com/document/d/1r8_0yW-5XqxSqhFzFq2fV6L4NxIQ6drF0sBjXXJevXU/edit?tab=t.0
Contact
informatics@rsna.org
Ethical review
All data were de-identified in accordance with HIPAA guidelines and institutional review board (IRB) approval.
Comments
This dataset was curated for the RSNA Screening Mammography Cancer Detection Challenge. It is a large, multi-institutional dataset with high-resolution digital mammograms, comprehensive metadata, and rigorous follow-up data from two diverse screening populations.
Date
Created: 2022-11-28
Dataset
Motivation
To accelerate the development and benchmarking of AI algorithms for breast cancer detection and to foster collaboration within the medical imaging research community.
Sampling
The dataset was intentionally enriched to 4% cancer prevalence, approximately five times higher than the population prevalence.
Partitioning scheme
The dataset was partitioned into a training set, a public test set, and a private test set for the challenge.
Missing information
The dataset excludes interval cancers. It does not include human-provided annotations that precisely localize or delineate specific lesions within the mammograms.
Relationships between instances
Only one exam per patient was included in the dataset. Each exam contains four images (views) as denoted by the laterality and view columns.
Noise
Examinations with missing images, images with invalid or missing DICOM metadata, errant images, and exams with inconsistencies between BI-RADS assessments and pathology outcomes were removed during curation.
External data
The Github for notebooks and metadata can be found here: https://github.com/RSNA/AI-Challenge-Data/wiki/
Confidentiality
All data were de-identified in accordance with HIPAA guidelines.
Re-identification
PHI-containing elements such as names, dates, accession numbers, patient IDs, and addresses were de-identified or removed prior to data collection.