Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/dataset.json

Name

OPTIMAM mammography subset for AI breast cancer risk prediction (Ellis et al., 2024)

Link

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11294956/

Indexing

Keywords: Breast cancer risk prediction, Screening mammography, Interval cancer, Screen-detected cancer, Mammographic density, UK NHS Breast Screening Programme, OPTIMAM

Content: BR

RadLex: RID10357

SNOMED: 254837009

Author(s)

Sam Ellis

Sandra Gomes

Matthew Trumble

Mark D. Halling-Brown

Kenneth C. Young

Nouman S. Chaudhry

Peter Harris

Lucy M. Warren

Organization(s)

Royal Surrey NHS Foundation Trust

University of Surrey

Contact

Sam Ellis (corresponding author): ten.shn@2sille.mas

Funding

Creation and maintenance of the OPTIMAM Image Database funded by Cancer Research UK (C30682/A28396). S.E. supported by the Million Women Study (C16077/A29186).

Ethical review

Data were collected with approval from an ethical research committee specializing in research databases organized by the NHS Health Research Authority.

Comments

Retrospective study using a curated subset of the UK OPTIMAM Mammography Image Database to train and evaluate a deep learning model predicting 3-year breast cancer risk from negative screening mammograms.

Date

Published: 2024-05-22

References

[1] Ellis S, Gomes S, Trumble M, et al.. "Deep Learning for Breast Cancer Risk Prediction: Application to a Large Representative UK Screening Cohort". Radiology: Artificial Intelligence. 2024-07-01. doi:10.1148/ryai.230431. PMID: 38775671. PMCID: PMC11294956.

[2] Halling-Brown MD, Warren LM, Ward D, et al.. "OPTIMAM mammography image database: a large-scale resource of mammography images and clinical data". Radiology: Artificial Intelligence. 2020-01-01. PMID: 33937853. PMCID: PMC8082293. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8082293/

Dataset

Motivation

Develop and evaluate a UK-specific AI model to predict 3-year future breast cancer risk from negative screening mammograms.

Sampling

From 5264 risk-positive and 191,488 risk-negative women, random sampling used: training negatives reduced by 50% to increase prevalence; one episode randomly selected per woman where multiple eligible episodes existed in the risk-negative group.

Partitioning scheme

Stratified 60:20:20 split into training, validation, and test to preserve cancer prevalence. Validation balanced by cancer outcome at the patient level. Test set left unmodified to reflect OPTIMAM demographics.

Missing information

Only Hologic mammography systems included; images from other manufacturers underrepresented and excluded. Examinations with confirmed cancer and images containing implants were excluded. Some screening episodes lacked follow-up and were excluded.

Relationships between instances

One screening episode per woman; two views per breast (CC and MLO). In the training split, images of healthy contralateral breasts of risk-positive patients were removed.

External data

Data derived from the OPTIMAM Mammography Image Database collected at multiple UK NHSBSP sites.