OPTIMAM Mammography Image Database: A Large-Scale Resource of Mammography Images and Clinical Data
OPTIMAM Mammography Image Database: A Large-Scale Resource of Mammography Images and Clinical Data
2025-11-29https://doi.org/10.1148/atlas.1764458089269
62
Overview
Schema Version
https://atlas.rsna.org/schemas/2025-11/dataset.json
Name
OPTIMAM Mammography Image Database: A Large-Scale Resource of Mammography Images and Clinical Data
Link
https://doi.org/10.1148/ryai.2020200103
Indexing
Keywords: Serial Screening Mammograms, Breast Cancer, Interval Cancers, Artificial Intelligence Algorithms, Breast Cancer Detection, Mammography, Clinical Data, Annotated Cancers, Quantitative Imaging, Breast Screening, Digital Mammography, Digital Breast Tomosynthesis, Magnetic Resonance Imaging
Content: BR, OI, BQ, PH, MR
Author(s)
Mark D. Halling-Brown
Lucy M. Warren
Dominic Ward
Emma Lewis
Alistair Mackenzie
Matthew G. Wallis
Louise S. Wilkinson
Rosalind M. Given-Wilson
Rita McAvinchey
Kenneth C. Young
Organization(s)
Royal Surrey NHS Foundation Trust
University of Surrey
Cambridge University Hospitals NHS Foundation Trust
NIHR Cambridge Biomedical Research Centre
Oxford University Hospitals NHS Foundation Trust
St George’s Healthcare NHS Trust
Jarvis Breast Screening Centre
Cancer Research UK
Version
May 2020 Snapshot
Contact
mhalling-brown@nhs.net
Funding
The creation and ongoing development of the OPTIMAM Image Database is funded by Cancer Research UK (C30682/A28396).
Ethical review
The project has approval from an ethical research committee specializing in research databases organized by the NHS Health Research Authority. A formal local agreement to contribute cases is also gained at each site.
Comments
The OPTIMAM Mammography Image Database (OMI-DB) was created to provide a centralized, fully annotated dataset for research. It includes processed and unprocessed mammography images from UK breast screening centers, with annotated cancers and clinical details. The database includes serial screening mammograms collected over a 10-year period with data from 172,282 women as of May 2020. It has been widely used to develop and evaluate artificial intelligence algorithms for breast cancer detection.
Date
Updated: 2020-05-01
Published: 2020-10-05
Dataset
Motivation
The database was created for Cancer Research UK–funded projects OPTIMAM and OPTIMAM2 to evaluate factors affecting breast cancer detection on mammograms. The objective was to collect mammograms for women with screen-detected cancers and representative samples of normal and benign screening cases.
Sampling
Images and clinical data were collected for all women screened during 2014, and for a random selection of 25% of all women screened in 2012, 2013, and 2015 at two of the three sites. The objective was to collect mammograms for women with screen-detected cancers as well as representative samples of normal and benign screening cases.
Partitioning scheme
The database has sharing protocols that allow images to be used by researchers around the world. Data and images have been shared with more than 30 academic and commercial groups for various research aims, mainly to develop machine learning artificial intelligence techniques.
Missing information
Twenty-seven screen-detected cancers and 14 interval cancers were excluded from tables due to missing or inconsistent information.
Relationships between instances
The database includes serial screening mammograms collected over a 10-year period, with data on current, previous, and subsequent screening episodes. Information on screening history, previous occurrences of cancer, biopsy results, and surgical procedures are collected.
External data
The National Breast Screening System (NBSS) is queried for women’s data.
Confidentiality
All potentially identifiable information is removed from the images and data at the point of collection and is inaccessible to researchers. Imaging and screening data are pseudonymized and records inserted into lookup tables.
Re-identification
All potentially identifiable information is removed from the images and data at the point of collection and is inaccessible to researchers. Imaging and screening data are pseudonymized and records inserted into lookup tables.