OPTIMAM Mammography Image Database: A Large-Scale Resource of Mammography Images and Clinical Data
OPTIMAM Mammography Image Database: A Large-Scale Resource of Mammography Images and Clinical Data
dataset2025-11-29https://doi.org/10.1148/atlas.1764458089269
62

Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/dataset.json

Name

OPTIMAM Mammography Image Database: A Large-Scale Resource of Mammography Images and Clinical Data

Link

https://doi.org/10.1148/ryai.2020200103

Indexing

Keywords: Serial Screening Mammograms, Breast Cancer, Interval Cancers, Artificial Intelligence Algorithms, Breast Cancer Detection, Mammography, Clinical Data, Annotated Cancers, Quantitative Imaging, Breast Screening, Digital Mammography, Digital Breast Tomosynthesis, Magnetic Resonance Imaging
Content: BR, OI, BQ, PH, MR

Author(s)

Mark D. Halling-Brown
Lucy M. Warren
Dominic Ward
Emma Lewis
Alistair Mackenzie
Matthew G. Wallis
Louise S. Wilkinson
Rosalind M. Given-Wilson
Rita McAvinchey
Kenneth C. Young

Organization(s)

Royal Surrey NHS Foundation Trust
University of Surrey
Cambridge University Hospitals NHS Foundation Trust
NIHR Cambridge Biomedical Research Centre
Oxford University Hospitals NHS Foundation Trust
St George’s Healthcare NHS Trust
Jarvis Breast Screening Centre
Cancer Research UK

Version

May 2020 Snapshot

Contact

mhalling-brown@nhs.net

Funding

The creation and ongoing development of the OPTIMAM Image Database is funded by Cancer Research UK (C30682/A28396).

Ethical review

The project has approval from an ethical research committee specializing in research databases organized by the NHS Health Research Authority. A formal local agreement to contribute cases is also gained at each site.

Comments

The OPTIMAM Mammography Image Database (OMI-DB) was created to provide a centralized, fully annotated dataset for research. It includes processed and unprocessed mammography images from UK breast screening centers, with annotated cancers and clinical details. The database includes serial screening mammograms collected over a 10-year period with data from 172,282 women as of May 2020. It has been widely used to develop and evaluate artificial intelligence algorithms for breast cancer detection.

Date

Updated: 2020-05-01
Published: 2020-10-05

Dataset

Motivation

The database was created for Cancer Research UK–funded projects OPTIMAM and OPTIMAM2 to evaluate factors affecting breast cancer detection on mammograms. The objective was to collect mammograms for women with screen-detected cancers and representative samples of normal and benign screening cases.

Sampling

Images and clinical data were collected for all women screened during 2014, and for a random selection of 25% of all women screened in 2012, 2013, and 2015 at two of the three sites. The objective was to collect mammograms for women with screen-detected cancers as well as representative samples of normal and benign screening cases.

Partitioning scheme

The database has sharing protocols that allow images to be used by researchers around the world. Data and images have been shared with more than 30 academic and commercial groups for various research aims, mainly to develop machine learning artificial intelligence techniques.

Missing information

Twenty-seven screen-detected cancers and 14 interval cancers were excluded from tables due to missing or inconsistent information.

Relationships between instances

The database includes serial screening mammograms collected over a 10-year period, with data on current, previous, and subsequent screening episodes. Information on screening history, previous occurrences of cancer, biopsy results, and surgical procedures are collected.

External data

The National Breast Screening System (NBSS) is queried for women’s data.

Confidentiality

All potentially identifiable information is removed from the images and data at the point of collection and is inaccessible to researchers. Imaging and screening data are pseudonymized and records inserted into lookup tables.

Re-identification

All potentially identifiable information is removed from the images and data at the point of collection and is inaccessible to researchers. Imaging and screening data are pseudonymized and records inserted into lookup tables.