Malignant vs. Non-malignant Annotations on TCGA Breast Cancer Whole Slide Images for AI Analysis.
Identification of malignant and non-malignant regions in breast cancer whole slide images (WSIs) is essential for understanding tumor heterogeneity and histopathological evaluation.
- 표본수 (n) 1,882
APA
Shahraki ZA, Jokelainen O, et al. (2026). Malignant vs. Non-malignant Annotations on TCGA Breast Cancer Whole Slide Images for AI Analysis.. Scientific data. https://doi.org/10.1038/s41597-026-07106-5
MLA
Shahraki ZA, et al.. "Malignant vs. Non-malignant Annotations on TCGA Breast Cancer Whole Slide Images for AI Analysis.." Scientific data, 2026.
PMID
41922381
Abstract
Identification of malignant and non-malignant regions in breast cancer whole slide images (WSIs) is essential for understanding tumor heterogeneity and histopathological evaluation. Annotated data allows deep learning models to learn features and analyze histological structures. In this study, we obtained 50 breast cancer WSIs from The Cancer Genome Atlas (TCGA) and had an expert pathologist manually annotate malignant (n = 1,882) and non-malignant (n = 374) regions using QuPath. The annotated regions were independently reviewed by a second pathologist, achieving an inter-observer agreement of 99.95%. In addition, to assess annotations' quality, we trained a hybrid contrastive-supervised machine learning pipeline for patch-level malignant vs. non-malignant classification. The model achieved a high F1-score of 0.90 indicating that our annotations are comparable in quality to those presented in public datasets. The proposed dataset provides expert-quality annotations and a unique resource for benchmarking and developing AI models for breast cancer histopathology.