본문으로 건너뛰기
← 뒤로

Malignant vs. Non-malignant Annotations on TCGA Breast Cancer Whole Slide Images for AI Analysis.

Scientific data 2026

Shahraki ZA, Jokelainen O, Valkonen M, Auvinen P, Mannermaa A, Behravan H

📝 환자 설명용 한 줄

Identification of malignant and non-malignant regions in breast cancer whole slide images (WSIs) is essential for understanding tumor heterogeneity and histopathological evaluation.

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)
  • 표본수 (n) 1,882

이 논문을 인용하기

BibTeX ↓ RIS ↓
APA Shahraki ZA, Jokelainen O, et al. (2026). Malignant vs. Non-malignant Annotations on TCGA Breast Cancer Whole Slide Images for AI Analysis.. Scientific data. https://doi.org/10.1038/s41597-026-07106-5
MLA Shahraki ZA, et al.. "Malignant vs. Non-malignant Annotations on TCGA Breast Cancer Whole Slide Images for AI Analysis.." Scientific data, 2026.
PMID 41922381

Abstract

Identification of malignant and non-malignant regions in breast cancer whole slide images (WSIs) is essential for understanding tumor heterogeneity and histopathological evaluation. Annotated data allows deep learning models to learn features and analyze histological structures. In this study, we obtained 50 breast cancer WSIs from The Cancer Genome Atlas (TCGA) and had an expert pathologist manually annotate malignant (n = 1,882) and non-malignant (n = 374) regions using QuPath. The annotated regions were independently reviewed by a second pathologist, achieving an inter-observer agreement of 99.95%. In addition, to assess annotations' quality, we trained a hybrid contrastive-supervised machine learning pipeline for patch-level malignant vs. non-malignant classification. The model achieved a high F1-score of 0.90 indicating that our annotations are comparable in quality to those presented in public datasets. The proposed dataset provides expert-quality annotations and a unique resource for benchmarking and developing AI models for breast cancer histopathology.