Key predictive factors of breast cancer based on race using machine learning models.
[PURPOSE] In this research, key factors influencing breast cancer risk, a major global issue, are investigated, using machine learning (ML) and explainable AI, for racial differences.
APA
Yin S, Nanda G, Sundararajan R (2026). Key predictive factors of breast cancer based on race using machine learning models.. Annals of epidemiology, 119, 110080. https://doi.org/10.1016/j.annepidem.2026.110080
MLA
Yin S, et al.. "Key predictive factors of breast cancer based on race using machine learning models.." Annals of epidemiology, vol. 119, 2026, pp. 110080.
PMID
41912057
Abstract
[PURPOSE] In this research, key factors influencing breast cancer risk, a major global issue, are investigated, using machine learning (ML) and explainable AI, for racial differences.
[METHODS] We used Breast Cancer Surveillance Consortium (BCSC) data, originally comprising 1.5 million unique combination records, from 6.7 million mammograms, collected between 2005 and 2017. Naïve Bayes, Logistic Regression, and Extreme Gradient Boosting models were applied to identify these key predictors. Variable importance and SHapley Additive exPlanations values were used to interpret models and identify most predictive factors. Analyses were stratified by six racial groups.
[RESULTS] History of biopsy (50.04%) and age group (25.85%) were the strongest predictors across all models and races. Menopausal status, breast density, and age at first childbirth were also important. White women had the highest overall incidences, particularly those over 65 (9.02 overall; 18.13 at age 65 + per 100,000), while Black women had higher rates in younger age groups (7.1 per 100,000 at age 18-29). Native American women showed higher rates in certain older age groups, whereas Asian/Pacific Islander and Other/Mixed groups had generally lower rates.
[CONCLUSIONS] ML and explainable AI applied to BCSC data identified key predictors and highlighted racial disparities among most predictive factors for breast cancer risk.
[METHODS] We used Breast Cancer Surveillance Consortium (BCSC) data, originally comprising 1.5 million unique combination records, from 6.7 million mammograms, collected between 2005 and 2017. Naïve Bayes, Logistic Regression, and Extreme Gradient Boosting models were applied to identify these key predictors. Variable importance and SHapley Additive exPlanations values were used to interpret models and identify most predictive factors. Analyses were stratified by six racial groups.
[RESULTS] History of biopsy (50.04%) and age group (25.85%) were the strongest predictors across all models and races. Menopausal status, breast density, and age at first childbirth were also important. White women had the highest overall incidences, particularly those over 65 (9.02 overall; 18.13 at age 65 + per 100,000), while Black women had higher rates in younger age groups (7.1 per 100,000 at age 18-29). Native American women showed higher rates in certain older age groups, whereas Asian/Pacific Islander and Other/Mixed groups had generally lower rates.
[CONCLUSIONS] ML and explainable AI applied to BCSC data identified key predictors and highlighted racial disparities among most predictive factors for breast cancer risk.
같은 제1저자의 인용 많은 논문 (5)
- Tetraspanin 13 Enhances Immune Evasion in Breast Cancer by Promoting MHC-I Degradation.
- CFMF: A Clustering-Free Cell Marker Finder for Single-Cell Transcriptomic Data.
- Bone Metastasis Mediates Poor Prognosis in Early-Onset Gastric Cancer: Insights Into Immune Suppression, Coagulopathy, and Inflammation.
- Rapid visual detection of Helicobacter pylori and vacA subtypes by Dual-Target RAA-LFD assay.
- Suppression of LIF in tumor-associated macrophages contributing to the PD-1/PD-L1 blockade in hepatocellular carcinoma.