Mammogram-based AI risk assessment in patients with dense breasts undergoing supplemental molecular breast imaging.
1/5 보강
[BACKGROUND] Image-based artificial intelligence (AI) risk models can estimate short-term breast cancer risk directly from mammograms and may outperform traditional questionnaire-based tools.
- p-value P=0.049
- p-value P=0.048
- 95% CI 0.61-0.89
- Sensitivity 61.1%
- Specificity 95%
APA
Ogunlade SB, Wang L, et al. (2026). Mammogram-based AI risk assessment in patients with dense breasts undergoing supplemental molecular breast imaging.. Quantitative imaging in medicine and surgery, 16(2), 123. https://doi.org/10.21037/qims-2025-1650
MLA
Ogunlade SB, et al.. "Mammogram-based AI risk assessment in patients with dense breasts undergoing supplemental molecular breast imaging.." Quantitative imaging in medicine and surgery, vol. 16, no. 2, 2026, pp. 123.
PMID
41669457 ↗
Abstract 한글 요약
[BACKGROUND] Image-based artificial intelligence (AI) risk models can estimate short-term breast cancer risk directly from mammograms and may outperform traditional questionnaire-based tools. However, risk stratification remains particularly challenging in women with dense breasts who do not otherwise meet high-risk criteria. At our institutions, molecular breast imaging (MBI) is used as supplemental screening for this population. This study evaluated the performance and clinical utility of a mammography-based AI risk model (iCAD ProFound AI Risk) in predicting short-term breast cancer risk among women with dense breasts undergoing MBI.
[METHODS] This retrospective IRB-approved study included 416 non-actionable (BI-RADS category 1 or 2) screening digital breast tomosynthesis mammograms (BI-RADS C-D density) obtained from 2018 to 2023, all followed by MBI within one year. The cohort comprised 70 cancer cases (16.8%) and 346 (83.2%) non-cancer controls. Mammograms were retrospectively processed using the ProFound AI Risk model to generate 1-year risk and density scores. Tyrer-Cuzick and Gail model scores were computed for comparison. Group differences were assessed using -tests and effect sizes, and model discrimination was evaluated with ROC analysis using area under the curve (AUC), sensitivity, specificity, and 95% confidence intervals (CIs).
[RESULTS] Across the full cohort, mean AI risk scores were higher in cancer cases than controls (0.41±0.35 0.37±0.21), although this difference was not statistically significant (P=0.239; Cohen's =0.23). Subgroup analyses demonstrated progressively stronger discriminatory performance with increasing breast density. The greatest separation was observed in women with extremely dense breasts (category D), where the AI model achieved an AUC of 0.75 (95% CI: 0.61-0.89; P=0.049), with 69.3% sensitivity and 61.1% specificity at a threshold of 0.14. Effect size in this group was the largest (=0.41). In contrast, traditional models showed limited and non-significant discrimination across all density categories, with AUC values ranging from 0.54 to 0.63. When stratified by cancer subtype, the AI model produced significantly higher risk scores in invasive lobular carcinoma (ILC) compared with controls (0.69±0.46 0.41±0.32; P=0.048; =0.56). Although differences in ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC) were not significant, risk scores trended higher for cancer cases. A similar pattern of increasing AI-estimated risk was observed with higher tumor grade, with the strongest separation seen in grade 2 cancers (P=0.089).
[CONCLUSIONS] Although overall differences between cancer and non-cancer groups were not statistically significant, the mammography-based AI risk model demonstrated meaningful and statistically significant discrimination in women with extremely dense breasts, outperforming both Tyrer-Cuzick and Gail models. The AI model also showed better separation in ILC and in higher-grade tumors. These findings support the role of image-based AI tools in refining risk assessment in women for whom mammography is least effective and in guiding more targeted use of supplemental MBI screening.
[METHODS] This retrospective IRB-approved study included 416 non-actionable (BI-RADS category 1 or 2) screening digital breast tomosynthesis mammograms (BI-RADS C-D density) obtained from 2018 to 2023, all followed by MBI within one year. The cohort comprised 70 cancer cases (16.8%) and 346 (83.2%) non-cancer controls. Mammograms were retrospectively processed using the ProFound AI Risk model to generate 1-year risk and density scores. Tyrer-Cuzick and Gail model scores were computed for comparison. Group differences were assessed using -tests and effect sizes, and model discrimination was evaluated with ROC analysis using area under the curve (AUC), sensitivity, specificity, and 95% confidence intervals (CIs).
[RESULTS] Across the full cohort, mean AI risk scores were higher in cancer cases than controls (0.41±0.35 0.37±0.21), although this difference was not statistically significant (P=0.239; Cohen's =0.23). Subgroup analyses demonstrated progressively stronger discriminatory performance with increasing breast density. The greatest separation was observed in women with extremely dense breasts (category D), where the AI model achieved an AUC of 0.75 (95% CI: 0.61-0.89; P=0.049), with 69.3% sensitivity and 61.1% specificity at a threshold of 0.14. Effect size in this group was the largest (=0.41). In contrast, traditional models showed limited and non-significant discrimination across all density categories, with AUC values ranging from 0.54 to 0.63. When stratified by cancer subtype, the AI model produced significantly higher risk scores in invasive lobular carcinoma (ILC) compared with controls (0.69±0.46 0.41±0.32; P=0.048; =0.56). Although differences in ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC) were not significant, risk scores trended higher for cancer cases. A similar pattern of increasing AI-estimated risk was observed with higher tumor grade, with the strongest separation seen in grade 2 cancers (P=0.089).
[CONCLUSIONS] Although overall differences between cancer and non-cancer groups were not statistically significant, the mammography-based AI risk model demonstrated meaningful and statistically significant discrimination in women with extremely dense breasts, outperforming both Tyrer-Cuzick and Gail models. The AI model also showed better separation in ILC and in higher-grade tumors. These findings support the role of image-based AI tools in refining risk assessment in women for whom mammography is least effective and in guiding more targeted use of supplemental MBI screening.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
📖 전문 본문 읽기 PMC JATS · ~70 KB · 영문
Introduction
Introduction
Breast cancer remains a leading cause of morbidity and mortality among women worldwide, with an estimated 2.3 million new cases and 685,000 deaths globally in 2020 alone (1). Early detection is central to reducing mortality, and mammography has long served as the cornerstone of population-level breast cancer screening. However, the sensitivity of mammography is significantly reduced in women with dense breast tissue, which not only obscures lesions but is also an independent risk factor for breast cancer (2-4). Dense breast tissue, classified by the Breast Imaging Reporting and Data System (BI-RADS) as heterogeneously or extremely dense, is present in approximately 43.3% of women aged 40 to 74 years undergoing screening mammography, with the proportion inversely associated with age and body mass index (BMI), and corresponding to an estimated 27.6 million women in the United States (5).
To address the limitations of mammography in women with dense breasts, supplemental imaging modalities such as breast ultrasound, magnetic resonance imaging (MRI), and molecular breast imaging (MBI) have been utilized. While these modalities can improve cancer detection rates, they also present challenges, including higher false-positive rates, increased healthcare costs, and patient anxiety (6,7). As a result, refining selection criteria for supplemental imaging remains a clinical priority. Triaging women with dense breast tissue without additional high-risk features, such as family history or genetic mutations, continues to pose a significant clinical challenge.
MBI is available at our institution as a supplemental screening option for women with dense breasts who do not otherwise meet high-risk criteria. The use of artificial intelligence (AI)-based risk models in this context could help refine patient selection for MBI, identifying those who may benefit most from supplemental screening based on intrinsic image-derived risk signatures rather than questionnaire data alone. Current risk stratification models, such as the Gail and Tyrer-Cuzick (TC) models, incorporate demographic and clinical risk factors, including breast density. However, these tools often underperform in diverse clinical populations and may not fully capture imaging-based risk indicators (8,9). Recent advances in AI offer a new approach by using image-based AI risk models that analyze mammograms directly to assess short-term breast cancer risk. These AI models have demonstrated superior performance compared with traditional questionnaire-based models, particularly in predicting interval and near-term cancers in high-risk patients (10,11).
In this context, the iCAD ProFound AI® Risk model—a deep convolutional neural network (CNN) trained on over 13,000 screening mammograms from multi-site international cohorts—provides a 1-year absolute risk estimate directly from mammographic data for high-risk patients. Importantly, none of the patient data from the three participating centers in this study was included in the AI model’s training or validation datasets; thus, the model’s performance here represents independent validation.
This study aims to determine whether the AI models can enhance risk stratification and inform more targeted supplemental screening in clinical practice. If effective, such tools could minimize the overuse of imaging, reduce patient burden, and improve early detection, thereby supporting a precision medicine approach to breast cancer screening. We present this article in accordance with the STARD reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1650/rc).
Breast cancer remains a leading cause of morbidity and mortality among women worldwide, with an estimated 2.3 million new cases and 685,000 deaths globally in 2020 alone (1). Early detection is central to reducing mortality, and mammography has long served as the cornerstone of population-level breast cancer screening. However, the sensitivity of mammography is significantly reduced in women with dense breast tissue, which not only obscures lesions but is also an independent risk factor for breast cancer (2-4). Dense breast tissue, classified by the Breast Imaging Reporting and Data System (BI-RADS) as heterogeneously or extremely dense, is present in approximately 43.3% of women aged 40 to 74 years undergoing screening mammography, with the proportion inversely associated with age and body mass index (BMI), and corresponding to an estimated 27.6 million women in the United States (5).
To address the limitations of mammography in women with dense breasts, supplemental imaging modalities such as breast ultrasound, magnetic resonance imaging (MRI), and molecular breast imaging (MBI) have been utilized. While these modalities can improve cancer detection rates, they also present challenges, including higher false-positive rates, increased healthcare costs, and patient anxiety (6,7). As a result, refining selection criteria for supplemental imaging remains a clinical priority. Triaging women with dense breast tissue without additional high-risk features, such as family history or genetic mutations, continues to pose a significant clinical challenge.
MBI is available at our institution as a supplemental screening option for women with dense breasts who do not otherwise meet high-risk criteria. The use of artificial intelligence (AI)-based risk models in this context could help refine patient selection for MBI, identifying those who may benefit most from supplemental screening based on intrinsic image-derived risk signatures rather than questionnaire data alone. Current risk stratification models, such as the Gail and Tyrer-Cuzick (TC) models, incorporate demographic and clinical risk factors, including breast density. However, these tools often underperform in diverse clinical populations and may not fully capture imaging-based risk indicators (8,9). Recent advances in AI offer a new approach by using image-based AI risk models that analyze mammograms directly to assess short-term breast cancer risk. These AI models have demonstrated superior performance compared with traditional questionnaire-based models, particularly in predicting interval and near-term cancers in high-risk patients (10,11).
In this context, the iCAD ProFound AI® Risk model—a deep convolutional neural network (CNN) trained on over 13,000 screening mammograms from multi-site international cohorts—provides a 1-year absolute risk estimate directly from mammographic data for high-risk patients. Importantly, none of the patient data from the three participating centers in this study was included in the AI model’s training or validation datasets; thus, the model’s performance here represents independent validation.
This study aims to determine whether the AI models can enhance risk stratification and inform more targeted supplemental screening in clinical practice. If effective, such tools could minimize the overuse of imaging, reduce patient burden, and improve early detection, thereby supporting a precision medicine approach to breast cancer screening. We present this article in accordance with the STARD reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1650/rc).
Methods
Methods
Study design and setting
This retrospective observational study was approved by the Institutional Review Board of Mayo Clinic (No. 23-007303) and was conducted in accordance with the Declaration of Helsinki and its subsequent amendments across three affiliated academic breast imaging centers of Mayo Clinic between January 2018 and December 2023. The study aimed to evaluate the performance of a mammogram-based AI risk model in predicting short-term breast cancer risk in intermediate-risk women with dense breasts undergoing supplemental screening with MBI. The requirement for informed consent was waived due to the retrospective nature of the data collection and analysis. The study was conducted in compliance with the Health Insurance Portability and Accountability Act.
Study population
A cohort of 416 women was retrospectively reviewed in this study. All patients underwent screening digital breast tomosynthesis (DBT) mammography between 2018 and 2023. Eligible participants had a screening mammogram with a benign or negative result and underwent a supplemental screening MBI study performed either on the same day or within one year of the index mammogram. All included women had heterogeneously dense or extremely dense breast tissue, as classified by the interpreting radiologist using the American College of Radiology Breast Imaging Reporting and Data System (ACR BI-RADS) qualitative density categories (12,13). The patients in the cancer group had a cancer diagnosis within 2 years following the benign or negative mammogram. Additionally, participants were classified as intermediate risk based on the TC lifetime risk model score <20%.
Exclusion criteria included a personal history of breast cancer at the time of the index screening or classification as high risk by TC score >20%. This ensured that the study population remained focused on evaluating AI model performance specifically within a non-high-risk, dense breast cohort.
From the cohort of 416 women, 70 (16.8%) were diagnosed with histologically confirmed breast cancer within 2 years of the index mammogram. The remaining 346 women (83.2%) served as cancer-free controls, having no diagnosis of breast cancer within at least 2 years of follow-up.
Imaging protocols
DBT
All patients underwent bilateral screening DBT using full-field digital mammography (FFDM) systems (Hologic Inc., Marlborough, MA, USA). Images were interpreted by board-certified breast radiologists using the ACR BI-RADS lexicon. Only those mammograms initially classified as BI-RADS 1 or 2 were included.
MBI
MBI was performed using dual-head gamma camera systems following intravenous administration of 300 MBq (approximately 8 mCi) of Tc-99m sestamibi. Standard two-view imaging [craniocaudal (CC) and mediolateral oblique (MLO)] of each breast was acquired with mild compression. All MBI examinations were interpreted independently by experienced breast imaging radiologists, with AI risk scores unavailable at the time of interpretation.
At our institution, MBI is offered specifically as a supplemental screening modality for women with dense breasts and no additional high-risk features. This institutional policy motivated the present investigation to determine whether image-based AI risk estimation could optimize the selection of women who would most benefit from MBI, thereby improving cost-effectiveness and reducing unnecessary exposure.
AI risk model
The mammograms from all patients in this study were retrospectively processed using a commercially available, image-based breast cancer risk assessment algorithm—the iCAD ProFound AI® Risk model (iCAD Inc., Nashua, NH, USA). This model, developed in collaboration with researchers at the Karolinska Institute (Sweden), is an individualized, image-driven AI system capable of generating short-term breast-cancer risk estimates directly from FFDM and DBT.
The ProFound AI® Risk model is built upon a multi-stage deep CNN framework. In its first stage, convolutional layers automatically extract and encode high-dimensional image features from screening mammograms, including tissue density, parenchymal texture, structural asymmetry, micro-calcifications, and architectural distortions. These low-level image features are progressively abstracted through multiple convolutional blocks, pooling, and activation layers. The resulting feature maps are then aggregated across both the CC and MLO views of each breast, allowing the model to form a comprehensive spatial context representation.
A subsequent feature-fusion layer combines the learned bilateral representations and feeds them into a set of fully connected (dense) layers, which integrate image-derived features with demographic variables such as age, race, and geographic region. The network’s final activation layer produces a continuous 1-year absolute risk probability, scaled into four interpretive categories (low, general, moderate, and high). This output is accompanied by an AI-predicted breast density score: least dense (a), dense (b), more dense (c), and most dense (d).
The ProFound AI® Risk model was trained and validated on large, multi-institutional datasets that are entirely independent of the patient population used in the present study. The FFDM network was trained on 974 biopsy-proven cancer cases and 9,376 non-cancer controls, while the DBT network used 563 cancers and 3,609 controls, collected from multiple vendors and imaging centers to ensure robustness across acquisition protocols (14). The mammograms analyzed in this study were not included in any stage of model training or validation, ensuring a true external validation design.
During training, supervised learning was performed using binary cross-entropy loss, with ground-truth cancer outcomes as labels. The CNN weights were optimized via back-propagation with stochastic gradient descent and momentum, and early stopping was used to prevent overfitting. Model evaluation on held-out test sets yielded an area under the curve (AUC) of 0.73 for FFDM and 0.80 for DBT, outperforming traditional risk-assessment tools such as TC and Gail models.
In this study, only index (baseline) screening mammograms interpreted as benign or negative (BI-RADS 1–2) were processed by the AI model. These exams were obtained prior to the subsequent cancer diagnosis, ensuring that the algorithm evaluated pre-diagnostic images rather than mammograms displaying visible tumor signs. The AI-generated risk scores, therefore, represent predicted probabilities derived solely from the initial, ostensibly normal screening studies.
Thus, ProFound AI® Risk functions as an end-to-end deep-learning pipeline that extracts latent imaging biomarkers from screening mammograms and integrates them with minimal demographic input to generate an objective 1-year breast cancer risk score. By excluding our institutional data from its training set and applying the algorithm only to baseline benign/negative images, the current study provides an independent, retrospective validation of this model’s performance in women with dense breasts.
Traditional risk models for comparison
Tyrer-Cuzick (version 8) and Gail (version 2) models were included to enable direct comparison with the AI-based risk estimates. Risk calculations were performed during routine clinical assessment at each patient’s initial clinic visit, using the validated online calculators—the IBIS Breast Cancer Risk Evaluation Tool (Tyrer-Cuzick) version 8 and the NCI Breast Cancer Risk Assessment Tool (Gail). These scores were based on patient-provided information obtained at intake, including age, age at menarche, parity, age at first childbirth, family history of breast cancer, prior biopsies, hormonal therapy use, and BI-RADS breast-density category. For this study, the previously computed TC and Gail scores were extracted directly from the electronic medical record (EMR) without recalculation and verified by two independent reviewers for accuracy and completeness. The TC model provided lifetime and 10-year risk estimates, whereas the Gail model generated 5-year and lifetime risk scores.
Statistical analysis
Descriptive statistics, including means and standard deviations, were calculated for demographic and imaging characteristics in the cancer and non-cancer groups. Independent t-tests were used to compare continuous variables, including AI risk scores, TC scores, and Gail model scores, between the two groups. A two-tailed P value less than 0.05 was considered statistically significant. Cohen’s d effect sizes were also computed to quantify the magnitude of mean-score differences, complementing the P values.
Subgroup analyses were conducted according to breast density categories as determined by the AI model (categories A, B, C, and D) to assess whether model performance differed by density. Further subgroup analysis within the cancer group was performed based on histologic subtype, including ductal carcinoma in situ (DCIS), invasive ductal carcinoma (IDC), and invasive lobular carcinoma (ILC). Furthermore, cancer cases were analyzed by tumor grade to assess whether risk model scores varied according to tumor differentiation. Receiver operating characteristic (ROC) curve analysis was performed to evaluate the diagnostic performance of the AI model. The sensitivity and specificity were also determined. All statistical analyses were conducted using the International Business Machines (IBM) Corporation Statistical Package for the Social Sciences (SPSS) Statistics for Windows, Version 28.0 (IBM Corporation, Armonk, New York, USA).
Study design and setting
This retrospective observational study was approved by the Institutional Review Board of Mayo Clinic (No. 23-007303) and was conducted in accordance with the Declaration of Helsinki and its subsequent amendments across three affiliated academic breast imaging centers of Mayo Clinic between January 2018 and December 2023. The study aimed to evaluate the performance of a mammogram-based AI risk model in predicting short-term breast cancer risk in intermediate-risk women with dense breasts undergoing supplemental screening with MBI. The requirement for informed consent was waived due to the retrospective nature of the data collection and analysis. The study was conducted in compliance with the Health Insurance Portability and Accountability Act.
Study population
A cohort of 416 women was retrospectively reviewed in this study. All patients underwent screening digital breast tomosynthesis (DBT) mammography between 2018 and 2023. Eligible participants had a screening mammogram with a benign or negative result and underwent a supplemental screening MBI study performed either on the same day or within one year of the index mammogram. All included women had heterogeneously dense or extremely dense breast tissue, as classified by the interpreting radiologist using the American College of Radiology Breast Imaging Reporting and Data System (ACR BI-RADS) qualitative density categories (12,13). The patients in the cancer group had a cancer diagnosis within 2 years following the benign or negative mammogram. Additionally, participants were classified as intermediate risk based on the TC lifetime risk model score <20%.
Exclusion criteria included a personal history of breast cancer at the time of the index screening or classification as high risk by TC score >20%. This ensured that the study population remained focused on evaluating AI model performance specifically within a non-high-risk, dense breast cohort.
From the cohort of 416 women, 70 (16.8%) were diagnosed with histologically confirmed breast cancer within 2 years of the index mammogram. The remaining 346 women (83.2%) served as cancer-free controls, having no diagnosis of breast cancer within at least 2 years of follow-up.
Imaging protocols
DBT
All patients underwent bilateral screening DBT using full-field digital mammography (FFDM) systems (Hologic Inc., Marlborough, MA, USA). Images were interpreted by board-certified breast radiologists using the ACR BI-RADS lexicon. Only those mammograms initially classified as BI-RADS 1 or 2 were included.
MBI
MBI was performed using dual-head gamma camera systems following intravenous administration of 300 MBq (approximately 8 mCi) of Tc-99m sestamibi. Standard two-view imaging [craniocaudal (CC) and mediolateral oblique (MLO)] of each breast was acquired with mild compression. All MBI examinations were interpreted independently by experienced breast imaging radiologists, with AI risk scores unavailable at the time of interpretation.
At our institution, MBI is offered specifically as a supplemental screening modality for women with dense breasts and no additional high-risk features. This institutional policy motivated the present investigation to determine whether image-based AI risk estimation could optimize the selection of women who would most benefit from MBI, thereby improving cost-effectiveness and reducing unnecessary exposure.
AI risk model
The mammograms from all patients in this study were retrospectively processed using a commercially available, image-based breast cancer risk assessment algorithm—the iCAD ProFound AI® Risk model (iCAD Inc., Nashua, NH, USA). This model, developed in collaboration with researchers at the Karolinska Institute (Sweden), is an individualized, image-driven AI system capable of generating short-term breast-cancer risk estimates directly from FFDM and DBT.
The ProFound AI® Risk model is built upon a multi-stage deep CNN framework. In its first stage, convolutional layers automatically extract and encode high-dimensional image features from screening mammograms, including tissue density, parenchymal texture, structural asymmetry, micro-calcifications, and architectural distortions. These low-level image features are progressively abstracted through multiple convolutional blocks, pooling, and activation layers. The resulting feature maps are then aggregated across both the CC and MLO views of each breast, allowing the model to form a comprehensive spatial context representation.
A subsequent feature-fusion layer combines the learned bilateral representations and feeds them into a set of fully connected (dense) layers, which integrate image-derived features with demographic variables such as age, race, and geographic region. The network’s final activation layer produces a continuous 1-year absolute risk probability, scaled into four interpretive categories (low, general, moderate, and high). This output is accompanied by an AI-predicted breast density score: least dense (a), dense (b), more dense (c), and most dense (d).
The ProFound AI® Risk model was trained and validated on large, multi-institutional datasets that are entirely independent of the patient population used in the present study. The FFDM network was trained on 974 biopsy-proven cancer cases and 9,376 non-cancer controls, while the DBT network used 563 cancers and 3,609 controls, collected from multiple vendors and imaging centers to ensure robustness across acquisition protocols (14). The mammograms analyzed in this study were not included in any stage of model training or validation, ensuring a true external validation design.
During training, supervised learning was performed using binary cross-entropy loss, with ground-truth cancer outcomes as labels. The CNN weights were optimized via back-propagation with stochastic gradient descent and momentum, and early stopping was used to prevent overfitting. Model evaluation on held-out test sets yielded an area under the curve (AUC) of 0.73 for FFDM and 0.80 for DBT, outperforming traditional risk-assessment tools such as TC and Gail models.
In this study, only index (baseline) screening mammograms interpreted as benign or negative (BI-RADS 1–2) were processed by the AI model. These exams were obtained prior to the subsequent cancer diagnosis, ensuring that the algorithm evaluated pre-diagnostic images rather than mammograms displaying visible tumor signs. The AI-generated risk scores, therefore, represent predicted probabilities derived solely from the initial, ostensibly normal screening studies.
Thus, ProFound AI® Risk functions as an end-to-end deep-learning pipeline that extracts latent imaging biomarkers from screening mammograms and integrates them with minimal demographic input to generate an objective 1-year breast cancer risk score. By excluding our institutional data from its training set and applying the algorithm only to baseline benign/negative images, the current study provides an independent, retrospective validation of this model’s performance in women with dense breasts.
Traditional risk models for comparison
Tyrer-Cuzick (version 8) and Gail (version 2) models were included to enable direct comparison with the AI-based risk estimates. Risk calculations were performed during routine clinical assessment at each patient’s initial clinic visit, using the validated online calculators—the IBIS Breast Cancer Risk Evaluation Tool (Tyrer-Cuzick) version 8 and the NCI Breast Cancer Risk Assessment Tool (Gail). These scores were based on patient-provided information obtained at intake, including age, age at menarche, parity, age at first childbirth, family history of breast cancer, prior biopsies, hormonal therapy use, and BI-RADS breast-density category. For this study, the previously computed TC and Gail scores were extracted directly from the electronic medical record (EMR) without recalculation and verified by two independent reviewers for accuracy and completeness. The TC model provided lifetime and 10-year risk estimates, whereas the Gail model generated 5-year and lifetime risk scores.
Statistical analysis
Descriptive statistics, including means and standard deviations, were calculated for demographic and imaging characteristics in the cancer and non-cancer groups. Independent t-tests were used to compare continuous variables, including AI risk scores, TC scores, and Gail model scores, between the two groups. A two-tailed P value less than 0.05 was considered statistically significant. Cohen’s d effect sizes were also computed to quantify the magnitude of mean-score differences, complementing the P values.
Subgroup analyses were conducted according to breast density categories as determined by the AI model (categories A, B, C, and D) to assess whether model performance differed by density. Further subgroup analysis within the cancer group was performed based on histologic subtype, including ductal carcinoma in situ (DCIS), invasive ductal carcinoma (IDC), and invasive lobular carcinoma (ILC). Furthermore, cancer cases were analyzed by tumor grade to assess whether risk model scores varied according to tumor differentiation. Receiver operating characteristic (ROC) curve analysis was performed to evaluate the diagnostic performance of the AI model. The sensitivity and specificity were also determined. All statistical analyses were conducted using the International Business Machines (IBM) Corporation Statistical Package for the Social Sciences (SPSS) Statistics for Windows, Version 28.0 (IBM Corporation, Armonk, New York, USA).
Results
Results
Patient characteristics
A total of 416 patients were included in the study, comprising 70 (16.8%) cancer cases (sample case in Figure 1) and 346 (83.2%) non-cancer controls (sample case in Figure 2). The overall mean age was 61.12±10.41 years. The mean age in the cancer group was 61.01±10.81 years, while the non-cancer group had a similar mean age of 61.14±10.34 years. Based on AI-derived breast density, 1 patient (0.2%) was classified as category A, 124 (29.8%) as category B, 210 (50.5%) as category C, and 81 (19.5%) as category D. Among the cancer cases, histologic subtypes included 10 patients (14.3%) with DCIS, 46 (65.7%) with IDC, and 14 (20.0%) with ILC. Tumor grade, assessed using the Nottingham grading system, showed that 19 patients (27.1%) had grade 1 tumors, 32 (45.7%) had grade 2, and 19 (27.1%) had grade 3. Figures 1,2 show illustrative examples from the study cohort: Figure 1 presents a representative case from the cancer group and Figure 2 presents a representative non-cancer case. These examples demonstrate how the AI-derived risk scores vary with image appearance and parenchymal density
Risk score across models
The average risk scores were higher in the cancer group than in the non-cancer group across all models. However, neither the conventional risk models nor the AI model demonstrated statistically significant differences between the groups. Among all models, the AI model had the lowest P value (0.239) and the largest effect size (d=0.23). While not statistically significant, the AI model showed a trend towards greater separation between the groups (Table 1).
Comparative assessments of risk model discrimination across patient and tumor characteristics
Across breast density categories B to D, the AI model demonstrated a consistent trend of higher mean risk scores in cancer patients compared to non-cancer controls. Notably, there was a visible pattern of increasing separation between cancer and non-cancer scores with higher breast density. While the difference was minimal in category B and modest in category C, the separation was most pronounced in category D, where the AI model approached statistical significance and showed a larger effect size (P=0.080; d=0.41) (Table 2). The TC and Gail models also showed some separation across density categories, particularly at higher densities, but the magnitude and consistency of the differences were smaller than those of the AI model. None of their comparisons reached statistical significance. These findings suggest that the AI model may offer improved discriminatory power, particularly in women with extremely dense breasts.
When stratified by cancer histology, the AI model consistently showed higher mean risk scores in patients with DCIS, IDC, and ILC than in controls. Although the differences were not significant in the DCIS and IDC subgroups, a significant separation was observed in the ILC group (P=0.048; d=0.56), indicating the AI model’s potential advantage in identifying risk in this subtype. The TC and Gail models also showed a trend toward higher scores in cancer cases across subtypes, but the differences were smaller, and none reached statistical significance (Table 2).
Subgroup analysis by tumor grade revealed a similar trend (Table 2). The AI model showed progressively higher mean scores with increasing tumor grade in cancer patients, while maintaining relatively stable scores in non-cancer controls. Although the comparisons did not reach statistical significance, the greatest separation occurred in grade 2 tumors (P=0.089), suggesting that the model may capture imaging features associated with biologically relevant tumor aggressiveness. Traditional models demonstrated parallel trends, with slightly higher scores in higher-grade tumors, but the differences were less pronounced and failed to significantly distinguish between cancer and control groups.
Collectively, these subgroup findings reinforce the superior performance of the image-based AI model over traditional risk assessment tools, particularly in high-density breast tissue and certain tumor subgroups, while acknowledging that traditional models also showed some separation, albeit to a lesser extent.
Diagnostic accuracy of the AI and conventional risk models by breast density
The diagnostic performance of the AI model and conventional risk models across breast-density categories is summarized in Tables 3,4. As shown in Table 3, the AI model exhibited progressively higher discriminatory performance with increasing breast density, reaching its best accuracy in category D [AUC =0.75 (95% CI: 0.61–0.89); P=0.049] with a sensitivity of 69.3% and specificity of 61.1% at a threshold of 0.14. In contrast, the Tyrer-Cuzick and Gail models (Table 4) demonstrated relatively flat AUC profiles across density categories (range ≈0.54–0.63) without a density-dependent gain in performance. This comparative pattern, consistent with the trends observed in Table 2, underscores that the AI-based model uniquely improved diagnostic accuracy in extremely dense breasts, whereas conventional risk models remained largely unchanged.
Patient characteristics
A total of 416 patients were included in the study, comprising 70 (16.8%) cancer cases (sample case in Figure 1) and 346 (83.2%) non-cancer controls (sample case in Figure 2). The overall mean age was 61.12±10.41 years. The mean age in the cancer group was 61.01±10.81 years, while the non-cancer group had a similar mean age of 61.14±10.34 years. Based on AI-derived breast density, 1 patient (0.2%) was classified as category A, 124 (29.8%) as category B, 210 (50.5%) as category C, and 81 (19.5%) as category D. Among the cancer cases, histologic subtypes included 10 patients (14.3%) with DCIS, 46 (65.7%) with IDC, and 14 (20.0%) with ILC. Tumor grade, assessed using the Nottingham grading system, showed that 19 patients (27.1%) had grade 1 tumors, 32 (45.7%) had grade 2, and 19 (27.1%) had grade 3. Figures 1,2 show illustrative examples from the study cohort: Figure 1 presents a representative case from the cancer group and Figure 2 presents a representative non-cancer case. These examples demonstrate how the AI-derived risk scores vary with image appearance and parenchymal density
Risk score across models
The average risk scores were higher in the cancer group than in the non-cancer group across all models. However, neither the conventional risk models nor the AI model demonstrated statistically significant differences between the groups. Among all models, the AI model had the lowest P value (0.239) and the largest effect size (d=0.23). While not statistically significant, the AI model showed a trend towards greater separation between the groups (Table 1).
Comparative assessments of risk model discrimination across patient and tumor characteristics
Across breast density categories B to D, the AI model demonstrated a consistent trend of higher mean risk scores in cancer patients compared to non-cancer controls. Notably, there was a visible pattern of increasing separation between cancer and non-cancer scores with higher breast density. While the difference was minimal in category B and modest in category C, the separation was most pronounced in category D, where the AI model approached statistical significance and showed a larger effect size (P=0.080; d=0.41) (Table 2). The TC and Gail models also showed some separation across density categories, particularly at higher densities, but the magnitude and consistency of the differences were smaller than those of the AI model. None of their comparisons reached statistical significance. These findings suggest that the AI model may offer improved discriminatory power, particularly in women with extremely dense breasts.
When stratified by cancer histology, the AI model consistently showed higher mean risk scores in patients with DCIS, IDC, and ILC than in controls. Although the differences were not significant in the DCIS and IDC subgroups, a significant separation was observed in the ILC group (P=0.048; d=0.56), indicating the AI model’s potential advantage in identifying risk in this subtype. The TC and Gail models also showed a trend toward higher scores in cancer cases across subtypes, but the differences were smaller, and none reached statistical significance (Table 2).
Subgroup analysis by tumor grade revealed a similar trend (Table 2). The AI model showed progressively higher mean scores with increasing tumor grade in cancer patients, while maintaining relatively stable scores in non-cancer controls. Although the comparisons did not reach statistical significance, the greatest separation occurred in grade 2 tumors (P=0.089), suggesting that the model may capture imaging features associated with biologically relevant tumor aggressiveness. Traditional models demonstrated parallel trends, with slightly higher scores in higher-grade tumors, but the differences were less pronounced and failed to significantly distinguish between cancer and control groups.
Collectively, these subgroup findings reinforce the superior performance of the image-based AI model over traditional risk assessment tools, particularly in high-density breast tissue and certain tumor subgroups, while acknowledging that traditional models also showed some separation, albeit to a lesser extent.
Diagnostic accuracy of the AI and conventional risk models by breast density
The diagnostic performance of the AI model and conventional risk models across breast-density categories is summarized in Tables 3,4. As shown in Table 3, the AI model exhibited progressively higher discriminatory performance with increasing breast density, reaching its best accuracy in category D [AUC =0.75 (95% CI: 0.61–0.89); P=0.049] with a sensitivity of 69.3% and specificity of 61.1% at a threshold of 0.14. In contrast, the Tyrer-Cuzick and Gail models (Table 4) demonstrated relatively flat AUC profiles across density categories (range ≈0.54–0.63) without a density-dependent gain in performance. This comparative pattern, consistent with the trends observed in Table 2, underscores that the AI-based model uniquely improved diagnostic accuracy in extremely dense breasts, whereas conventional risk models remained largely unchanged.
Discussion
Discussion
Breast cancer screening strategies have evolved significantly, yet determining the optimal approach for women with mammographically dense breasts who do not meet traditional high-risk criteria remains a critical gap in clinical practice. While current guidelines provide clear recommendations for high-risk women of any density (2,15,16), there is limited guidance for intermediate-risk women with dense breast tissue. This group represents a substantial proportion of the screening population but occupies a clinical grey zone, where decisions regarding supplemental imaging are often left to individual clinician discretion or patient preference, leading to variability in care and potential underdiagnosis.
Unlike traditional models that rely on static clinical features, AI algorithms can identify complex imaging patterns that may signal an underlying risk of malignancy, even before detectable lesions emerge. This makes AI a compelling tool for personalized screening strategies, particularly for women with dense breasts. However, despite their promise, image-based AI models require robust clinical validation, particularly in women who are not identified as high risk by conventional criteria.
In our study, AI-assigned breast density provided a reproducible, objective alternative to radiologists’ subjective assessments, thereby enhancing consistency in risk stratification and reducing screening variability. Notably, AI-assigned density distribution in our cohort differed from radiologist-assigned BI-RADS categories, reflecting the well-documented divergence between visual and automated evaluations. This objectivity is critical, as visual BI-RADS density categorization is known to suffer from moderate-to-substantial inter-reader variability (17,18). By contrast, automated AI-based methods have demonstrated stronger inter-reader reliability, with some outperforming human readers in consistency, while maintaining comparable cancer risk discrimination (10). Prior work has shown that radiologist classification is influenced by image contrast, reader experience, and subjective interpretation of “masking”, whereas automated methods quantify fibroglandular tissue proportion directly from image data, thereby reducing human bias (19). Recent evidence from Da Rocha et al. further reinforces this advantage as their open-source convolutional neural network substantially outperformed the variability seen in human readers (20). These findings support the adoption of AI-based density measurement to ensure more accurate and equitable supplemental screening decisions
The AI model also consistently showed higher mean risk scores in cancer patients than in controls across all AI-assigned breast density categories, with the greatest separation observed in category D (extremely dense breasts). Although statistical significance was not achieved in all comparisons, the progressive increase in discriminatory performance with higher density supports the utility of AI models in populations where traditional mammography performs sub-optimally (3,4). In this context, the AI model’s enhanced discrimination in women with extremely dense breasts may reflect its capacity to extract subtle imaging features that are not apparent to human readers.
The model’s superior performance extended beyond density stratification. Subgroup analyses revealed a significant difference in risk scores among patients with ILC, a subtype characterized by a diffuse growth pattern and lower detectability on conventional mammography (21-23). This finding warrants particular attention, as ILC often demonstrates subtle parenchymal distortions and architectural asymmetries that may escape visual detection but could be captured by image-based AI models through higher-order textural or contextual features. The AI model’s improved performance in this subgroup suggests that deep-learning-based algorithms may identify predictive cues associated with ILC risk that are not readily apparent to human observers. Compared with traditional models, the AI model also showed better discrimination and a progressive increase in risk scores with advancing tumor grade, suggesting it may capture imaging phenotypes associated with biologically aggressive tumors.
However, the absence of statistical significance across several other subgroups and the overall cohort likely reflects multifactorial influences. The relatively small number of cancer cases, particularly within certain subtypes, limited statistical power to detect subtle differences. In addition, the inherent biological heterogeneity of breast cancers, variations in image acquisition parameters, and the diverse demographic and clinical characteristics of the study population may have introduced variability, diluting the statistical contrast between groups. Moreover, breast cancer risk prediction is intrinsically complex, as it depends on overlapping imaging, genetic, and hormonal factors that may not all be captured by imaging-based AI models alone. Despite these constraints, the consistent directionality of findings across subgroups strengthens the biological plausibility of the observed trends and underscores the need for larger, prospective, and multi-institutional validation to confirm these early signals of performance.
Critically, the AI model’s diagnostic accuracy, as measured by ROC analysis, rose substantially with increasing breast density, achieving an AUC of 0.75 in category D, a statistically significant result (P=0.049). Sensitivity (69.3%) and specificity (61.1%) in this group were also notable, suggesting meaningful clinical utility. These findings align with prior studies. Yala et al. reported an AUC of 0.68 for image-only models (10), while Eriksson et al. observed AUCs ranging from 0.65 to 0.74 in high-density cohorts (24).
Taken together, the observed subgroup differences across breast density, cancer subtype, and tumor grade suggest that the AI model may be identifying imaging biomarkers associated with both underlying cancer risk and tumor detectability, offering an opportunity for earlier diagnosis in women who would otherwise be missed by traditional approaches. These capabilities are particularly relevant in the intermediate-risk population, where decisions to pursue supplemental imaging are often subjective and inconsistent. By integrating image-based risk assessment with objective density assessment, AI models may support more equitable, precise, and individualized screening strategies.
At our institutions, MBI is offered as a supplemental screening modality for women with dense breast tissue and no other risk factors. A previous study demonstrated that MBI improves cancer detection rates and specificity in this subgroup (25). Our retrospective study focuses on predicting which women with dense breasts, not deemed high-risk by traditional clinical models, are likely to develop breast cancer in the short term. We assess the accuracy of an AI model in stratifying future cancer risk to determine whether it can more effectively guide the use of MBI for supplemental screening and evaluate the AI model’s predictive performance compared to existing tools. If effective, it could minimize the overuse of imaging, reduce patient burden, and improve early detection.
It is important to distinguish between the fundamental principles underlying traditional and AI-based risk models, as they operate on different temporal and biological scales. Conventional models such as the Gail and TC algorithms estimate long-term susceptibility to breast cancer by aggregating epidemiologic and hormonal factors—variables that evolve over years and reflect inherent biological predisposition. In contrast, image-based AI models are designed to capture short-term, imaging-derived indicators of tumor development, detecting subtle textural, architectural, or microvascular changes that may precede radiologically visible lesions. This distinction between biological susceptibility (long-term risk) and incipient detectability (short-term risk) may explain why traditional models often fail to identify women who develop cancer within a year of screening, whereas AI models—trained directly on imaging biomarkers—demonstrate higher short-term discriminatory performance even in the absence of broad statistical significance across all risk strata. This study contributes to a growing body of literature on integrating AI into clinical workflows, particularly in scenarios where conventional tools fall short. The findings may help address the gap in personalized screening for women with dense breast tissue and no other risk factors, supporting more efficient allocation of supplemental imaging resources. Moreover, given that several states have mandated breast density notification laws without clear clinical guidance, AI could serve as a decision-support tool to inform both clinicians and patients about personalized risk and screening options (26).
This study has several important limitations. First, its retrospective design inherently introduces potential biases, including selection and information bias. Second, the mammograms were obtained from three different sites within the same institution, which may have introduced some variability in imaging protocols and quality. While this is a limitation, it also reflects real-world variability and may enhance the applicability of our findings across broader, more diverse populations and clinical settings. Third, not all patients had complete data for the TC and Gail models, which may have affected the robustness of comparative analyses. Additionally, the cancer-negative group was defined based on benign or negative mammography reports, which raises the possibility of missed or undetected cancers, particularly in women with the most dense breasts. To offset this limitation, all non-cancer cases underwent at least two years of mammographic follow-up to confirm true negative status. The AI model used in this study was trained solely on mammographic images; incorporating additional clinical, sociodemographic, and genetic risk factors could potentially improve its predictive performance and is an ongoing area of AI development and research. Lastly, the number of confirmed cancer cases within the study period was relatively small, limiting statistical power in subgroup analyses. Despite these limitations, to our knowledge, this is one of the very few studies, if any, that evaluate breast cancer risk scores specifically in non–high-risk patients. This novel focus adds meaningful insight to the growing effort to personalize breast cancer screening strategies beyond traditionally high-risk populations.
Future studies should therefore prioritize several specific directions. First, larger, prospective, and multi-institutional cohorts are needed to validate these findings, particularly within distinct breast-density subgroups to confirm the model’s density-dependent behavior. Second, longitudinal studies that track cancer incidence and outcomes over time could determine whether AI-based risk stratification translates into earlier detection, reduced interval cancers, or improved survival. Third, integrating AI-generated risk scores with comprehensive clinical and genomic data could yield hybrid models that better reflect the multifactorial nature of breast cancer risk. Fourth, future research should explore how AI-guided screening strategies might optimize imaging intervals or tailor the choice of supplemental modalities (such as MBI, MRI, or ultrasound) in dense-breast populations. Fourth, future work should also include systematic cross-model evaluations—testing this AI model alongside other established AI models to determine relative performance across platforms and imaging vendors. Since Profound AI is among the first commercially available models of its kind, cross-platform, vendor-neutral validation will become increasingly important as additional AI tools emerge, ensuring the reproducibility, robustness, and generalizability of AI-driven risk assessment across diverse clinical environments. Finally, building on our observation with ILC, future studies should specifically explore AI systems optimized for lobular histology, with dedicated training datasets and multi-institutional validation cohorts. Such efforts could help determine whether AI-based risk stratification can improve early detection or screening triage in women with ILC—an area where conventional imaging remains limited.
Breast cancer screening strategies have evolved significantly, yet determining the optimal approach for women with mammographically dense breasts who do not meet traditional high-risk criteria remains a critical gap in clinical practice. While current guidelines provide clear recommendations for high-risk women of any density (2,15,16), there is limited guidance for intermediate-risk women with dense breast tissue. This group represents a substantial proportion of the screening population but occupies a clinical grey zone, where decisions regarding supplemental imaging are often left to individual clinician discretion or patient preference, leading to variability in care and potential underdiagnosis.
Unlike traditional models that rely on static clinical features, AI algorithms can identify complex imaging patterns that may signal an underlying risk of malignancy, even before detectable lesions emerge. This makes AI a compelling tool for personalized screening strategies, particularly for women with dense breasts. However, despite their promise, image-based AI models require robust clinical validation, particularly in women who are not identified as high risk by conventional criteria.
In our study, AI-assigned breast density provided a reproducible, objective alternative to radiologists’ subjective assessments, thereby enhancing consistency in risk stratification and reducing screening variability. Notably, AI-assigned density distribution in our cohort differed from radiologist-assigned BI-RADS categories, reflecting the well-documented divergence between visual and automated evaluations. This objectivity is critical, as visual BI-RADS density categorization is known to suffer from moderate-to-substantial inter-reader variability (17,18). By contrast, automated AI-based methods have demonstrated stronger inter-reader reliability, with some outperforming human readers in consistency, while maintaining comparable cancer risk discrimination (10). Prior work has shown that radiologist classification is influenced by image contrast, reader experience, and subjective interpretation of “masking”, whereas automated methods quantify fibroglandular tissue proportion directly from image data, thereby reducing human bias (19). Recent evidence from Da Rocha et al. further reinforces this advantage as their open-source convolutional neural network substantially outperformed the variability seen in human readers (20). These findings support the adoption of AI-based density measurement to ensure more accurate and equitable supplemental screening decisions
The AI model also consistently showed higher mean risk scores in cancer patients than in controls across all AI-assigned breast density categories, with the greatest separation observed in category D (extremely dense breasts). Although statistical significance was not achieved in all comparisons, the progressive increase in discriminatory performance with higher density supports the utility of AI models in populations where traditional mammography performs sub-optimally (3,4). In this context, the AI model’s enhanced discrimination in women with extremely dense breasts may reflect its capacity to extract subtle imaging features that are not apparent to human readers.
The model’s superior performance extended beyond density stratification. Subgroup analyses revealed a significant difference in risk scores among patients with ILC, a subtype characterized by a diffuse growth pattern and lower detectability on conventional mammography (21-23). This finding warrants particular attention, as ILC often demonstrates subtle parenchymal distortions and architectural asymmetries that may escape visual detection but could be captured by image-based AI models through higher-order textural or contextual features. The AI model’s improved performance in this subgroup suggests that deep-learning-based algorithms may identify predictive cues associated with ILC risk that are not readily apparent to human observers. Compared with traditional models, the AI model also showed better discrimination and a progressive increase in risk scores with advancing tumor grade, suggesting it may capture imaging phenotypes associated with biologically aggressive tumors.
However, the absence of statistical significance across several other subgroups and the overall cohort likely reflects multifactorial influences. The relatively small number of cancer cases, particularly within certain subtypes, limited statistical power to detect subtle differences. In addition, the inherent biological heterogeneity of breast cancers, variations in image acquisition parameters, and the diverse demographic and clinical characteristics of the study population may have introduced variability, diluting the statistical contrast between groups. Moreover, breast cancer risk prediction is intrinsically complex, as it depends on overlapping imaging, genetic, and hormonal factors that may not all be captured by imaging-based AI models alone. Despite these constraints, the consistent directionality of findings across subgroups strengthens the biological plausibility of the observed trends and underscores the need for larger, prospective, and multi-institutional validation to confirm these early signals of performance.
Critically, the AI model’s diagnostic accuracy, as measured by ROC analysis, rose substantially with increasing breast density, achieving an AUC of 0.75 in category D, a statistically significant result (P=0.049). Sensitivity (69.3%) and specificity (61.1%) in this group were also notable, suggesting meaningful clinical utility. These findings align with prior studies. Yala et al. reported an AUC of 0.68 for image-only models (10), while Eriksson et al. observed AUCs ranging from 0.65 to 0.74 in high-density cohorts (24).
Taken together, the observed subgroup differences across breast density, cancer subtype, and tumor grade suggest that the AI model may be identifying imaging biomarkers associated with both underlying cancer risk and tumor detectability, offering an opportunity for earlier diagnosis in women who would otherwise be missed by traditional approaches. These capabilities are particularly relevant in the intermediate-risk population, where decisions to pursue supplemental imaging are often subjective and inconsistent. By integrating image-based risk assessment with objective density assessment, AI models may support more equitable, precise, and individualized screening strategies.
At our institutions, MBI is offered as a supplemental screening modality for women with dense breast tissue and no other risk factors. A previous study demonstrated that MBI improves cancer detection rates and specificity in this subgroup (25). Our retrospective study focuses on predicting which women with dense breasts, not deemed high-risk by traditional clinical models, are likely to develop breast cancer in the short term. We assess the accuracy of an AI model in stratifying future cancer risk to determine whether it can more effectively guide the use of MBI for supplemental screening and evaluate the AI model’s predictive performance compared to existing tools. If effective, it could minimize the overuse of imaging, reduce patient burden, and improve early detection.
It is important to distinguish between the fundamental principles underlying traditional and AI-based risk models, as they operate on different temporal and biological scales. Conventional models such as the Gail and TC algorithms estimate long-term susceptibility to breast cancer by aggregating epidemiologic and hormonal factors—variables that evolve over years and reflect inherent biological predisposition. In contrast, image-based AI models are designed to capture short-term, imaging-derived indicators of tumor development, detecting subtle textural, architectural, or microvascular changes that may precede radiologically visible lesions. This distinction between biological susceptibility (long-term risk) and incipient detectability (short-term risk) may explain why traditional models often fail to identify women who develop cancer within a year of screening, whereas AI models—trained directly on imaging biomarkers—demonstrate higher short-term discriminatory performance even in the absence of broad statistical significance across all risk strata. This study contributes to a growing body of literature on integrating AI into clinical workflows, particularly in scenarios where conventional tools fall short. The findings may help address the gap in personalized screening for women with dense breast tissue and no other risk factors, supporting more efficient allocation of supplemental imaging resources. Moreover, given that several states have mandated breast density notification laws without clear clinical guidance, AI could serve as a decision-support tool to inform both clinicians and patients about personalized risk and screening options (26).
This study has several important limitations. First, its retrospective design inherently introduces potential biases, including selection and information bias. Second, the mammograms were obtained from three different sites within the same institution, which may have introduced some variability in imaging protocols and quality. While this is a limitation, it also reflects real-world variability and may enhance the applicability of our findings across broader, more diverse populations and clinical settings. Third, not all patients had complete data for the TC and Gail models, which may have affected the robustness of comparative analyses. Additionally, the cancer-negative group was defined based on benign or negative mammography reports, which raises the possibility of missed or undetected cancers, particularly in women with the most dense breasts. To offset this limitation, all non-cancer cases underwent at least two years of mammographic follow-up to confirm true negative status. The AI model used in this study was trained solely on mammographic images; incorporating additional clinical, sociodemographic, and genetic risk factors could potentially improve its predictive performance and is an ongoing area of AI development and research. Lastly, the number of confirmed cancer cases within the study period was relatively small, limiting statistical power in subgroup analyses. Despite these limitations, to our knowledge, this is one of the very few studies, if any, that evaluate breast cancer risk scores specifically in non–high-risk patients. This novel focus adds meaningful insight to the growing effort to personalize breast cancer screening strategies beyond traditionally high-risk populations.
Future studies should therefore prioritize several specific directions. First, larger, prospective, and multi-institutional cohorts are needed to validate these findings, particularly within distinct breast-density subgroups to confirm the model’s density-dependent behavior. Second, longitudinal studies that track cancer incidence and outcomes over time could determine whether AI-based risk stratification translates into earlier detection, reduced interval cancers, or improved survival. Third, integrating AI-generated risk scores with comprehensive clinical and genomic data could yield hybrid models that better reflect the multifactorial nature of breast cancer risk. Fourth, future research should explore how AI-guided screening strategies might optimize imaging intervals or tailor the choice of supplemental modalities (such as MBI, MRI, or ultrasound) in dense-breast populations. Fourth, future work should also include systematic cross-model evaluations—testing this AI model alongside other established AI models to determine relative performance across platforms and imaging vendors. Since Profound AI is among the first commercially available models of its kind, cross-platform, vendor-neutral validation will become increasingly important as additional AI tools emerge, ensuring the reproducibility, robustness, and generalizability of AI-driven risk assessment across diverse clinical environments. Finally, building on our observation with ILC, future studies should specifically explore AI systems optimized for lobular histology, with dedicated training datasets and multi-institutional validation cohorts. Such efforts could help determine whether AI-based risk stratification can improve early detection or screening triage in women with ILC—an area where conventional imaging remains limited.
Conclusions
Conclusions
This study demonstrates that an image-based AI risk model can provide enhanced discriminatory performance for breast cancer risk assessment in specific subgroups, particularly among women with extremely dense breast tissue (category D) and those with ILC. Although statistically significant differences were not observed across all comparisons, the consistent trend of higher AI-derived risk scores among patients with dense-breast and lobular subgroups suggests that AI-based imaging analysis captures subtle, biologically relevant features not discernible by traditional questionnaire-based models, particularly among non-high-risk patients. These findings highlight the complementary role of AI-derived risk tools in breast cancer screening workflows, where they may help overcome the limitations of conventional models, reduce subjectivity in breast-density classification, and better identify women who could benefit from supplemental imaging or tailored surveillance strategies.
This study demonstrates that an image-based AI risk model can provide enhanced discriminatory performance for breast cancer risk assessment in specific subgroups, particularly among women with extremely dense breast tissue (category D) and those with ILC. Although statistically significant differences were not observed across all comparisons, the consistent trend of higher AI-derived risk scores among patients with dense-breast and lobular subgroups suggests that AI-based imaging analysis captures subtle, biologically relevant features not discernible by traditional questionnaire-based models, particularly among non-high-risk patients. These findings highlight the complementary role of AI-derived risk tools in breast cancer screening workflows, where they may help overcome the limitations of conventional models, reduce subjectivity in breast-density classification, and better identify women who could benefit from supplemental imaging or tailored surveillance strategies.
Supplementary
Supplementary
The article’s supplementary files as
The article’s supplementary files as
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- Generational trends in reproductive factors among women in the US: implications for breast cancer incidence.
- Large-scale meta-analysis and precision functional assays identify FANCM regions in which PTVs confer different risks for ER-negative and triple-negative breast cancer.
- Recreational physical activity and biomarkers of breast cancer risk in a cohort of adolescent girls.
- Artificial Intelligence Approaches for Predictive Biomarker Discovery in Non-Small Cell Lung Cancer.
- Seeing the Unseen: Artificial Intelligence-Assisted Detection of Subtle Colorectal Adenomas During Colonoscopy.
- Increased Toll-Like Receptor-4 Signalling in Breast Tissue of High Fibroglandular Density.