Comparison of artificial intelligence (AI) services for Breast Imaging-Reporting and Data System (BI-RADS) classification on mammograms.

Vasilev Y; Mayorova A; Rumyantsev D; Semenov S; Bobrovskaya T; Pestrenin L; Erizhokov R; Vladzymyrskyy A; Omelyanskaya O; Arzamasov K

doi:10.21037/qims-2025-1658

← 뒤로

Comparison of artificial intelligence (AI) services for Breast Imaging-Reporting and Data System (BI-RADS) classification on mammograms.

1/5 보강

Quantitative imaging in medicine and surgery 2026 Vol.16(4) p. 311

Vasilev Y, Mayorova A, Rumyantsev D, Semenov S, Bobrovskaya T, Pestrenin L, Erizhokov R, Vladzymyrskyy A, Omelyanskaya O, Arzamasov K

PMC 전문 ↗ 원문 ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

이 논문을 인용하기

BibTeX ↓ RIS ↓

APA Vasilev Y, Mayorova A, et al. (2026). Comparison of artificial intelligence (AI) services for Breast Imaging-Reporting and Data System (BI-RADS) classification on mammograms.. Quantitative imaging in medicine and surgery, 16(4), 311. https://doi.org/10.21037/qims-2025-1658

MLA Vasilev Y, et al.. "Comparison of artificial intelligence (AI) services for Breast Imaging-Reporting and Data System (BI-RADS) classification on mammograms.." Quantitative imaging in medicine and surgery, vol. 16, no. 4, 2026, pp. 311.

PMID 41972034

DOI 10.21037/qims-2025-1658

Abstract

[BACKGROUND] Existing literature primarily focuses on artificial intelligence (AI) ability to detect malignant breast tumors, often neglecting or limiting analysis to Breast Imaging-Reporting and Data System (BI-RADS) categories 4 and 5. The diagnostic performance of AI for other BI-RADS categories remains understudied. The objective of this study is to compare the diagnostic accuracy of three mammographic AI services in predicting individual BI-RADS categories and definition of opportunity integration of AI into routine clinical practice.

[METHODS] Anonymized mammograms were obtained from the Unified Radiological Information Service of Moscow. Inclusion criteria: screening mammogram, radiology report from an AI and a human radiologist, age patients 40-75 years. Exclusion criteria: mammograms without BI-RADS categories, BI-RADS categories 0 and 6. The AI performance was assessed by calculating their diagnostic performance using the radiologists' opinion as the ground truth together with the calibration tests.

[RESULTS] The study sample consisted of 81,895 mammograms. Median accuracy was 76.9%, with a positive predictive value (PPV) of 11.8%. The highest negative predictive value (NPV) was observed for BI-RADS 2 (78.5-83.4%). The second highest NPVs were observed for BI-RADS 1, 3, 4, and 5 (over 84.7%). Binary classification yielded median accuracy and PPV values of 80.5% and 98.6% respectively, compared to the calibration testing (76.0% and 84.7%).

[CONCLUSIONS] Most AI service metrics were suboptimal for individual BI-RADS prediction, potentially due to reliance on variable radiologist conclusions and lack of histological calibration. Binary classification demonstrated higher performance metrics, and no significant differences in NPV were observed across AI applications, which means they can be recommended to confirm the absence of pathology. Successful integration of AI into routine clinical practice requires consideration of various diagnostic accuracy assessment methods, tailored to specific use cases.