Assessing the Safety and Clinical Appropriateness of Breast Cancer Advice From Consumer-Grade Large Language Models.
2/5 보강
OpenAlex 토픽 ·
Global Cancer Incidence and Screening
Health Literacy and Information Accessibility
Patient-Provider Communication in Healthcare
[INTRODUCTION] Freely available consumer large language models (LLMs) have become a common source of health information for patients.
APA
Michael Njunge, Yang Huang, et al. (2026). Assessing the Safety and Clinical Appropriateness of Breast Cancer Advice From Consumer-Grade Large Language Models.. Journal of medical imaging and radiation oncology. https://doi.org/10.1111/1754-9485.70092
MLA
Michael Njunge, et al.. "Assessing the Safety and Clinical Appropriateness of Breast Cancer Advice From Consumer-Grade Large Language Models.." Journal of medical imaging and radiation oncology, 2026.
PMID
41937254 ↗
Abstract 한글 요약
[INTRODUCTION] Freely available consumer large language models (LLMs) have become a common source of health information for patients. Though convenient, their consumer use by patients raises concerns about accuracy, safety and applicability to local clinical practice. We set out to assess how reliable and clinically appropriate breast cancer advice from three widely used LLMs (ChatGPT 3.5o, Gemini 2.0 and Perplexity (Standard)) is when applied in a Western Australian (WA) context.
[METHOD] We developed 31 questions covering breast cancer prevention, screening, imaging and management. Each LLM was asked the same question three times. The final answers were assessed for qualitative and quantitative reliability and graded for clinical appropriateness by a blinded panel of Consultant Breast Surgeons and Radiologists.
[RESULTS] All three models performed well in terms of reliability, with ChatGPT and Perplexity providing consistent answers to all questions. ChatGPT had the highest rate of clinically appropriate answers (97%), followed by Perplexity (90%) and Gemini (87%). Inappropriate responses were more common when questions included WA-specific terminology, particularly for Perplexity and Gemini. Agreement between Surgeons was strong, while Radiologists showed variability in their ratings.
[CONCLUSION] LLMs can provide reliable and generally appropriate breast cancer advice, but performance suffers regarding WA-specific breast screening terminology. Our findings highlight how LLM performance is region-specific, and this fact is likely generalisable to other areas of medicine where there may be regional variance in practice. Overall, LLMs are useful as educational tools, but their outputs should always be interpreted considering local guidelines and with clinical oversight.
[METHOD] We developed 31 questions covering breast cancer prevention, screening, imaging and management. Each LLM was asked the same question three times. The final answers were assessed for qualitative and quantitative reliability and graded for clinical appropriateness by a blinded panel of Consultant Breast Surgeons and Radiologists.
[RESULTS] All three models performed well in terms of reliability, with ChatGPT and Perplexity providing consistent answers to all questions. ChatGPT had the highest rate of clinically appropriate answers (97%), followed by Perplexity (90%) and Gemini (87%). Inappropriate responses were more common when questions included WA-specific terminology, particularly for Perplexity and Gemini. Agreement between Surgeons was strong, while Radiologists showed variability in their ratings.
[CONCLUSION] LLMs can provide reliable and generally appropriate breast cancer advice, but performance suffers regarding WA-specific breast screening terminology. Our findings highlight how LLM performance is region-specific, and this fact is likely generalisable to other areas of medicine where there may be regional variance in practice. Overall, LLMs are useful as educational tools, but their outputs should always be interpreted considering local guidelines and with clinical oversight.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- Early local immune activation following intra-operative radiotherapy in human breast tissue.
- Overall survival and prognostic factors in young women with breast cancer: a retrospective cohort study from Southern Thailand.
- Age at First Pregnancy, Adult Weight Gain and Postmenopausal Breast Cancer Risk: The PROCAS Study (United Kingdom).
- Advances in Targeted Therapy for Human Epidermal Growth Factor Receptor 2-Low Tumors: From Trastuzumab to Antibody-Drug Conjugates.
- Structural determinants of glycosaminoglycan oligosaccharides as LL-37 inhibitors in breast cancer.
- Artificial intelligence and breast cancer screening in Serbia: a dual-perspective qualitative study among radiologists and screening-aged women.