본문으로 건너뛰기
← 뒤로

Assessing the Safety and Clinical Appropriateness of Breast Cancer Advice From Consumer-Grade Large Language Models.

2/5 보강
Journal of medical imaging and radiation oncology 📖 저널 OA 15.4% 2023: 0/1 OA 2025: 2/3 OA 2026: 2/20 OA 2023~2026 2026 Global Cancer Incidence and Screenin
Retraction 확인
출처
PubMed DOI OpenAlex 마지막 보강 2026-04-30
OpenAlex 토픽 · Global Cancer Incidence and Screening Health Literacy and Information Accessibility Patient-Provider Communication in Healthcare

Njunge M, Huang Y, Li R, Karunairajah A, Burns N, Falkner N, Porter G

📝 환자 설명용 한 줄

[INTRODUCTION] Freely available consumer large language models (LLMs) have become a common source of health information for patients.

이 논문을 인용하기

↓ .bib ↓ .ris
APA Michael Njunge, Yang Huang, et al. (2026). Assessing the Safety and Clinical Appropriateness of Breast Cancer Advice From Consumer-Grade Large Language Models.. Journal of medical imaging and radiation oncology. https://doi.org/10.1111/1754-9485.70092
MLA Michael Njunge, et al.. "Assessing the Safety and Clinical Appropriateness of Breast Cancer Advice From Consumer-Grade Large Language Models.." Journal of medical imaging and radiation oncology, 2026.
PMID 41937254 ↗

Abstract

[INTRODUCTION] Freely available consumer large language models (LLMs) have become a common source of health information for patients. Though convenient, their consumer use by patients raises concerns about accuracy, safety and applicability to local clinical practice. We set out to assess how reliable and clinically appropriate breast cancer advice from three widely used LLMs (ChatGPT 3.5o, Gemini 2.0 and Perplexity (Standard)) is when applied in a Western Australian (WA) context.

[METHOD] We developed 31 questions covering breast cancer prevention, screening, imaging and management. Each LLM was asked the same question three times. The final answers were assessed for qualitative and quantitative reliability and graded for clinical appropriateness by a blinded panel of Consultant Breast Surgeons and Radiologists.

[RESULTS] All three models performed well in terms of reliability, with ChatGPT and Perplexity providing consistent answers to all questions. ChatGPT had the highest rate of clinically appropriate answers (97%), followed by Perplexity (90%) and Gemini (87%). Inappropriate responses were more common when questions included WA-specific terminology, particularly for Perplexity and Gemini. Agreement between Surgeons was strong, while Radiologists showed variability in their ratings.

[CONCLUSION] LLMs can provide reliable and generally appropriate breast cancer advice, but performance suffers regarding WA-specific breast screening terminology. Our findings highlight how LLM performance is region-specific, and this fact is likely generalisable to other areas of medicine where there may be regional variance in practice. Overall, LLMs are useful as educational tools, but their outputs should always be interpreted considering local guidelines and with clinical oversight.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반