본문으로 건너뛰기
← 뒤로

Performance of Retrieval-Augmented Generation Large Language Models in Guideline-Concordant Prostate-Specific Antigen Testing: Comparative Study With Junior Clinicians.

1/5 보강
Journal of medical Internet research 2025 Vol.27() p. e78393
Retraction 확인
출처

Tung JYM, Le Q, Yao J, Huang Y, Lim DYZ, Sng GGR, Lau RSE, Tan YG, Chen K, Tay KJ, Tan JH, Yuen JSP, Cheng CWS, Ho HSS

📝 환자 설명용 한 줄

[BACKGROUND] Prostate-specific antigen (PSA) testing remains the cornerstone of early prostate cancer detection.

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)
  • p-value P<.001

이 논문을 인용하기

BibTeX ↓ RIS ↓
APA Tung JYM, Le Q, et al. (2025). Performance of Retrieval-Augmented Generation Large Language Models in Guideline-Concordant Prostate-Specific Antigen Testing: Comparative Study With Junior Clinicians.. Journal of medical Internet research, 27, e78393. https://doi.org/10.2196/78393
MLA Tung JYM, et al.. "Performance of Retrieval-Augmented Generation Large Language Models in Guideline-Concordant Prostate-Specific Antigen Testing: Comparative Study With Junior Clinicians.." Journal of medical Internet research, vol. 27, 2025, pp. e78393.
PMID 41259800
DOI 10.2196/78393

Abstract

[BACKGROUND] Prostate-specific antigen (PSA) testing remains the cornerstone of early prostate cancer detection. Society guidelines for prostate cancer screening via PSA testing serve to standardize patient care and are often used by trainees, junior staff, or generalist medical practitioners to guide medical decision-making. However, adherence to guidelines is a time-consuming and challenging task, and rates of inappropriate PSA testing are high. Retrieval-augmented generation (RAG) is a method to enhance the reliability of large language models (LLMs) by grounding responses in trusted external sources.

[OBJECTIVE] This study aimed to evaluate a RAG-enhanced LLM system, grounded in current European Association of Urology and American Urological Association guidelines, to assess its effectiveness in providing guideline-concordant PSA screening recommendations compared to junior clinicians.

[METHODS] A series of 44 fictional outpatient case scenarios was developed to represent a broad spectrum of clinical presentations. A RAG pipeline was developed, comprising a life expectancy estimation module based on the Charlson Comorbidity Index, followed by LLM-generated recommendations constrained to retrieved excerpts from the European Association of Urology and American Urological Association guidelines. Five junior clinicians were tasked to provide PSA testing recommendations for the same scenarios in closed-book and open-book formats. Answers were compared for accuracy in a binomial fashion. Fleiss κ was computed to assess interrater agreement among clinicians.

[RESULTS] The RAG-LLM tool provided guideline-concordant recommendations in 95.5% (210/220) of case scenarios, compared to junior clinicians, who were correct in 62.3% (137/220) of scenarios in a closed-book format and 74.1% (163/220) of scenarios in an open-book format. The difference was statistically significant for both closed-book (P<.001) and open-book (P<.001) formats. Interrater agreement among clinicians was fair, with Fleiss κ of 0.294 and 0.321 for closed-book and open-book formats, respectively.

[CONCLUSIONS] Use of RAG techniques allows LLMs to integrate complex guidelines into day-to-day medical decision-making. RAG-LLM tools in urology have the capability to enhance clinical decision-making by providing guideline-concordant recommendations for PSA testing, potentially improving the consistency of health care delivery, reducing cognitive load on clinicians, and reducing unnecessary investigations and costs. While this study used synthetic cases in a controlled simulation environment, it establishes a foundation for future validation in real-world clinical settings.

MeSH Terms

Humans; Prostate-Specific Antigen; Male; Prostatic Neoplasms; Practice Guidelines as Topic; Guideline Adherence; Large Language Models

같은 제1저자의 인용 많은 논문 (1)