Automating the Observer OPTION-5 measure of shared decision making: Assessing validity by comparing large language models to human ratings.

Selvaraj SP; Yen RW; Forcino R; Elwyn G

doi:10.1016/j.pec.2025.109362

← 뒤로

Automating the Observer OPTION-5 measure of shared decision making: Assessing validity by comparing large language models to human ratings.

1/5 보강

Patient education and counseling 📖 저널 OA 25% 2024~2026 2026 Vol.142() p. 109362

Selvaraj SP, Yen RW, Forcino R, Elwyn G

PubMed ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

[OBJECTIVES] Observer-based measures of shared decision rely on human raters, it is resource-intensive, limiting routine assessment and improvement.

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)

p-value p < 0.01
연구 설계 randomized controlled trial

이 논문을 인용하기

↓ .bib ↓ .ris

APA Selvaraj SP, Yen RW, et al. (2026). Automating the Observer OPTION-5 measure of shared decision making: Assessing validity by comparing large language models to human ratings.. Patient education and counseling, 142, 109362. https://doi.org/10.1016/j.pec.2025.109362

MLA Selvaraj SP, et al.. "Automating the Observer OPTION-5 measure of shared decision making: Assessing validity by comparing large language models to human ratings.." Patient education and counseling, vol. 142, 2026, pp. 109362.

PMID 41016196 ↗

DOI 10.1016/j.pec.2025.109362

Abstract

[OBJECTIVES] Observer-based measures of shared decision rely on human raters, it is resource-intensive, limiting routine assessment and improvement. Generative artificial intelligence could increase the speed and accuracy of observer-based evaluation while reducing the burden. This study aimed to assess the performance of large language models (LLMs) from Gemini, GPT, and LLaMA family of models in evaluating the extent of shared decision-making between clinicians and women considering surgery for early-stage breast cancer.

[METHODS] LLM-generated scores were compared with those of trained human raters from a randomized controlled trial using the 5-item Observer OPTION-5 measure. We analyzed 287 anonymized transcripts of breast cancer consultations. A series of prompts were tested across models, assessing correlations with human scores. We also evaluated the ability of LLMs to distinguish high versus low encounters and the impact of inter-rater agreement on performance. RESULTS: The scores for Observer OPTION-5 items generated by the GPT-4o and Gemini-1.5-Pro-002 correlated with human ratings (Pearson r ≈ 0.6, p-value<0.01), representing ≈ 75-80 % of the correlation observed between human raters themselves (r = 0.77). Providing detailed descriptions and examples improved the models' performance. The results also confirm that the models could distinguish high- from low-scoring encounters, with an independent-samples t-test showing a large and significant separation between the two groups (t > 10, p < 0.01).

[CONCLUSIONS] Based on the breast cancer surgery dataset we explored, LLMs can evaluate aspects of clinician-patient dialog using existing measures, providing the basis for the development and fine-tuning of prompts. Future work should focus on generalizability, larger datasets, and improving model performance.

[PRACTICE IMPLICATIONS] The prospect of being able to automate the assessment of shared decision-making opens the door to rapid feedback as a means for reflective practice improvement.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

Scoring Physician Risk Communication in Prostate Cancer Using Large Language Models.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 2026 Lopez-Garcia G 외 📖 unpaywall
Real-world Treatment Selection and Shared Decision-making in De Novo Metastatic Castration-sensitive Prostate Cancer in Japan.
Cancer diagnosis & prognosis 2026 Furubayashi N 외 📖 unpaywall
Association of Patient Comorbidities With Treatment Regret Among Patients With Localized Prostate Cancer - Results From a Population-Based Cohort.
Practical radiation oncology 2026 Mali RD 외 📖 unpaywall
DualPG-DTA: A Large Language Model-Powered Graph Neural Network Framework for Enhanced Drug-Target Affinity Prediction and Discovery of Novel CDK9 Inhibitors Exhibiting In Vivo Anti-Leukemia Activity.
Advanced science (Weinheim, Baden-Wurttemberg, Germany) 2026 Chen Y 외 📖 unpaywall
Automating the segmentation, date extraction, and classification of multi-report PDFs in outside medical records using optical character recognition and generative artificial intelligence.
JAMIA open 2026 Damani S 외 📖 OA
Genetic underpinnings of type-2 diabetes (T2D) with colorectal cancer (CRC): In-silico discovery of common molecular signatures, pathogenetic processes and therapeutic candidates.
Journal, genetic engineering & biotechnology 2026 Ahmmed R 외 📖 OA

이 논문을 인용하기

Abstract 한글 요약

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

Abstract