Promise and pitfalls of AI chatbots in complex decision-making for thyroid nodules and papillary thyroid cancer.

Grigoris Effraimidis; Athanasios Kasotas; Sofia Varsami; Eleni Sazakli; Olga Karapanou; Katerina Saltiki; Marina Michalaki

doi:10.1530/ETJ-25-0385

← 뒤로

Promise and pitfalls of AI chatbots in complex decision-making for thyroid nodules and papillary thyroid cancer.

설문조사 2/5 보강

European thyroid journal 📖 저널 OA 100% 2022~2026 2026 Vol.15(2) OA Artificial Intelligence in Healthcar

OpenAlex 토픽 · Artificial Intelligence in Healthcare and Education Clinical Reasoning and Diagnostic Skills Machine Learning in Healthcare

Effraimidis G, Kasotas A, Varsami S, Sazakli E, Karapanou O, Saltiki K

📖 무료 전문 🟢 PMC 전문 PMC13087872 🔓 OA PDF unpaywall · cc-by

PubMed ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

[INTRODUCTION] Artificial intelligence (AI) chatbots are increasingly used in medicine, but their reliability in scenarios with multiple management options is unclear.

이 논문을 인용하기

↓ .bib ↓ .ris

APA Grigoris Effraimidis, Athanasios Kasotas, et al. (2026). Promise and pitfalls of AI chatbots in complex decision-making for thyroid nodules and papillary thyroid cancer.. European thyroid journal, 15(2). https://doi.org/10.1530/ETJ-25-0385

MLA Grigoris Effraimidis, et al.. "Promise and pitfalls of AI chatbots in complex decision-making for thyroid nodules and papillary thyroid cancer.." European thyroid journal, vol. 15, no. 2, 2026.

PMID 41885289 ↗

DOI 10.1530/ETJ-25-0385

Abstract

[INTRODUCTION] Artificial intelligence (AI) chatbots are increasingly used in medicine, but their reliability in scenarios with multiple management options is unclear. Indeterminate thyroid nodules and low- and low-to-intermediate-risk papillary thyroid carcinoma (PTC) represent such cases.

[METHODS] In a nationwide web-based survey, 201 members of the Hellenic Endocrine Society evaluated 12 clinical vignettes on indeterminate thyroid nodules and low- and low-to-intermediate-risk PTC. Their responses were compared with those generated by four conversational AI models (ChatGPT, Gemini, Copilot, and DeepSeek) at two time points, 11 months apart. DeepSeek was assessed only at the second time point. Chatbot outputs were assessed for agreement with endocrinologists' predominant answers, concordance with the most guideline-consistent options (American and European Thyroid Association recommendations), temporal stability, and inter-model agreement.

[RESULTS] Alignment between chatbots and endocrinologists' predominant responses was limited, reaching at most 25% across scenarios. In contrast, concordance with the most guideline-consistent options was higher, up to 83% (10/12 scenarios), depending on the model and time point. Across 12 scenarios, ChatGPT, Gemini, and Copilot changed their responses in 4, 7, and 5 scenarios, respectively, with some updates moving closer to, and others further from, guideline-based answers. Inter-model agreement ranged from 33 to 67%, indicating substantial variability among chatbots.

[CONCLUSION] AI chatbots show evolving but inconsistent performance in complex thyroid management scenarios. While guideline concordance can be relatively high, substantial variability across models, limited temporal reproducibility, and poor alignment with clinical practice highlight the need for ongoing longitudinal evaluation before safe integration into clinical decision-making.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

A Phase I Study of Hydroxychloroquine and Suba-Itraconazole in Men with Biochemical Relapse of Prostate Cancer (HITMAN-PC): Dose Escalation Results.
Cancer research communications 2026 Talmor B 외 📖 unpaywall
Self-management of male urinary symptoms: qualitative findings from a primary care trial.
The British journal of general practice : the journal of the Royal College of General Practitioners 2026 Wheeler JR 외 📖 unpaywall
Clinical and Liquid Biomarkers of 20-Year Prostate Cancer Risk in Men Aged 45 to 70 Years.
JAMA network open 2026 Lindholz M 외 📖 unpaywall
Diagnostic accuracy of Ga-PSMA PET/CT versus multiparametric MRI for preoperative pelvic invasion in the patients with prostate cancer.
Science progress 2026 Qin Z 외 📖 unpaywall
Comprehensive analysis of androgen receptor splice variant target gene expression in prostate cancer.
Biochimica et biophysica acta. Molecular cell research 2026 Wüstmann N 외 📖 unpaywall
Clinical Presentation and Outcomes of Patients Undergoing Surgery for Thyroid Cancer.
Journal of the College of Physicians and Surgeons--Pakistan : JCPSP 2026 Khan MMU 외 📖 unpaywall

이 논문을 인용하기

Abstract 한글 요약

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

Abstract