Performance of latest AI models, RAG, and MCP on lung cancer-related questions.

Zhao X; Yang M; Tian K; Jiang H; Guo D; Wang Y; Du J

doi:10.1177/20552076261427503

← 뒤로

Performance of latest AI models, RAG, and MCP on lung cancer-related questions.

Digital health 2026 Vol.12() p. 20552076261427503

Zhao X, Yang M, Tian K, Jiang H, Guo D, Wang Y, Du J

PMC 전문 ↗ 원문 ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

[BACKGROUND] Large language models (LLMs) have advanced rapidly.

이 논문을 인용하기

BibTeX ↓ RIS ↓

APA Zhao X, Yang M, et al. (2026). Performance of latest AI models, RAG, and MCP on lung cancer-related questions.. Digital health, 12, 20552076261427503. https://doi.org/10.1177/20552076261427503

MLA Zhao X, et al.. "Performance of latest AI models, RAG, and MCP on lung cancer-related questions.." Digital health, vol. 12, 2026, pp. 20552076261427503.

PMID 41836629

DOI 10.1177/20552076261427503

Abstract

[BACKGROUND] Large language models (LLMs) have advanced rapidly. However, concerns remain regarding their reliability in clinical settings due to the inherent issues of hallucinations and inadequate referencing.

[MATERIALS AND METHODS] We evaluated six current LLMs: GPT-4.1 (GPT), o3, Gemini-2.5-Pro-Preview-0506 (Gemini), Grok-3 (Grok), Qwen3-235B-A22B (Qwen3), and Claude Sonnet 4 (Claude), as well as two technologies that extend LLM capabilities using external knowledge bases: retrieval-augmented generation (RAG) and Model Context Protocol (MCP). Each model was evaluated using 50 questions selected from a 132-question pool developed based on the Chinese Medical Association guideline for clinical diagnosis and treatment of lung cancer (2024 Edition). Three models-Qwen, GPT, and Grok-were further analyzed to assess performance changes with RAG and MCP integration. All responses were independently reviewed by two qualitative evaluators.

[RESULTS] Overall, o3 achieved the highest accuracy (50%), followed by GPT (48%) and Gemini (48%), then Grok (44%), Qwen (40%), and Claude (36%). However, implementing RAG (LLM-RAG) or MCP (LLM-MCP) significantly improved accuracy, with statistical differences observed between baseline LLMs and their RAG- or MCP-enhanced counterparts. Lexical richness and semantic noise both diminished, whereas the semantic clarity and accuracy of verbs, noun-verb combinations, and content words improved.

[CONCLUSIONS] The six latest LLMs performed similarly on lung cancer-related questions. The integration of RAG or MCP significantly enhanced accuracy while simplifying sentence structure, focusing more on the main topics, and using more accurate vocabulary.

같은 제1저자의 인용 많은 논문 (5)

Heterogeneous Magnetic Resonance Nanoprobe for Assisting Liver Fibrosis Three-Dimensional Reconstruction and Cascaded Therapy.
ACS nano 2026
Key molecules and functional subsets of regulatory T cells in maternal-fetal immune tolerance: Recent advances.
Journal of reproductive immunology 2026
Population pharmacokinetics and exposure-response analysis of durvalumab in patients with resectable stage II to IIIB (N2) NSCLC in the phase III AEGEAN study.
British journal of clinical pharmacology 2026
CCDC137 stabilizes S100A6 to activate the PI3K/AKT pathway and drive acute myeloid leukemia progression.
Journal of leukocyte biology 2026
A Novel Modified Bu/Vp16/cy/Flu/Ara-C Conditioning Regimen Enhances Outcomes for High-Risk Acute Lymphoblastic Leukemia Patients Undergoing Allogeneic Hematopoietic Stem Cell Transplantation.
Cancer medicine 2026