본문으로 건너뛰기
← 뒤로

GPT-4 vs. radiologists: who advances mediastinal tumor classification better across report quality levels? a cohort study.

코호트 1/5 보강
International journal of surgery (London, England) 📖 저널 OA 62.3% 2021: 0/3 OA 2022: 0/6 OA 2023: 9/9 OA 2024: 53/53 OA 2025: 129/222 OA 2026: 156/242 OA 2021~2026 2025 Vol.111(12) p. 9000-9011
Retraction 확인
출처

Wen R, Li X, Chen K, Sun M, Zhu C, Xu P

📝 환자 설명용 한 줄

[BACKGROUND] Accurate mediastinal tumor classification is crucial for treatment planning, but diagnostic performance varies with radiologists' experience and report quality.

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)
  • p-value P <0.001
  • 95% CI 71.0-75.5

이 논문을 인용하기

↓ .bib ↓ .ris
APA Wen R, Li X, et al. (2025). GPT-4 vs. radiologists: who advances mediastinal tumor classification better across report quality levels? a cohort study.. International journal of surgery (London, England), 111(12), 9000-9011. https://doi.org/10.1097/JS9.0000000000003127
MLA Wen R, et al.. "GPT-4 vs. radiologists: who advances mediastinal tumor classification better across report quality levels? a cohort study.." International journal of surgery (London, England), vol. 111, no. 12, 2025, pp. 9000-9011.
PMID 40788014 ↗

Abstract

[BACKGROUND] Accurate mediastinal tumor classification is crucial for treatment planning, but diagnostic performance varies with radiologists' experience and report quality.

[PURPOSE] To evaluate generative pretrained transformer's (GPT-4's) diagnostic accuracy in classifying mediastinal tumors from radiological reports compared to radiologists of different experience levels using radiological reports of varying quality.

[MATERIALS AND METHODS] We conducted a retrospective study of 1494 patients from five tertiary hospitals with mediastinal tumors diagnosed via chest CT and pathology. Radiological reports were categorized into low-, medium-, and high-quality based on predefined criteria assessed by experienced radiologists. Six radiologists (two residents, two attending radiologists, and two associate senior radiologists) and GPT-4 evaluated the chest CT reports. Diagnostic performance was analyzed overall, by report quality, and by tumor type using Wald χ2 tests and 95% CIs calculated via the Wilson method.

[RESULTS] GPT-4 achieved an overall diagnostic accuracy of 73.3% (95% CI: 71.0-75.5), comparable to associate senior radiologists (74.3%, 95% CI: 72.0-76.5; P >0.05). For low-quality reports, GPT-4 outperformed associate senior radiologists (60.8% vs. 51.1%, P <0.001). In high-quality reports, GPT-4 was comparable to attending radiologists (80.6% vs.79.4%, P >0.05). Diagnostic performance varied by tumor type: GPT-4 was comparable to radiology residents for neurogenic tumors (44.9% vs. 50.3%, P >0.05), similar to associate senior radiologists for teratomas (68.1% vs. 65.9%, P >0.05), and superior in diagnosing lymphoma (75.4% vs. 60.4%, P <0.001).

[CONCLUSION] GPT-4 demonstrated interpretation accuracy comparable to Associate Senior Radiologists, excelling in low-quality reports and outperforming them in diagnosing lymphoma. These findings underscore GPT-4's potential to enhance diagnostic performance in challenging diagnostic scenarios.

[SUMMARY] In this retrospective study involving 1494 chest CT reports of different quality from five tertiary hospitals, GPT-4 demonstrated diagnostic accuracy comparable to Associate Senior Radiologists in classifying mediastinal tumors from chest CT reports, excelling in low-quality reports and outperforming Associate Senior Radiologists in diagnosing specific tumor types like lymphoma, showcasing its potential to enhance diagnostic performance in challenging scenarios.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (2)

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기