GPT-4 vs. radiologists: who advances mediastinal tumor classification better across report quality levels? a cohort study.
코호트
1/5 보강
[BACKGROUND] Accurate mediastinal tumor classification is crucial for treatment planning, but diagnostic performance varies with radiologists' experience and report quality.
- p-value P <0.001
- 95% CI 71.0-75.5
APA
Wen R, Li X, et al. (2025). GPT-4 vs. radiologists: who advances mediastinal tumor classification better across report quality levels? a cohort study.. International journal of surgery (London, England), 111(12), 9000-9011. https://doi.org/10.1097/JS9.0000000000003127
MLA
Wen R, et al.. "GPT-4 vs. radiologists: who advances mediastinal tumor classification better across report quality levels? a cohort study.." International journal of surgery (London, England), vol. 111, no. 12, 2025, pp. 9000-9011.
PMID
40788014 ↗
Abstract 한글 요약
[BACKGROUND] Accurate mediastinal tumor classification is crucial for treatment planning, but diagnostic performance varies with radiologists' experience and report quality.
[PURPOSE] To evaluate generative pretrained transformer's (GPT-4's) diagnostic accuracy in classifying mediastinal tumors from radiological reports compared to radiologists of different experience levels using radiological reports of varying quality.
[MATERIALS AND METHODS] We conducted a retrospective study of 1494 patients from five tertiary hospitals with mediastinal tumors diagnosed via chest CT and pathology. Radiological reports were categorized into low-, medium-, and high-quality based on predefined criteria assessed by experienced radiologists. Six radiologists (two residents, two attending radiologists, and two associate senior radiologists) and GPT-4 evaluated the chest CT reports. Diagnostic performance was analyzed overall, by report quality, and by tumor type using Wald χ2 tests and 95% CIs calculated via the Wilson method.
[RESULTS] GPT-4 achieved an overall diagnostic accuracy of 73.3% (95% CI: 71.0-75.5), comparable to associate senior radiologists (74.3%, 95% CI: 72.0-76.5; P >0.05). For low-quality reports, GPT-4 outperformed associate senior radiologists (60.8% vs. 51.1%, P <0.001). In high-quality reports, GPT-4 was comparable to attending radiologists (80.6% vs.79.4%, P >0.05). Diagnostic performance varied by tumor type: GPT-4 was comparable to radiology residents for neurogenic tumors (44.9% vs. 50.3%, P >0.05), similar to associate senior radiologists for teratomas (68.1% vs. 65.9%, P >0.05), and superior in diagnosing lymphoma (75.4% vs. 60.4%, P <0.001).
[CONCLUSION] GPT-4 demonstrated interpretation accuracy comparable to Associate Senior Radiologists, excelling in low-quality reports and outperforming them in diagnosing lymphoma. These findings underscore GPT-4's potential to enhance diagnostic performance in challenging diagnostic scenarios.
[SUMMARY] In this retrospective study involving 1494 chest CT reports of different quality from five tertiary hospitals, GPT-4 demonstrated diagnostic accuracy comparable to Associate Senior Radiologists in classifying mediastinal tumors from chest CT reports, excelling in low-quality reports and outperforming Associate Senior Radiologists in diagnosing specific tumor types like lymphoma, showcasing its potential to enhance diagnostic performance in challenging scenarios.
[PURPOSE] To evaluate generative pretrained transformer's (GPT-4's) diagnostic accuracy in classifying mediastinal tumors from radiological reports compared to radiologists of different experience levels using radiological reports of varying quality.
[MATERIALS AND METHODS] We conducted a retrospective study of 1494 patients from five tertiary hospitals with mediastinal tumors diagnosed via chest CT and pathology. Radiological reports were categorized into low-, medium-, and high-quality based on predefined criteria assessed by experienced radiologists. Six radiologists (two residents, two attending radiologists, and two associate senior radiologists) and GPT-4 evaluated the chest CT reports. Diagnostic performance was analyzed overall, by report quality, and by tumor type using Wald χ2 tests and 95% CIs calculated via the Wilson method.
[RESULTS] GPT-4 achieved an overall diagnostic accuracy of 73.3% (95% CI: 71.0-75.5), comparable to associate senior radiologists (74.3%, 95% CI: 72.0-76.5; P >0.05). For low-quality reports, GPT-4 outperformed associate senior radiologists (60.8% vs. 51.1%, P <0.001). In high-quality reports, GPT-4 was comparable to attending radiologists (80.6% vs.79.4%, P >0.05). Diagnostic performance varied by tumor type: GPT-4 was comparable to radiology residents for neurogenic tumors (44.9% vs. 50.3%, P >0.05), similar to associate senior radiologists for teratomas (68.1% vs. 65.9%, P >0.05), and superior in diagnosing lymphoma (75.4% vs. 60.4%, P <0.001).
[CONCLUSION] GPT-4 demonstrated interpretation accuracy comparable to Associate Senior Radiologists, excelling in low-quality reports and outperforming them in diagnosing lymphoma. These findings underscore GPT-4's potential to enhance diagnostic performance in challenging diagnostic scenarios.
[SUMMARY] In this retrospective study involving 1494 chest CT reports of different quality from five tertiary hospitals, GPT-4 demonstrated diagnostic accuracy comparable to Associate Senior Radiologists in classifying mediastinal tumors from chest CT reports, excelling in low-quality reports and outperforming Associate Senior Radiologists in diagnosing specific tumor types like lymphoma, showcasing its potential to enhance diagnostic performance in challenging scenarios.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
같은 제1저자의 인용 많은 논문 (2)
- Incorporating new criteria can improve the ability of CEUS LI-RADS to differentiate HCC from non-HCC malignancies.
- A novel nomogram integrated with preablation stimulated thyroglobulin and thyroglobulin/thyroid-stimulating hormone ratio to predict the therapeutic response of intermediate‑ and high‑risk differentiated thyroid cancer patients: a bi-center retrospective study.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- A Phase I Study of Hydroxychloroquine and Suba-Itraconazole in Men with Biochemical Relapse of Prostate Cancer (HITMAN-PC): Dose Escalation Results.
- Self-management of male urinary symptoms: qualitative findings from a primary care trial.
- Clinical and Liquid Biomarkers of 20-Year Prostate Cancer Risk in Men Aged 45 to 70 Years.
- Diagnostic accuracy of Ga-PSMA PET/CT versus multiparametric MRI for preoperative pelvic invasion in the patients with prostate cancer.
- Comprehensive analysis of androgen receptor splice variant target gene expression in prostate cancer.
- Clinical Presentation and Outcomes of Patients Undergoing Surgery for Thyroid Cancer.