Large language models for toxicity extraction in oncology trials: A real-world benchmark in prostate radiotherapy.

Mastroleo F; Borras-Osorio M; Patel SP; Peterson S; Wilson R; Zhou M; Shiraishi S; Foong AYK; Routman DM; Waddle MR

doi:10.1016/j.radonc.2025.111348

← 뒤로

Large language models for toxicity extraction in oncology trials: A real-world benchmark in prostate radiotherapy.

1/5 보강

Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology 📖 저널 OA 24.3% 2021~2026 2026 Vol.216() p. 111348

PICO 자동 추출 (휴리스틱, conf 2/4)

유사 논문

P · Population 대상 환자/모집단

55 patients, 8968 toxicity records).

I · Intervention 중재 / 시술

추출되지 않음

C · Comparison 대조 / 비교

추출되지 않음

O · Outcome 결과 / 결론

[CONCLUSIONS] Off-the-shelf LLMs can extract clinically relevant toxicities with performance approaching human inter-rater reliability, at variable but often negligible costs. While grade-level accuracy remains limited, LLM integration into oncology workflows is feasible, offering scalable, low-cost support for toxicity monitoring and data abstraction in clinical research.

Mastroleo F, Borras-Osorio M, Patel SP, Peterson S, Wilson R, Zhou M

PubMed ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)

표본수 (n) 55
Sensitivity 74.0 %

이 논문을 인용하기

↓ .bib ↓ .ris

APA Mastroleo F, Borras-Osorio M, et al. (2026). Large language models for toxicity extraction in oncology trials: A real-world benchmark in prostate radiotherapy.. Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology, 216, 111348. https://doi.org/10.1016/j.radonc.2025.111348

MLA Mastroleo F, et al.. "Large language models for toxicity extraction in oncology trials: A real-world benchmark in prostate radiotherapy.." Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology, vol. 216, 2026, pp. 111348.

PMID 41419026 ↗

DOI 10.1016/j.radonc.2025.111348

Abstract

[BACKGROUND] Accurate toxicity assessment is critical in oncology trials, yet current reporting frameworks such as the Common Terminology Criteria for Adverse Events (CTCAE) remain labor-intensive and subject to inter-observer variability. Large language models (LLMs) offer potential to automate extraction and grading of adverse events from clinical notes and patient-reported outcomes (PROs), but their comparative performance and cost-effectiveness remain underexplored.

[METHODS] We evaluated five off-the-shelf LLMs (Gemini 2.0 Flash, Gemini 2.5 Flash, Gemini 2.5 Pro, GPT-4o, and GPT-5) using a rule-augmented few-shot prompting strategy to extract CTCAE-graded gastrointestinal and genitourinary toxicities from a prospective prostate radiotherapy trial (NCT02874014; n = 55 patients, 8968 toxicity records). Binary and grade-level accuracy, precision, recall, specificity, F1 score, Cohen's kappa, and computational costs were assessed.

[RESULTS] All models achieved high binary accuracy (84.6-87.4 %) and moderate grade accuracy (79.1-82.3 %). GPT-4o reached the best binary (87.4 %) and grade (83.5 %) accuracy, while Gemini 2.5 Pro demonstrated highest sensitivity (74.0 %). Specificity peaked with GPT-4o (96.0 %). Cohen's kappa values indicated moderate agreement (0.552-0.560 for binary; 0.401-0.465 for grades). Costs for the entire extraction varied substantially: Gemini 2.0 Flash delivered competitive accuracy at $0.77 total, whereas Gemini 2.5 Pro and GPT-5 exceeded $21.

[CONCLUSIONS] Off-the-shelf LLMs can extract clinically relevant toxicities with performance approaching human inter-rater reliability, at variable but often negligible costs. While grade-level accuracy remains limited, LLM integration into oncology workflows is feasible, offering scalable, low-cost support for toxicity monitoring and data abstraction in clinical research.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (1)

Penalized-Survival Nomogram Predicts 5-Year Metastasis-Free Survival After Salvage Radiotherapy for Postprostatectomy Patients: A Multicenter Study.
Clinical genitourinary cancer 2025

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

A Phase I Study of Hydroxychloroquine and Suba-Itraconazole in Men with Biochemical Relapse of Prostate Cancer (HITMAN-PC): Dose Escalation Results.
Cancer research communications 2026 Talmor B 외 📖 unpaywall
Self-management of male urinary symptoms: qualitative findings from a primary care trial.
The British journal of general practice : the journal of the Royal College of General Practitioners 2026 Wheeler JR 외 📖 unpaywall
Clinical and Liquid Biomarkers of 20-Year Prostate Cancer Risk in Men Aged 45 to 70 Years.
JAMA network open 2026 Lindholz M 외 📖 unpaywall
Diagnostic accuracy of Ga-PSMA PET/CT versus multiparametric MRI for preoperative pelvic invasion in the patients with prostate cancer.
Science progress 2026 Qin Z 외 📖 unpaywall
Comprehensive analysis of androgen receptor splice variant target gene expression in prostate cancer.
Biochimica et biophysica acta. Molecular cell research 2026 Wüstmann N 외 📖 unpaywall
Clinical Presentation and Outcomes of Patients Undergoing Surgery for Thyroid Cancer.
Journal of the College of Physicians and Surgeons--Pakistan : JCPSP 2026 Khan MMU 외 📖 unpaywall

이 논문을 인용하기

Abstract 한글 요약

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (1)

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

Abstract