Automated Classification of Adverse Events After Hydrogel Perirectal Spacer Insertion for Prostate Cancer Using Large Language Models.

Sohoni N; Sohoni NS; Sutherland RA; Sundaresan VM; Smani S; Ananth P; Onofrey JA; Aneja S; Miszczyk M; Lee HJ; Olivieri JE; Leapman MS

doi:10.1016/j.urology.2026.01.008

← 뒤로

Automated Classification of Adverse Events After Hydrogel Perirectal Spacer Insertion for Prostate Cancer Using Large Language Models.

Urology 2026 Vol.209() p. 18-24

Sohoni N, Sohoni NS, Sutherland RA, Sundaresan VM, Smani S, Ananth P, Onofrey JA, Aneja S, Miszczyk M, Lee HJ, Olivieri JE, Leapman MS

원문 ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

[OBJECTIVE] To examine the performance of large language models (LLMs) for the analysis of adverse events (AEs) associated with a perirectal hydrogel spacer (SpaceOAR) prior to prostate radiation.

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)

표본수 (n) 1455

이 논문을 인용하기

BibTeX ↓ RIS ↓

APA Sohoni N, Sohoni NS, et al. (2026). Automated Classification of Adverse Events After Hydrogel Perirectal Spacer Insertion for Prostate Cancer Using Large Language Models.. Urology, 209, 18-24. https://doi.org/10.1016/j.urology.2026.01.008

MLA Sohoni N, et al.. "Automated Classification of Adverse Events After Hydrogel Perirectal Spacer Insertion for Prostate Cancer Using Large Language Models.." Urology, vol. 209, 2026, pp. 18-24.

PMID 41565161

DOI 10.1016/j.urology.2026.01.008

Abstract

[OBJECTIVE] To examine the performance of large language models (LLMs) for the analysis of adverse events (AEs) associated with a perirectal hydrogel spacer (SpaceOAR) prior to prostate radiation.

[METHODS] We queried the Food and Drug Administration's (FDA's) Manufacturer and User Facility Device Experience (MAUDE) database to extract reports related to "SpaceOAR". Ninety-seven reports were initially manually abstracted to classify the problems associated with each event, subsequent modifications in radiation timing, and severity using the Common Terminology Criteria for Adverse Events (CTCAE) score. We compared the accuracy of 3 families of LLMs when compared to human abstraction. The highest-performing LLM was then used to classify AEs based on all available MAUDE data for spaceOAR (n = 1455) from January 2015 to December 2024.

[RESULTS] The ability of LLMs to correctly identify the AE outcomes was aggregated into an overall score. The highest-performing model was GPT-4o, with an overall score of 4.96 (σ = 0.00526) compared to the human reviewers who had an overall score of 4.99 (σ =.216). When run on all 1455 reports, GPT-4o revealed that the most common primary problems were malpositioned gel (58.7%), infection/inflammation/abscess (10.4%), fistula (7.1%), and rectal ulcer (4.7%). ICU level care and death were reported 0.1% and 0.3% of the time, respectively.

[CONCLUSION] These findings highlight the potential for LLMs to automate the time-consuming process of tabulating device-related AEs. Reported serious AEs associated with spaceOAR underscore potential safety concerns, warranting dynamic ongoing surveillance and careful consideration when opting to implement hydrogel spacers.

MeSH Terms

Humans; Male; Prostatic Neoplasms; Hydrogels; Rectum; United States; Large Language Models