Automated Classification of Adverse Events After Hydrogel Perirectal Spacer Insertion for Prostate Cancer Using Large Language Models.
[OBJECTIVE] To examine the performance of large language models (LLMs) for the analysis of adverse events (AEs) associated with a perirectal hydrogel spacer (SpaceOAR) prior to prostate radiation.
- 표본수 (n) 1455
APA
Sohoni N, Sohoni NS, et al. (2026). Automated Classification of Adverse Events After Hydrogel Perirectal Spacer Insertion for Prostate Cancer Using Large Language Models.. Urology, 209, 18-24. https://doi.org/10.1016/j.urology.2026.01.008
MLA
Sohoni N, et al.. "Automated Classification of Adverse Events After Hydrogel Perirectal Spacer Insertion for Prostate Cancer Using Large Language Models.." Urology, vol. 209, 2026, pp. 18-24.
PMID
41565161
Abstract
[OBJECTIVE] To examine the performance of large language models (LLMs) for the analysis of adverse events (AEs) associated with a perirectal hydrogel spacer (SpaceOAR) prior to prostate radiation.
[METHODS] We queried the Food and Drug Administration's (FDA's) Manufacturer and User Facility Device Experience (MAUDE) database to extract reports related to "SpaceOAR". Ninety-seven reports were initially manually abstracted to classify the problems associated with each event, subsequent modifications in radiation timing, and severity using the Common Terminology Criteria for Adverse Events (CTCAE) score. We compared the accuracy of 3 families of LLMs when compared to human abstraction. The highest-performing LLM was then used to classify AEs based on all available MAUDE data for spaceOAR (n = 1455) from January 2015 to December 2024.
[RESULTS] The ability of LLMs to correctly identify the AE outcomes was aggregated into an overall score. The highest-performing model was GPT-4o, with an overall score of 4.96 (σ = 0.00526) compared to the human reviewers who had an overall score of 4.99 (σ =.216). When run on all 1455 reports, GPT-4o revealed that the most common primary problems were malpositioned gel (58.7%), infection/inflammation/abscess (10.4%), fistula (7.1%), and rectal ulcer (4.7%). ICU level care and death were reported 0.1% and 0.3% of the time, respectively.
[CONCLUSION] These findings highlight the potential for LLMs to automate the time-consuming process of tabulating device-related AEs. Reported serious AEs associated with spaceOAR underscore potential safety concerns, warranting dynamic ongoing surveillance and careful consideration when opting to implement hydrogel spacers.
[METHODS] We queried the Food and Drug Administration's (FDA's) Manufacturer and User Facility Device Experience (MAUDE) database to extract reports related to "SpaceOAR". Ninety-seven reports were initially manually abstracted to classify the problems associated with each event, subsequent modifications in radiation timing, and severity using the Common Terminology Criteria for Adverse Events (CTCAE) score. We compared the accuracy of 3 families of LLMs when compared to human abstraction. The highest-performing LLM was then used to classify AEs based on all available MAUDE data for spaceOAR (n = 1455) from January 2015 to December 2024.
[RESULTS] The ability of LLMs to correctly identify the AE outcomes was aggregated into an overall score. The highest-performing model was GPT-4o, with an overall score of 4.96 (σ = 0.00526) compared to the human reviewers who had an overall score of 4.99 (σ =.216). When run on all 1455 reports, GPT-4o revealed that the most common primary problems were malpositioned gel (58.7%), infection/inflammation/abscess (10.4%), fistula (7.1%), and rectal ulcer (4.7%). ICU level care and death were reported 0.1% and 0.3% of the time, respectively.
[CONCLUSION] These findings highlight the potential for LLMs to automate the time-consuming process of tabulating device-related AEs. Reported serious AEs associated with spaceOAR underscore potential safety concerns, warranting dynamic ongoing surveillance and careful consideration when opting to implement hydrogel spacers.
MeSH Terms
Humans; Male; Prostatic Neoplasms; Hydrogels; Rectum; United States; Large Language Models