irAE-GPT: leveraging large language models to identify immune-related adverse events in electronic health records and clinical trial datasets.

Bejan CA; Wang M; Venkateswaran S; Bergmann EA; Hiles L; Xu Y; Chandler GS; Brondfield S; Silverstein J; Wright F; de Dios K; Kim DM; Mukherjee E; Krantz MS; Yao L; Johnson DB; Phillips EJ; Balko JM; Mohindra R; Quandt Z

doi:10.1016/j.ebiom.2026.106227

← 뒤로

irAE-GPT: leveraging large language models to identify immune-related adverse events in electronic health records and clinical trial datasets.

1/5 보강

EBioMedicine 📖 저널 OA 98.2% 2022~2026 2026 p. 106227 OA

Bejan CA, Wang M, Venkateswaran S, Bergmann EA, Hiles L, Xu Y

📖 무료 전문 🔓 OA PDF unpaywall · cc-by

PubMed ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

[BACKGROUND] Large language models (LLMs) have emerged as transformative technologies, revolutionising natural language understanding and generation across various domains, including medicine.

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)

표본수 (n) 64

이 논문을 인용하기

↓ .bib ↓ .ris

APA Bejan CA, Wang M, et al. (2026). irAE-GPT: leveraging large language models to identify immune-related adverse events in electronic health records and clinical trial datasets.. EBioMedicine, 106227. https://doi.org/10.1016/j.ebiom.2026.106227

MLA Bejan CA, et al.. "irAE-GPT: leveraging large language models to identify immune-related adverse events in electronic health records and clinical trial datasets.." EBioMedicine, 2026, pp. 106227.

PMID 41951517 ↗

DOI 10.1016/j.ebiom.2026.106227

Abstract

[BACKGROUND] Large language models (LLMs) have emerged as transformative technologies, revolutionising natural language understanding and generation across various domains, including medicine. In this study, we investigated the capabilities, limitations, and generalisability of Generative Pre-trained Transformer (GPT) models in analysing unstructured patient notes from large healthcare datasets to identify immune-related adverse events (irAEs) associated with the use of immune checkpoint inhibitor (ICI) therapy.

[METHODS] We evaluated the performance of GPT-3.5, GPT-4, and GPT-4o models on manually annotated datasets of patients receiving ICI therapy, sampled from two electronic health record (EHR) systems and seven clinical trials. A zero-shot prompt was designed to exhaustively identify irAEs at both the patient level (main analysis) and the note level (secondary analysis). The LLM-based system followed a multi-label classification approach to identify any combination of irAEs associated with individual patients or clinical notes. System evaluation was conducted for each available irAE as well as for broader categories of irAEs classified at the organ level.

[FINDINGS] Our analysis included 442 patients across three institutions. The most common irAEs manually identified in the patient datasets included pneumonitis (N = 64), colitis (N = 56), rash (N = 32), and hepatitis (N = 28). The GPT models demonstrated generalisable abilities in identifying irAEs across EHRs and clinical trial reports. Overall, the models achieved relatively high sensitivity and specificity but only moderate positive predictive values, reflecting a potential bias towards overpredicting irAE outcomes. GPT-4o achieved the highest F1 and micro-averaged F1 scores for both patient-level and note-level evaluations. Highest performance was observed in the haematological (F1 range = 1.0-1.0), gastrointestinal (F1 range = 0.81-0.85), and musculoskeletal and rheumatologic (F1 range = 0.67-1.0) irAE categories. Error analysis uncovered substantial limitations of GPT models in handling textual causation, where adverse events should not only be accurately identified in clinical text but also causally linked to immune checkpoint inhibitors.

[INTERPRETATION] This study demonstrated that GPT models can automate the detection of immune related adverse events in varied healthcare datasets, reducing the burden on physicians and other healthcare professionals by limiting the need for manual review. This capability will accelerate the generation of safety insights across large healthcare datasets and facilitate the characterisation of patient-level drivers of toxicities, thus enhancing safety monitoring and ultimately improving patient care.

[FUNDING] National Institutes of Health, Roche, National Health and Medical Research Council of Australia, Stevens-Johnson Syndrome Foundation, Angela Anderson Research Fund, Larry L Hillblom Foundation and UCSF Research Allocation Program.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

The Barriers and Enablers to Participation in Oncology Clinical Trials for Ethnically Diverse Communities: A Qualitative Systematic Review Using Metaethnography.
Cancer nursing 2026 Turner L 외 📖 OA
A novel real-world data methodology for lymphoma outcome classification: the real-world Lugano study.
Journal of comparative effectiveness research 2026 Swain RS 외 📖 unpaywall
Updates on Clinical Trials and Molecular Characteristics of Locally Advanced and Oligometastatic Renal Cell Carcinoma.
International journal of molecular sciences 2026 Ogunmola TM 외 📖 OA
Phase II randomized study of first-line carboplatin and paclitaxel in combination with pembrolizumab, followed by maintenance pembrolizumab alone or with nesuparib, in mismatch-repair proficient, advanced or recurrent endometrial cancer (PENELOPE).
Journal of gynecologic oncology 2026 Kim SI 외 📖 unpaywall
Quality of Life Measurement in PARP Inhibitor Trials of Epithelial Ovarian Cancer - What Do We Know?
Cancer control : journal of the Moffitt Cancer Center 2026 Kumari S 외 📖 OA
Metastasis-directed SBRT for oligometastatic hormone sensitive prostate cancer (METRO): protocol for a prospective randomised phase III trial, NCT04983095.
BMC cancer 2026 Söderkvist K 외 📖 unpaywall

이 논문을 인용하기

Abstract 한글 요약

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

Abstract