본문으로 건너뛰기
← 뒤로

irAE-GPT: leveraging large language models to identify immune-related adverse events in electronic health records and clinical trial datasets.

1/5 보강
EBioMedicine 📖 저널 OA 98.2% 2022: 1/1 OA 2023: 2/2 OA 2024: 8/8 OA 2025: 16/16 OA 2026: 25/26 OA 2022~2026 2026 p. 106227 OA
Retraction 확인
출처
PubMed DOI 마지막 보강 2026-04-30

Bejan CA, Wang M, Venkateswaran S, Bergmann EA, Hiles L, Xu Y

📝 환자 설명용 한 줄

[BACKGROUND] Large language models (LLMs) have emerged as transformative technologies, revolutionising natural language understanding and generation across various domains, including medicine.

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)
  • 표본수 (n) 64

이 논문을 인용하기

↓ .bib ↓ .ris
APA Bejan CA, Wang M, et al. (2026). irAE-GPT: leveraging large language models to identify immune-related adverse events in electronic health records and clinical trial datasets.. EBioMedicine, 106227. https://doi.org/10.1016/j.ebiom.2026.106227
MLA Bejan CA, et al.. "irAE-GPT: leveraging large language models to identify immune-related adverse events in electronic health records and clinical trial datasets.." EBioMedicine, 2026, pp. 106227.
PMID 41951517 ↗

Abstract

[BACKGROUND] Large language models (LLMs) have emerged as transformative technologies, revolutionising natural language understanding and generation across various domains, including medicine. In this study, we investigated the capabilities, limitations, and generalisability of Generative Pre-trained Transformer (GPT) models in analysing unstructured patient notes from large healthcare datasets to identify immune-related adverse events (irAEs) associated with the use of immune checkpoint inhibitor (ICI) therapy.

[METHODS] We evaluated the performance of GPT-3.5, GPT-4, and GPT-4o models on manually annotated datasets of patients receiving ICI therapy, sampled from two electronic health record (EHR) systems and seven clinical trials. A zero-shot prompt was designed to exhaustively identify irAEs at both the patient level (main analysis) and the note level (secondary analysis). The LLM-based system followed a multi-label classification approach to identify any combination of irAEs associated with individual patients or clinical notes. System evaluation was conducted for each available irAE as well as for broader categories of irAEs classified at the organ level.

[FINDINGS] Our analysis included 442 patients across three institutions. The most common irAEs manually identified in the patient datasets included pneumonitis (N = 64), colitis (N = 56), rash (N = 32), and hepatitis (N = 28). The GPT models demonstrated generalisable abilities in identifying irAEs across EHRs and clinical trial reports. Overall, the models achieved relatively high sensitivity and specificity but only moderate positive predictive values, reflecting a potential bias towards overpredicting irAE outcomes. GPT-4o achieved the highest F1 and micro-averaged F1 scores for both patient-level and note-level evaluations. Highest performance was observed in the haematological (F1 range = 1.0-1.0), gastrointestinal (F1 range = 0.81-0.85), and musculoskeletal and rheumatologic (F1 range = 0.67-1.0) irAE categories. Error analysis uncovered substantial limitations of GPT models in handling textual causation, where adverse events should not only be accurately identified in clinical text but also causally linked to immune checkpoint inhibitors.

[INTERPRETATION] This study demonstrated that GPT models can automate the detection of immune related adverse events in varied healthcare datasets, reducing the burden on physicians and other healthcare professionals by limiting the need for manual review. This capability will accelerate the generation of safety insights across large healthcare datasets and facilitate the characterisation of patient-level drivers of toxicities, thus enhancing safety monitoring and ultimately improving patient care.

[FUNDING] National Institutes of Health, Roche, National Health and Medical Research Council of Australia, Stevens-Johnson Syndrome Foundation, Angela Anderson Research Fund, Larry L Hillblom Foundation and UCSF Research Allocation Program.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🔓 OA PDF 열기