irAE-GPT: leveraging large language models to identify immune-related adverse events in electronic health records and clinical trial datasets.
1/5 보강
[BACKGROUND] Large language models (LLMs) have emerged as transformative technologies, revolutionising natural language understanding and generation across various domains, including medicine.
- 표본수 (n) 64
APA
Bejan CA, Wang M, et al. (2026). irAE-GPT: leveraging large language models to identify immune-related adverse events in electronic health records and clinical trial datasets.. EBioMedicine, 106227. https://doi.org/10.1016/j.ebiom.2026.106227
MLA
Bejan CA, et al.. "irAE-GPT: leveraging large language models to identify immune-related adverse events in electronic health records and clinical trial datasets.." EBioMedicine, 2026, pp. 106227.
PMID
41951517 ↗
Abstract 한글 요약
[BACKGROUND] Large language models (LLMs) have emerged as transformative technologies, revolutionising natural language understanding and generation across various domains, including medicine. In this study, we investigated the capabilities, limitations, and generalisability of Generative Pre-trained Transformer (GPT) models in analysing unstructured patient notes from large healthcare datasets to identify immune-related adverse events (irAEs) associated with the use of immune checkpoint inhibitor (ICI) therapy.
[METHODS] We evaluated the performance of GPT-3.5, GPT-4, and GPT-4o models on manually annotated datasets of patients receiving ICI therapy, sampled from two electronic health record (EHR) systems and seven clinical trials. A zero-shot prompt was designed to exhaustively identify irAEs at both the patient level (main analysis) and the note level (secondary analysis). The LLM-based system followed a multi-label classification approach to identify any combination of irAEs associated with individual patients or clinical notes. System evaluation was conducted for each available irAE as well as for broader categories of irAEs classified at the organ level.
[FINDINGS] Our analysis included 442 patients across three institutions. The most common irAEs manually identified in the patient datasets included pneumonitis (N = 64), colitis (N = 56), rash (N = 32), and hepatitis (N = 28). The GPT models demonstrated generalisable abilities in identifying irAEs across EHRs and clinical trial reports. Overall, the models achieved relatively high sensitivity and specificity but only moderate positive predictive values, reflecting a potential bias towards overpredicting irAE outcomes. GPT-4o achieved the highest F1 and micro-averaged F1 scores for both patient-level and note-level evaluations. Highest performance was observed in the haematological (F1 range = 1.0-1.0), gastrointestinal (F1 range = 0.81-0.85), and musculoskeletal and rheumatologic (F1 range = 0.67-1.0) irAE categories. Error analysis uncovered substantial limitations of GPT models in handling textual causation, where adverse events should not only be accurately identified in clinical text but also causally linked to immune checkpoint inhibitors.
[INTERPRETATION] This study demonstrated that GPT models can automate the detection of immune related adverse events in varied healthcare datasets, reducing the burden on physicians and other healthcare professionals by limiting the need for manual review. This capability will accelerate the generation of safety insights across large healthcare datasets and facilitate the characterisation of patient-level drivers of toxicities, thus enhancing safety monitoring and ultimately improving patient care.
[FUNDING] National Institutes of Health, Roche, National Health and Medical Research Council of Australia, Stevens-Johnson Syndrome Foundation, Angela Anderson Research Fund, Larry L Hillblom Foundation and UCSF Research Allocation Program.
[METHODS] We evaluated the performance of GPT-3.5, GPT-4, and GPT-4o models on manually annotated datasets of patients receiving ICI therapy, sampled from two electronic health record (EHR) systems and seven clinical trials. A zero-shot prompt was designed to exhaustively identify irAEs at both the patient level (main analysis) and the note level (secondary analysis). The LLM-based system followed a multi-label classification approach to identify any combination of irAEs associated with individual patients or clinical notes. System evaluation was conducted for each available irAE as well as for broader categories of irAEs classified at the organ level.
[FINDINGS] Our analysis included 442 patients across three institutions. The most common irAEs manually identified in the patient datasets included pneumonitis (N = 64), colitis (N = 56), rash (N = 32), and hepatitis (N = 28). The GPT models demonstrated generalisable abilities in identifying irAEs across EHRs and clinical trial reports. Overall, the models achieved relatively high sensitivity and specificity but only moderate positive predictive values, reflecting a potential bias towards overpredicting irAE outcomes. GPT-4o achieved the highest F1 and micro-averaged F1 scores for both patient-level and note-level evaluations. Highest performance was observed in the haematological (F1 range = 1.0-1.0), gastrointestinal (F1 range = 0.81-0.85), and musculoskeletal and rheumatologic (F1 range = 0.67-1.0) irAE categories. Error analysis uncovered substantial limitations of GPT models in handling textual causation, where adverse events should not only be accurately identified in clinical text but also causally linked to immune checkpoint inhibitors.
[INTERPRETATION] This study demonstrated that GPT models can automate the detection of immune related adverse events in varied healthcare datasets, reducing the burden on physicians and other healthcare professionals by limiting the need for manual review. This capability will accelerate the generation of safety insights across large healthcare datasets and facilitate the characterisation of patient-level drivers of toxicities, thus enhancing safety monitoring and ultimately improving patient care.
[FUNDING] National Institutes of Health, Roche, National Health and Medical Research Council of Australia, Stevens-Johnson Syndrome Foundation, Angela Anderson Research Fund, Larry L Hillblom Foundation and UCSF Research Allocation Program.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- The Barriers and Enablers to Participation in Oncology Clinical Trials for Ethnically Diverse Communities: A Qualitative Systematic Review Using Metaethnography.
- A novel real-world data methodology for lymphoma outcome classification: the real-world Lugano study.
- Updates on Clinical Trials and Molecular Characteristics of Locally Advanced and Oligometastatic Renal Cell Carcinoma.
- Phase II randomized study of first-line carboplatin and paclitaxel in combination with pembrolizumab, followed by maintenance pembrolizumab alone or with nesuparib, in mismatch-repair proficient, advanced or recurrent endometrial cancer (PENELOPE).
- Quality of Life Measurement in PARP Inhibitor Trials of Epithelial Ovarian Cancer - What Do We Know?
- Metastasis-directed SBRT for oligometastatic hormone sensitive prostate cancer (METRO): protocol for a prospective randomised phase III trial, NCT04983095.