Leveraging large language models to populate structured clinical case report forms from unstructured medical notes in radiation oncology.

Marcel Nachbar; Nianzi Yi; Marcel Büttner; Cihan Gani; Maximilian Niyazi; Augusto García-Agúndez; Carsten Eickhoff; Daniela Thorwarth

doi:10.1016/j.ctro.2026.101143

← 뒤로

Leveraging large language models to populate structured clinical case report forms from unstructured medical notes in radiation oncology.

증례보고 3/5 보강

Clinical and translational radiation oncology 📖 저널 OA 100% 2021~2026 2026 Vol.58() p. 101143 cited 1 OA Topic Modeling

TL;DR Large language models can automatically extract and structure data from unstructured medical notes with an average time of 16 s per note, indicating inaccuracies in the routine ground truth.

OpenAlex 토픽 · Topic Modeling Artificial Intelligence in Healthcare and Education Machine Learning in Healthcare

Nachbar M, Yi N, Büttner M, Gani C, Niyazi M, Garcia-Agundez A

📖 무료 전문 🟢 PMC 전문 PMC12996807 🔓 OA PDF unpaywall · cc-by

PubMed ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

Large language models can automatically extract and structure data from unstructured medical notes with an average time of 16 s per note, indicating inaccuracies in the routine ground truth.

이 논문을 인용하기

↓ .bib ↓ .ris

APA Marcel Nachbar, Nianzi Yi, et al. (2026). Leveraging large language models to populate structured clinical case report forms from unstructured medical notes in radiation oncology.. Clinical and translational radiation oncology, 58, 101143. https://doi.org/10.1016/j.ctro.2026.101143

MLA Marcel Nachbar, et al.. "Leveraging large language models to populate structured clinical case report forms from unstructured medical notes in radiation oncology.." Clinical and translational radiation oncology, vol. 58, 2026, pp. 101143.

PMID 41859030 ↗

DOI 10.1016/j.ctro.2026.101143

Abstract

[BACKGROUND AND PURPOSE] Large language models (LLMs) have shown growing potential for clinical text processing, but their systematic application in radiation oncology-especially for non-English clinical documentation-remains underexplored. This study investigated whether pretrained LLMs can automatically extract, analyze, and structure radiotherapy-relevant information from routine unstructured medical notes, with the goal of supporting automated population of electronic case report forms (eCRFs).

[MATERIALS AND METHODS] This study examined prostate cancer patients treated with the MR-Linac, for whom ground truth data exist in the MOMENTUM database. A total of 100 patients were included, with 90 used for prompt development and 10 for independent testing. Medical notes were extracted, anonymized, and categorized by time points. The Llama-3.1-8b model was used, with prompts designed using chain-of-thought (CoT) logic with five in-context examples. The model output was post-processed, and extracted data was compared against ground truth.

[RESULTS] Medical notes were successfully processed, with predicted values generated in an average time of 16 s per note. The LLM achieved matching accuracies of 83.6% and 83.8% on the development and testing datasets. Analysis revealed that the model disagreed with specific values in 8.1% of development dataset cases and 8.6% of testing dataset cases. An independent manual review before model evaluation showed approximately 7.5% of routinely collected test data did not match reviewed values, indicating inaccuracies in the routinely acquired ground truth.

[CONCLUSION] This study demonstrated the effectiveness of LLMs in structuring clinical data from medical non-English notes, with high accuracy in extracting and categorizing information. While multi-institutional validation is needed, the results indicate a significant healthcare impact through efficient data management, processing notes in 16 s, and accurately populating CRFs with minimal staff involvement.

이 논문을 인용하기

Abstract 한글 요약

Abstract