Augmenting Large Language Model With Prompt Engineering and Supervised Fine-Tuning in Non-Small Cell Lung Cancer Tumor-Node-Metastasis Staging: Framework Development and Validation.

Jin R; Ling C; Hou Y; Sun Y; Li N; Han J; Sheng J; Wang Q; Liu Y; Zheng S; Ren X; Chen C; Wang J; Li C

doi:10.2196/77988

← 뒤로

Augmenting Large Language Model With Prompt Engineering and Supervised Fine-Tuning in Non-Small Cell Lung Cancer Tumor-Node-Metastasis Staging: Framework Development and Validation.

1/5 보강

JMIR AI 2026 Vol.5() p. e77988

Jin R, Ling C, Hou Y, Sun Y, Li N, Han J, Sheng J, Wang Q, Liu Y, Zheng S, Ren X, Chen C, Wang J, Li C

PMC 전문 ↗ 원문 ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

[BACKGROUND] Accurate tumor node metastasis (TNM) staging is fundamental for treatment planning and prognosis in non-small cell lung cancer (NSCLC).

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)

95% CI 0.850-0.959

이 논문을 인용하기

BibTeX ↓ RIS ↓

APA Jin R, Ling C, et al. (2026). Augmenting Large Language Model With Prompt Engineering and Supervised Fine-Tuning in Non-Small Cell Lung Cancer Tumor-Node-Metastasis Staging: Framework Development and Validation.. JMIR AI, 5, e77988. https://doi.org/10.2196/77988

MLA Jin R, et al.. "Augmenting Large Language Model With Prompt Engineering and Supervised Fine-Tuning in Non-Small Cell Lung Cancer Tumor-Node-Metastasis Staging: Framework Development and Validation.." JMIR AI, vol. 5, 2026, pp. e77988.

PMID 41984624

DOI 10.2196/77988

Abstract

[BACKGROUND] Accurate tumor node metastasis (TNM) staging is fundamental for treatment planning and prognosis in non-small cell lung cancer (NSCLC). However, its complexity poses significant challenges. Traditional rule-based natural language processing methods are constrained by their reliance on manually crafted rules and are susceptible to inconsistencies in clinical reporting.

[OBJECTIVE] This study aimed to develop and validate a robust, accurate, and operationally efficient artificial intelligence framework for the TNM staging of NSCLC by strategically enhancing a large language model, GLM-4-Air (general language model), through advanced prompt engineering and supervised fine-tuning (SFT).

[METHODS] We constructed a curated dataset of 492 deidentified real-world medical imaging reports, with TNM staging annotations rigorously validated by senior physicians according to the AJCC (American Joint Committee on Cancer) 8th edition guidelines. The GLM-4-Air model was systematically optimized via a multi-phase process: iterative prompt engineering incorporating chain-of-thought reasoning and domain knowledge injection for all staging tasks, followed by parameter-efficient SFT using low-rank adaptation for the reasoning-intensive primary tumor characteristics (T) and regional lymph node involvement (N) staging tasks. The final hybrid model was evaluated on a completely held-out test set (black-box) and benchmarked against GPT-4o using standard metrics, statistical tests, and a clinical impact analysis of staging errors.

[RESULTS] The optimized hybrid GLM-4-Air model demonstrated reliable performance. It achieved higher staging accuracies on the black-box test set: 92% (95% CI 0.850-0.959) for T, 86% (95% CI 0.779-0.915) for N, 92% (95% CI 0.850-0.959) for distant metastasis status (M), and 90% for overall clinical staging; by comparison, GPT-4o attained 87% (95% CI 0.790-0.922), 70% (95% CI 0.604-0.781), 78% (95% CI 0.689-0.850), and 80%, respectively. The model's robustness was further evidenced by its macro-average F1-scores of 0.914 (T), 0.815 (N), and 0.831 (M), consistently surpassing those of GPT-4o (0.836, 0.620, and 0.698). Analysis of confusion matrices confirmed the model's proficiency in identifying critical staging features while effectively minimizing false negatives. Crucially, the clinical impact assessment showed a substantial reduction in severe category I errors, which are defined as misclassifications that could significantly influence subsequent clinical decisions. Our model committed 0 category I errors in M staging and fewer category I errors in T and N staging. Furthermore, the framework demonstrated practical deployability, achieving efficient inference on consumer-grade hardware (eg, 4 RTX 4090 GPUs) with latencies suitable and acceptable for clinical workflows.

[CONCLUSIONS] The proposed hybrid framework, integrating structured prompt engineering and applying SFT to reasoning-heavy tasks (T/N), enables the GLM-4-Air model to serve as a highly accurate, clinically reliable, and cost-efficient solution for automated NSCLC TNM staging. This work demonstrates the efficacy and potential of a domain-optimized smaller model compared with an off-the-shelf generalist model, holding promise for enhancing diagnostic standardization in resource-aware health care environments.

같은 제1저자의 인용 많은 논문 (5)

Comparison of the efficacy of endoscopic submucosal dissection and transanal endoscopic microsurgery in the treatment of rectal neuroendocrine tumors ≤ 2 cm.
Frontiers in endocrinology 2022 cited 1
Tarsal-Fixation With Aponeurotic Flap Linkage in Blepharoplasty: Bridge Technique.
Aesthetic surgery journal 2020 cited 1
Dynamic T-Cell Reprogramming Modulates the Treatment Outcome of Neoadjuvant Immunochemotherapy in Non-Small-Cell Lung Cancer.
MedComm 2026
Staged Management of Traumatic Diaphragmatic Hernia With Open Abdomen: A Case Report on Successful Abdominal Wall Reconstruction.
Clinical case reports 2025
Comments on "Opinions on the Treatment Strategy after Breast Augmentation by Polyacrylamide Hydrogel Injection".
Aesthetic plastic surgery 2018