A multi-branch ensemble learning framework for detection of non-small cell lung cancer via T-cell receptor sequencing.
[BACKGROUND] Early detection of non-small cell lung cancer (NSCLC) is paramount for patient survival, but conventional diagnostics are often invasive.
- Sensitivity 91.4%
APA
Wang W, Hu X, et al. (2026). A multi-branch ensemble learning framework for detection of non-small cell lung cancer via T-cell receptor sequencing.. BMC cancer, 26(1). https://doi.org/10.1186/s12885-026-15714-y
MLA
Wang W, et al.. "A multi-branch ensemble learning framework for detection of non-small cell lung cancer via T-cell receptor sequencing.." BMC cancer, vol. 26, no. 1, 2026.
PMID
41673808
Abstract
[BACKGROUND] Early detection of non-small cell lung cancer (NSCLC) is paramount for patient survival, but conventional diagnostics are often invasive. T-cell receptor (TCR) sequencing of peripheral blood offers a non-invasive alternative by capturing the systemic immune response, but its complex data requires advanced analytical methods.
[METHODS] We propose a multi-branch ensemble learning framework to diagnose NSCLC using TCR sequencing data. It synergistically integrates three analytical branches: one quantifies repertoire-level features including diversity metrics, clonality indices, and gene usage patterns; one identifies convergent TCR clusters indicative of shared antigen recognition; and one employs a Transformer-based language model to capture sequence-level patterns in CDR3 regions. All repertoire-level features were standardized using Z-score normalization, and binary classification (NSCLC vs. Healthy) was performed through a stacking ensemble classifier.
[RESULTS] The framework was validated on 150 early-stage NSCLC patients and 162 healthy controls from 7 independent sources ( = 312). Principal component analysis confirmed that samples cluster primarily by disease status rather than study source (Silhouette score: 0.293 for disease vs. 0.244 for study), indicating biological signal dominance over potential batch effects. To evaluate multi-center generalizability, the model was tested on two independent external cohorts: DB1 (Illumina MiSeq, = 47) and DB2 (Adaptive Biotechnologies immunoSEQ, = 45). The model achieved an AUC of 0.982 in DB1 and 0.941 in DB2, indicating robust performance across different clinical settings. Notably, validation on 35 independent NSCLC samples from Sun Yat-sen Memorial Hospital yielded a sensitivity of 91.4%, further supporting its potential for clinical application.
[CONCLUSION] Our framework provides a powerful, accurate, and interpretable tool for non-invasive NSCLC detection. By capturing a holistic picture of the anti-tumor immune response through complementary analytical branches, this work offers a promising step toward liquid biopsy-based cancer screening. Further validation in larger, multi-center prospective cohorts is essential before clinical translation.
[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12885-026-15714-y.
[METHODS] We propose a multi-branch ensemble learning framework to diagnose NSCLC using TCR sequencing data. It synergistically integrates three analytical branches: one quantifies repertoire-level features including diversity metrics, clonality indices, and gene usage patterns; one identifies convergent TCR clusters indicative of shared antigen recognition; and one employs a Transformer-based language model to capture sequence-level patterns in CDR3 regions. All repertoire-level features were standardized using Z-score normalization, and binary classification (NSCLC vs. Healthy) was performed through a stacking ensemble classifier.
[RESULTS] The framework was validated on 150 early-stage NSCLC patients and 162 healthy controls from 7 independent sources ( = 312). Principal component analysis confirmed that samples cluster primarily by disease status rather than study source (Silhouette score: 0.293 for disease vs. 0.244 for study), indicating biological signal dominance over potential batch effects. To evaluate multi-center generalizability, the model was tested on two independent external cohorts: DB1 (Illumina MiSeq, = 47) and DB2 (Adaptive Biotechnologies immunoSEQ, = 45). The model achieved an AUC of 0.982 in DB1 and 0.941 in DB2, indicating robust performance across different clinical settings. Notably, validation on 35 independent NSCLC samples from Sun Yat-sen Memorial Hospital yielded a sensitivity of 91.4%, further supporting its potential for clinical application.
[CONCLUSION] Our framework provides a powerful, accurate, and interpretable tool for non-invasive NSCLC detection. By capturing a holistic picture of the anti-tumor immune response through complementary analytical branches, this work offers a promising step toward liquid biopsy-based cancer screening. Further validation in larger, multi-center prospective cohorts is essential before clinical translation.
[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12885-026-15714-y.
같은 제1저자의 인용 많은 논문 (5)
- Flap Failure and Salvage in Head and Neck Reconstruction.
- Integrative Analysis Combining Machine Learning and Functional Experiments Uncovers ISG15 As a Key Determinant of Cisplatin Resistance in Gastric Cancer.
- Regenerative strategies for post-prostatectomy incontinence: stem cells, exosomes, and the path to clinical resolution.
- Management of pleural relapse after breast cancer resection in a middle-aged man: a case report.
- Predicting Radiation Pneumonitis Integrating Clinical Information, Medical Text, and 2.5D Deep Learning Features in Lung Cancer.