본문으로 건너뛰기
← 뒤로

ColoLDB: a machine learning-based predictive model for colorectal cancer using routine laboratory parameters.

Journal of gastrointestinal oncology 2026 Vol.17(1) p. 12

Zhang X, Tong X, Mou J, Liu J, Zhang C, Han H, Deng K

📝 환자 설명용 한 줄

[BACKGROUND] Colorectal cancer (CRC) is one of the most common and highly prevalent cancers worldwide, posing a serious threat to public health.

이 논문을 인용하기

BibTeX ↓ RIS ↓
APA Zhang X, Tong X, et al. (2026). ColoLDB: a machine learning-based predictive model for colorectal cancer using routine laboratory parameters.. Journal of gastrointestinal oncology, 17(1), 12. https://doi.org/10.21037/jgo-2025-611
MLA Zhang X, et al.. "ColoLDB: a machine learning-based predictive model for colorectal cancer using routine laboratory parameters.." Journal of gastrointestinal oncology, vol. 17, no. 1, 2026, pp. 12.
PMID 41816568

Abstract

[BACKGROUND] Colorectal cancer (CRC) is one of the most common and highly prevalent cancers worldwide, posing a serious threat to public health. Current CRC screening and diagnosis primarily depend on colonoscopy, an invasive procedure that often misses early-stage tumors, contributing to delayed diagnoses. The aim of this study is to develop a simpler, more accessible screening method to assist clinicians in the early identification and diagnosis of CRC and its precancerous lesions.

[METHODS] Using the patient's hospitalization number as the unique identifier, invalid age records were excluded, non-numerical laboratory test results were removed, and only the first diagnostic test result for each parameter per patient (i.e., the initial test value at first diagnosis) was retained. The study distinguished between the CRC experimental group and the control group. The study collected laboratory test data from each participant, including tumor markers, biochemical parameters, immunological indicators, complete blood count, coagulation tests, and routine urinalysis. We selected light gradient boosting machine (LightGBM), logistic regression (LR), random forest (RF), and extreme gradient boosting (XGBoost) to construct the models. Finally, the SHapley Additive explanations (SHAP) algorithm was employed to interpret the models.

[RESULTS] After analyzing the four selected models, the intersection of the top-ranked features across all models was identified, ultimately screening eight laboratory parameters to construct the diagnostic colorectal laboratory digital biomarker (ColoLDB) model: specific gravity (SG), carbohydrate antigen 19-9 (CA19-9), carcinoembryonic antigen (CEA), age, albumin (ALB), cytokeratin 19 fragment (CYFRA21-1), high-density lipoprotein cholesterol (HDL-C) and carbohydrate antigen 72-4 (CA72-4). In the test set, the RF machine learning model demonstrated optimal performance in identifying CRC, achieving an area under the curve (AUC) of 0.863 (95% confidence interval: 0.792-0.922), an accuracy of 0.900, a sensitivity of 0.225, a specificity of 0.997, a positive predictive value (PPV) of 0.917, and a negative predictive value (NPV) of 0.900. When the specificity was set at 0.903, the ColoLDB model's sensitivity reached 0.694. In comparison, a diagnostic model combining CEA and CA19-9 yielded an AUC of 0.688, a sensitivity of 0.429 and a specificity of 0.947. The RF diagnostic ColoLDB model exhibited superior diagnostic efficacy compared to the combined CEA and CA19-9 diagnosis model.

[CONCLUSIONS] Our research findings indicate that eight laboratory test indicators may be related the risk of developing CRC. Our RF diagnostic ColoLDB model is an innovative and practical tool that effectively predicts the occurrence of CRC, enhancing the diagnostic efficiency for this disease. This method holds promise as a valuable tool for diagnosing CRC.

같은 제1저자의 인용 많은 논문 (5)