Construction and Validation of Multi-Omics Predictive Models for Colorectal Cancer Using Machine-Learning Approaches.
[OBJECTIVE] To construct and externally validate a multi-omics nomogram that uses only routine clinicopathological variables to predict tumor mutational burden (TMB), microsatellite instability (MSI),
- 표본수 (n) 398
APA
Lu Z, Li X, et al. (2026). Construction and Validation of Multi-Omics Predictive Models for Colorectal Cancer Using Machine-Learning Approaches.. Pharmacogenomics and personalized medicine, 19, 566928. https://doi.org/10.2147/PGPM.S566928
MLA
Lu Z, et al.. "Construction and Validation of Multi-Omics Predictive Models for Colorectal Cancer Using Machine-Learning Approaches.." Pharmacogenomics and personalized medicine, vol. 19, 2026, pp. 566928.
PMID
41947871
Abstract
[OBJECTIVE] To construct and externally validate a multi-omics nomogram that uses only routine clinicopathological variables to predict tumor mutational burden (TMB), microsatellite instability (MSI), NTRK/PIK3CA mutation status and overall survival (OS) in colorectal cancer (CRC).
[METHODS] TCGA data (n=398) served as the training set and 120 consecutive CRC patients who underwent radical resection at Yuebei People's Hospital formed the prospective validation set. After z-score normalization, 21demographic, clinical and pathological features were screened for multicollinearity (VIF<5) and redundancy via least absolute shrinkage and selection operator (LASSO) regression with 10-fold cross-validation. Optimal hyper-parameters for each algorithm were tuned by nested 10-fold grid search. Four machine-learning algorithms, logistic regression (LR), support-vector machine (SVM), decision tree (DT) and random forest (RF), were compared by area under the receiver-operating-characteristic curve (AUC), F1 score and decision-curve analysis. The best model was externally validated and calibrated with bootstrapping.
[RESULTS] The results showed that the TMB prediction model included in the MSI index had the best power when constructed by the RF method, with an area under the ROC curve value of 0.9597. For the MSI state prediction model which includes three indicators of TMB, had the best power when constructed by RF method, with AUC value of 0.8225. The and gene status prediction model, which included three indicators of TMB and MSI status, had the best power when constructed using the RF method.
[CONCLUSION] The prediction model constructed in this study can help clinicians quickly identify high-risk patients and provide a basis for formulating a reasonable treatment plan. Further optimization of the model and expansion of the sample size are required to verify its power in the future.
[METHODS] TCGA data (n=398) served as the training set and 120 consecutive CRC patients who underwent radical resection at Yuebei People's Hospital formed the prospective validation set. After z-score normalization, 21demographic, clinical and pathological features were screened for multicollinearity (VIF<5) and redundancy via least absolute shrinkage and selection operator (LASSO) regression with 10-fold cross-validation. Optimal hyper-parameters for each algorithm were tuned by nested 10-fold grid search. Four machine-learning algorithms, logistic regression (LR), support-vector machine (SVM), decision tree (DT) and random forest (RF), were compared by area under the receiver-operating-characteristic curve (AUC), F1 score and decision-curve analysis. The best model was externally validated and calibrated with bootstrapping.
[RESULTS] The results showed that the TMB prediction model included in the MSI index had the best power when constructed by the RF method, with an area under the ROC curve value of 0.9597. For the MSI state prediction model which includes three indicators of TMB, had the best power when constructed by RF method, with AUC value of 0.8225. The and gene status prediction model, which included three indicators of TMB and MSI status, had the best power when constructed using the RF method.
[CONCLUSION] The prediction model constructed in this study can help clinicians quickly identify high-risk patients and provide a basis for formulating a reasonable treatment plan. Further optimization of the model and expansion of the sample size are required to verify its power in the future.
같은 제1저자의 인용 많은 논문 (5)
- [Construction of a prognosis forecasting model for breast cancer based on lipid metabolism-related genes and functional verification of ].
- Tumor cell AMPK activation enhances NK cell anti-tumor immunity and synergizes with PD-L1 blockade therapy.
- Explainable deep learning for predicting HER-2 expression in breast cancer: a multicenter study.
- 50-Year age threshold for early-onset NSCLC: A SEER-TCGA retrospective analysis reveals a prognostic paradox based on age treatment response.
- AI-driven design of BRAF inhibitors with enhanced binding affinity and optimized drug-likeness.