Integrating histopathology and immune marker analysis for machine learning-based colorectal cancer prognostics.
[OBJECTIVE] This study aims to create a machine-learning framework that integrates histopathological imaging, immune marker data, and clinical features to classify tumor types, determine histological
APA
Saadh MJ, Saeed TN, et al. (2026). Integrating histopathology and immune marker analysis for machine learning-based colorectal cancer prognostics.. Diagnostic pathology, 21(1). https://doi.org/10.1186/s13000-026-01754-2
MLA
Saadh MJ, et al.. "Integrating histopathology and immune marker analysis for machine learning-based colorectal cancer prognostics.." Diagnostic pathology, vol. 21, no. 1, 2026.
PMID
41731487
Abstract
[OBJECTIVE] This study aims to create a machine-learning framework that integrates histopathological imaging, immune marker data, and clinical features to classify tumor types, determine histological grades, and predict survival outcomes in CRC patients.
[MATERIALS AND METHODS] This study analyzed 650 CRC cases, divided into training (520 cases) and testing (130 cases) subsets. Preprocessing involved quality control, color normalization, and selecting tumor regions (tumor center and invasive margins) with uniform dimensions (224 × 224 pixels). Six immune markers (CD3, CD8, CD45RO, PD-1, LAG-3, Tim-3) were selected for their relevance to CRC progression and immune response. A fine-tuned EfficientNet model was used to extract high-dimensional features from the images. These imaging features were combined with clinical data. To refine the dataset, feature selection methods such as PCA, RFE, and LASSO were applied to include only the most relevant variables. Machine learning models (XGBoost, CatBoost, Random Forest) and ensemble models were developed for tumor type and grade prediction. For survival prediction, regression models such as SVR, Random Forest Regressor, and stacking regressors were used. Model performance was evaluated using metrics such as accuracy, AUC, MSE, and the C-index.
[RESULTS] The evaluation highlighted the value of feature selection and ensemble learning in CRC classification and survival prediction. For tumor type classification, XGBoost with RFE achieved a testing AUC of 86.15% and accuracy of 85.33%, while stacking-based models with RFE performed better, with a testing AUC of 96.73% and accuracy of 96.92%. Histological grade classification followed a similar trend, with stacking-based models achieving a testing AUC of 96.37% and accuracy of 96.92%. In survival prediction, the Stacking Regressor with RFE produced the best results, with testing MSEs of 132.87 for Disease-Free Survival (days) and 115.78 for Survival (days), along with the highest C-index of 0.84 for both tasks.
[CONCLUSIONS] The proposed framework demonstrates the potential of integrating histopathological imaging, immune markers, and machine learning to improve CRC prognosis and enable personalized treatment, setting a benchmark for future computational pathology research.
[MATERIALS AND METHODS] This study analyzed 650 CRC cases, divided into training (520 cases) and testing (130 cases) subsets. Preprocessing involved quality control, color normalization, and selecting tumor regions (tumor center and invasive margins) with uniform dimensions (224 × 224 pixels). Six immune markers (CD3, CD8, CD45RO, PD-1, LAG-3, Tim-3) were selected for their relevance to CRC progression and immune response. A fine-tuned EfficientNet model was used to extract high-dimensional features from the images. These imaging features were combined with clinical data. To refine the dataset, feature selection methods such as PCA, RFE, and LASSO were applied to include only the most relevant variables. Machine learning models (XGBoost, CatBoost, Random Forest) and ensemble models were developed for tumor type and grade prediction. For survival prediction, regression models such as SVR, Random Forest Regressor, and stacking regressors were used. Model performance was evaluated using metrics such as accuracy, AUC, MSE, and the C-index.
[RESULTS] The evaluation highlighted the value of feature selection and ensemble learning in CRC classification and survival prediction. For tumor type classification, XGBoost with RFE achieved a testing AUC of 86.15% and accuracy of 85.33%, while stacking-based models with RFE performed better, with a testing AUC of 96.73% and accuracy of 96.92%. Histological grade classification followed a similar trend, with stacking-based models achieving a testing AUC of 96.37% and accuracy of 96.92%. In survival prediction, the Stacking Regressor with RFE produced the best results, with testing MSEs of 132.87 for Disease-Free Survival (days) and 115.78 for Survival (days), along with the highest C-index of 0.84 for both tasks.
[CONCLUSIONS] The proposed framework demonstrates the potential of integrating histopathological imaging, immune markers, and machine learning to improve CRC prognosis and enable personalized treatment, setting a benchmark for future computational pathology research.
같은 제1저자의 인용 많은 논문 (5)
- From Motor Proteins to Oncogenic Factors: The Evolving Role of Kinesin Superfamily Proteins in Breast Cancer Development.
- LncRNA CRNDE and HOTAIR: Molecules behind the scenes in the progression of gastrointestinal cancers through regulating microRNAs.
- The role of hypoxia-inducible factor-1α on colon cancer progression and metastasis.
- The hidden messengers: Tumor microenvironment-derived exosomal ceRNAs in gastric cancer progression.
- Dual roles of long non-coding RNAs in thyroid cancer: regulation of programmed cell death pathways.