Machine Learning for Predicting Colorectal Cancer-Specific Mortality: The Role of Socioeconomic Inequalities in Public Policy.
[BACKGROUND] Colorectal cancer remains a leading cause of mortality worldwide.
- 추적기간 24 months
APA
Delpino FM, Friebel R, et al. (2026). Machine Learning for Predicting Colorectal Cancer-Specific Mortality: The Role of Socioeconomic Inequalities in Public Policy.. ANZ journal of surgery. https://doi.org/10.1111/ans.70572
MLA
Delpino FM, et al.. "Machine Learning for Predicting Colorectal Cancer-Specific Mortality: The Role of Socioeconomic Inequalities in Public Policy.." ANZ journal of surgery, 2026.
PMID
41847910
Abstract
[BACKGROUND] Colorectal cancer remains a leading cause of mortality worldwide. We investigated whether adding socioeconomic information to machine learning models can improve the prediction of colorectal cancer-specific mortality.
[METHODS] Using data from the Fundação Oncocentro de São Paulo (FOSP), we analyzed individuals diagnosed with colorectal cancer between 2000 and 2023; however, predictive models were developed using patients diagnosed from 2000 to 2021, ensuring a minimum follow-up of 24 months for the 2-year mortality outcome. Thirty predictor variables were included, including clinical factors associated with the disease and socioeconomic factors such as income, educational attainment, HDI components, as well as distance and travel time to healthcare facilities. We tested seven machine learning algorithms using a 70/30 training/testing split. Discrimination was measured by the area under the receiver operating characteristic curve (AUC-ROC), comparing versions with and without socioeconomic factors.
[RESULTS] The Random Forest algorithm provided the best discrimination for predicting the risk of death due to colorectal cancer within 2 years after diagnosis (AUC-ROC = 0.92). The addition of socioeconomic and access-related predictors (Human Development Index [HDI] components, education, distance/travel time to healthcare facilities, and type of coverage) improved the AUROC by 0.13 (0.79-0.92) compared with the clinical-only model.
[CONCLUSION] The inclusion of socioeconomic variables in conjunction with clinical data in machine learning models has the potential to enhance the ability to predict colorectal cancer-specific mortality in patients with colorectal cancer.
[METHODS] Using data from the Fundação Oncocentro de São Paulo (FOSP), we analyzed individuals diagnosed with colorectal cancer between 2000 and 2023; however, predictive models were developed using patients diagnosed from 2000 to 2021, ensuring a minimum follow-up of 24 months for the 2-year mortality outcome. Thirty predictor variables were included, including clinical factors associated with the disease and socioeconomic factors such as income, educational attainment, HDI components, as well as distance and travel time to healthcare facilities. We tested seven machine learning algorithms using a 70/30 training/testing split. Discrimination was measured by the area under the receiver operating characteristic curve (AUC-ROC), comparing versions with and without socioeconomic factors.
[RESULTS] The Random Forest algorithm provided the best discrimination for predicting the risk of death due to colorectal cancer within 2 years after diagnosis (AUC-ROC = 0.92). The addition of socioeconomic and access-related predictors (Human Development Index [HDI] components, education, distance/travel time to healthcare facilities, and type of coverage) improved the AUROC by 0.13 (0.79-0.92) compared with the clinical-only model.
[CONCLUSION] The inclusion of socioeconomic variables in conjunction with clinical data in machine learning models has the potential to enhance the ability to predict colorectal cancer-specific mortality in patients with colorectal cancer.