Deep learning-based predictive models for assessing the impact of clinical factors and second primary malignancy on survival in patients with colorectal cancer.
1/5 보강
PICO 자동 추출 (휴리스틱, conf 2/4)
유사 논문P · Population 대상 환자/모집단
522 patients with colorectal cancer (CRC) with a second primary malignancy (SPM) to develop a deep learning model for predicting 1-year, 3-year, and 5-year survival outcomes in patients with CRC who subsequently developed SPM.
I · Intervention 중재 / 시술
추출되지 않음
C · Comparison 대조 / 비교
추출되지 않음
O · Outcome 결과 / 결론
Our analysis indicated that patients with second primary prostate cancer and CRC generally have better survival prospects than that of those with other SPMs.
In this study, we first utilized a data set of 21,522 patients with colorectal cancer (CRC) with a second primary malignancy (SPM) to develop a deep learning model for predicting 1-year, 3-year, and 5
- 95% CI 0.850-0.861
APA
Zhao Z, Zhao M, et al. (2025). Deep learning-based predictive models for assessing the impact of clinical factors and second primary malignancy on survival in patients with colorectal cancer.. European journal of medical research, 31(1), 175. https://doi.org/10.1186/s40001-025-03760-4
MLA
Zhao Z, et al.. "Deep learning-based predictive models for assessing the impact of clinical factors and second primary malignancy on survival in patients with colorectal cancer.." European journal of medical research, vol. 31, no. 1, 2025, pp. 175.
PMID
41476205 ↗
Abstract 한글 요약
In this study, we first utilized a data set of 21,522 patients with colorectal cancer (CRC) with a second primary malignancy (SPM) to develop a deep learning model for predicting 1-year, 3-year, and 5-year survival outcomes in patients with CRC who subsequently developed SPM. Our models demonstrated high performance, achieving area under the curve (AUC) values of 0.850 (95% confidence interval [CI] 0.840-0.861), 0.856 (95% CI 0.850-0.861), and 0.848 (95% CI 0.843-0.853) for the 1-year, 3-year, and 5-year survival predictions, respectively. We then examined the impact of 33 clinical factors on these predictions and found that age, radiation therapy for the SPM, and sex were the most influential factors. Age and metastatic status of the SPM emerged as the most critical predictors. Finally, using one-hot encoding, we evaluated the effects of various SPMs on survival outcomes in patients with CRC and provided clinical interpretations of these findings. Our analysis indicated that patients with second primary prostate cancer and CRC generally have better survival prospects than that of those with other SPMs. Patients with second primary pancreatic and gastric cancers have poor survival outcomes. These findings provide valuable insights into the intricate interactions among CRC, SPM, and various clinical factors, thereby improving the treatment and evaluation of patients with CRC and SPM.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
같은 제1저자의 인용 많은 논문 (5)
- Ultrasound-targeted microbubble cavitation enhances anti-PD-L1 therapy in TNBC via eNOS-mediated reoxygenation.
- Case Report: Intractable hiccups induced by gallbladder necrosis after laparoscopic distal D2 radical gastrectomy: two cases report.
- Resection of the entire first rib for a malignant tumor via the transclavicular approach: a case report and literature review.
- Integrative machine learning of hypoxia and centrosome-related gene signatures enables prognostic stratification and therapeutic insights in lung adenocarcinoma.
- Case Report: Giant undifferentiated pleomorphic sarcoma of the breast combined with contralateral high-grade intraductal carcinoma.
📖 전문 본문 읽기 PMC JATS · ~59 KB · 영문
Introduction
Introduction
Colorectal cancer (CRC) is the third most common cancer globally and the second leading cause of cancer-related mortality [1–3]. In the United States, an estimated 153,020 individuals will be diagnosed with CRC and 52,550 will die from the disease in 2023 [3]. However, with the increased emphasis on early cancer screening and significant advancements in diagnostic and therapeutic modalities, the mortality rate for patients with CRC is gradually declining [3, 4]. As survival rates improve, the incidence of second primary malignancy (SPM) in these patients is rising [5]. Notably, patients with multiple primary cancers have poorer survival outcomes, and SPMs increasingly threatens the lives of CRC survivors [5, 6]. This finding suggests that SPMs should be a critical focus of ongoing research and surveillance.
Previous research has primarily focused on identifying the risk factors for SPM in patients with CRC and developing predictive models for their occurrence [7–10]. However, few studies have comprehensively investigated the effect of a SPM on the survival rate of patients with CRC. A more comprehensive assessment and refined prediction of the impact of a SPM on the survival of this patient population is crucial. Such an assessment would provide valuable insights for developing targeted and personalized treatment strategies explicitly tailored for patients with CRC, ultimately enhancing their quality of life and extending survival.
In recent years, the application of machine-learning techniques has led to significant advancements in oncology research [11–13]. Reportedly, machine learning offers superior performance and reliability in tumor prediction, risk assessment, and identification of key prognostic factors, achieving substantial improvements over traditional statistical methods [14, 15]. Traditional statistical methods such as the Cox proportional hazards model and linear regression often rely on linear assumptions and predefined models. In contrast, machine-learning methods can capture intricate patterns and interactions within large data sets. Among these, deep learning methods not only model highly complex relationships and feature interactions, but also automatically extract features and patterns [16]. This capability empowers researchers to conduct a more comprehensive analysis of the impact of SPM on patients with CRC, thereby providing a deeper understanding of the complex relationship between SPMs and survival rates in patients with CRC.
The main objective of this study was twofold: first, to develop and validate robust deep learning models for predicting 1-year, 3-year, and 5-year survival outcomes, specifically for patients with CRC diagnosed with a single subsequent SPM. We explicitly distinguished this cohort from patients with metastatic disease originating from the primary CRC. Second, we aimed to interpret these models to identify the most significant clinical and tumor-related features that influence survival in this unique and understudied patient population, thereby providing a basis for improved clinical decision-making.
Colorectal cancer (CRC) is the third most common cancer globally and the second leading cause of cancer-related mortality [1–3]. In the United States, an estimated 153,020 individuals will be diagnosed with CRC and 52,550 will die from the disease in 2023 [3]. However, with the increased emphasis on early cancer screening and significant advancements in diagnostic and therapeutic modalities, the mortality rate for patients with CRC is gradually declining [3, 4]. As survival rates improve, the incidence of second primary malignancy (SPM) in these patients is rising [5]. Notably, patients with multiple primary cancers have poorer survival outcomes, and SPMs increasingly threatens the lives of CRC survivors [5, 6]. This finding suggests that SPMs should be a critical focus of ongoing research and surveillance.
Previous research has primarily focused on identifying the risk factors for SPM in patients with CRC and developing predictive models for their occurrence [7–10]. However, few studies have comprehensively investigated the effect of a SPM on the survival rate of patients with CRC. A more comprehensive assessment and refined prediction of the impact of a SPM on the survival of this patient population is crucial. Such an assessment would provide valuable insights for developing targeted and personalized treatment strategies explicitly tailored for patients with CRC, ultimately enhancing their quality of life and extending survival.
In recent years, the application of machine-learning techniques has led to significant advancements in oncology research [11–13]. Reportedly, machine learning offers superior performance and reliability in tumor prediction, risk assessment, and identification of key prognostic factors, achieving substantial improvements over traditional statistical methods [14, 15]. Traditional statistical methods such as the Cox proportional hazards model and linear regression often rely on linear assumptions and predefined models. In contrast, machine-learning methods can capture intricate patterns and interactions within large data sets. Among these, deep learning methods not only model highly complex relationships and feature interactions, but also automatically extract features and patterns [16]. This capability empowers researchers to conduct a more comprehensive analysis of the impact of SPM on patients with CRC, thereby providing a deeper understanding of the complex relationship between SPMs and survival rates in patients with CRC.
The main objective of this study was twofold: first, to develop and validate robust deep learning models for predicting 1-year, 3-year, and 5-year survival outcomes, specifically for patients with CRC diagnosed with a single subsequent SPM. We explicitly distinguished this cohort from patients with metastatic disease originating from the primary CRC. Second, we aimed to interpret these models to identify the most significant clinical and tumor-related features that influence survival in this unique and understudied patient population, thereby providing a basis for improved clinical decision-making.
Materials and methods
Materials and methods
Data source
Research data were obtained from the Surveillance, Epidemiology, and End Results (SEER) Research Data, 12 Registries, Nov 2023 Sub (1992–2021) in the SEER database (http://seer.cancer.gov) using SEER*Stat version 8.4.3. This analysis focused on patients with CRC and SPMs. Patients with CRC were identified using site and histological codes from the International Classification of Diseases for Oncology (ICD-O-3). Patients with colon cancer were identified using ICD-O-3 site codes C18.0 and C18.2–C18.9, while patients with rectal cancer were identified using C19.9 and C20.9. SPMs were defined as asynchronous invasive solid cancers occurring ≥ 6 months after the initial primary cancer (IPC), based on the modified Warren and Gates criteria from the National Cancer Institute [17], excluding cases presenting evidence of tumor recurrence or metastasis.
Patient selection
The SEER database enabled the identification of SPMs and the sequence and number of multiple malignancies indexed in patients with an initial diagnosis of CRC. To isolate the effect of a subsequent tumor, we restricted our analysis to patients with CRC who developed only one SPM. We screened 21,522 eligible patients from the SEER database using the following inclusion criteria: (1) primary cancer was CRC, and patients developed one SPM and (2) detailed survival data were provided with a follow-up period of at least 5 years. The exclusion criteria were as follows: (1) patients for whom only death certificates or autopsy records were provided; (2) patients whose cause of death by the end of the study period was non-tumor-related; and (3) patients lacking data on the specific type of SPM.
Data preprocessing
We systematically re-encoded 33 variables, spanning demographic characteristics, clinical features of CRC, treatment information, and attributes related to the SPM, into a numerical format optimized for neural network input. Simultaneously, we discretized complex continuous clinical variables, such as patient survival time, into structured categorical variables to improve the interpretability and enhance the feature-learning capacity of the deep learning model. This strategy strengthened the model’s ability to capture nonlinear relationships among complex clinical factors. The final preprocessed data set was split into a feature matrix of 33 clinical and demographic variables and a binary survival outcome vector, providing a robust foundation for stratified cross-validation and deep neural network training.
Construction of the models
The data set in the models comprised 33 features after processing. These features span a broad spectrum of clinical and demographic variables and are categorized as follows.
Demographic information: Included age, race, sex, marital status at diagnosis, median annual income, and residence.
Characteristics of CRC: initial diagnosis site, grade, histological type, extent, stage-T, stage-N, stage-M, overall stage, tumor size, and regional nodes examined.
Treatment details included surgery, radiation, and chemotherapy for CRC, along with the time from diagnosis to treatment.
Characteristics of SPM include the time between tumor diagnosis, stage-T, stage-N, stage-M, overall stage, surgery, radiation, chemotherapy, regional node positivity, tumor size, metastasis, site, and histology type.
Each feature was classified during the preprocessing phase and missing essential values were removed to maintain data integrity. Using the preprocessed data set, three predictive models (Models 1, 2, and 3) were constructed to forecast the 1-year, 3-year, and 5-year survival rates of patients. These models employ deep neural network architectures, each tailored to capture the complex interactions between features and survival outcomes. It is important to note that these three models were developed as independent tools for predicting three distinct clinical endpoints. Although they share the same architecture, they were trained separately. Subsequently, the importance of each feature in these models was calculated and analyzed. This process provides critical insights into the variables that have the most significant impact on survival prediction, thereby enhancing the interpretability and reliability of the models.
Development and analysis of the deep neural network
Model framework
The models were constructed using a deep neural network (DNN) architecture implemented in TensorFlow. The architecture consists of six fully connected layers.
Input layer: The input layer comprises 512 neurons utilizing the ReLU activation function. To prevent overfitting, L2 regularization with a coefficient of 0.0001 was applied along with batch normalization and a dropout rate of 0.35.
Hidden layers: The hidden layer is divided into four parts:
First hidden layer: This layer comprised 256 neurons with ReLU activation. This includes L2 regularization (coefficient: 0.0001), batch normalization, and a dropout rate of 0.35.
Second hidden layer: This layer consisted of 128 neurons with ReLU activation utilizing L2 regularization (coefficient: 0.0001), batch normalization, and a dropout rate of 0.3.
Third hidden layer: This layer included 64 neurons with ReLU activation, L2 regularization (coefficient: 0.0001), batch normalization, and a dropout rate of 0.3.
Fourth hidden layer: This layer comprised 32 neurons with ReLU activation and L2 regularization (coefficient: 0.0001), and was designed to capture more refined patterns within the data.
Output layer: This layer features a single neuron with a sigmoid activation function that outputs the probability of binary classification. Figure 1 shows a schematic of a DNN.
The model is a sequential neural network. It comprises a series of Dense, BatchNormalization, and Dropout layers. It starts with a dense layer of 512 units, followed by BatchNormalization and Dropout layers. This pattern continues through dense layers of 256, 128, 64, and 32 units, each preceded by BatchNormalization and Dropout. The final layer is a dense layer with a single unit. The architecture was independently trained for 1-year, 3-year, and 5-year survival predictions.
Optimization strategy
An Adam optimizer with a learning rate schedule based on exponential decay was employed to optimize the model and ensure efficient convergence during training. The model was trained using a binary cross-entropy loss function, and appropriate class weights were calculated and applied to address the class imbalance in the training data.
Model training and validation
To ensure a robust and unbiased evaluation of the model performance, we employed a stratified fivefold cross-validation strategy across all models, preserving the class distribution within each fold. A key advantage of stratified sampling is that it maintains the proportion of survival and mortality cases consistent with the overall data set, which is particularly important for stable training and reliable evaluation of imbalanced survival data.
It is important to note that although our validation methodology was consistent, the data partitions necessarily differed across models for two reasons: (1) the target variable (survival vs. mortality) was defined separately for the 1-year, 3-year, and 5-year prediction horizons, resulting in distinct stratifications and (2) the preprocessing for analyzing SPM (SPM) required one-hot encoding, which altered the data set structure compared with analyses based solely on clinical factors. Fixed random seeds were used throughout the experiments to ensure reproducibility.
Furthermore, an early stopping mechanism was implemented to prevent overfitting by monitoring the validation loss. For each fold, the best-performing model was retained for subsequent analysis and performance reporting.
Feature importance calculation
The feature importance was inferred by analyzing the weights of the first dense layer in the trained model. The feature weights were extracted and recorded in each cross-validation fold. Subsequently, the average weights across all folds were calculated to rank the importance of the features, thereby providing insight into the relative importance of each feature in predicting the target variable.
Outcome evaluation and model performance
The outcome measures of the study were the 1-year, 3-year, and 5-year survival rates of patients with CRC with a SPM. A consistent 8:2 ratio was used to split the data into training and testing sets (using a randomized index). All models underwent fivefold cross-validation to ensure the reliability of the results, and the performance was additionally evaluated on held-out validation sets within each fold to confirm the model generalizability. We primarily assessed the performance of the models using the area under the curve (AUC) of the receiver operating characteristic (ROC) curve, accuracy, and 95% confidence interval (CI). To assess the overall classification performance, we calculated the precision, recall, and the F1-score. Furthermore, to evaluate the reliability of the predicted probabilities, we calculated the Brier score. The models were constructed using Python 3.9.
Assessing the impact of SPM
To accurately assess the impact of different types of second primary tumors on patient survival rates, we once-hot encoded the primary feature of interest—the category of SPM—to handle categorical data. This encoding transforms each category into a distinct binary feature, allowing the model to interpret categorical information effectively. In this part of the study, we finely adapted the modeling framework; details are provided in the Supplementary Material.
When calculating the feature importance, we followed a method similar to that described above. We focused primarily on the weights associated with the SPM categories. Feature weights were extracted from the model, especially those from the first hidden layer, and analyzed across all folds in the fivefold cross-validation. The average weights were calculated and recorded to provide a robust ranking of the feature importance.
Given that sample size limitations may affect the weight and significance of SPM in our models, we focused our analysis on 16 SPMs with larger sample sizes to minimize potential bias caused by these limitations. These malignancies include those affecting the colon, rectum, lung, bronchus, prostate, rectosigmoid junction, breast, blood, hematopoietic and related tissues, pancreas, urinary bladder, kidney, stomach, uterus, lymph nodes, skin, liver, and small intestine. In the subsequent evaluation, we conducted a statistical analysis of the survival status of patients with these specific SPMs to further assess their impact on survival outcomes and to enhance our understanding of the importance of these tumors in the model and their influence on survival rates.
Data source
Research data were obtained from the Surveillance, Epidemiology, and End Results (SEER) Research Data, 12 Registries, Nov 2023 Sub (1992–2021) in the SEER database (http://seer.cancer.gov) using SEER*Stat version 8.4.3. This analysis focused on patients with CRC and SPMs. Patients with CRC were identified using site and histological codes from the International Classification of Diseases for Oncology (ICD-O-3). Patients with colon cancer were identified using ICD-O-3 site codes C18.0 and C18.2–C18.9, while patients with rectal cancer were identified using C19.9 and C20.9. SPMs were defined as asynchronous invasive solid cancers occurring ≥ 6 months after the initial primary cancer (IPC), based on the modified Warren and Gates criteria from the National Cancer Institute [17], excluding cases presenting evidence of tumor recurrence or metastasis.
Patient selection
The SEER database enabled the identification of SPMs and the sequence and number of multiple malignancies indexed in patients with an initial diagnosis of CRC. To isolate the effect of a subsequent tumor, we restricted our analysis to patients with CRC who developed only one SPM. We screened 21,522 eligible patients from the SEER database using the following inclusion criteria: (1) primary cancer was CRC, and patients developed one SPM and (2) detailed survival data were provided with a follow-up period of at least 5 years. The exclusion criteria were as follows: (1) patients for whom only death certificates or autopsy records were provided; (2) patients whose cause of death by the end of the study period was non-tumor-related; and (3) patients lacking data on the specific type of SPM.
Data preprocessing
We systematically re-encoded 33 variables, spanning demographic characteristics, clinical features of CRC, treatment information, and attributes related to the SPM, into a numerical format optimized for neural network input. Simultaneously, we discretized complex continuous clinical variables, such as patient survival time, into structured categorical variables to improve the interpretability and enhance the feature-learning capacity of the deep learning model. This strategy strengthened the model’s ability to capture nonlinear relationships among complex clinical factors. The final preprocessed data set was split into a feature matrix of 33 clinical and demographic variables and a binary survival outcome vector, providing a robust foundation for stratified cross-validation and deep neural network training.
Construction of the models
The data set in the models comprised 33 features after processing. These features span a broad spectrum of clinical and demographic variables and are categorized as follows.
Demographic information: Included age, race, sex, marital status at diagnosis, median annual income, and residence.
Characteristics of CRC: initial diagnosis site, grade, histological type, extent, stage-T, stage-N, stage-M, overall stage, tumor size, and regional nodes examined.
Treatment details included surgery, radiation, and chemotherapy for CRC, along with the time from diagnosis to treatment.
Characteristics of SPM include the time between tumor diagnosis, stage-T, stage-N, stage-M, overall stage, surgery, radiation, chemotherapy, regional node positivity, tumor size, metastasis, site, and histology type.
Each feature was classified during the preprocessing phase and missing essential values were removed to maintain data integrity. Using the preprocessed data set, three predictive models (Models 1, 2, and 3) were constructed to forecast the 1-year, 3-year, and 5-year survival rates of patients. These models employ deep neural network architectures, each tailored to capture the complex interactions between features and survival outcomes. It is important to note that these three models were developed as independent tools for predicting three distinct clinical endpoints. Although they share the same architecture, they were trained separately. Subsequently, the importance of each feature in these models was calculated and analyzed. This process provides critical insights into the variables that have the most significant impact on survival prediction, thereby enhancing the interpretability and reliability of the models.
Development and analysis of the deep neural network
Model framework
The models were constructed using a deep neural network (DNN) architecture implemented in TensorFlow. The architecture consists of six fully connected layers.
Input layer: The input layer comprises 512 neurons utilizing the ReLU activation function. To prevent overfitting, L2 regularization with a coefficient of 0.0001 was applied along with batch normalization and a dropout rate of 0.35.
Hidden layers: The hidden layer is divided into four parts:
First hidden layer: This layer comprised 256 neurons with ReLU activation. This includes L2 regularization (coefficient: 0.0001), batch normalization, and a dropout rate of 0.35.
Second hidden layer: This layer consisted of 128 neurons with ReLU activation utilizing L2 regularization (coefficient: 0.0001), batch normalization, and a dropout rate of 0.3.
Third hidden layer: This layer included 64 neurons with ReLU activation, L2 regularization (coefficient: 0.0001), batch normalization, and a dropout rate of 0.3.
Fourth hidden layer: This layer comprised 32 neurons with ReLU activation and L2 regularization (coefficient: 0.0001), and was designed to capture more refined patterns within the data.
Output layer: This layer features a single neuron with a sigmoid activation function that outputs the probability of binary classification. Figure 1 shows a schematic of a DNN.
The model is a sequential neural network. It comprises a series of Dense, BatchNormalization, and Dropout layers. It starts with a dense layer of 512 units, followed by BatchNormalization and Dropout layers. This pattern continues through dense layers of 256, 128, 64, and 32 units, each preceded by BatchNormalization and Dropout. The final layer is a dense layer with a single unit. The architecture was independently trained for 1-year, 3-year, and 5-year survival predictions.
Optimization strategy
An Adam optimizer with a learning rate schedule based on exponential decay was employed to optimize the model and ensure efficient convergence during training. The model was trained using a binary cross-entropy loss function, and appropriate class weights were calculated and applied to address the class imbalance in the training data.
Model training and validation
To ensure a robust and unbiased evaluation of the model performance, we employed a stratified fivefold cross-validation strategy across all models, preserving the class distribution within each fold. A key advantage of stratified sampling is that it maintains the proportion of survival and mortality cases consistent with the overall data set, which is particularly important for stable training and reliable evaluation of imbalanced survival data.
It is important to note that although our validation methodology was consistent, the data partitions necessarily differed across models for two reasons: (1) the target variable (survival vs. mortality) was defined separately for the 1-year, 3-year, and 5-year prediction horizons, resulting in distinct stratifications and (2) the preprocessing for analyzing SPM (SPM) required one-hot encoding, which altered the data set structure compared with analyses based solely on clinical factors. Fixed random seeds were used throughout the experiments to ensure reproducibility.
Furthermore, an early stopping mechanism was implemented to prevent overfitting by monitoring the validation loss. For each fold, the best-performing model was retained for subsequent analysis and performance reporting.
Feature importance calculation
The feature importance was inferred by analyzing the weights of the first dense layer in the trained model. The feature weights were extracted and recorded in each cross-validation fold. Subsequently, the average weights across all folds were calculated to rank the importance of the features, thereby providing insight into the relative importance of each feature in predicting the target variable.
Outcome evaluation and model performance
The outcome measures of the study were the 1-year, 3-year, and 5-year survival rates of patients with CRC with a SPM. A consistent 8:2 ratio was used to split the data into training and testing sets (using a randomized index). All models underwent fivefold cross-validation to ensure the reliability of the results, and the performance was additionally evaluated on held-out validation sets within each fold to confirm the model generalizability. We primarily assessed the performance of the models using the area under the curve (AUC) of the receiver operating characteristic (ROC) curve, accuracy, and 95% confidence interval (CI). To assess the overall classification performance, we calculated the precision, recall, and the F1-score. Furthermore, to evaluate the reliability of the predicted probabilities, we calculated the Brier score. The models were constructed using Python 3.9.
Assessing the impact of SPM
To accurately assess the impact of different types of second primary tumors on patient survival rates, we once-hot encoded the primary feature of interest—the category of SPM—to handle categorical data. This encoding transforms each category into a distinct binary feature, allowing the model to interpret categorical information effectively. In this part of the study, we finely adapted the modeling framework; details are provided in the Supplementary Material.
When calculating the feature importance, we followed a method similar to that described above. We focused primarily on the weights associated with the SPM categories. Feature weights were extracted from the model, especially those from the first hidden layer, and analyzed across all folds in the fivefold cross-validation. The average weights were calculated and recorded to provide a robust ranking of the feature importance.
Given that sample size limitations may affect the weight and significance of SPM in our models, we focused our analysis on 16 SPMs with larger sample sizes to minimize potential bias caused by these limitations. These malignancies include those affecting the colon, rectum, lung, bronchus, prostate, rectosigmoid junction, breast, blood, hematopoietic and related tissues, pancreas, urinary bladder, kidney, stomach, uterus, lymph nodes, skin, liver, and small intestine. In the subsequent evaluation, we conducted a statistical analysis of the survival status of patients with these specific SPMs to further assess their impact on survival outcomes and to enhance our understanding of the importance of these tumors in the model and their influence on survival rates.
Results
Results
Participant characteristics
The clinicopathological information of 21,522 Patients with CRC with SPM was obtained from the SEER database. Among the study population, 1,731 (8.0%) were aged < 50 years, 6,749 (31.4%) were aged 50–65 years, and 13,042 (60.6%) were aged > 65 years. Males accounted for 12,147 (56.4%) of the study population, whereas females accounted for 9,375 (43.6%). The initial cancer sites were the colon in 15,301 patients (71.1%), rectosigmoid junction in 1,692 patients (7.9%), and rectum in 4,529 patients (21.0%). Histologically, 19,196 patients (89.2%) had adenomas and adenocarcinomas, whereas 2,326 patients (10.8%) had other types. Table 1 provides a summary of patient characteristics. Detailed demographic characteristics are provided in Supplementary Materials.
Based on statistical analysis, the five most common sites for SPMs were the colon (8745 cases, 40.63%), rectum (2730 cases, 12.68%), lung and bronchus (1894 cases, 8.80%), prostate (1277 cases, 5.93%), and rectosigmoid junction (962 cases, 4.47%). The five most prevalent histological types for SPMs were adenocarcinoma, NOS (10,463 cases, 48.62%), adenocarcinoma in tubulovillous adenoma (1208 cases, 5.61%), adenocarcinoma in adenomatous polyp (1201 cases, 5.58%), mucinous adenocarcinoma (829 cases, 3.85%), and squamous cell carcinoma, NOS (604 cases, 2.81%). Table 2 details the site and histological types of SPM in patients with CRC.
Performance and analysis of models
For the prediction of 1-year survival rates in patients with CRC with SPM, Model 1 achieved an AUC of 0.850 (95% CI 0.840–0.861) and an accuracy of 0.799 (95% CI 0.789–0.809). Model 2 achieved an AUC of 0.856 (95% CI 0.850–0.861) and an accuracy of 0.784 (95% CI 0.778–0.790) for 3-year survival rate prediction. Model 3 achieved an AUC of 0.848 (95% CI 0.843–0.853) and an accuracy of 0.765 (95% CI 0.760–0.770) for 5-year survival rate prediction. Figure 2 illustrates the performance of these models. The detailed performance metrics are listed in Table 3.
The curves illustrate the diagnostic ability of the deep learning models across different prediction horizons. The plots correspond to (Model 1) 1-year survival prediction, (Model 2) 3-year survival prediction, and (Model 3) 5-year survival prediction. The area under the curve (AUC) is used as the primary metric to quantify the discriminative power of each model.
After visualizing and analyzing the prediction weights of the models, we identified the top 10 factors that contributed most significantly to the predictive survival outcomes. In Model 1, the key factors included ‘Age,’ ‘Radiation of SPM,’ ‘Sex,’ ‘Histology Types of SPM,’ ‘Stage-M of SPM,’ ‘Metastasis of SPM,’ ‘Chemotherapy of CRC,’ ‘Site of SPM,’ ‘Surgery of CRC,’ and ‘Stage-N of SPM.’ For Model 2, the most influential factors were identified as ‘Age,’ ‘Stage-M of SPM,’ ‘Metastasis of SPM,’ ‘Chemotherapy of CRC,’ ‘Site of SPM,’ ‘Surgery of CRC,’ ‘Histology Types of SPM,’ ‘Radiation of SPM,’ ‘Chemotherapy of SPM,’ and ‘Sex.’ Finally, in Model 3, the top contributing factors were ‘Age,’ ‘Metastasis of SPM,’ ‘Stage-M of SPM,’ ‘Chemotherapy of CRC,’ ‘Site of SPM,’ ‘Histology Types of SPM,’ ‘Surgery of CRC,’ ‘Stage-N of SPM,’ ‘Radiation of SPM,’ and ‘Tumor Size of SPM.’ The detailed weight of factors can be found in Fig. 3.
The impact of SPM on survival outcomes
Survival statistics indicated that the overall survival rates were consistently higher for SPMs, such as colon cancer, rectosigmoid junction tumors, rectal cancer, skin cancer, breast cancer, and prostate cancer. Conversely, survival rates were lower for SPMs, such as gastric, liver, pancreatic, lung, and bronchial cancers. Detailed survival statistics are shown in Fig. 4.
The chart illustrates the variations in survival rates across different second primary malignancy sites. The horizontal axis represents the specific categories of SPMs, while the vertical axis indicates the observed survival rate.
Upon analyzing the predictive weights of each SPM in the models, we found that for 1-year survival predictions, prostate, pancreatic, and gastric cancer had the highest weights, whereas lymphatic system tumors, uterine cancer, and bladder cancer had the lowest weights. For 3-year survival predictions, prostate, pancreatic, and colon cancers were the most significant contributors, with uterine, lymphatic, and bladder cancers contributing the least. In predicting 5-year survival, prostate, pancreatic, and colon cancers had the highest weights, whereas lymphoma, skin, and uterine cancers had the lowest weights. Figure 5 shows the specific results for each SPM weight.
Participant characteristics
The clinicopathological information of 21,522 Patients with CRC with SPM was obtained from the SEER database. Among the study population, 1,731 (8.0%) were aged < 50 years, 6,749 (31.4%) were aged 50–65 years, and 13,042 (60.6%) were aged > 65 years. Males accounted for 12,147 (56.4%) of the study population, whereas females accounted for 9,375 (43.6%). The initial cancer sites were the colon in 15,301 patients (71.1%), rectosigmoid junction in 1,692 patients (7.9%), and rectum in 4,529 patients (21.0%). Histologically, 19,196 patients (89.2%) had adenomas and adenocarcinomas, whereas 2,326 patients (10.8%) had other types. Table 1 provides a summary of patient characteristics. Detailed demographic characteristics are provided in Supplementary Materials.
Based on statistical analysis, the five most common sites for SPMs were the colon (8745 cases, 40.63%), rectum (2730 cases, 12.68%), lung and bronchus (1894 cases, 8.80%), prostate (1277 cases, 5.93%), and rectosigmoid junction (962 cases, 4.47%). The five most prevalent histological types for SPMs were adenocarcinoma, NOS (10,463 cases, 48.62%), adenocarcinoma in tubulovillous adenoma (1208 cases, 5.61%), adenocarcinoma in adenomatous polyp (1201 cases, 5.58%), mucinous adenocarcinoma (829 cases, 3.85%), and squamous cell carcinoma, NOS (604 cases, 2.81%). Table 2 details the site and histological types of SPM in patients with CRC.
Performance and analysis of models
For the prediction of 1-year survival rates in patients with CRC with SPM, Model 1 achieved an AUC of 0.850 (95% CI 0.840–0.861) and an accuracy of 0.799 (95% CI 0.789–0.809). Model 2 achieved an AUC of 0.856 (95% CI 0.850–0.861) and an accuracy of 0.784 (95% CI 0.778–0.790) for 3-year survival rate prediction. Model 3 achieved an AUC of 0.848 (95% CI 0.843–0.853) and an accuracy of 0.765 (95% CI 0.760–0.770) for 5-year survival rate prediction. Figure 2 illustrates the performance of these models. The detailed performance metrics are listed in Table 3.
The curves illustrate the diagnostic ability of the deep learning models across different prediction horizons. The plots correspond to (Model 1) 1-year survival prediction, (Model 2) 3-year survival prediction, and (Model 3) 5-year survival prediction. The area under the curve (AUC) is used as the primary metric to quantify the discriminative power of each model.
After visualizing and analyzing the prediction weights of the models, we identified the top 10 factors that contributed most significantly to the predictive survival outcomes. In Model 1, the key factors included ‘Age,’ ‘Radiation of SPM,’ ‘Sex,’ ‘Histology Types of SPM,’ ‘Stage-M of SPM,’ ‘Metastasis of SPM,’ ‘Chemotherapy of CRC,’ ‘Site of SPM,’ ‘Surgery of CRC,’ and ‘Stage-N of SPM.’ For Model 2, the most influential factors were identified as ‘Age,’ ‘Stage-M of SPM,’ ‘Metastasis of SPM,’ ‘Chemotherapy of CRC,’ ‘Site of SPM,’ ‘Surgery of CRC,’ ‘Histology Types of SPM,’ ‘Radiation of SPM,’ ‘Chemotherapy of SPM,’ and ‘Sex.’ Finally, in Model 3, the top contributing factors were ‘Age,’ ‘Metastasis of SPM,’ ‘Stage-M of SPM,’ ‘Chemotherapy of CRC,’ ‘Site of SPM,’ ‘Histology Types of SPM,’ ‘Surgery of CRC,’ ‘Stage-N of SPM,’ ‘Radiation of SPM,’ and ‘Tumor Size of SPM.’ The detailed weight of factors can be found in Fig. 3.
The impact of SPM on survival outcomes
Survival statistics indicated that the overall survival rates were consistently higher for SPMs, such as colon cancer, rectosigmoid junction tumors, rectal cancer, skin cancer, breast cancer, and prostate cancer. Conversely, survival rates were lower for SPMs, such as gastric, liver, pancreatic, lung, and bronchial cancers. Detailed survival statistics are shown in Fig. 4.
The chart illustrates the variations in survival rates across different second primary malignancy sites. The horizontal axis represents the specific categories of SPMs, while the vertical axis indicates the observed survival rate.
Upon analyzing the predictive weights of each SPM in the models, we found that for 1-year survival predictions, prostate, pancreatic, and gastric cancer had the highest weights, whereas lymphatic system tumors, uterine cancer, and bladder cancer had the lowest weights. For 3-year survival predictions, prostate, pancreatic, and colon cancers were the most significant contributors, with uterine, lymphatic, and bladder cancers contributing the least. In predicting 5-year survival, prostate, pancreatic, and colon cancers had the highest weights, whereas lymphoma, skin, and uterine cancers had the lowest weights. Figure 5 shows the specific results for each SPM weight.
Discussion
Discussion
Previous studies have demonstrated the advantages of machine learning in cancer prediction and prognosis [11, 13, 18]. However, most studies have focused on predicting and assessing the occurrence of SPMs in patients with CRC. To the best of our knowledge, this is the first study to analyze the survival outcomes of patients with CRC and SPM. We developed multiple neural network models using clinical factors to predict the 1-year, 3-year, and 5-year survival rates in these patients. These models demonstrate high predictive performance, thus providing novel perspectives for clinical applications and assessments.
Traditional statistical methods, such as the Cox proportional hazards model, are often limited in their ability to capture complex non-linear relationships between clinical factors and patient survival. In contrast, deep learning methods can recognize these relationships more effectively by learning complex patterns from data without relying on linear assumptions [16]. Analysis of these models allows evaluation of the relative importance of various clinical factors in survival prediction, offering deeper insights into the determinants of patient survival.
In this study, for 1-year survival rates, factors such as ‘Age,’ ‘Radiation of SPM,’ ‘Sex,’ ‘Histology Types of SPM,’ and ‘Stage-M of SPM’ had the most significant impact on predictions. The profound influence of age is both biological and clinical. Older patients often present with diminished physiological reserves, a higher burden of comorbidities, and an age-related decline in immune function [19], which collectively compromise their ability to tolerate aggressive anticancer treatments and mount an effective anti-tumor immune response [20, 21]. Notably, in short-term survival predictions, radiation therapy for SPM substantially affected the survival rates. This is likely due to its efficacy in providing rapid local tumor control, which can quickly slow tumor progression and reduce tumor burden, especially in cases where metastasis has not yet become widespread [22, 23]. Clinically, radiation therapy effectively kills cancer cells and inhibits tumor growth, thereby alleviating symptoms and providing patients with additional survival time, which is particularly critical in the first year post-diagnosis.
For 3-year and 5-year survival rates, the most critical factors were ‘Age,’ ‘Stage-M of SPM,’ ‘Metastasis of SPMs,’ ‘Chemotherapy of CRC,’ and ‘Site of SPM.’ As the observation period extended, the impact of metastasis from the second tumor on survival rates became more pronounced. This shift highlights a fundamental principle in oncology: long-term survival is overwhelmingly dictated by the presence or absence of systemic disease [24]. Metastatic tumors are not only geographically distant; they often exhibit more aggressive biological characteristics, such as higher proliferation rates, increased genomic instability, and acquired drug resistance [25]. Moreover, the establishment of tumors in vital organs makes it more challenging to control with local therapies, such as surgery or radiation, and often leads to multi-organ failure, which accelerates disease progression and ultimately results in decreased long-term survival rates [26].
Across all models, factors such as age, metastatic status of the second tumor, site of the second tumor, histological type of the second tumor, chemotherapy status of the primary tumor, surgical treatment of the primary tumor, and radiation therapy for the second tumor were consistently found to have a significant impact on patient survival rates, emerging as critical indicators for survival assessment and clinical management. Our analysis revealed that indicators associated with SPM influenced survival rates more than those associated with primary tumors. However, it is important to note that the surgical and chemotherapy statuses of the primary tumor significantly affect survival outcomes in patients with CRC and SPM. Moreover, age consistently emerged as a pivotal predictor of survival in all models. As discussed above, this finding likely reflects the critical role of age in prognostication, and is potentially linked to age-associated variations in immune responses, treatment tolerance, and other physiological factors [19, 27].
In addition to clinical features, our models incorporated sociodemographic variables, such as marital status, race, and median annual income. While these factors contributed to the model's predictive performance, it is imperative to interpret them as social determinants of health rather than direct biological drivers of cancer progression. Variables such as income and race often act as proxies for healthcare accessibility, insurance coverage, and the quality of post-treatment surveillance, all of which influence survival outcomes. Similarly, marital status may serve as an indicator of psychosocial support and treatment adherence.
However, integrating these features into deep learning models necessitates caution. There is a risk that predictive models may inadvertently reinforce existing systemic biases if not carefully contextualized. For instance, a lower predicted survival probability driven by sociodemographic factors should not lead to the withholding of aggressive treatment; rather, it should signal a need for enhanced support. Therefore, we emphasize that these predictors should be utilized to identify high-risk groups warranting additional social and medical resources, rather than to justify stratified standards of care. Future research should aim not only to predict outcomes based on these disparities but also to develop strategies to mitigate the impact of SDOH on patient survival.
This study focused on the development of separate deep learning models for 1-year, 3-year, and 5-year survival, which translate statistical risk into actionable strategies across different phases of patient care. By identifying the most influential prognostic factors at each time horizon, these models promote a shift from uniform treatment to personalized management of patients with CRC with SPM. In the analysis of short-term horizons, with Radiation of SPM emerging as one of the dominant predictors, two clinical implications are evident: (i) patients with non-metastatic but locally aggressive SPM may benefit from immediate local control (e.g., surgery or radiation), whereas (ii) frail or older patients with a high predicted early mortality risk may be better suited for less intensive regimens or timely palliative care. In the analysis of mid-term horizon, where metastasis of SPM was one of the strongest predictors, the model informs surveillance intensity: high-risk patients may warrant intensified imaging (CT or PET), whereas low-risk patients can safely undergo de-escalated follow-up, minimizing radiation exposure, cost, and psychological stress. In the long term, the model supports survivorship planning by anticipating late treatment effects and providing clearer prognostic expectations to guide life and health planning.
We subsequently employed deep learning methods to assess the impact of different types of SPM on survival predictions at various timepoints. The analysis revealed that prostate, pancreatic, gastric, and subsequent CRCs consistently assigned greater predictive weights for both short- and long-term survival. This indicates that these malignancies play a significant role in predicting patient survival, and are closely associated with patient prognosis. In contrast, cancers of the uterus, breast, and bladder, as well as lymphoma and skin cancer, had lower weights, suggesting a less pronounced influence on survival outcomes.
To further interpret our model's predictions, we performed a statistical analysis of the patient survival status. The results indicated that patients with second primary prostate cancer or CRC generally had higher survival rates, whereas those with second primary pancreatic or gastric cancer had poorer survival outcomes. Combining this statistical analysis with model interpretations, we found that patients with second primary prostate cancer or CRC often had better survival prospects than that of those with other second primary tumors, likely because of the generally favorable prognosis associated with these cancers [3, 28]. Among these, prostate cancer typically exhibits a lower malignancy and slower progression [28]. Moreover, similarities in tumor characteristics and treatment strategies may contribute to improved disease management and prognosis after second primary CRC. Conversely, the lower survival rates observed in patients with second primary pancreatic or gastric cancer may be attributed to a higher malignancy and late-stage diagnosis [29, 30]. In addition, there may be an interactive effect between gastric and CRCs, where the cumulative effect of these malignancies could accelerate disease progression and decrease survival rates [31, 32].
In our models, the lower weights assigned to uterine cancer, breast cancer, bladder cancer, lymphoma, and skin cancer suggest that these SPMs have a less pronounced impact on predicting the survival of patient with CRC. This finding is likely attributable to the typically lower malignancy rate and more favorable prognosis associated with these cancer types [1], making their overall effect on survival rates less significant. This underscores the need to focus on the characteristics and prognosis of primary CRC when evaluating patient survival after a SPM.
Interestingly, although prostate cancer is generally considered less malignant, it holds significant weight in survival predictions for patients with a SPM. This may be related to interactions between diseases for CRC and prostate cancer [33, 34]. First, both malignancies share key epidemiological risk factors, such as advanced age and obesity [35, 36]. Therefore, patients with this dual diagnosis may represent a subpopulation with higher comorbidity burden and poorer baseline health, predisposing them to adverse outcomes. Second, the iatrogenic effects of the previous treatments are critical. Pelvic radiotherapy for prostate cancer can cause chronic rectal damage, including fibrosis and vascular injury, which can significantly complicate subsequent CRC surgery and management [37, 38], thereby negatively affecting survival. Moreover, prostate cancer and CRCs may share some biological mechanisms, such as the androgen receptor (AR) signaling pathway, a cornerstone of prostate cancer pathogenesis, which is also active in a subset of CRC, suggesting potential biological crosstalk that could influence tumor progression and therapeutic response [31, 39, 40]. In addition, prostate cancer is more closely associated with age, which is a significant prognostic factor compared to that in other tumors [41]. Future studies should explore the specific mechanisms underlying these associations.
Although this study represents a significant advancement in this field, it has some limitations. First, the model lacks external validation. Although internal robustness was demonstrated through fivefold cross-validation, its generalizability to other institutions or populations remains uncertain. Second, although the deep-learning framework successfully identified complex nonlinear relationships, it could not quantify their exact magnitudes. Third, minor instabilities were observed in some feature weights. To ensure reliability, our analysis focused on the most consistent features and prioritized SPMs with sufficient sample sizes for a robust evaluation. This approach preserved the overall robustness of the analysis, but may have influenced the specific weight estimates. Future multicenter studies with larger and more diverse data sets are necessary to externally validate the model, provide more precise quantification of these correlations, and strengthen its potential for clinical translation.
Previous studies have demonstrated the advantages of machine learning in cancer prediction and prognosis [11, 13, 18]. However, most studies have focused on predicting and assessing the occurrence of SPMs in patients with CRC. To the best of our knowledge, this is the first study to analyze the survival outcomes of patients with CRC and SPM. We developed multiple neural network models using clinical factors to predict the 1-year, 3-year, and 5-year survival rates in these patients. These models demonstrate high predictive performance, thus providing novel perspectives for clinical applications and assessments.
Traditional statistical methods, such as the Cox proportional hazards model, are often limited in their ability to capture complex non-linear relationships between clinical factors and patient survival. In contrast, deep learning methods can recognize these relationships more effectively by learning complex patterns from data without relying on linear assumptions [16]. Analysis of these models allows evaluation of the relative importance of various clinical factors in survival prediction, offering deeper insights into the determinants of patient survival.
In this study, for 1-year survival rates, factors such as ‘Age,’ ‘Radiation of SPM,’ ‘Sex,’ ‘Histology Types of SPM,’ and ‘Stage-M of SPM’ had the most significant impact on predictions. The profound influence of age is both biological and clinical. Older patients often present with diminished physiological reserves, a higher burden of comorbidities, and an age-related decline in immune function [19], which collectively compromise their ability to tolerate aggressive anticancer treatments and mount an effective anti-tumor immune response [20, 21]. Notably, in short-term survival predictions, radiation therapy for SPM substantially affected the survival rates. This is likely due to its efficacy in providing rapid local tumor control, which can quickly slow tumor progression and reduce tumor burden, especially in cases where metastasis has not yet become widespread [22, 23]. Clinically, radiation therapy effectively kills cancer cells and inhibits tumor growth, thereby alleviating symptoms and providing patients with additional survival time, which is particularly critical in the first year post-diagnosis.
For 3-year and 5-year survival rates, the most critical factors were ‘Age,’ ‘Stage-M of SPM,’ ‘Metastasis of SPMs,’ ‘Chemotherapy of CRC,’ and ‘Site of SPM.’ As the observation period extended, the impact of metastasis from the second tumor on survival rates became more pronounced. This shift highlights a fundamental principle in oncology: long-term survival is overwhelmingly dictated by the presence or absence of systemic disease [24]. Metastatic tumors are not only geographically distant; they often exhibit more aggressive biological characteristics, such as higher proliferation rates, increased genomic instability, and acquired drug resistance [25]. Moreover, the establishment of tumors in vital organs makes it more challenging to control with local therapies, such as surgery or radiation, and often leads to multi-organ failure, which accelerates disease progression and ultimately results in decreased long-term survival rates [26].
Across all models, factors such as age, metastatic status of the second tumor, site of the second tumor, histological type of the second tumor, chemotherapy status of the primary tumor, surgical treatment of the primary tumor, and radiation therapy for the second tumor were consistently found to have a significant impact on patient survival rates, emerging as critical indicators for survival assessment and clinical management. Our analysis revealed that indicators associated with SPM influenced survival rates more than those associated with primary tumors. However, it is important to note that the surgical and chemotherapy statuses of the primary tumor significantly affect survival outcomes in patients with CRC and SPM. Moreover, age consistently emerged as a pivotal predictor of survival in all models. As discussed above, this finding likely reflects the critical role of age in prognostication, and is potentially linked to age-associated variations in immune responses, treatment tolerance, and other physiological factors [19, 27].
In addition to clinical features, our models incorporated sociodemographic variables, such as marital status, race, and median annual income. While these factors contributed to the model's predictive performance, it is imperative to interpret them as social determinants of health rather than direct biological drivers of cancer progression. Variables such as income and race often act as proxies for healthcare accessibility, insurance coverage, and the quality of post-treatment surveillance, all of which influence survival outcomes. Similarly, marital status may serve as an indicator of psychosocial support and treatment adherence.
However, integrating these features into deep learning models necessitates caution. There is a risk that predictive models may inadvertently reinforce existing systemic biases if not carefully contextualized. For instance, a lower predicted survival probability driven by sociodemographic factors should not lead to the withholding of aggressive treatment; rather, it should signal a need for enhanced support. Therefore, we emphasize that these predictors should be utilized to identify high-risk groups warranting additional social and medical resources, rather than to justify stratified standards of care. Future research should aim not only to predict outcomes based on these disparities but also to develop strategies to mitigate the impact of SDOH on patient survival.
This study focused on the development of separate deep learning models for 1-year, 3-year, and 5-year survival, which translate statistical risk into actionable strategies across different phases of patient care. By identifying the most influential prognostic factors at each time horizon, these models promote a shift from uniform treatment to personalized management of patients with CRC with SPM. In the analysis of short-term horizons, with Radiation of SPM emerging as one of the dominant predictors, two clinical implications are evident: (i) patients with non-metastatic but locally aggressive SPM may benefit from immediate local control (e.g., surgery or radiation), whereas (ii) frail or older patients with a high predicted early mortality risk may be better suited for less intensive regimens or timely palliative care. In the analysis of mid-term horizon, where metastasis of SPM was one of the strongest predictors, the model informs surveillance intensity: high-risk patients may warrant intensified imaging (CT or PET), whereas low-risk patients can safely undergo de-escalated follow-up, minimizing radiation exposure, cost, and psychological stress. In the long term, the model supports survivorship planning by anticipating late treatment effects and providing clearer prognostic expectations to guide life and health planning.
We subsequently employed deep learning methods to assess the impact of different types of SPM on survival predictions at various timepoints. The analysis revealed that prostate, pancreatic, gastric, and subsequent CRCs consistently assigned greater predictive weights for both short- and long-term survival. This indicates that these malignancies play a significant role in predicting patient survival, and are closely associated with patient prognosis. In contrast, cancers of the uterus, breast, and bladder, as well as lymphoma and skin cancer, had lower weights, suggesting a less pronounced influence on survival outcomes.
To further interpret our model's predictions, we performed a statistical analysis of the patient survival status. The results indicated that patients with second primary prostate cancer or CRC generally had higher survival rates, whereas those with second primary pancreatic or gastric cancer had poorer survival outcomes. Combining this statistical analysis with model interpretations, we found that patients with second primary prostate cancer or CRC often had better survival prospects than that of those with other second primary tumors, likely because of the generally favorable prognosis associated with these cancers [3, 28]. Among these, prostate cancer typically exhibits a lower malignancy and slower progression [28]. Moreover, similarities in tumor characteristics and treatment strategies may contribute to improved disease management and prognosis after second primary CRC. Conversely, the lower survival rates observed in patients with second primary pancreatic or gastric cancer may be attributed to a higher malignancy and late-stage diagnosis [29, 30]. In addition, there may be an interactive effect between gastric and CRCs, where the cumulative effect of these malignancies could accelerate disease progression and decrease survival rates [31, 32].
In our models, the lower weights assigned to uterine cancer, breast cancer, bladder cancer, lymphoma, and skin cancer suggest that these SPMs have a less pronounced impact on predicting the survival of patient with CRC. This finding is likely attributable to the typically lower malignancy rate and more favorable prognosis associated with these cancer types [1], making their overall effect on survival rates less significant. This underscores the need to focus on the characteristics and prognosis of primary CRC when evaluating patient survival after a SPM.
Interestingly, although prostate cancer is generally considered less malignant, it holds significant weight in survival predictions for patients with a SPM. This may be related to interactions between diseases for CRC and prostate cancer [33, 34]. First, both malignancies share key epidemiological risk factors, such as advanced age and obesity [35, 36]. Therefore, patients with this dual diagnosis may represent a subpopulation with higher comorbidity burden and poorer baseline health, predisposing them to adverse outcomes. Second, the iatrogenic effects of the previous treatments are critical. Pelvic radiotherapy for prostate cancer can cause chronic rectal damage, including fibrosis and vascular injury, which can significantly complicate subsequent CRC surgery and management [37, 38], thereby negatively affecting survival. Moreover, prostate cancer and CRCs may share some biological mechanisms, such as the androgen receptor (AR) signaling pathway, a cornerstone of prostate cancer pathogenesis, which is also active in a subset of CRC, suggesting potential biological crosstalk that could influence tumor progression and therapeutic response [31, 39, 40]. In addition, prostate cancer is more closely associated with age, which is a significant prognostic factor compared to that in other tumors [41]. Future studies should explore the specific mechanisms underlying these associations.
Although this study represents a significant advancement in this field, it has some limitations. First, the model lacks external validation. Although internal robustness was demonstrated through fivefold cross-validation, its generalizability to other institutions or populations remains uncertain. Second, although the deep-learning framework successfully identified complex nonlinear relationships, it could not quantify their exact magnitudes. Third, minor instabilities were observed in some feature weights. To ensure reliability, our analysis focused on the most consistent features and prioritized SPMs with sufficient sample sizes for a robust evaluation. This approach preserved the overall robustness of the analysis, but may have influenced the specific weight estimates. Future multicenter studies with larger and more diverse data sets are necessary to externally validate the model, provide more precise quantification of these correlations, and strengthen its potential for clinical translation.
Conclusions
Conclusions
In this study, we developed prognostic models for patients with CRC with SPM and systematically analyzed the impact of various clinical factors and SPM types on survival outcomes. Our findings enhance the prognostic assessment of these patients and provide valuable insights into tailoring treatment strategies and surveillance protocols. Furthermore, this study highlights the potential of deep learning to advance oncological prognostication and offers new research ideas for applying these techniques to predict the survival of patients with cancer.
In this study, we developed prognostic models for patients with CRC with SPM and systematically analyzed the impact of various clinical factors and SPM types on survival outcomes. Our findings enhance the prognostic assessment of these patients and provide valuable insights into tailoring treatment strategies and surveillance protocols. Furthermore, this study highlights the potential of deep learning to advance oncological prognostication and offers new research ideas for applying these techniques to predict the survival of patients with cancer.
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- A Phase I Study of Hydroxychloroquine and Suba-Itraconazole in Men with Biochemical Relapse of Prostate Cancer (HITMAN-PC): Dose Escalation Results.
- Self-management of male urinary symptoms: qualitative findings from a primary care trial.
- Clinical and Liquid Biomarkers of 20-Year Prostate Cancer Risk in Men Aged 45 to 70 Years.
- Diagnostic accuracy of Ga-PSMA PET/CT versus multiparametric MRI for preoperative pelvic invasion in the patients with prostate cancer.
- Comprehensive analysis of androgen receptor splice variant target gene expression in prostate cancer.
- Clinical Presentation and Outcomes of Patients Undergoing Surgery for Thyroid Cancer.