Improving Colorectal Cancer Screening and Risk Assessment through Predictive Modeling on Medical Images and Records.
1/5 보강
Colonoscopy screening effectively identifies and removes polyps before they progress to colorectal cancer (CRC), but current follow-up guidelines rely primarily on histopathologic features, overlookin
- p-value P = 0.013
- p-value P = 0.002
APA
Jiang S, Robinson C, et al. (2026). Improving Colorectal Cancer Screening and Risk Assessment through Predictive Modeling on Medical Images and Records.. The American journal of pathology, 196(2), 493-504. https://doi.org/10.1016/j.ajpath.2025.09.016
MLA
Jiang S, et al.. "Improving Colorectal Cancer Screening and Risk Assessment through Predictive Modeling on Medical Images and Records.." The American journal of pathology, vol. 196, no. 2, 2026, pp. 493-504.
PMID
41109668 ↗
Abstract 한글 요약
Colonoscopy screening effectively identifies and removes polyps before they progress to colorectal cancer (CRC), but current follow-up guidelines rely primarily on histopathologic features, overlooking other important CRC risk factors. Variability in polyp characterization among pathologists also hinders consistent surveillance decisions. Advances in digital pathology and deep learning enable the integration of pathology slides and medical records for more accurate progression risk prediction. Using data from the New Hampshire Colonoscopy Registry, including longitudinal follow-up, a transformer-based model for histopathology image analysis was adapted to predict 5-year progression risk. Multi-modal fusion strategies were further explored to combine clinical records with deep learning-derived image features. Training the model to predict intermediate clinical variables improved 5-year progression risk prediction [area under the receiver-operating characteristic curve (AUC), 0.630] compared with direct prediction (AUC, 0.615; P = 0.013). Integrating whole-slide imaging-based model predictions with nonimaging features further improved performance (AUC, 0.672), significantly outperforming the nonimaging-only approach (AUC, 0.666; P = 0.002). These results highlight the value of integrating diverse data modalities with computational methods to enhance progression risk stratification.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
같은 제1저자의 인용 많은 논문 (5)
- -mediated mRNA instability promotes inflammation-driven hepatocellular carcinoma.
- C21orf2 as a potential regulator of JAK2/STAT3 signaling in prostate cancer cell proliferation and apoptosis: an exploratory study.
- Spinal astrocytic EAATs mediate endothelin-1-induced breakthrough cancer pain in mice.
- Multicentre Evaluation of an AI-Assisted Urine Test for Clinically Significant Prostate Cancer in Men Undergoing Initial Biopsy.
- Interleukin-8-induced tumor self-rampart spatially confines oncolytic virotherapy in glioblastoma.
📖 전문 본문 읽기 PMC JATS · ~49 KB · 영문
Materials and Methods
Materials and Methods
Study Population
The NHCR is a National Cancer Institute–funded, statewide registry that contains comprehensive longitudinal colonoscopy information from nearly all endoscopy sites in New Hampshire since 2004. It includes patient risk factors, such as: age; sex; personal and family history of polyps or CRC; weight; height; smoking status; alcohol consumption; endoscopy history; polyp sizes, locations, numbers, and treatment; pathology reports; follow-up recommendations; and follow-up outcomes.65,66 These data are extracted through a rigorous data collection effort from 31 participating practices in addition to questionnaire responses from patients. The NHCR is unique in the United States in terms of its detailed, population-based, longitudinal, and comprehensive data set. Dartmouth Hitchcock Medical Center (DHMC), a tertiary academic medical center in Lebanon, New Hampshire, has been participating in the NHCR from its start in 2004, and, among patients who have their information recorded in the NHCR, >30,000 are DHMC patients. The histology slides of these DHMC patients are stored at the Department of Pathology and Laboratory Medicine at DHMC and were available for the current study.
The sources of data in this project include the NHCR67,68 and pathology slides from the DHMC. A total of 2598 patients who underwent colonoscopy at the DHMC from 2004 to 2018 without a CRC diagnosis at the index visit, and who had digitized polyp slides from their baseline colonoscopies with CRC status reassessed after 5 years, were included in this study. After excluding 205 patients with missing clinical data, 2393 patients were included in the training and evaluation of the proposed models. This data set encompasses hematoxylin and eosin–stained whole-slide images (WSIs) with various types of polyps, along with patients’ clinical records from the NHCR.
Outcome
The NHCR and DHMC follow-up medical records are used to identify patients at high risk of CRC and to build the progression risk reference standard labels for patients. Based on polyp recurrence rate, CRC progression time, and the recommended frequency for follow-up colonoscopies,11,12 patients in this study who developed CRC, advanced adenomatous polyps, or serrated polyps with dysplasia in the 5 years after screening were considered as high risk; patients without those developments within 5 years were considered low risk. Advanced adenomatous polyps include polyps ≥1 cm, with villous components (tubulovillous adenoma/villous adenoma), or with high-grade dysplasia.69 Advanced adenomas and serrated polyps ≥1 cm or with dysplasia are known as surrogates for CRC and are widely used as indicators of high progression risk.70, 71, 72, 73, 74 The 5-year risk window is chosen to maximize the clinical utility, based on the use case in this project and the current guidelines for follow-up colonoscopy intervals.11,12
Clinical Features and Image Data
Under the review and approval of the Committee for the Protection of Human Subjects, the following information from the NHCR database was extracted: i) identifiers of patients with tissue removed during colonoscopy, including the pathology Case-ID, used to locate tissue slides and access WSIs; ii) relevant medical information; iii) types, numbers, and sizes of polyps identified in the index colonoscopy examination; and iv) outcome determination within 5 years after the index colonoscopy.
The medical information was collected from NHCR Procedure Forms, completed by endoscopists or endoscopy nurses at participating sites, and patient questionnaire responses. The NHCR database covers a comprehensive list of CRC risk factors based on peer-reviewed publications.65,66 The variables extracted from the NHCR database are summarized in five categories, as shown in Supplemental Table S1.
Hematoxylin and eosin–stained WSIs, scanned at DHMC (Aperio AT2; Leica Biosystems), were processed by using the MaskHIT pipeline. Briefly, a color thresholding technique was used to create tissue masks. Non-overlapping patches of size 224 μm × 224 μm (ie, 448 × 448 pixels) at a 20× magnification level (0.5 μm per pixel) were extracted, along with their positions on the WSI.
Risk Prediction Using WSIs
The MaskHIT architecture is used for predicting 5-year progression risk using WSIs. MaskHIT can effectively model the relative positional information of patches on a large region from the WSI. In the pretraining phase, the Masked AutoEncoder technique was used, first randomly masking out a portion of patches from the sampled region, then using the output from the transformer model to restore the feature representations of those masked locations. This process helps the model capture relationships between different patches and their histopathologic features and understand the context. The original MaskHIT model was pretrained using >10 cancer types from The Cancer Genome Atlas database (https://tcga-data.nci.nih.gov/docs/publications/tcga, last accessed October 30, 2025). The MaskHIT model achieved improved performance in cancer survival prediction and cancer subtype classification tasks compared with state-of-the-art models.
The workflow of the MaskHIT model involves the extraction of square regions, comprising up to 400 patches, from WSIs. Subsequently, a ResNet model, pretrained on ImageNet data,75 is used for feature extraction. The location information of the patches, along with the extracted features, is then fed into a transformer model, which comprises eight attention heads and 12 attention layers. The output of the transformer model yields a class token, serving as a representation of the entire region. Multiple regions can be sampled from a single WSI concurrently, and the class tokens are averaged to generate a global summarization of the WSI. This global summarization is then used for risk classification through a linear projection layer (Figure 1, A and C).
To tailor the MaskHIT model pretrained on The Cancer Genome Atlas database to the specific context of polyp analysis, an additional pretraining phase for 200 epochs was conducted, using the same pretraining methodology as previously described.76 During the fine-tuning stage, the model randomly sampled four nonoverlapping regions from each patient. Each region comprised up to 400 patches, each of size 448 × 448 pixels at a 20× magnification level. To mitigate computational costs of fine-tuning, 25% of the patches were randomly sampled from each region. During the evaluation phase, a maximum of 64 regions were sampled from each patient across all slides for that patient to estimate progression risk.
Beyond the direct risk prediction using WSIs, an alternative training approach, named “guided prediction” (Figure 1E), was explored. In this procedure, a MaskHIT model was initially fine-tuned to predict intermediate variables derived from patients’ pathology reports at the index visit. Subsequently, this model was used to predict patients’ future progression risk. Two strategies were compared: i) freezing the weights of the transformer model and exclusively fine-tuning the last linear projection layer for outcome prediction; and ii) fine-tuning both the transformer model weights and the last linear projection layer. The guided prediction approach uses intermediate variables only during the training procedure to assist the MaskHIT model in focusing on more relevant regions in the WSIs.
Combination of Clinical and Image Histopathologic Information
The nonimaging variables (Figure 1B) underwent preprocessing, in which continuous variables were standardized and categorical variables were one-hot encoded, resulting in a feature vector of dimension 69. Missing values were imputed by replacing them with either the average value for continuous variables or the most common class for categorical variables.
Common approaches for risk prediction, such as penalized logistic regression, multi-layer perceptron, and random forest, were explored for modeling nonimaging variables. Each approach was evaluated through cross-validation on a subset withheld from the training data to select the optimal architecture for modeling nonimaging variables (Figure 1D).
Various strategies for integrating nonimaging variables with WSI features were investigated, including feature-level aggregation and decision-level aggregation (Figure 1F). Feature-level aggregation, an early fusion technique, concatenated nonimaging features with those extracted from the transformer output of WSIs. Subsequently, patient outcomes were predicted by using multiple layers of linear projections. In decision-level fusion, predicted risk probabilities from nonimaging variables and WSIs were combined either through averaging or by assigning different weights to each component.
Evaluation
In this study, 25% of the data set (ie, slides and records for 600 patients) was held out as the test set for evaluation of the developed methods, and the remaining 75% was used as the training set. A fivefold cross-validation was conducted on the training data set for hyperparameter tuning. The area under the receiver-operating characteristic curve (AUC) was used to assess the model’s performance. To ensure a more robust estimate of model performance, the train/test splitting process was repeated 10 times, and the average performance along with the SD on the test splits were reported. Paired t-tests across repeated experiments were used to calculate the statistical significance (P < 0.05) of the compared methods.
Model Interpretation and Visualization
A significant limitation of current deep learning methods is their black-box nature, in which the focus is primarily on the efficacy of the final results, with little attention given to providing clear explanations or evidence of the factors that contribute to these outcomes. To address this issue and gain a deeper understanding of the pertinent regions on WSI influencing risk predictions, the difference in attention scores between the pretrained transformer model and the transformer model fine-tuned for outcome prediction was computed. These attention score differences were then color-coded and overlaid onto the WSIs, enabling insights into the shift in model attention for each specific outcome prediction task.
For the multi-modal fusion model that integrates nonimaging information with the WSI, interpretability was enhanced by calculating Shapley values for both nonimaging features and WSI risk predictions. These Shapley values were aggregated across repeated experiments, and the average scores were plotted for visualization, providing a transparent depiction of the contributions of each feature to the model’s predictions.
Study Population
The NHCR is a National Cancer Institute–funded, statewide registry that contains comprehensive longitudinal colonoscopy information from nearly all endoscopy sites in New Hampshire since 2004. It includes patient risk factors, such as: age; sex; personal and family history of polyps or CRC; weight; height; smoking status; alcohol consumption; endoscopy history; polyp sizes, locations, numbers, and treatment; pathology reports; follow-up recommendations; and follow-up outcomes.65,66 These data are extracted through a rigorous data collection effort from 31 participating practices in addition to questionnaire responses from patients. The NHCR is unique in the United States in terms of its detailed, population-based, longitudinal, and comprehensive data set. Dartmouth Hitchcock Medical Center (DHMC), a tertiary academic medical center in Lebanon, New Hampshire, has been participating in the NHCR from its start in 2004, and, among patients who have their information recorded in the NHCR, >30,000 are DHMC patients. The histology slides of these DHMC patients are stored at the Department of Pathology and Laboratory Medicine at DHMC and were available for the current study.
The sources of data in this project include the NHCR67,68 and pathology slides from the DHMC. A total of 2598 patients who underwent colonoscopy at the DHMC from 2004 to 2018 without a CRC diagnosis at the index visit, and who had digitized polyp slides from their baseline colonoscopies with CRC status reassessed after 5 years, were included in this study. After excluding 205 patients with missing clinical data, 2393 patients were included in the training and evaluation of the proposed models. This data set encompasses hematoxylin and eosin–stained whole-slide images (WSIs) with various types of polyps, along with patients’ clinical records from the NHCR.
Outcome
The NHCR and DHMC follow-up medical records are used to identify patients at high risk of CRC and to build the progression risk reference standard labels for patients. Based on polyp recurrence rate, CRC progression time, and the recommended frequency for follow-up colonoscopies,11,12 patients in this study who developed CRC, advanced adenomatous polyps, or serrated polyps with dysplasia in the 5 years after screening were considered as high risk; patients without those developments within 5 years were considered low risk. Advanced adenomatous polyps include polyps ≥1 cm, with villous components (tubulovillous adenoma/villous adenoma), or with high-grade dysplasia.69 Advanced adenomas and serrated polyps ≥1 cm or with dysplasia are known as surrogates for CRC and are widely used as indicators of high progression risk.70, 71, 72, 73, 74 The 5-year risk window is chosen to maximize the clinical utility, based on the use case in this project and the current guidelines for follow-up colonoscopy intervals.11,12
Clinical Features and Image Data
Under the review and approval of the Committee for the Protection of Human Subjects, the following information from the NHCR database was extracted: i) identifiers of patients with tissue removed during colonoscopy, including the pathology Case-ID, used to locate tissue slides and access WSIs; ii) relevant medical information; iii) types, numbers, and sizes of polyps identified in the index colonoscopy examination; and iv) outcome determination within 5 years after the index colonoscopy.
The medical information was collected from NHCR Procedure Forms, completed by endoscopists or endoscopy nurses at participating sites, and patient questionnaire responses. The NHCR database covers a comprehensive list of CRC risk factors based on peer-reviewed publications.65,66 The variables extracted from the NHCR database are summarized in five categories, as shown in Supplemental Table S1.
Hematoxylin and eosin–stained WSIs, scanned at DHMC (Aperio AT2; Leica Biosystems), were processed by using the MaskHIT pipeline. Briefly, a color thresholding technique was used to create tissue masks. Non-overlapping patches of size 224 μm × 224 μm (ie, 448 × 448 pixels) at a 20× magnification level (0.5 μm per pixel) were extracted, along with their positions on the WSI.
Risk Prediction Using WSIs
The MaskHIT architecture is used for predicting 5-year progression risk using WSIs. MaskHIT can effectively model the relative positional information of patches on a large region from the WSI. In the pretraining phase, the Masked AutoEncoder technique was used, first randomly masking out a portion of patches from the sampled region, then using the output from the transformer model to restore the feature representations of those masked locations. This process helps the model capture relationships between different patches and their histopathologic features and understand the context. The original MaskHIT model was pretrained using >10 cancer types from The Cancer Genome Atlas database (https://tcga-data.nci.nih.gov/docs/publications/tcga, last accessed October 30, 2025). The MaskHIT model achieved improved performance in cancer survival prediction and cancer subtype classification tasks compared with state-of-the-art models.
The workflow of the MaskHIT model involves the extraction of square regions, comprising up to 400 patches, from WSIs. Subsequently, a ResNet model, pretrained on ImageNet data,75 is used for feature extraction. The location information of the patches, along with the extracted features, is then fed into a transformer model, which comprises eight attention heads and 12 attention layers. The output of the transformer model yields a class token, serving as a representation of the entire region. Multiple regions can be sampled from a single WSI concurrently, and the class tokens are averaged to generate a global summarization of the WSI. This global summarization is then used for risk classification through a linear projection layer (Figure 1, A and C).
To tailor the MaskHIT model pretrained on The Cancer Genome Atlas database to the specific context of polyp analysis, an additional pretraining phase for 200 epochs was conducted, using the same pretraining methodology as previously described.76 During the fine-tuning stage, the model randomly sampled four nonoverlapping regions from each patient. Each region comprised up to 400 patches, each of size 448 × 448 pixels at a 20× magnification level. To mitigate computational costs of fine-tuning, 25% of the patches were randomly sampled from each region. During the evaluation phase, a maximum of 64 regions were sampled from each patient across all slides for that patient to estimate progression risk.
Beyond the direct risk prediction using WSIs, an alternative training approach, named “guided prediction” (Figure 1E), was explored. In this procedure, a MaskHIT model was initially fine-tuned to predict intermediate variables derived from patients’ pathology reports at the index visit. Subsequently, this model was used to predict patients’ future progression risk. Two strategies were compared: i) freezing the weights of the transformer model and exclusively fine-tuning the last linear projection layer for outcome prediction; and ii) fine-tuning both the transformer model weights and the last linear projection layer. The guided prediction approach uses intermediate variables only during the training procedure to assist the MaskHIT model in focusing on more relevant regions in the WSIs.
Combination of Clinical and Image Histopathologic Information
The nonimaging variables (Figure 1B) underwent preprocessing, in which continuous variables were standardized and categorical variables were one-hot encoded, resulting in a feature vector of dimension 69. Missing values were imputed by replacing them with either the average value for continuous variables or the most common class for categorical variables.
Common approaches for risk prediction, such as penalized logistic regression, multi-layer perceptron, and random forest, were explored for modeling nonimaging variables. Each approach was evaluated through cross-validation on a subset withheld from the training data to select the optimal architecture for modeling nonimaging variables (Figure 1D).
Various strategies for integrating nonimaging variables with WSI features were investigated, including feature-level aggregation and decision-level aggregation (Figure 1F). Feature-level aggregation, an early fusion technique, concatenated nonimaging features with those extracted from the transformer output of WSIs. Subsequently, patient outcomes were predicted by using multiple layers of linear projections. In decision-level fusion, predicted risk probabilities from nonimaging variables and WSIs were combined either through averaging or by assigning different weights to each component.
Evaluation
In this study, 25% of the data set (ie, slides and records for 600 patients) was held out as the test set for evaluation of the developed methods, and the remaining 75% was used as the training set. A fivefold cross-validation was conducted on the training data set for hyperparameter tuning. The area under the receiver-operating characteristic curve (AUC) was used to assess the model’s performance. To ensure a more robust estimate of model performance, the train/test splitting process was repeated 10 times, and the average performance along with the SD on the test splits were reported. Paired t-tests across repeated experiments were used to calculate the statistical significance (P < 0.05) of the compared methods.
Model Interpretation and Visualization
A significant limitation of current deep learning methods is their black-box nature, in which the focus is primarily on the efficacy of the final results, with little attention given to providing clear explanations or evidence of the factors that contribute to these outcomes. To address this issue and gain a deeper understanding of the pertinent regions on WSI influencing risk predictions, the difference in attention scores between the pretrained transformer model and the transformer model fine-tuned for outcome prediction was computed. These attention score differences were then color-coded and overlaid onto the WSIs, enabling insights into the shift in model attention for each specific outcome prediction task.
For the multi-modal fusion model that integrates nonimaging information with the WSI, interpretability was enhanced by calculating Shapley values for both nonimaging features and WSI risk predictions. These Shapley values were aggregated across repeated experiments, and the average scores were plotted for visualization, providing a transparent depiction of the contributions of each feature to the model’s predictions.
Results
Results
Description of Study Population
A description of the demographic features of the study population is presented in Supplemental Table S2. Of 2393 patients, 1994 (83.3%) remained in the low-risk category after 5 years, whereas 399 (16.7%) developed high-risk findings. The patients who developed high progression risk in 5 years were significantly older than those who remained in the low progression risk category (62.0 years versus 58.7 years; P < 0.001) and were more likely to be male (60.7% versus 51.9%; P = 0.002). Much of the study population was non-Hispanic White, and the distribution of race and ethnicity did not differ by risk group. Descriptions of other groups of variables are provided in Supplemental Tables S3 to S7.
Risk Prediction Using WSIs
In the direct prediction of 5-year progression risk using WSIs, the MaskHIT model attained an average AUC of 0.615. Multiple intermediate variables were evaluated, including size and number of adenomas, size and number of serrated lesions, most advanced serrated lesion, most advanced adenoma, and all of them combined (Table 1).
MaskHIT exhibited robust predictive performance for various intermediate variables, with notable AUC values. The highest AUC was achieved when predicting the most advanced serrated lesion (AUC, 0.927 ± 0.007), followed closely by predictions for the most advanced adenoma (AUC, 0.902 ± 0.004). The prediction of the number of adenomas found in colonoscopy yielded a slightly lower AUC at 0.800 ± 0.007. Overall, MaskHIT exhibited effective predictive capabilities across a range of intermediate variables.
These colonoscopy findings can predict 5-year progression risk with various performances (Table 1). Measurements of the size and number of adenomas were better at predicting 5-year progression risk than measurements of serrated lesions. The best predictor among them was number of adenomas (AUC, 0.643 ± 0.029), whereas the AUCs obtained using measurements of serrated lesions were no better than a random guess. Measurements including most advanced adenoma or serrated lesion, although still contributing to prediction, achieved an AUC of approximately 0.55 in forecasting 5-year progression risk.
Using the guided attention approach, in which the MaskHIT model was initially fine-tuned for intermediate variables and subsequently fine-tuned for 5-year progression risk prediction, most intermediate variables exhibited an enhancement in outcome prediction performance. The best performance was observed when using the size of the largest known serrated lesion as the intermediate variable, achieving an AUC of 0.629 ± 0.016, although this variable itself cannot predict 5-year progression risk better than a random guess. When incorporating all colonoscopy variables as intermediate variables, the MaskHIT model achieved an average AUC of 0.622 ± 0.015 when the transformer backend was frozen. Further fine-tuning the transformer backend for risk prediction resulted in an average AUC of 0.630 ± 0.016), representing a statistically significant improvement compared with the direct prediction approach.
Risk Prediction Using Medical Records
The performance comparison of L2 penalized logistic regression, random forest, and neural network models for predicting 5-year progression risk using nonimaging variables is presented in Table 2. There was no clear winner among these three prediction methods. Variables extracted from the index colonoscopy examinations displayed the best performance in predicting 5-year progression risk (AUC, 0.654–0.662), followed by personal history–related variables (AUC, 0.588–0.593). Previous colonoscopy history variables showed limited capability to predict 5-year progression risk AUC with an AUC of 0.514 to 0.547. However, medical and family history variables did not seem to contribute significantly to progression risk prediction.
Multi-Modal Prediction
Table 3 compares different fusion strategies, including decision-level average and weighting, and the incorporation of WSI-predicted risk score and WSI-extracted features with nonimaging features. The results were stratified according to the strategy of fine-tuning the MaskHIT model for 5-year risk prediction. In both cases, the best multi-modal fusion performance was achieved when using the weighted average of the independent probabilities from WSIs and the nonimaging information (direction prediction training AUC, 0.669 ± 0.019; guided prediction training AUC, 0.672 ± 0.019). On average, decision-level fusion not only provides improved performance but also lower variation across the 10 repeated experiments compared with feature-level fusion.
In Table 4, the 5-year progression risk prediction performances resulting from diverse combinations of medical records, colonoscopy findings, and WSI risk predictions are presented. In this experiment, weighted decisions were used to fuse WSI-based predicted probabilities with the predicted probability from nonimaging features. Using medical record variables (all nonimaging variables excluding the index colonoscopy findings) or colonoscopy-only findings yielded an average AUC of 0.592 ± 0.032 and 0.662 ± 0.030, respectively, whereas the combination of both showed some improvements (AUC, 0.666 ± 0.023).
Incorporating WSI-predicted risk scores led to noteworthy improvements. Specifically, the combination of colonoscopy findings with WSI risk scores presented an average AUC value of 0.668 ± 0.025, whereas the combination of all three modalities further improved the AUC to 0.672 ± 0.019; both improvements were statistically significant (P = 0.037 and 0.002, respectively).
Model Interpretation
Attention Map Visualization
The attention maps obtained from the MaskHIT model for two representative WSIs are presented in Figure 2A for a high-risk patient and Figure 2B for a low-risk patient. The visualization reveals that the MaskHIT model tends to focus more on the structures of polyps within the WSIs. Interestingly, the high attention areas appear similar regardless of whether guided fine-tuning methods were used.
The intensity of attention weights from the direct prediction approach and the guided prediction approach was further examined by calculating the difference in attention weights between these two methods. The results are presented in panel “Ag-Ad” of Figure 2. The redder color in these panels indicates that the highlighted region received higher attention from the guided prediction model compared with the direct prediction model. This visualization shows that the regions receiving higher attention from the guided prediction model generally align with the regions attended by both the direct prediction model and the guided prediction model. In essence, the guided prediction model exhibited greater confidence in assigning weights to regions that were deemed important for risk prediction.
Feature Importance Ranking
The top 10 most important features influencing the output of the final fusion model are presented in Figure 3. The most influential feature is the number of adenomas, showing a positive association with 5-year progression risk. Notably, the predicted risk probability from the WSI was ranked as the third most important feature in the fusion model, exceeded only by the number of adenomas and age.
Description of Study Population
A description of the demographic features of the study population is presented in Supplemental Table S2. Of 2393 patients, 1994 (83.3%) remained in the low-risk category after 5 years, whereas 399 (16.7%) developed high-risk findings. The patients who developed high progression risk in 5 years were significantly older than those who remained in the low progression risk category (62.0 years versus 58.7 years; P < 0.001) and were more likely to be male (60.7% versus 51.9%; P = 0.002). Much of the study population was non-Hispanic White, and the distribution of race and ethnicity did not differ by risk group. Descriptions of other groups of variables are provided in Supplemental Tables S3 to S7.
Risk Prediction Using WSIs
In the direct prediction of 5-year progression risk using WSIs, the MaskHIT model attained an average AUC of 0.615. Multiple intermediate variables were evaluated, including size and number of adenomas, size and number of serrated lesions, most advanced serrated lesion, most advanced adenoma, and all of them combined (Table 1).
MaskHIT exhibited robust predictive performance for various intermediate variables, with notable AUC values. The highest AUC was achieved when predicting the most advanced serrated lesion (AUC, 0.927 ± 0.007), followed closely by predictions for the most advanced adenoma (AUC, 0.902 ± 0.004). The prediction of the number of adenomas found in colonoscopy yielded a slightly lower AUC at 0.800 ± 0.007. Overall, MaskHIT exhibited effective predictive capabilities across a range of intermediate variables.
These colonoscopy findings can predict 5-year progression risk with various performances (Table 1). Measurements of the size and number of adenomas were better at predicting 5-year progression risk than measurements of serrated lesions. The best predictor among them was number of adenomas (AUC, 0.643 ± 0.029), whereas the AUCs obtained using measurements of serrated lesions were no better than a random guess. Measurements including most advanced adenoma or serrated lesion, although still contributing to prediction, achieved an AUC of approximately 0.55 in forecasting 5-year progression risk.
Using the guided attention approach, in which the MaskHIT model was initially fine-tuned for intermediate variables and subsequently fine-tuned for 5-year progression risk prediction, most intermediate variables exhibited an enhancement in outcome prediction performance. The best performance was observed when using the size of the largest known serrated lesion as the intermediate variable, achieving an AUC of 0.629 ± 0.016, although this variable itself cannot predict 5-year progression risk better than a random guess. When incorporating all colonoscopy variables as intermediate variables, the MaskHIT model achieved an average AUC of 0.622 ± 0.015 when the transformer backend was frozen. Further fine-tuning the transformer backend for risk prediction resulted in an average AUC of 0.630 ± 0.016), representing a statistically significant improvement compared with the direct prediction approach.
Risk Prediction Using Medical Records
The performance comparison of L2 penalized logistic regression, random forest, and neural network models for predicting 5-year progression risk using nonimaging variables is presented in Table 2. There was no clear winner among these three prediction methods. Variables extracted from the index colonoscopy examinations displayed the best performance in predicting 5-year progression risk (AUC, 0.654–0.662), followed by personal history–related variables (AUC, 0.588–0.593). Previous colonoscopy history variables showed limited capability to predict 5-year progression risk AUC with an AUC of 0.514 to 0.547. However, medical and family history variables did not seem to contribute significantly to progression risk prediction.
Multi-Modal Prediction
Table 3 compares different fusion strategies, including decision-level average and weighting, and the incorporation of WSI-predicted risk score and WSI-extracted features with nonimaging features. The results were stratified according to the strategy of fine-tuning the MaskHIT model for 5-year risk prediction. In both cases, the best multi-modal fusion performance was achieved when using the weighted average of the independent probabilities from WSIs and the nonimaging information (direction prediction training AUC, 0.669 ± 0.019; guided prediction training AUC, 0.672 ± 0.019). On average, decision-level fusion not only provides improved performance but also lower variation across the 10 repeated experiments compared with feature-level fusion.
In Table 4, the 5-year progression risk prediction performances resulting from diverse combinations of medical records, colonoscopy findings, and WSI risk predictions are presented. In this experiment, weighted decisions were used to fuse WSI-based predicted probabilities with the predicted probability from nonimaging features. Using medical record variables (all nonimaging variables excluding the index colonoscopy findings) or colonoscopy-only findings yielded an average AUC of 0.592 ± 0.032 and 0.662 ± 0.030, respectively, whereas the combination of both showed some improvements (AUC, 0.666 ± 0.023).
Incorporating WSI-predicted risk scores led to noteworthy improvements. Specifically, the combination of colonoscopy findings with WSI risk scores presented an average AUC value of 0.668 ± 0.025, whereas the combination of all three modalities further improved the AUC to 0.672 ± 0.019; both improvements were statistically significant (P = 0.037 and 0.002, respectively).
Model Interpretation
Attention Map Visualization
The attention maps obtained from the MaskHIT model for two representative WSIs are presented in Figure 2A for a high-risk patient and Figure 2B for a low-risk patient. The visualization reveals that the MaskHIT model tends to focus more on the structures of polyps within the WSIs. Interestingly, the high attention areas appear similar regardless of whether guided fine-tuning methods were used.
The intensity of attention weights from the direct prediction approach and the guided prediction approach was further examined by calculating the difference in attention weights between these two methods. The results are presented in panel “Ag-Ad” of Figure 2. The redder color in these panels indicates that the highlighted region received higher attention from the guided prediction model compared with the direct prediction model. This visualization shows that the regions receiving higher attention from the guided prediction model generally align with the regions attended by both the direct prediction model and the guided prediction model. In essence, the guided prediction model exhibited greater confidence in assigning weights to regions that were deemed important for risk prediction.
Feature Importance Ranking
The top 10 most important features influencing the output of the final fusion model are presented in Figure 3. The most influential feature is the number of adenomas, showing a positive association with 5-year progression risk. Notably, the predicted risk probability from the WSI was ranked as the third most important feature in the fusion model, exceeded only by the number of adenomas and age.
Discussion
Discussion
The accurate prediction of future progression risk is crucial for informed decisions regarding follow-up colonoscopy visits. Existing guidelines recommend leveraging polyp characteristics identified in colonoscopy examinations, as well as some personal and family history risk factors, for patient risk stratification to determine the timing of subsequent colonoscopies.11 This study sought to advance future progression risk prediction by integrating automatic deep learning–based analysis of WSIs and incorporating CRC-related medical information in a predictive multi-modal pipeline.
Relying exclusively on colonoscopy findings resulted in an average AUC of 0.662. However, by incorporating deep learning–predicted probabilities and information from medical records, a statistically significant improvement in the prediction AUC to 0.672 was observed. This finding underscores the potential of leveraging advanced computational techniques and multi-modal data fusion to enhance progression risk assessment beyond conventional guidelines. Such an approach provides a more robust foundation for personalized and effective follow-up strategies in clinical practice.
To enhance the prediction performance using WSIs, the recently developed model MaskHIT was adopted and adapted. MaskHIT is a transformer-based method that leverages the location information of patches extracted from the entire slide. The unique aspect of the transformer model as a patch-level feature fusion technique lies in its capacity to incorporate spatial details, enabling the deep learning model to capture high-level structural information of the polyps. This approach stands in contrast to commonly used multiple-instance learning approaches, offering a more nuanced and comprehensive representation of the intricate characteristics of colorectal polyps in the predictive model.
In addition, experiments involving a guided prediction approach to improve the transformer model for 5-year progression risk prediction were conducted. As shown in Table 1, predicting 5-year progression risk using WSIs is challenging, as many factors beyond histopathologic features from colonoscopy examinations can influence future progression. Consequently, the MaskHIT model may face difficulties in accurately identifying visual features linked to progression risk in this complex context.
To tackle this challenge, a guided prediction approach was adopted, enabling the transformer model to first predict histopathologic features derived from the colonoscopy examination. Notably, MaskHIT exhibited strong performance in this task, with AUCs exceeding 0.8 and considerably smaller SDs than those in risk prediction tasks. Subsequently, fine-tuning the MaskHIT model for risk prediction led to a statistically significant improvement compared with the direct prediction approach. Interestingly, certain variables, although ineffective at predicting future progression risk independently, also contributed to enhancing MaskHIT’s accuracy in 5-year progression risk prediction.
Attention map visualizations supported the hypothesis, revealing that the guided prediction model assigned greater attention weights to locations relevant for risk prediction (ie, polyps) compared with the direct prediction model. This nuanced approach shows the effectiveness of leveraging the guided prediction approach to enhance the interpretability and performance of deep learning models in the context of 5-year progression risk prediction from WSIs.
Further exploration of various approaches for combining information from colonoscopy examinations, WSIs, and medical records was explored. In general, decision-level fusion produced superior results compared with models combining nonimage features with risk predictions from the slides. Due to the high predictive value of colonoscopy variables, the signal from WSI predictions can be easily overwhelmed by noise in clinical features, a known issue in multi-modal fusion.77 However, through the application of decision-level fusion techniques, this challenge can be addressed, resulting in improved outcomes compared with using either modality in isolation, consistent with findings in previous studies.62,63
As future steps, the multi-modal progression risk model will be validated using additional data sets, including prospective and multicenter cohorts. The ultimate goal is to integrate the model-derived risk score into existing guideline-based risk stratification for patients undergoing screening or surveillance colonoscopy. By complementing established criteria, such as polyp type, size, and number, the model could help personalize follow-up intervals, identifying patients who may benefit from earlier surveillance as well as those suitable for extended intervals. The potential health outcomes and cost implications of such an approach could also be assessed through follow-up clinical trials and prospective studies.
The accurate prediction of future progression risk is crucial for informed decisions regarding follow-up colonoscopy visits. Existing guidelines recommend leveraging polyp characteristics identified in colonoscopy examinations, as well as some personal and family history risk factors, for patient risk stratification to determine the timing of subsequent colonoscopies.11 This study sought to advance future progression risk prediction by integrating automatic deep learning–based analysis of WSIs and incorporating CRC-related medical information in a predictive multi-modal pipeline.
Relying exclusively on colonoscopy findings resulted in an average AUC of 0.662. However, by incorporating deep learning–predicted probabilities and information from medical records, a statistically significant improvement in the prediction AUC to 0.672 was observed. This finding underscores the potential of leveraging advanced computational techniques and multi-modal data fusion to enhance progression risk assessment beyond conventional guidelines. Such an approach provides a more robust foundation for personalized and effective follow-up strategies in clinical practice.
To enhance the prediction performance using WSIs, the recently developed model MaskHIT was adopted and adapted. MaskHIT is a transformer-based method that leverages the location information of patches extracted from the entire slide. The unique aspect of the transformer model as a patch-level feature fusion technique lies in its capacity to incorporate spatial details, enabling the deep learning model to capture high-level structural information of the polyps. This approach stands in contrast to commonly used multiple-instance learning approaches, offering a more nuanced and comprehensive representation of the intricate characteristics of colorectal polyps in the predictive model.
In addition, experiments involving a guided prediction approach to improve the transformer model for 5-year progression risk prediction were conducted. As shown in Table 1, predicting 5-year progression risk using WSIs is challenging, as many factors beyond histopathologic features from colonoscopy examinations can influence future progression. Consequently, the MaskHIT model may face difficulties in accurately identifying visual features linked to progression risk in this complex context.
To tackle this challenge, a guided prediction approach was adopted, enabling the transformer model to first predict histopathologic features derived from the colonoscopy examination. Notably, MaskHIT exhibited strong performance in this task, with AUCs exceeding 0.8 and considerably smaller SDs than those in risk prediction tasks. Subsequently, fine-tuning the MaskHIT model for risk prediction led to a statistically significant improvement compared with the direct prediction approach. Interestingly, certain variables, although ineffective at predicting future progression risk independently, also contributed to enhancing MaskHIT’s accuracy in 5-year progression risk prediction.
Attention map visualizations supported the hypothesis, revealing that the guided prediction model assigned greater attention weights to locations relevant for risk prediction (ie, polyps) compared with the direct prediction model. This nuanced approach shows the effectiveness of leveraging the guided prediction approach to enhance the interpretability and performance of deep learning models in the context of 5-year progression risk prediction from WSIs.
Further exploration of various approaches for combining information from colonoscopy examinations, WSIs, and medical records was explored. In general, decision-level fusion produced superior results compared with models combining nonimage features with risk predictions from the slides. Due to the high predictive value of colonoscopy variables, the signal from WSI predictions can be easily overwhelmed by noise in clinical features, a known issue in multi-modal fusion.77 However, through the application of decision-level fusion techniques, this challenge can be addressed, resulting in improved outcomes compared with using either modality in isolation, consistent with findings in previous studies.62,63
As future steps, the multi-modal progression risk model will be validated using additional data sets, including prospective and multicenter cohorts. The ultimate goal is to integrate the model-derived risk score into existing guideline-based risk stratification for patients undergoing screening or surveillance colonoscopy. By complementing established criteria, such as polyp type, size, and number, the model could help personalize follow-up intervals, identifying patients who may benefit from earlier surveillance as well as those suitable for extended intervals. The potential health outcomes and cost implications of such an approach could also be assessed through follow-up clinical trials and prospective studies.
Conclusions
Conclusions
In this study, the integration of the transformer-predicted risk score and additional clinical information resulted in an improvement in the performance of progression risk stratification. Notably, variables describing colonoscopy and microscopy findings of polyps were identified as contributors to enhanced performance in predicting 5-year progression risk using deep learning models. Despite its simplicity in multi-modal fusion, decision-level fusion showed superior performance improvements when combining imaging and nonimaging information. Future research is essential to refine deep learning methods to include more related clinical information and to evaluate the additional benefits of an accurate progression risk stratification in colonoscopy screening programs.
In this study, the integration of the transformer-predicted risk score and additional clinical information resulted in an improvement in the performance of progression risk stratification. Notably, variables describing colonoscopy and microscopy findings of polyps were identified as contributors to enhanced performance in predicting 5-year progression risk using deep learning models. Despite its simplicity in multi-modal fusion, decision-level fusion showed superior performance improvements when combining imaging and nonimaging information. Future research is essential to refine deep learning methods to include more related clinical information and to evaluate the additional benefits of an accurate progression risk stratification in colonoscopy screening programs.
Disclosure Statement
Disclosure Statement
None declared.
None declared.
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- A Phase I Study of Hydroxychloroquine and Suba-Itraconazole in Men with Biochemical Relapse of Prostate Cancer (HITMAN-PC): Dose Escalation Results.
- Self-management of male urinary symptoms: qualitative findings from a primary care trial.
- Clinical and Liquid Biomarkers of 20-Year Prostate Cancer Risk in Men Aged 45 to 70 Years.
- Diagnostic accuracy of Ga-PSMA PET/CT versus multiparametric MRI for preoperative pelvic invasion in the patients with prostate cancer.
- Comprehensive analysis of androgen receptor splice variant target gene expression in prostate cancer.
- Clinical Presentation and Outcomes of Patients Undergoing Surgery for Thyroid Cancer.