Benchmarking pathology foundation models for predicting microsatellite instability in colorectal cancer histopathology.
The rapid evolution of pathology foundation models necessitates rigorous benchmarking for clinical tasks.
- 표본수 (n) 409
APA
Bilal M, Gulzar MA, et al. (2026). Benchmarking pathology foundation models for predicting microsatellite instability in colorectal cancer histopathology.. Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society, 127, 102680. https://doi.org/10.1016/j.compmedimag.2025.102680
MLA
Bilal M, et al.. "Benchmarking pathology foundation models for predicting microsatellite instability in colorectal cancer histopathology.." Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society, vol. 127, 2026, pp. 102680.
PMID
41352179
Abstract
The rapid evolution of pathology foundation models necessitates rigorous benchmarking for clinical tasks. We evaluated three leading foundation models, UNI, Virchow2, and CONCH, for predicting microsatellite instability status from colorectal cancer whole-slide images, an essential routine clinical test. Our comprehensive framework assessed stain, tissue, and resolution invariance using datasets from The Cancer Genome Atlas (TCGA, USA; n = 409) and Pathology Artificial Intelligence Platform (PAIP, South Korea; training n = 47, testing n = 21 and n = 78). We developed an efficient pipeline with minimal preprocessing, omitting stain normalization, color augmentation, and tumor segmentation. To improve contextual encoding, we applied a five-crop strategy per patch, averaging embeddings from the center and four peripheral crops. We compared three slide-level aggregation and four efficient adaptation strategies. CONCH, using 2-cluster aggregation and ProtoNet adaptation, achieved top balanced accuracies (0.775 and 0.778) in external validation on PAIP. Conversely, UNI, with mean-averaging aggregation and ANN adaptation, excelled in TCGA cross-validation (0.778) but not in external validation (0.764), suggesting potential overfitting. The proposed 5-Crop augmentation enhances robustness to scale in UNI and CONCH and reflects intrinsic invariance achieved by Virchow2 through large-scale pretraining. For prescreening, CONCH demonstrated specificity of 0.65 and 0.45 at sensitivities of 0.90 and 0.94, respectively, highlighting its effectiveness in identifying stable cases and minimizing number of rapid molecular tests needed. Interestingly, a fine-tuned ResNet34 adaptation achieved superior performance (0.836) in the smaller internal validation cohort, suggesting current pathology foundation models training recipes may not sufficiently generalize without task-specific fine-tuning. Interpretability analyses using CONCH's multimodal embeddings identified plasma cells as key morphological features differentiating microsatellite instability from stability, validated by pathologists (accuracy up to 92.4 %). This study underscores the feasibility and clinical significance of adapting foundation models to enhance diagnostic efficiency and patient outcomes.
MeSH Terms
Microsatellite Instability; Humans; Colorectal Neoplasms; Benchmarking; Artificial Intelligence