Uniform processing of diverse sequencing data: A cross-population comparison of colon cancer genomic landscapes from The Cancer Genome Atlas and a Chinese cohort.
[BACKGROUND] Colon cancer is genetically heterogeneous, necessitating standardized genomic analyses for cross-cohort comparisons.
APA
Chang HY, Huang CJ, et al. (2026). Uniform processing of diverse sequencing data: A cross-population comparison of colon cancer genomic landscapes from The Cancer Genome Atlas and a Chinese cohort.. Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology. https://doi.org/10.1158/1055-9965.EPI-24-1709
MLA
Chang HY, et al.. "Uniform processing of diverse sequencing data: A cross-population comparison of colon cancer genomic landscapes from The Cancer Genome Atlas and a Chinese cohort.." Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology, 2026.
PMID
41824468
Abstract
[BACKGROUND] Colon cancer is genetically heterogeneous, necessitating standardized genomic analyses for cross-cohort comparisons. While The Cancer Genome Atlas (TCGA)-colon adenocarcinoma (COAD) is a widely used dataset, its comparability to other ethnically different populations remains unclear. This study systematically compares the genomic characteristics of TCGA-COAD and ChangKang, a Chinese colon cancer cohort, using an identical data-processing pipeline to minimize methodological biases.
[METHODS] Whole-exome sequencing data from both cohorts were uniformly processed to analyze five key genomic features: tumor mutation burden (TMB), microsatellite instability (MSI), significantly mutated genes, mutational signatures, and copy number variation (CNV). Samples were classified into hypermutated and non-hypermutated subgroups for further comparisons.
[RESULTS] The TCGA-COAD cohort exhibited a higher overall TMB, driven by a greater proportion of hypermutated samples. However, the hypermutated subgroup of the ChangKang cohort included more ultramutated cases with POLE exonuclease domain mutations, leading to a higher subgroup TMB. MSI was more prevalent in TCGA-COAD, while significantly mutated gene frequencies varied, with lower APC and ACVR2A mutation rates in the ChangKang cohort. CNV patterns were largely similar, though CNV frequencies were higher in TCGA-COAD.
[CONCLUSIONS] Despite differences in subgroup distributions and mutation frequencies, the overall genomic characteristics of colon cancer remain consistent between these ethnically different cohorts. This suggests that cross-population analyses are feasible when standardized processing methods are applied.
[IMPACT] This study provides a systematic, unbiased comparison of TCGA-COAD and the Chinese ChangKang cohort, demonstrating that the genomic characteristics remain largely consistent across ethnically distinct populations.
[METHODS] Whole-exome sequencing data from both cohorts were uniformly processed to analyze five key genomic features: tumor mutation burden (TMB), microsatellite instability (MSI), significantly mutated genes, mutational signatures, and copy number variation (CNV). Samples were classified into hypermutated and non-hypermutated subgroups for further comparisons.
[RESULTS] The TCGA-COAD cohort exhibited a higher overall TMB, driven by a greater proportion of hypermutated samples. However, the hypermutated subgroup of the ChangKang cohort included more ultramutated cases with POLE exonuclease domain mutations, leading to a higher subgroup TMB. MSI was more prevalent in TCGA-COAD, while significantly mutated gene frequencies varied, with lower APC and ACVR2A mutation rates in the ChangKang cohort. CNV patterns were largely similar, though CNV frequencies were higher in TCGA-COAD.
[CONCLUSIONS] Despite differences in subgroup distributions and mutation frequencies, the overall genomic characteristics of colon cancer remain consistent between these ethnically different cohorts. This suggests that cross-population analyses are feasible when standardized processing methods are applied.
[IMPACT] This study provides a systematic, unbiased comparison of TCGA-COAD and the Chinese ChangKang cohort, demonstrating that the genomic characteristics remain largely consistent across ethnically distinct populations.