Bioinformatics combined with machine learning for the identification of malignant transformation markers in colorectal polyps.
[BACKGROUND] Colorectal polyps, as crucial precancerous lesions of colorectal cancer (CRC), have incompletely clarified origin and evolutionary mechanisms, which restrict the early prevention and cont
APA
He T, Liang C, et al. (2026). Bioinformatics combined with machine learning for the identification of malignant transformation markers in colorectal polyps.. Frontiers in molecular biosciences, 13, 1785464. https://doi.org/10.3389/fmolb.2026.1785464
MLA
He T, et al.. "Bioinformatics combined with machine learning for the identification of malignant transformation markers in colorectal polyps.." Frontiers in molecular biosciences, vol. 13, 2026, pp. 1785464.
PMID
41953046
Abstract
[BACKGROUND] Colorectal polyps, as crucial precancerous lesions of colorectal cancer (CRC), have incompletely clarified origin and evolutionary mechanisms, which restrict the early prevention and control of CRC. This study aimed to screen core genes regulating colorectal tumorigenesis and construct a reliable diagnostic model for CRC.
[METHODS] The edgeR package and weighted gene co-expression network analysis (WGCNA) were first used to analyze the GSE209741 dataset to identify differentially expressed genes (DEGs) and module genes, followed by functional enrichment analysis to reveal core biological pathways and functions. Combined with the GSE161277 single-cell RNA sequencing dataset, 57 epithelial cell-specific regulatory molecules were screened. Based on the TCGA-COADREAD cohort, feature genes were selected by the combined application of the Boruta algorithm, LASSO regression and XGBoost model. Finally, a ridge regression diagnostic model was established using six core genes (EIF2S3, GTF3A, HMGA1, HSP90AB1, PABPC1, S100A11), and its performance was verified in the internal validation set and the external independent cohort GSE41258. Meanwhile, the UALCAN database was used to validate the protein expression levels of core genes in tumor tissues, survival analysis was performed to explore their correlation with CRC prognosis, and qRT-PCR was applied to verify the mRNA expression differences of the six core genes between CRC cell lines (SW480, HCT116) and the normal colorectal epithelial cell line NCM460.
[RESULTS] The diagnostic model exhibited excellent diagnostic efficacy in both internal and external datasets. The UALCAN database confirmed that the protein expression of the six genes was significantly upregulated in CRC tissues. Survival analysis revealed that high expression of EIF2S3 and S100A11 was associated with poor prognosis in CRC patients. qRT-PCR further verified that the mRNA expression levels of the six core genes were significantly elevated in CRC cell lines.
[CONCLUSION] This study identified six key genes regulating colorectal tumorigenesis and constructed a high-performance diagnostic model. These findings provide novel insights into the molecular mechanisms underlying the initiation and progression of CRC, and offer potential biomarkers and therapeutic targets for the clinical diagnosis and treatment of CRC.
[METHODS] The edgeR package and weighted gene co-expression network analysis (WGCNA) were first used to analyze the GSE209741 dataset to identify differentially expressed genes (DEGs) and module genes, followed by functional enrichment analysis to reveal core biological pathways and functions. Combined with the GSE161277 single-cell RNA sequencing dataset, 57 epithelial cell-specific regulatory molecules were screened. Based on the TCGA-COADREAD cohort, feature genes were selected by the combined application of the Boruta algorithm, LASSO regression and XGBoost model. Finally, a ridge regression diagnostic model was established using six core genes (EIF2S3, GTF3A, HMGA1, HSP90AB1, PABPC1, S100A11), and its performance was verified in the internal validation set and the external independent cohort GSE41258. Meanwhile, the UALCAN database was used to validate the protein expression levels of core genes in tumor tissues, survival analysis was performed to explore their correlation with CRC prognosis, and qRT-PCR was applied to verify the mRNA expression differences of the six core genes between CRC cell lines (SW480, HCT116) and the normal colorectal epithelial cell line NCM460.
[RESULTS] The diagnostic model exhibited excellent diagnostic efficacy in both internal and external datasets. The UALCAN database confirmed that the protein expression of the six genes was significantly upregulated in CRC tissues. Survival analysis revealed that high expression of EIF2S3 and S100A11 was associated with poor prognosis in CRC patients. qRT-PCR further verified that the mRNA expression levels of the six core genes were significantly elevated in CRC cell lines.
[CONCLUSION] This study identified six key genes regulating colorectal tumorigenesis and constructed a high-performance diagnostic model. These findings provide novel insights into the molecular mechanisms underlying the initiation and progression of CRC, and offer potential biomarkers and therapeutic targets for the clinical diagnosis and treatment of CRC.
같은 제1저자의 인용 많은 논문 (5)
- Value of Multitracer Imaging in Hepatocellular Carcinomas with Different Metastatic Potential.
- Advancements in the study of exosomes in disease diagnosis and treatment.
- Stable Disease without Tumor Shrink Cannot Benefit from Surgery following Immune-Based Therapy in Potentially Resectable Hepatocellular Carcinoma.
- Dual targeting of lipid metabolic reprogramming and immunosuppressive sentinel lymph nodes potentiates anti-metastatic therapy for triple negative breast cancer.
- Twenty-four-week anti-PD-1 antibody regimen promoted HBsAg reduction and concurrently enhanced HBV-specific T cell responses in patients with chronic hepatitis B.