본문으로 건너뛰기
← 뒤로

Leveraging Machine Learning for Severity Level-Wise Biomarker Identification in Prostate Cancer Microarray Gene Expression Data.

Biomedicines 2025 Vol.13(10)

Marouf AA, Bismar TA, Ghosh S, Rokne JG, Alhajj R

📝 환자 설명용 한 줄

Prostate cancer is the most commonly occurring cancer amongst men.

이 논문을 인용하기

BibTeX ↓ RIS ↓
APA Marouf AA, Bismar TA, et al. (2025). Leveraging Machine Learning for Severity Level-Wise Biomarker Identification in Prostate Cancer Microarray Gene Expression Data.. Biomedicines, 13(10). https://doi.org/10.3390/biomedicines13102350
MLA Marouf AA, et al.. "Leveraging Machine Learning for Severity Level-Wise Biomarker Identification in Prostate Cancer Microarray Gene Expression Data.." Biomedicines, vol. 13, no. 10, 2025.
PMID 41153637

Abstract

Prostate cancer is the most commonly occurring cancer amongst men. The detection and treatment of this cancer is therefore of great importance. The severity level of this cancer, which is established as a score in the Gleason Grading Group (GGC), guides the treatment of the cancer. In this paper, traditional machine learning (ML) classification methods such as Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), and XGBoost (XGB), which have recently been shown to accurately identifying biomarkers for computational biology, are leveraged to find potential biomarkers for the different GGC scores. A ML framework that maps the Gleason Grading Group (GGG) into five severity levels-low, intermediate-low, intermediate, intermediate-high, and high-has been developed using the above methods. The microarray data for this ML method have been derived from immunohistochemical tests. The study includes severity level-wise biomarker identification, incorporating missing value imputation, class imbalance handling using the SMOTE-Tomek link method, and stratified k-fold validation to ensure robust biomarker selection. The framework is evaluated on prostate cancer tissue microarray gene expression data from 1119 samples. A combination of high-aggressive and low-aggressive signatures are used in four experimental setups. The results demonstrate the effectiveness of the approach in distinguishing between critical biomarkers with highly accurate models, obtaining 96.85% accuracy using the XGBoost method. Leveraging ML gives a potential ground to involve the domain experts and the satisfactory results have approved that. For the future physician-in-the-loop approach can be tested to ensure further diagnosis impact.

같은 제1저자의 인용 많은 논문 (1)