본문으로 건너뛰기
← 뒤로

Profile-guided Hybrid Approach for block-wise missing data handling in multi-omics: a breast cancer case study.

BioData mining 2026 Vol.19(1)

Abdelaziz EH, Amin E, Ismail R, Mabrouk M

📝 환자 설명용 한 줄

[BACKGROUND] Block-wise missingness is a common challenge in multi-omics data, hindering the development of robust and generalizable machine learning models, as real-world cohorts rarely contain compl

이 논문을 인용하기

BibTeX ↓ RIS ↓
APA Abdelaziz EH, Amin E, et al. (2026). Profile-guided Hybrid Approach for block-wise missing data handling in multi-omics: a breast cancer case study.. BioData mining, 19(1). https://doi.org/10.1186/s13040-026-00530-8
MLA Abdelaziz EH, et al.. "Profile-guided Hybrid Approach for block-wise missing data handling in multi-omics: a breast cancer case study.." BioData mining, vol. 19, no. 1, 2026.
PMID 41862959

Abstract

[BACKGROUND] Block-wise missingness is a common challenge in multi-omics data, hindering the development of robust and generalizable machine learning models, as real-world cohorts rarely contain complete omic profiles. Many current methods either discard incomplete samples, use available-case models that need retraining when faced with new missingness patterns, or depend on full-dataset imputation, which can risk biological integrity and model stability.

[METHODS] Using a complete four-omics breast cancer dataset (705 patients, 1,937 features), up to 60% block-wise missingness was simulated across five clinically relevant scenarios and used to compare four strategies for handling missing data: an Imputation-Based model, Dynamic and Exhaustive Available-Case approaches, and the proposed Hybrid Approach that combines profile-guided modeling with selective, test-time imputation. Performance was evaluated using accuracy, F1 score, balanced accuracy, inference time, and variability across 15 random seeds, with significance assessed using the Wilcoxon signed-rank test.

[RESULTS] The Hybrid Approach consistently achieved the strongest and most stable performance. Relative to the complete-data baseline, it reached an average accuracy of 103.7%, F1 score of 123.3%, and balanced accuracy of 104.8%, outperforming the Imputation-Based method and matching or exceeding both Dynamic and Exhaustive Available-Case strategies. Statistical testing confirmed that these improvements were significant. The method also demonstrated fast and predictable inference (~ 2 s) and an average total runtime of ~ 49 s per configuration—nearly three times faster than the Exhaustive approach (~ 124 s)—while maintaining high reproducibility and low variance across seeds, a key indicator of computational stability.

[CONCLUSION] By selectively combining lightweight imputation with profile-specific modeling, the Hybrid Approach provides a computationally efficient and statistically robust solution for block-wise missing data. This framework offers a generalizable strategy for multi-omics data mining, and lays the foundation for future systems incorporating cross-profile learning and advanced imputation.