PSM-SMOTE: propensity score matching and synthetic minority oversampling for handling unbalanced microbiome data.
기술보고
1/5 보강
[BACKGROUND] Predictive models using microbiome data often suffer from covariate imbalance and class imbalance, biasing results.
- 연구 설계 case-control
APA
Moon J, Liu Z, Park T (2025). PSM-SMOTE: propensity score matching and synthetic minority oversampling for handling unbalanced microbiome data.. Genes & genomics, 47(11), 1175-1185. https://doi.org/10.1007/s13258-025-01688-x
MLA
Moon J, et al.. "PSM-SMOTE: propensity score matching and synthetic minority oversampling for handling unbalanced microbiome data.." Genes & genomics, vol. 47, no. 11, 2025, pp. 1175-1185.
PMID
41045399 ↗
Abstract 한글 요약
[BACKGROUND] Predictive models using microbiome data often suffer from covariate imbalance and class imbalance, biasing results. Propensity Score Matching (PSM) balances covariates but reduces sample size, while borderline synthetic minority oversampling technique (borderline-SMOTE) oversamples minority classes but can generate uninformative examples.
[OBJECTIVE] To develop and evaluate PSM-SMOTE, a novel hybrid sampling method that integrates PSM and borderline-SMOTE to handle both covariate and class imbalance in microbiome data.
[METHODS] We developed PSM-SMOTE, a three-step hybrid sampling algorithm for microbiome data: (1) PSM at four caliper levels to balance covariates, (2) selection of at least ten robust differential markers via seven statistical tests with false discovery rate correction, and (3) application of borderline-SMOTE on the marker-based distance matrix to oversample minority classes. We evaluated PSM-SMOTE on three publicly available microbiome case-control datasets: pancreatic ductal adenocarcinoma (PDAC), colorectal cancer (CRC), and obesity, using logistic regression (LR), random forest (RF), and support vector machine (SVM) classifiers. Performance was assessed via area under the ROC curve (AUC).
[RESULTS] PSM-SMOTE improved test AUCs in multiple model-dataset combinations compared with using PSM alone. Notably, for the RF model, PSM-SMOTE consistently enhanced AUC across nearly all oversampling settings in the PDAC and obesity cohorts. For the SVM model, PSM-SMOTE also achieved a significant AUC increase in the CRC cohort. For the LR model, PSM-SMOTE showed modest improvement under strict matching.
[CONCLUSION] PSM-SMOTE effectively addresses dual imbalance in microbiome data and consistently enhances performance, providing a practical solution for imbalanced data analyses.
[OBJECTIVE] To develop and evaluate PSM-SMOTE, a novel hybrid sampling method that integrates PSM and borderline-SMOTE to handle both covariate and class imbalance in microbiome data.
[METHODS] We developed PSM-SMOTE, a three-step hybrid sampling algorithm for microbiome data: (1) PSM at four caliper levels to balance covariates, (2) selection of at least ten robust differential markers via seven statistical tests with false discovery rate correction, and (3) application of borderline-SMOTE on the marker-based distance matrix to oversample minority classes. We evaluated PSM-SMOTE on three publicly available microbiome case-control datasets: pancreatic ductal adenocarcinoma (PDAC), colorectal cancer (CRC), and obesity, using logistic regression (LR), random forest (RF), and support vector machine (SVM) classifiers. Performance was assessed via area under the ROC curve (AUC).
[RESULTS] PSM-SMOTE improved test AUCs in multiple model-dataset combinations compared with using PSM alone. Notably, for the RF model, PSM-SMOTE consistently enhanced AUC across nearly all oversampling settings in the PDAC and obesity cohorts. For the SVM model, PSM-SMOTE also achieved a significant AUC increase in the CRC cohort. For the LR model, PSM-SMOTE showed modest improvement under strict matching.
[CONCLUSION] PSM-SMOTE effectively addresses dual imbalance in microbiome data and consistently enhances performance, providing a practical solution for imbalanced data analyses.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
같은 제1저자의 인용 많은 논문 (5)
- Deep learning-based auto-segmentation and RECIST evaluation after concurrent chemoradiotherapy in locally advanced hepatocellular carcinoma patients.
- Perspectives and trends in gas delivery systems based on ultrasound responsive nanomaterials for cancer therapy.
- Is the Modified Allen's Test a Useful Tool for Evaluating the Vascular Dominance of the Forearm?
- Long-Term Risk of Pancreatic Cancer After Acute Acetylcholinesterase Inhibitor Insecticide Exposure: A Nationwide Cohort Study.
- Case Reports of Teprotumumab as Treatment for Monoclonal Antibody-Induced Thyroid Orbitopathy.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- A Phase I Study of Hydroxychloroquine and Suba-Itraconazole in Men with Biochemical Relapse of Prostate Cancer (HITMAN-PC): Dose Escalation Results.
- Self-management of male urinary symptoms: qualitative findings from a primary care trial.
- Clinical and Liquid Biomarkers of 20-Year Prostate Cancer Risk in Men Aged 45 to 70 Years.
- Diagnostic accuracy of Ga-PSMA PET/CT versus multiparametric MRI for preoperative pelvic invasion in the patients with prostate cancer.
- Clinical Presentation and Outcomes of Patients Undergoing Surgery for Thyroid Cancer.
- Association of patient health education with the postoperative health related quality of life in low- intermediate recurrence risk differentiated thyroid cancer patients.