본문으로 건너뛰기
← 뒤로

PSM-SMOTE: propensity score matching and synthetic minority oversampling for handling unbalanced microbiome data.

기술보고 1/5 보강
Genes & genomics 📖 저널 OA 7.4% 2022: 0/1 OA 2023: 0/1 OA 2024: 0/1 OA 2025: 1/11 OA 2026: 1/13 OA 2022~2026 2025 Vol.47(11) p. 1175-1185
Retraction 확인
출처

Moon J, Liu Z, Park T

📝 환자 설명용 한 줄

[BACKGROUND] Predictive models using microbiome data often suffer from covariate imbalance and class imbalance, biasing results.

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)
  • 연구 설계 case-control

이 논문을 인용하기

↓ .bib ↓ .ris
APA Moon J, Liu Z, Park T (2025). PSM-SMOTE: propensity score matching and synthetic minority oversampling for handling unbalanced microbiome data.. Genes & genomics, 47(11), 1175-1185. https://doi.org/10.1007/s13258-025-01688-x
MLA Moon J, et al.. "PSM-SMOTE: propensity score matching and synthetic minority oversampling for handling unbalanced microbiome data.." Genes & genomics, vol. 47, no. 11, 2025, pp. 1175-1185.
PMID 41045399 ↗

Abstract

[BACKGROUND] Predictive models using microbiome data often suffer from covariate imbalance and class imbalance, biasing results. Propensity Score Matching (PSM) balances covariates but reduces sample size, while borderline synthetic minority oversampling technique (borderline-SMOTE) oversamples minority classes but can generate uninformative examples.

[OBJECTIVE] To develop and evaluate PSM-SMOTE, a novel hybrid sampling method that integrates PSM and borderline-SMOTE to handle both covariate and class imbalance in microbiome data.

[METHODS] We developed PSM-SMOTE, a three-step hybrid sampling algorithm for microbiome data: (1) PSM at four caliper levels to balance covariates, (2) selection of at least ten robust differential markers via seven statistical tests with false discovery rate correction, and (3) application of borderline-SMOTE on the marker-based distance matrix to oversample minority classes. We evaluated PSM-SMOTE on three publicly available microbiome case-control datasets: pancreatic ductal adenocarcinoma (PDAC), colorectal cancer (CRC), and obesity, using logistic regression (LR), random forest (RF), and support vector machine (SVM) classifiers. Performance was assessed via area under the ROC curve (AUC).

[RESULTS] PSM-SMOTE improved test AUCs in multiple model-dataset combinations compared with using PSM alone. Notably, for the RF model, PSM-SMOTE consistently enhanced AUC across nearly all oversampling settings in the PDAC and obesity cohorts. For the SVM model, PSM-SMOTE also achieved a significant AUC increase in the CRC cohort. For the LR model, PSM-SMOTE showed modest improvement under strict matching.

[CONCLUSION] PSM-SMOTE effectively addresses dual imbalance in microbiome data and consistently enhances performance, providing a practical solution for imbalanced data analyses.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반