본문으로 건너뛰기
← 뒤로

Sparse robust discriminant analysis for high-dimensional and heavy-tailed data.

1/5 보강
Biometrics 2026 Vol.82(1)
Retraction 확인
출처

Huang W, Mai Q, Zeng J

📝 환자 설명용 한 줄

With advancements in data-collecting techniques, large-scale data have become increasingly prevalent in medical science.

이 논문을 인용하기

BibTeX ↓ RIS ↓
APA Huang W, Mai Q, Zeng J (2026). Sparse robust discriminant analysis for high-dimensional and heavy-tailed data.. Biometrics, 82(1). https://doi.org/10.1093/biomtc/ujag039
MLA Huang W, et al.. "Sparse robust discriminant analysis for high-dimensional and heavy-tailed data.." Biometrics, vol. 82, no. 1, 2026.
PMID 41744040

Abstract

With advancements in data-collecting techniques, large-scale data have become increasingly prevalent in medical science. For instance, gene expression data provide information on tens of thousands of genes, while diagnostic imaging, such as the magnetic resonance imaging, generates a vast volume of pixels. While various sparse linear discriminant analysis methods have been developed to handle high-dimensional medical data, they often assume the light-tailed predictors, which is frequently violated in real applications. In this paper, we propose a robust classifier under an elliptically contoured discriminant analysis (EDA) model, which accommodates both light-tailed and heavy-tailed data. In addition, we assess the prediction accuracy using the balanced rate, a more appropriate metric when the data is imbalanced. Under the EDA model, we identify the intrinsic dimension-reduction subspace that captures all information from predictors for achieving the lowest balanced rate. By leveraging this dimension-reduction subspace, we propose a robust high-dimensional classifier, which reduces data dimensionality through subspace projection, followed by prediction on the reduced data. Theoretically, our proposal simultaneously enjoys the consistencies of subspace estimation, variable selection, and prediction accuracy under only finite fourth-moment condition of predictors. Numerically, we apply our method to synthetic data and three real datasets, including two lung cancer data and a leukemia data. The empirical findings support the superiority of our approach over other state-of-the-art methods.

같은 제1저자의 인용 많은 논문 (5)