본문으로 건너뛰기
← 뒤로

Predicting non-alcoholic fatty liver disease (NAFLD) using machine learning algorithms: Evidence from a large-scale community cohort in Taiwan.

Bioscience trends 2026 Vol.20(1) p. 80-90

Lin TC, Wei YJ, Liang PC, Tsai PC, Lin YH, Hsieh MH, Jang TY, Wang CW, Hsieh MY, Lin ZY, Yeh ML, Huang JF, Huang CF, Chuang WL, Yu ML, Dai CY, Shi HY

📝 환자 설명용 한 줄

Closely associated with metabolic disorders, non-alcoholic fatty liver disease (NAFLD) substantially increases the risk of hepatocellular carcinoma.

이 논문을 인용하기

BibTeX ↓ RIS ↓
APA Lin TC, Wei YJ, et al. (2026). Predicting non-alcoholic fatty liver disease (NAFLD) using machine learning algorithms: Evidence from a large-scale community cohort in Taiwan.. Bioscience trends, 20(1), 80-90. https://doi.org/10.5582/bst.2025.01323
MLA Lin TC, et al.. "Predicting non-alcoholic fatty liver disease (NAFLD) using machine learning algorithms: Evidence from a large-scale community cohort in Taiwan.." Bioscience trends, vol. 20, no. 1, 2026, pp. 80-90.
PMID 41765506

Abstract

Closely associated with metabolic disorders, non-alcoholic fatty liver disease (NAFLD) substantially increases the risk of hepatocellular carcinoma. This study aimed to apply machine learning (ML) algorithms to a community-based cohort in southern Taiwan to identify key risk factors for NAFLD and to develop predictive models with clinical applicability. Data were derived from community health examinations, and eighteen clinical and demographic features were analyzed. Five ML algorithms were evaluated: logistic regression (LR), random forest (RF), K-nearest neighbors (KNN), adaptive boosting (AdaBoost), and extreme gradient boosting (XGBoost). Model performance was assessed using accuracy, precision, recall, F1 score, and area under the receiver operating characteristic curve (AUROC). A total of 7,510 participants were included (38.8% male; mean age 50.9 ± 15.0 years). The dataset was randomly divided into training (80%) and testing (20%) subsets, with no significant differences observed between groups in most independent variables. The Synthetic Minority Over-sampling Technique (SMOTE) was employed to balance NAFLD and non-NAFLD groups in the training dataset. Among all models, XGBoost achieved the highest performance, with an accuracy of 83.48%, precision of 84.31%, recall of 81.21%, F1 score of 82.72%, and AUROC of 92.85%. Feature importance analysis identified low-density lipoprotein cholesterol (LDL-C), body mass index (BMI), waist circumference, fasting plasma glucose (FPG), and triglycerides (TG) as the most influential predictors of NAFLD. ML algorithms, particularly XGBoost, demonstrated high accuracy in predicting NAFLD and effectively identified key clinical predictors. These findings may enhance early diagnosis and facilitate the development of targeted intervention strategies in the management of NAFLD.

MeSH Terms

Humans; Non-alcoholic Fatty Liver Disease; Taiwan; Male; Machine Learning; Middle Aged; Female; Cohort Studies; Risk Factors; Adult; Algorithms; ROC Curve; Aged; Logistic Models

같은 제1저자의 인용 많은 논문 (4)