Accelerating Discovery of Leukemia Inhibitors Using AI-Driven Quantitative Structure-Activity Relationship: Algorithm Development and Validation.
1/5 보강
[BACKGROUND] Leukemia treatment remains a major challenge in oncology.
APA
Kakraba S, Agyemang EF, Shmookler Reis RJ (2026). Accelerating Discovery of Leukemia Inhibitors Using AI-Driven Quantitative Structure-Activity Relationship: Algorithm Development and Validation.. JMIR AI, 5, e81552. https://doi.org/10.2196/81552
MLA
Kakraba S, et al.. "Accelerating Discovery of Leukemia Inhibitors Using AI-Driven Quantitative Structure-Activity Relationship: Algorithm Development and Validation.." JMIR AI, vol. 5, 2026, pp. e81552.
PMID
41358925 ↗
DOI
10.2196/81552
Abstract 한글 요약
[BACKGROUND] Leukemia treatment remains a major challenge in oncology. While thiadiazolidinone analogs show potential to inhibit leukemia cell proliferation, they often lack sufficient potency and selectivity. Traditional drug discovery struggles to efficiently explore the vast chemical landscape, highlighting the need for innovative computational strategies. Machine learning (ML)-enhanced quantitative structure-activity relationship (QSAR) modeling offers a promising route to identify and optimize inhibitors with improved activity and specificity.
[OBJECTIVE] We aimed to develop and validate an integrated ML-enhanced QSAR modeling workflow for the rational design and prediction of thiadiazolidinone analogs with improved antileukemia activity by systematically evaluating molecular descriptors and algorithmic approaches to identify key determinants of potency and guide future inhibitor optimization.
[METHODS] We analyzed 35 thiadiazolidinone derivatives with confirmed antileukemia activity, removing outliers for data quality. Using Schrödinger MAESTRO, we calculated 220 molecular descriptors (1D-4D). Seventeen ML models, including random forests, XGBoost, and neural networks, were trained on 70% of the data and tested on 30%, using stratified random sampling. Model performance was assessed with 12 metrics, including mean squared error (MSE), coefficient of determination (explained variance; R), and Shapley additive explanations (SHAP) values, and optimized via hyperparameter tuning and 5-fold cross-validation. Additional analyses, including train-test gap assessment, comparison to baseline linear models, and cross-validation stability analysis, were performed to assess genuine learning rather than overfitting.
[RESULTS] Isotonic regression ranked first with the lowest test MSE (0.00031 ± 0.00009), outperforming baseline models by over 15% in explained variance. Ensemble methods, especially LightGBM and random forest, also showed superior predictive performance (LightGBM: MSE=0.00063 ± 0.00012; R=0.9709 ± 0.0084). Training-to-test performance degradation of LightGBM was modest (ΔR=-0.01, ΔMSE=+0.000126), suggesting genuine pattern learning rather than memorization. SHAP analysis revealed that the most influential features contributing to antileukemia activity were global molecular shape (r_qp_glob; mean SHAP value=0.52), weighted polar surface area (r_qp_WPSA; ≈0.50), polarizability (r_qp_QPpolrz; ≈0.49), partition coefficient (r_qp_QPlogPC16; ≈0.48), solvent-accessible surface area (r_qp_SASA; ≈0.48), hydrogen bond donor count (r_qp_donorHB; ≈0.48), and the sum of topological distances between oxygen and chlorine atoms (i_desc_Sum_of_topological_distances_between_O.Cl; ≈0.47). These features highlight the importance of steric complementarity and the 3D arrangement of functional groups. Aqueous solubility (r_qp_QPlogS; ≈0.47) and hydrogen bond acceptor count (r_qp_accptHB; ≈0.44) were also among the top 10 features. The significance of these descriptors was consistent across multiple algorithmic models, including random forest, XGBoost, and partial least squares approaches.
[CONCLUSIONS] Integrating advanced ML with QSAR modeling enables systematic analysis of structure-activity relationships in thiadiazolidinone analogs on this dataset. While ensemble methods capture complex patterns with high internal validation metrics, external validation on independent compounds and prospective experimental testing are essential before broad therapeutic claims can be made. This work provides a methodological foundation and identifies molecular features for future validation efforts.
[OBJECTIVE] We aimed to develop and validate an integrated ML-enhanced QSAR modeling workflow for the rational design and prediction of thiadiazolidinone analogs with improved antileukemia activity by systematically evaluating molecular descriptors and algorithmic approaches to identify key determinants of potency and guide future inhibitor optimization.
[METHODS] We analyzed 35 thiadiazolidinone derivatives with confirmed antileukemia activity, removing outliers for data quality. Using Schrödinger MAESTRO, we calculated 220 molecular descriptors (1D-4D). Seventeen ML models, including random forests, XGBoost, and neural networks, were trained on 70% of the data and tested on 30%, using stratified random sampling. Model performance was assessed with 12 metrics, including mean squared error (MSE), coefficient of determination (explained variance; R), and Shapley additive explanations (SHAP) values, and optimized via hyperparameter tuning and 5-fold cross-validation. Additional analyses, including train-test gap assessment, comparison to baseline linear models, and cross-validation stability analysis, were performed to assess genuine learning rather than overfitting.
[RESULTS] Isotonic regression ranked first with the lowest test MSE (0.00031 ± 0.00009), outperforming baseline models by over 15% in explained variance. Ensemble methods, especially LightGBM and random forest, also showed superior predictive performance (LightGBM: MSE=0.00063 ± 0.00012; R=0.9709 ± 0.0084). Training-to-test performance degradation of LightGBM was modest (ΔR=-0.01, ΔMSE=+0.000126), suggesting genuine pattern learning rather than memorization. SHAP analysis revealed that the most influential features contributing to antileukemia activity were global molecular shape (r_qp_glob; mean SHAP value=0.52), weighted polar surface area (r_qp_WPSA; ≈0.50), polarizability (r_qp_QPpolrz; ≈0.49), partition coefficient (r_qp_QPlogPC16; ≈0.48), solvent-accessible surface area (r_qp_SASA; ≈0.48), hydrogen bond donor count (r_qp_donorHB; ≈0.48), and the sum of topological distances between oxygen and chlorine atoms (i_desc_Sum_of_topological_distances_between_O.Cl; ≈0.47). These features highlight the importance of steric complementarity and the 3D arrangement of functional groups. Aqueous solubility (r_qp_QPlogS; ≈0.47) and hydrogen bond acceptor count (r_qp_accptHB; ≈0.44) were also among the top 10 features. The significance of these descriptors was consistent across multiple algorithmic models, including random forest, XGBoost, and partial least squares approaches.
[CONCLUSIONS] Integrating advanced ML with QSAR modeling enables systematic analysis of structure-activity relationships in thiadiazolidinone analogs on this dataset. While ensemble methods capture complex patterns with high internal validation metrics, external validation on independent compounds and prospective experimental testing are essential before broad therapeutic claims can be made. This work provides a methodological foundation and identifies molecular features for future validation efforts.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- From in-silico QSAR modeling to in-vitro MTT assay: experimental validation of novel uPAR leads for triple-negative breast cancer (TNBC) and skin cancer.
- Machine learning-driven computational drug repurposing to identify new tubulin inhibitors against cancer.
- In Silico Development of Novel Quinazoline-Based EGFR Inhibitors via 3D-QSAR, Docking, ADMET, and Molecular Dynamics.
- Selective phosphoinositide 3-kinase inhibitors and implication in diabetic retinopathy as pharmacological tools.
- Integrative lipophilicity assessment and pharmacokinetic correlation of pyrimidine precursors and artesunate-pyrimidine hybrids: development of QSAR models for anticancer activity and interaction with -glycoprotein.
- Computational optimization of MALT1 inhibitors against DLBCL: a QSAR-guided molecular docking and dynamics study.