LMSCDA: A Secondary Structure Enhanced Language Model for Predicting CircRNA and Disease Associations.
1/5 보강
Circular RNA (circRNA) is a kind of non-coding RNA widely present in cells.
APA
Lu MS, Wang L, et al. (2026). LMSCDA: A Secondary Structure Enhanced Language Model for Predicting CircRNA and Disease Associations.. IEEE journal of biomedical and health informatics, PP. https://doi.org/10.1109/JBHI.2026.3672861
MLA
Lu MS, et al.. "LMSCDA: A Secondary Structure Enhanced Language Model for Predicting CircRNA and Disease Associations.." IEEE journal of biomedical and health informatics, vol. PP, 2026.
PMID
41805502
Abstract
Circular RNA (circRNA) is a kind of non-coding RNA widely present in cells. CircRNA plays a critical role in the occurrence and treatment of diseases. Unraveling the relationships between circRNAs and diseases has become a focus for diagnosis. While computational methods for predicting circRNA-disease associations (CDA) exist, they often oversimplify the representation of circRNA structures. To address this gap, we propose a novel method LMSCDA, which focuses on enhancing circRNA and disease representation by language model to predict CDAs. Specifically, we first calculate circRNA secondary structure by the chemistry principle. Then we employ a hierarchical feature extraction model to extract the circRNA structure and semantic features and amplify features by attention mechanism. Concurrently disease semantic features encoded utilize the biomedical language model. While behavioral features of circRNA and disease captured from circRNA-miRNA and circRNA-disease networks. We integrate them into comprehensive representation to predict CDAs. LMSCDA achieves an AUC of 0.9877 and an AUPR of 0.9881 in 5-fold cross-validation on the CircR2Disease dataset. Our approach yields demonstrably competitive results when evaluated against prominent existing models. Our case study on breast cancer first validated predictive accuracy of LMSCDA, with 19 of the top 20 circRNA-Breast cancer associations being confirmed by literature evidence. An analysis on independent clinical transcriptomic dataset identified highly differentially expressed circRNA by LMSCDA, pinpointing candidates for future investigation.