EC2Seq2Sql: Patient-trial matching with LLM agents.
1/5 보강
Timely identification of patients who meet clinical trial eligibility criteria is a persistent bottleneck in trial recruitment because the criteria are written in flexible natural language, while hosp
APA
Yang L, Han Y, et al. (2026). EC2Seq2Sql: Patient-trial matching with LLM agents.. PloS one, 21(2), e0341827. https://doi.org/10.1371/journal.pone.0341827
MLA
Yang L, et al.. "EC2Seq2Sql: Patient-trial matching with LLM agents.." PloS one, vol. 21, no. 2, 2026, pp. e0341827.
PMID
41678526
Abstract
Timely identification of patients who meet clinical trial eligibility criteria is a persistent bottleneck in trial recruitment because the criteria are written in flexible natural language, while hospital EHRs are stored in structured schemas. To bridge this gap, we propose EC2Seq2Sql, an end-to-end, two-stage framework that automatically converts narrative eligibility criteria into executable SQL queries for EHR-based patient screening. In the first stage, a BART-based semantic parser transforms free-text trial criteria into lightweight structured pattern sequences defined over seven common clinical domains. In the second stage, an LLM-based agent, guided by system- and human-designed prompts, grounds these structured patterns to the target database schema and generates syntactically valid and logically coherent SQL statements. We evaluated the framework on the ClinicalTrials.gov eligibility-criteria dataset and further validated it on a de-identified real-world hepatocellular carcinoma EHR cohort from Zhongshan Hospital, Fudan University. The BART parser outperformed representative Seq2Seq baselines, achieving ROUGE_L 0.8067 and BLEU 0.8427, while the SQL generation stage reached an exact-match accuracy of 0.84 and an execution accuracy of 0.91 after SQL normalization. On the real-world cohort, the generated queries achieved a clinical match accuracy of 0.88 after expert review, indicating that the proposed pipeline can retrieve trial-eligible patients from operational EHR data. These results suggest that EC2Seq2Sql can substantially reduce manual screening effort and provide a reproducible path from narrative criteria to database-level cohort identification, although broader multi-center validation and ontology-based normalization will be needed for large-scale deployment.
🏷️ 키워드 / MeSH
같은 제1저자의 인용 많은 논문 (5)
- Generational trends in reproductive factors among women in the US: implications for breast cancer incidence.
- Prediction of high-risk factor in early-stage lung cancer: micropapillary adenocarcinoma.
- Comprehensive treatment for intracranial invasive sinonasal intestinal-type adenocarcinoma with a focus on radiotherapy dosage and immunological combination therapy: A case report.
- Cancer-associated fibroblasts-derived exosomes in colorectal cancer progression: Mechanism and therapeutic opportunities.
- Effects of Preoperative Mindfulness Training Combined With Active Breathing and Circulation Exercises on Pulmonary Function Recovery in Lung Cancer Patients After Lobectomy.