본문으로 건너뛰기
← 뒤로

EC2Seq2Sql: Patient-trial matching with LLM agents.

1/5 보강
PloS one 📖 저널 OA 99.7% 2026 Vol.21(2) p. e0341827
Retraction 확인
출처

Yang L, Han Y, Liu L, Jiang X, Li Y, Huang J, Su Q

📝 환자 설명용 한 줄

Timely identification of patients who meet clinical trial eligibility criteria is a persistent bottleneck in trial recruitment because the criteria are written in flexible natural language, while hosp

이 논문을 인용하기

↓ .bib ↓ .ris
APA Yang L, Han Y, et al. (2026). EC2Seq2Sql: Patient-trial matching with LLM agents.. PloS one, 21(2), e0341827. https://doi.org/10.1371/journal.pone.0341827
MLA Yang L, et al.. "EC2Seq2Sql: Patient-trial matching with LLM agents.." PloS one, vol. 21, no. 2, 2026, pp. e0341827.
PMID 41678526

Abstract

Timely identification of patients who meet clinical trial eligibility criteria is a persistent bottleneck in trial recruitment because the criteria are written in flexible natural language, while hospital EHRs are stored in structured schemas. To bridge this gap, we propose EC2Seq2Sql, an end-to-end, two-stage framework that automatically converts narrative eligibility criteria into executable SQL queries for EHR-based patient screening. In the first stage, a BART-based semantic parser transforms free-text trial criteria into lightweight structured pattern sequences defined over seven common clinical domains. In the second stage, an LLM-based agent, guided by system- and human-designed prompts, grounds these structured patterns to the target database schema and generates syntactically valid and logically coherent SQL statements. We evaluated the framework on the ClinicalTrials.gov eligibility-criteria dataset and further validated it on a de-identified real-world hepatocellular carcinoma EHR cohort from Zhongshan Hospital, Fudan University. The BART parser outperformed representative Seq2Seq baselines, achieving ROUGE_L 0.8067 and BLEU 0.8427, while the SQL generation stage reached an exact-match accuracy of 0.84 and an execution accuracy of 0.91 after SQL normalization. On the real-world cohort, the generated queries achieved a clinical match accuracy of 0.88 after expert review, indicating that the proposed pipeline can retrieve trial-eligible patients from operational EHR data. These results suggest that EC2Seq2Sql can substantially reduce manual screening effort and provide a reproducible path from narrative criteria to database-level cohort identification, although broader multi-center validation and ontology-based normalization will be needed for large-scale deployment.

🏷️ 키워드 / MeSH

같은 제1저자의 인용 많은 논문 (5)

🟢 PMC 전문 열기