Synthetic Lung-cancer Cohorts Generated by a Large Language Model: Epidemiological Validity Assessment.
1/5 보강
Large language models (LLMs) are increasingly used in medicine for clinical reasoning and educational simulation.
APA
Fuentes-Martín Á, Mayol J, et al. (2026). Synthetic Lung-cancer Cohorts Generated by a Large Language Model: Epidemiological Validity Assessment.. Open respiratory archives, 8(1), 100533. https://doi.org/10.1016/j.opresp.2025.100533
MLA
Fuentes-Martín Á, et al.. "Synthetic Lung-cancer Cohorts Generated by a Large Language Model: Epidemiological Validity Assessment.." Open respiratory archives, vol. 8, no. 1, 2026, pp. 100533.
PMID
41624079 ↗
Abstract 한글 요약
Large language models (LLMs) are increasingly used in medicine for clinical reasoning and educational simulation. This study assessed the epidemiological plausibility of a synthetic lung-cancer cohort generated by ChatGPT-4.0. A total of 102 virtual cases were created in Spanish using structured prompts including demographic, histologic, and molecular variables. When descriptively compared with international datasets (GLOBOCAN 2020, SEER, and biomarker meta-analyses), the cohort reproduced general disease patterns but showed statistically significant deviations ( < 0.05): early-stage disease and EGFR-positive tumors were overrepresented, while advanced stages, ALK rearrangements, and extreme PD-L1 values were underrepresented. These discrepancies likely reflect biases in model training data and the probabilistic nature of generative language models. Despite this quantified generative bias, the utility of these cohorts for non-epidemiological tasks like educational simulation is discussed, provided methodological transparency is maintained.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
같은 제1저자의 인용 많은 논문 (1)
📖 전문 본문 읽기 PMC JATS · ~12 KB · 영문
Ethical statement
Ethical statement
This study did not involve real patients or human subjects. Instead, it was based entirely on synthetic data generated through an artificial intelligence model (ChatGPT-4.0), and no identifiable or confidential patient information was used. Therefore, the requirement for informed consent was waived. Nevertheless, the study protocol was reviewed and approved by the Clinical Research Ethics Committee of our institution (Reference: PI-25-146-C), and the project was conducted in accordance with the ethical principles of the Declaration of Helsinki (2013 revision). The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
This study did not involve real patients or human subjects. Instead, it was based entirely on synthetic data generated through an artificial intelligence model (ChatGPT-4.0), and no identifiable or confidential patient information was used. Therefore, the requirement for informed consent was waived. Nevertheless, the study protocol was reviewed and approved by the Clinical Research Ethics Committee of our institution (Reference: PI-25-146-C), and the project was conducted in accordance with the ethical principles of the Declaration of Helsinki (2013 revision). The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Declaration of generative AI and AI-assisted technologies in the writing process
Declaration of generative AI and AI-assisted technologies in the writing process
During the preparation of this work, the authors used the Generative Pre-trained Transformer 4 (ChatGPT-4) not only for grammar review and translation, but also for the structured generation of a synthetic cohort of 102 virtual patients with lung cancer, based on predefined clinical, molecular, and psychosocial parameters. This simulated dataset was used for research purposes within the framework of this study. After using this tool, the authors reviewed, validated, and edited the output as necessary, and take full responsibility for the content of the publication.
During the preparation of this work, the authors used the Generative Pre-trained Transformer 4 (ChatGPT-4) not only for grammar review and translation, but also for the structured generation of a synthetic cohort of 102 virtual patients with lung cancer, based on predefined clinical, molecular, and psychosocial parameters. This simulated dataset was used for research purposes within the framework of this study. After using this tool, the authors reviewed, validated, and edited the output as necessary, and take full responsibility for the content of the publication.
Funding
Funding
This research received no external funding.
This research received no external funding.
Authors’ contributions
Authors’ contributions
All authors contributed substantially to the design of the study, data analysis, manuscript drafting, and critical revision of its content. All authors have read and approved the final version of the manuscript.
All authors contributed substantially to the design of the study, data analysis, manuscript drafting, and critical revision of its content. All authors have read and approved the final version of the manuscript.
Conflicts of interest
Conflicts of interest
The authors declare no conflicts of interest.
The authors declare no conflicts of interest.
Data availability
Data availability
All data used in this study were synthetically generated using the ChatGPT-4.0 model (OpenAI) and do not correspond to real individuals. The full methodology used for data generation is described in the Methods section. Examples of the synthetic cases are provided in the Supplementary Material. No real patient data were accessed or used in this study.
All data used in this study were synthetically generated using the ChatGPT-4.0 model (OpenAI) and do not correspond to real individuals. The full methodology used for data generation is described in the Methods section. Examples of the synthetic cases are provided in the Supplementary Material. No real patient data were accessed or used in this study.
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- DualPG-DTA: A Large Language Model-Powered Graph Neural Network Framework for Enhanced Drug-Target Affinity Prediction and Discovery of Novel CDK9 Inhibitors Exhibiting In Vivo Anti-Leukemia Activity.
- Automating the segmentation, date extraction, and classification of multi-report PDFs in outside medical records using optical character recognition and generative artificial intelligence.
- Privacy-Preserving Generation of Structured Lymphoma Progression Reports from Cross-sectional Imaging: A Comparative Analysis of Llama 3.3 and Llama 4.
- Traditional Cox regression outperforms large language models in predicting long-term progression of intermediate to advanced hepatocellular carcinoma.
- Opportunities and Challenges of Visual Large Language Models in Imaging Diagnostics: Lessons from Brain Metastasis Detection in Clinical MRI.
- Safety and Efficacy of CT-Guided Lung Biopsy in Elderly Patients age 75 Years and Older: A Single-Centre Retrospective Comparative Study.