Synthetic Lung-cancer Cohorts Generated by a Large Language Model: Epidemiological Validity Assessment.

Fuentes-Martín Á; Mayol J; Segura Méndez B; Cilleruelo-Ramos Á

doi:10.1016/j.opresp.2025.100533

← 뒤로

Synthetic Lung-cancer Cohorts Generated by a Large Language Model: Epidemiological Validity Assessment.

1/5 보강

Open respiratory archives 2026 Vol.8(1) p. 100533

Fuentes-Martín Á, Mayol J, Segura Méndez B, Cilleruelo-Ramos Á

📖 무료 전문 🟢 PMC 전문 PMC12860333

PubMed ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

Large language models (LLMs) are increasingly used in medicine for clinical reasoning and educational simulation.

이 논문을 인용하기

↓ .bib ↓ .ris

APA Fuentes-Martín Á, Mayol J, et al. (2026). Synthetic Lung-cancer Cohorts Generated by a Large Language Model: Epidemiological Validity Assessment.. Open respiratory archives, 8(1), 100533. https://doi.org/10.1016/j.opresp.2025.100533

MLA Fuentes-Martín Á, et al.. "Synthetic Lung-cancer Cohorts Generated by a Large Language Model: Epidemiological Validity Assessment.." Open respiratory archives, vol. 8, no. 1, 2026, pp. 100533.

PMID 41624079 ↗

DOI 10.1016/j.opresp.2025.100533

Abstract

Large language models (LLMs) are increasingly used in medicine for clinical reasoning and educational simulation. This study assessed the epidemiological plausibility of a synthetic lung-cancer cohort generated by ChatGPT-4.0. A total of 102 virtual cases were created in Spanish using structured prompts including demographic, histologic, and molecular variables. When descriptively compared with international datasets (GLOBOCAN 2020, SEER, and biomarker meta-analyses), the cohort reproduced general disease patterns but showed statistically significant deviations ( < 0.05): early-stage disease and EGFR-positive tumors were overrepresented, while advanced stages, ALK rearrangements, and extreme PD-L1 values were underrepresented. These discrepancies likely reflect biases in model training data and the probabilistic nature of generative language models. Despite this quantified generative bias, the utility of these cohorts for non-epidemiological tasks like educational simulation is discussed, provided methodological transparency is maintained.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (1)

Disruption of Radiological Surveillance Following a Global Health Crisis in Resected Lung Cancer.
Thoracic cancer 2026

📖 전문 본문 읽기 PMC JATS · ~12 KB · 영문

Ethical statement

Ethical statement
This study did not involve real patients or human subjects. Instead, it was based entirely on synthetic data generated through an artificial intelligence model (ChatGPT-4.0), and no identifiable or confidential patient information was used. Therefore, the requirement for informed consent was waived. Nevertheless, the study protocol was reviewed and approved by the Clinical Research Ethics Committee of our institution (Reference: PI-25-146-C), and the project was conducted in accordance with the ethical principles of the Declaration of Helsinki (2013 revision). The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Declaration of generative AI and AI-assisted technologies in the writing process

Declaration of generative AI and AI-assisted technologies in the writing process
During the preparation of this work, the authors used the Generative Pre-trained Transformer 4 (ChatGPT-4) not only for grammar review and translation, but also for the structured generation of a synthetic cohort of 102 virtual patients with lung cancer, based on predefined clinical, molecular, and psychosocial parameters. This simulated dataset was used for research purposes within the framework of this study. After using this tool, the authors reviewed, validated, and edited the output as necessary, and take full responsibility for the content of the publication.

Funding

Funding
This research received no external funding.

Authors’ contributions

Authors’ contributions
All authors contributed substantially to the design of the study, data analysis, manuscript drafting, and critical revision of its content. All authors have read and approved the final version of the manuscript.

Conflicts of interest

Conflicts of interest
The authors declare no conflicts of interest.

Data availability

Data availability
All data used in this study were synthetically generated using the ChatGPT-4.0 model (OpenAI) and do not correspond to real individuals. The full methodology used for data generation is described in the Methods section. Examples of the synthetic cases are provided in the Supplementary Material. No real patient data were accessed or used in this study.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

DualPG-DTA: A Large Language Model-Powered Graph Neural Network Framework for Enhanced Drug-Target Affinity Prediction and Discovery of Novel CDK9 Inhibitors Exhibiting In Vivo Anti-Leukemia Activity.
Advanced science (Weinheim, Baden-Wurttemberg, Germany) 2026 Chen Y 외 📖 unpaywall
Automating the segmentation, date extraction, and classification of multi-report PDFs in outside medical records using optical character recognition and generative artificial intelligence.
JAMIA open 2026 Damani S 외 📖 OA
Privacy-Preserving Generation of Structured Lymphoma Progression Reports from Cross-sectional Imaging: A Comparative Analysis of Llama 3.3 and Llama 4.
Journal of imaging informatics in medicine 2026 Prucker P 외 📖 unpaywall
Traditional Cox regression outperforms large language models in predicting long-term progression of intermediate to advanced hepatocellular carcinoma.
Frontiers in oncology 2026 Li K 외 📖 OA
Opportunities and Challenges of Visual Large Language Models in Imaging Diagnostics: Lessons from Brain Metastasis Detection in Clinical MRI.
Diagnostics (Basel, Switzerland) 2026 Nelles C 외 📖 OA
Safety and Efficacy of CT-Guided Lung Biopsy in Elderly Patients age 75 Years and Older: A Single-Centre Retrospective Comparative Study.
Technology in cancer research & treatment 2026 Yoshimura S 외 📖 OA