본문으로 건너뛰기
← 뒤로

Synthetic Lung-cancer Cohorts Generated by a Large Language Model: Epidemiological Validity Assessment.

1/5 보강
Open respiratory archives 2026 Vol.8(1) p. 100533
Retraction 확인
출처

Fuentes-Martín Á, Mayol J, Segura Méndez B, Cilleruelo-Ramos Á

📝 환자 설명용 한 줄

Large language models (LLMs) are increasingly used in medicine for clinical reasoning and educational simulation.

이 논문을 인용하기

↓ .bib ↓ .ris
APA Fuentes-Martín Á, Mayol J, et al. (2026). Synthetic Lung-cancer Cohorts Generated by a Large Language Model: Epidemiological Validity Assessment.. Open respiratory archives, 8(1), 100533. https://doi.org/10.1016/j.opresp.2025.100533
MLA Fuentes-Martín Á, et al.. "Synthetic Lung-cancer Cohorts Generated by a Large Language Model: Epidemiological Validity Assessment.." Open respiratory archives, vol. 8, no. 1, 2026, pp. 100533.
PMID 41624079 ↗

Abstract

Large language models (LLMs) are increasingly used in medicine for clinical reasoning and educational simulation. This study assessed the epidemiological plausibility of a synthetic lung-cancer cohort generated by ChatGPT-4.0. A total of 102 virtual cases were created in Spanish using structured prompts including demographic, histologic, and molecular variables. When descriptively compared with international datasets (GLOBOCAN 2020, SEER, and biomarker meta-analyses), the cohort reproduced general disease patterns but showed statistically significant deviations ( < 0.05): early-stage disease and EGFR-positive tumors were overrepresented, while advanced stages, ALK rearrangements, and extreme PD-L1 values were underrepresented. These discrepancies likely reflect biases in model training data and the probabilistic nature of generative language models. Despite this quantified generative bias, the utility of these cohorts for non-epidemiological tasks like educational simulation is discussed, provided methodological transparency is maintained.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (1)

📖 전문 본문 읽기 PMC JATS · ~12 KB · 영문

Ethical statement

Ethical statement
This study did not involve real patients or human subjects. Instead, it was based entirely on synthetic data generated through an artificial intelligence model (ChatGPT-4.0), and no identifiable or confidential patient information was used. Therefore, the requirement for informed consent was waived. Nevertheless, the study protocol was reviewed and approved by the Clinical Research Ethics Committee of our institution (Reference: PI-25-146-C), and the project was conducted in accordance with the ethical principles of the Declaration of Helsinki (2013 revision). The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Declaration of generative AI and AI-assisted technologies in the writing process

Declaration of generative AI and AI-assisted technologies in the writing process
During the preparation of this work, the authors used the Generative Pre-trained Transformer 4 (ChatGPT-4) not only for grammar review and translation, but also for the structured generation of a synthetic cohort of 102 virtual patients with lung cancer, based on predefined clinical, molecular, and psychosocial parameters. This simulated dataset was used for research purposes within the framework of this study. After using this tool, the authors reviewed, validated, and edited the output as necessary, and take full responsibility for the content of the publication.

Funding

Funding
This research received no external funding.

Authors’ contributions

Authors’ contributions
All authors contributed substantially to the design of the study, data analysis, manuscript drafting, and critical revision of its content. All authors have read and approved the final version of the manuscript.

Conflicts of interest

Conflicts of interest
The authors declare no conflicts of interest.

Data availability

Data availability
All data used in this study were synthetically generated using the ChatGPT-4.0 model (OpenAI) and do not correspond to real individuals. The full methodology used for data generation is described in the Methods section. Examples of the synthetic cases are provided in the Supplementary Material. No real patient data were accessed or used in this study.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기