Evaluation of an Artificial Intelligence Defined Lung Nodule Malignancy Score in Incidental Pulmonary Nodules: The CREATE Study.
1/5 보강
PICO 자동 추출 (휴리스틱, conf 2/4)
유사 논문P · Population 대상 환자/모집단
712 participants (high-risk: 498 and low-risk: 214) between April 1, 2023, and December 31, 2024.
I · Intervention 중재 / 시술
추출되지 않음
C · Comparison 대조 / 비교
추출되지 않음
O · Outcome 결과 / 결론
[CONCLUSION] The results demonstrate the potential of qXR-LNMS in predicting benign and malignant IPN on CXR, thereby supporting lung cancer screening, particularly in resource-limited settings, although further validation is needed. [TRIALS REGISTRATION] clinicaltrials.gov Identifier: NCT05817110.
[OBJECTIVE] To evaluate the effectiveness of the artificial intelligence-based qXR lung nodule malignancy score (qXR-LNMS) in detecting high-risk incidental pulmonary nodules (IPNs) on chest X-rays (C
- 95% CI 49.8-58.5
APA
Koksal D, Govindarajan A, et al. (2026). Evaluation of an Artificial Intelligence Defined Lung Nodule Malignancy Score in Incidental Pulmonary Nodules: The CREATE Study.. Mayo Clinic proceedings. Digital health, 4(1), 100335. https://doi.org/10.1016/j.mcpdig.2026.100335
MLA
Koksal D, et al.. "Evaluation of an Artificial Intelligence Defined Lung Nodule Malignancy Score in Incidental Pulmonary Nodules: The CREATE Study.." Mayo Clinic proceedings. Digital health, vol. 4, no. 1, 2026, pp. 100335.
PMID
41716936 ↗
Abstract 한글 요약
[OBJECTIVE] To evaluate the effectiveness of the artificial intelligence-based qXR lung nodule malignancy score (qXR-LNMS) in detecting high-risk incidental pulmonary nodules (IPNs) on chest X-rays (CXRs).
[PATIENTS AND METHODS] The CREATE (NCT05817110), a prospective, observational study for participants aged 35 years or older with IPN (size, ≥8 to ≤30 mm) on CXR, enrolled 712 participants (high-risk: 498 and low-risk: 214) between April 1, 2023, and December 31, 2024. Participants were flagged by the Food and Drug Administration-cleared qXR detection algorithm and confirmed by radiologists. Threshold for success was set at 20% for positive predictive value (PPV) and 70% for negative predictive value (NPV). The primary and secondary outcomes included PPV and NPV of qXR-LNMS against the risk of malignancy assessed by radiologists using low-dose computed tomography (LDCT) and binarized risk categories based on Lung-RADS score and Mayo Clinic model and PPVs and NPVs by clinicodemographic characteristics with 95% CIs using Wilson score method.
[RESULTS] Overall, the PPV and the NPV of qXR-LNMS risk prediction against radiologists' assessment on LDCT were 54.2% (95% CI, 49.8-58.5) and 93.5% (95% CI, 89.3-96.1), respectively. The agreement between Mayo Clinic model and qXR-LNMS was observed in 70.6% participants (Spearman correlation, 0.247). Results across key subgroups were consistent with all PPV and NPV point estimates crossing the prespecified threshold.
[CONCLUSION] The results demonstrate the potential of qXR-LNMS in predicting benign and malignant IPN on CXR, thereby supporting lung cancer screening, particularly in resource-limited settings, although further validation is needed.
[TRIALS REGISTRATION] clinicaltrials.gov Identifier: NCT05817110.
[PATIENTS AND METHODS] The CREATE (NCT05817110), a prospective, observational study for participants aged 35 years or older with IPN (size, ≥8 to ≤30 mm) on CXR, enrolled 712 participants (high-risk: 498 and low-risk: 214) between April 1, 2023, and December 31, 2024. Participants were flagged by the Food and Drug Administration-cleared qXR detection algorithm and confirmed by radiologists. Threshold for success was set at 20% for positive predictive value (PPV) and 70% for negative predictive value (NPV). The primary and secondary outcomes included PPV and NPV of qXR-LNMS against the risk of malignancy assessed by radiologists using low-dose computed tomography (LDCT) and binarized risk categories based on Lung-RADS score and Mayo Clinic model and PPVs and NPVs by clinicodemographic characteristics with 95% CIs using Wilson score method.
[RESULTS] Overall, the PPV and the NPV of qXR-LNMS risk prediction against radiologists' assessment on LDCT were 54.2% (95% CI, 49.8-58.5) and 93.5% (95% CI, 89.3-96.1), respectively. The agreement between Mayo Clinic model and qXR-LNMS was observed in 70.6% participants (Spearman correlation, 0.247). Results across key subgroups were consistent with all PPV and NPV point estimates crossing the prespecified threshold.
[CONCLUSION] The results demonstrate the potential of qXR-LNMS in predicting benign and malignant IPN on CXR, thereby supporting lung cancer screening, particularly in resource-limited settings, although further validation is needed.
[TRIALS REGISTRATION] clinicaltrials.gov Identifier: NCT05817110.
같은 제1저자의 인용 많은 논문 (1)
📖 전문 본문 읽기 PMC JATS · ~43 KB · 영문
Patients and Methods
Patients and Methods
Study Design
The CREATE (Cohort Study to Validate Effectiveness of an Artificial Intelligence defined Lung Nodule Malignancy Score in Patients with Pulmonary Nodule; NCT05817110), a multicenter, prospective, observational study enrolled individuals with IPNs on CXR identified by qXR (AI) and confirmed by radiologist, from 23 sites across Egypt, India, Indonesia, Mexico, and Turkey between April 2023 and December 2024. A total of 185,700 anonymous CXRs from individuals visiting the site for any clinical reason were evaluated for IPNs. Of these, 15,100 scans were flagged by qXR for the presence of at least nodule and referred to a site radiologist for independent assessment to confirm the presence of a nodule or classify the detection as a false positive. Among these, 9577 CXRs were confirmed by radiologists as demonstrating IPNs. CXRs with radiologist-confirmed nodules were considered for eligibility screening. All individuals with radiologist-confirmed nodules were contacted for potential inclusion. A total of 716 individuals from the prescreened cohort provided informed consent to participate in the study.
Radiologists were only involved in confirming AI-flagged findings, ensuring consecutive and unbiased inclusion. A detailed study workflow is depicted in Supplemental Figure 1 (available online at https://www.mcpdigitalhealth.org/). The LNMS algorithm provided a continuous malignancy score ranging from 0 to 100 for each detected nodule in the individual. The highest score among all nodules in the image was assigned as the malignancy score for the individual. This continuous score was then binarized into high-risk or low-risk categories using a predefined threshold calibrated on internal datasets of biopsy-confirmed nodules.
The study protocol was approved by the local ethics committee/institutional review board/independent ethics committee/country-specific regulatory authority before the commencement of study recruitment at any center. All participants provided written informed consent before study entry. The study adhered to the Declaration of Helsinki, Good Clinical Practice guidelines, and local regulations on observational studies. The study has 2 phases with a study duration of approximately 30 months from enrollment for each participant. We report the results of phase 1 of the study as per Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) checklist.16
Study Population and Data Collection
Adults aged 35 years or older diagnosed with IPN on CXR by qXR, confirmed by a site radiologist with a nodule size of 8 or greater and 30 mm or less, were enrolled. Individuals with contraindications to CT findings, nondigital CXRs, or LDCT performed >6 months after CXR, a prior diagnosis of lung cancer, and CXR referrals for suspected lung cancer were excluded.
In phase 1, the IPNs on CXRs detected by qXR were categorized as high or low-risk. The consented participants underwent LDCT within 6 months of study enrollment. LDCT findings were assessed using the Lung-RADS system17 by 2 radiologists blinded to qXR-LNMS risk analysis. They assigned risk of malignancy for each CT scan; a Lung-RADS score of 4A or higher was considered indicative of high-risk for malignancy. Disagreements in binary risk assessments were resolved by a third radiologist. Only nodules detected on CXR by qXR and confirmed by CXR reading radiologist were included in the outcome analysis, whereas additional nodules found on CT were documented but excluded from outcome analysis. Risk of malignancy of IPN on CXRs were also assessed using the Mayo Clinic model risk calculator.18 Further, CT scans were interpreted by radiologists using Likert scale—1, nonmalignant; 2, probably nonmalignant; 3, uncertain; 4, probably malignant; and 5, malignant. A total of 42 radiologists participated, of whom 28 were general radiologists with an average of 5 years of clinical experience, and 14 had expertise in thoracic imaging with an average of 10 years of experience.
Data collected from participant’s medical records included the following: sociodemographic characteristics (age, sex, ethnicity, geographical location, and smoking status), medical history (a lung disease history, diagnosis and characteristics, and nodule characteristics), concomitant medications, nodule characteristics (number, type, size, homogeneity, spiculation, and calcification), qXR-LNMS interpretation, and CT report with Lung-RADS, and Mayo Clinic model risk score.
In the ongoing phase 2, participants will be followed for up to 2 years from the date of the first CT scan. Data related to available investigations (CT scan, positron emission tomography, biopsy, lung function tests, histology, biomarkers, and blood tests), lung cancer diagnosis and staging (if applicable), performance status (if available and applicable), lung conditions like tuberculosis diagnosis (acid-fast bacillus test/GeneXpert, if available), and other relevant clinical data will be collected during these visits.
Study Outcomes
Primary outcomes included the PPV and the NPV of qXR-LNMS categorized as high-risk LNMS or low-risk LNMS, against the reference standard of malignancy risk as assessed by radiologists on LDCT using Lung-RADS. The PPV was calculated as the proportion of participants identified as high-risk by both qXR and the reference standard (true positive) of the total number of high-risk participants according to qXR-LNMS (PPV = true positive/[true positive + false positive]). The NPV was calculated as the proportion of participants identified as low-risk by both qXR and the reference standard (true negative) of the total number of low-risk participants according to qXR-LNMS (NPV = true negative/[true negative + false negative]).
Secondary outcomes were agreement level between qXR-LNMS and alternate malignancy risk scoring models (Mayo Clinic model on CXRs and radiologists’ Likert scale on CT scan). Other secondary outcomes included PPV and NPV assessed by demographic and clinical characteristics of individuals with high- and low-risk qXR-LNMS with their probability of malignancy on CT scan.
Statistical Analysis
Considering a threshold PPV of 20% with a 5% margin of error and an NPV of 70% with a 10% margin of error with a power of 80%, a sample size of 500 high-risk LNMS cases and 200 low-risk LNMS cases was deemed to be adequate. These thresholds were used for sample size estimation purposes only and were not intended as clinical cutoffs. The PPV reflects expected diagnostic yield enrichment, whereas the NPV serves to characterize residual malignancy risk among low-risk cases, not for clinical rule out. Data analysis was performed using R software (version 4.3.2; R Foundation for Statistical Computing). Categorical variables were expressed as counts and percentages with 95% CIs, and continuous variables were summarized using medians and range values. The modified Wilson method was applied to calculate 95% CIs. Point biserial correlation between LNMS (high/low) and Mayo score was computed using Spearman method.
Study Design
The CREATE (Cohort Study to Validate Effectiveness of an Artificial Intelligence defined Lung Nodule Malignancy Score in Patients with Pulmonary Nodule; NCT05817110), a multicenter, prospective, observational study enrolled individuals with IPNs on CXR identified by qXR (AI) and confirmed by radiologist, from 23 sites across Egypt, India, Indonesia, Mexico, and Turkey between April 2023 and December 2024. A total of 185,700 anonymous CXRs from individuals visiting the site for any clinical reason were evaluated for IPNs. Of these, 15,100 scans were flagged by qXR for the presence of at least nodule and referred to a site radiologist for independent assessment to confirm the presence of a nodule or classify the detection as a false positive. Among these, 9577 CXRs were confirmed by radiologists as demonstrating IPNs. CXRs with radiologist-confirmed nodules were considered for eligibility screening. All individuals with radiologist-confirmed nodules were contacted for potential inclusion. A total of 716 individuals from the prescreened cohort provided informed consent to participate in the study.
Radiologists were only involved in confirming AI-flagged findings, ensuring consecutive and unbiased inclusion. A detailed study workflow is depicted in Supplemental Figure 1 (available online at https://www.mcpdigitalhealth.org/). The LNMS algorithm provided a continuous malignancy score ranging from 0 to 100 for each detected nodule in the individual. The highest score among all nodules in the image was assigned as the malignancy score for the individual. This continuous score was then binarized into high-risk or low-risk categories using a predefined threshold calibrated on internal datasets of biopsy-confirmed nodules.
The study protocol was approved by the local ethics committee/institutional review board/independent ethics committee/country-specific regulatory authority before the commencement of study recruitment at any center. All participants provided written informed consent before study entry. The study adhered to the Declaration of Helsinki, Good Clinical Practice guidelines, and local regulations on observational studies. The study has 2 phases with a study duration of approximately 30 months from enrollment for each participant. We report the results of phase 1 of the study as per Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) checklist.16
Study Population and Data Collection
Adults aged 35 years or older diagnosed with IPN on CXR by qXR, confirmed by a site radiologist with a nodule size of 8 or greater and 30 mm or less, were enrolled. Individuals with contraindications to CT findings, nondigital CXRs, or LDCT performed >6 months after CXR, a prior diagnosis of lung cancer, and CXR referrals for suspected lung cancer were excluded.
In phase 1, the IPNs on CXRs detected by qXR were categorized as high or low-risk. The consented participants underwent LDCT within 6 months of study enrollment. LDCT findings were assessed using the Lung-RADS system17 by 2 radiologists blinded to qXR-LNMS risk analysis. They assigned risk of malignancy for each CT scan; a Lung-RADS score of 4A or higher was considered indicative of high-risk for malignancy. Disagreements in binary risk assessments were resolved by a third radiologist. Only nodules detected on CXR by qXR and confirmed by CXR reading radiologist were included in the outcome analysis, whereas additional nodules found on CT were documented but excluded from outcome analysis. Risk of malignancy of IPN on CXRs were also assessed using the Mayo Clinic model risk calculator.18 Further, CT scans were interpreted by radiologists using Likert scale—1, nonmalignant; 2, probably nonmalignant; 3, uncertain; 4, probably malignant; and 5, malignant. A total of 42 radiologists participated, of whom 28 were general radiologists with an average of 5 years of clinical experience, and 14 had expertise in thoracic imaging with an average of 10 years of experience.
Data collected from participant’s medical records included the following: sociodemographic characteristics (age, sex, ethnicity, geographical location, and smoking status), medical history (a lung disease history, diagnosis and characteristics, and nodule characteristics), concomitant medications, nodule characteristics (number, type, size, homogeneity, spiculation, and calcification), qXR-LNMS interpretation, and CT report with Lung-RADS, and Mayo Clinic model risk score.
In the ongoing phase 2, participants will be followed for up to 2 years from the date of the first CT scan. Data related to available investigations (CT scan, positron emission tomography, biopsy, lung function tests, histology, biomarkers, and blood tests), lung cancer diagnosis and staging (if applicable), performance status (if available and applicable), lung conditions like tuberculosis diagnosis (acid-fast bacillus test/GeneXpert, if available), and other relevant clinical data will be collected during these visits.
Study Outcomes
Primary outcomes included the PPV and the NPV of qXR-LNMS categorized as high-risk LNMS or low-risk LNMS, against the reference standard of malignancy risk as assessed by radiologists on LDCT using Lung-RADS. The PPV was calculated as the proportion of participants identified as high-risk by both qXR and the reference standard (true positive) of the total number of high-risk participants according to qXR-LNMS (PPV = true positive/[true positive + false positive]). The NPV was calculated as the proportion of participants identified as low-risk by both qXR and the reference standard (true negative) of the total number of low-risk participants according to qXR-LNMS (NPV = true negative/[true negative + false negative]).
Secondary outcomes were agreement level between qXR-LNMS and alternate malignancy risk scoring models (Mayo Clinic model on CXRs and radiologists’ Likert scale on CT scan). Other secondary outcomes included PPV and NPV assessed by demographic and clinical characteristics of individuals with high- and low-risk qXR-LNMS with their probability of malignancy on CT scan.
Statistical Analysis
Considering a threshold PPV of 20% with a 5% margin of error and an NPV of 70% with a 10% margin of error with a power of 80%, a sample size of 500 high-risk LNMS cases and 200 low-risk LNMS cases was deemed to be adequate. These thresholds were used for sample size estimation purposes only and were not intended as clinical cutoffs. The PPV reflects expected diagnostic yield enrichment, whereas the NPV serves to characterize residual malignancy risk among low-risk cases, not for clinical rule out. Data analysis was performed using R software (version 4.3.2; R Foundation for Statistical Computing). Categorical variables were expressed as counts and percentages with 95% CIs, and continuous variables were summarized using medians and range values. The modified Wilson method was applied to calculate 95% CIs. Point biserial correlation between LNMS (high/low) and Mayo score was computed using Spearman method.
Results
Results
Baseline and Clinical Characteristics
Of 716 screened individuals, 712 were enrolled: 498 with high-risk LNMS and 214 with low-risk LNMS (Figure 1). Of 4 excluded participants, 3 had LDCT done more than 6 months after the qXR-enabled CXR, and 1 participant was younger than 35 years. The median age was 59 years (range, 35-89 years) with 61.2% (n=436) participants aged 55 years or older; 63.1% (n=449) were males. Majority (97.2%, n=692) did not report a family history of lung cancer. Overall, 71.6% (n=510) were never smokers. Among current/former smokers (28.4%, n=202), the median duration of smoking was 24 years (range, 0-60 years). Of 29.8% (n=212) participants with a history of concomitant disease, 6.6% (n=47) had diabetes, 2.5% (n=18) had chronic obstructive pulmonary disease, 1.4% (n=10) had emphysema, and 1.1% (n=8) had interstitial pulmonary fibrosis.
Most participants (68.0%, n=484) had solitary nodules, and 32.0% (n=228) had multiple nodules; 72.6% (n=517) had solid nodules. Nodules were homogeneous, were calcified, and had spiculations in 49.7% (n=354), 18.5% (n=132), and 28.9% (n=206) of the participants, respectively (Table 1; Supplemental Table 1).
qXR-LNMS Performance Metrics
In total, the agreement between qXR-LNMS and radiologist panel was observed for 270 of 498 cases with high-risk LNMS and 200 of 214 cases with low-risk LNMS. The qXR-LNMS system exhibited an overall PPV of 54.2% (95% CI, 49.8-58.5) and NPV of 93.5% (95% CI, 89.3-96.1) when benchmarked against the radiologist panel’s binary decisions based on the Lung-RADS score as the reference standard (Table 2).
The predictive performance of qXR-LNMS varied across different demographic groups with overlapping CI (Figure 2). For males, the PPV and NPV were 57.4% (95% CI, 51.9%-62.7%) and 93.8% (95% CI, 88.3%-96.8%), respectively. For females, the PPV and NPV were 48.6% (95% CI, 41.4%-55.9%) and 92.9% (95% CI, 85.3%-96.7%), respectively. Individuals aged 55 years or older reported a PPV and NPV of 60.1% (95% CI, 54.5%-65.5%) and 92.9% (95% CI, 87.4%-96.1%), whereas those younger than 55 years had a PPV of 45.5% (95% CI, 38.8%-52.4%) and NPV of 94.6% (95% CI, 86.9%-97.9%), respectively. Participants with solitary nodules had a PPV and NPV of 49.6% (95% CI, 44.3%-54.8%) and 92.2% (95% CI, 86.6%-95.6%), and those with multiple nodules exhibited a PPV and NPV of 64.5% (95% CI, 56.7%-71.6%) and 95.9% (95% CI, 88.6%-98.6%), respectively, with CI overlapping across categories for both PPV and NPV (Figure 3).
qXR-LNMS and Alternative Malignancy Risk Scoring Models
Mayo Clinic Model
Mayo score, used to assess malignancy risk of nodule on CXR, ranged from 0.7 to 95.6, with a median of 13.8. (Supplemental Figure 2, available online at https://www.mcpdigitalhealth.org/). In the high-risk LNMS and low-risk LNMS groups, the median Mayo scores were 17.4 (range, 0.7-94.4) and 8.6 (range, 1.2-95.6), respectively. The Spearman correlation coefficient between LNMS groups and Mayo score was 0.247. When Mayo score was binarized at a 2% threshold, the agreement between Mayo Clinic model and qXR-LNMS was observed in 70.6% (high-risk LNMS group: 487 and low-risk LNMS group: 16) participants (Supplemental Table 2, available online at https://www.mcpdigitalhealth.org/). The Mayo Clinic model demonstrated an accuracy of 42.8%, correctly categorizing 281 of 284 high-risk and 24 of 428 low-risk individuals according to the reference standard (Supplemental Table 3, available online at https://www.mcpdigitalhealth.org/). Of 470 (high-risk: 270 and low-risk: 200) participants, where both the qXR-LNMS and the reference standard agreed, Mayo score disagreed in 187 (39.8%) participants. For the 305 participants where Mayo score and the reference standard agreed, there was a disagreement in 22 participants (7.2%) by qXR-LNMS (Supplemental Figure 3, available online at https://www.mcpdigitalhealth.org/).
Radiologist Assessment
On the Likert scale, 59.6% (297/498) high-risk LNMS participants were classified by radiologists into 1 of the following categories: malignant, probably malignant, uncertain, or probably nonmalignant. In the low-risk LNMS group, 81.8% (175/214) participants were classified as nonmalignant (Supplemental Figure 4, available online at https://www.mcpdigitalhealth.org/).
Baseline and Clinical Characteristics
Of 716 screened individuals, 712 were enrolled: 498 with high-risk LNMS and 214 with low-risk LNMS (Figure 1). Of 4 excluded participants, 3 had LDCT done more than 6 months after the qXR-enabled CXR, and 1 participant was younger than 35 years. The median age was 59 years (range, 35-89 years) with 61.2% (n=436) participants aged 55 years or older; 63.1% (n=449) were males. Majority (97.2%, n=692) did not report a family history of lung cancer. Overall, 71.6% (n=510) were never smokers. Among current/former smokers (28.4%, n=202), the median duration of smoking was 24 years (range, 0-60 years). Of 29.8% (n=212) participants with a history of concomitant disease, 6.6% (n=47) had diabetes, 2.5% (n=18) had chronic obstructive pulmonary disease, 1.4% (n=10) had emphysema, and 1.1% (n=8) had interstitial pulmonary fibrosis.
Most participants (68.0%, n=484) had solitary nodules, and 32.0% (n=228) had multiple nodules; 72.6% (n=517) had solid nodules. Nodules were homogeneous, were calcified, and had spiculations in 49.7% (n=354), 18.5% (n=132), and 28.9% (n=206) of the participants, respectively (Table 1; Supplemental Table 1).
qXR-LNMS Performance Metrics
In total, the agreement between qXR-LNMS and radiologist panel was observed for 270 of 498 cases with high-risk LNMS and 200 of 214 cases with low-risk LNMS. The qXR-LNMS system exhibited an overall PPV of 54.2% (95% CI, 49.8-58.5) and NPV of 93.5% (95% CI, 89.3-96.1) when benchmarked against the radiologist panel’s binary decisions based on the Lung-RADS score as the reference standard (Table 2).
The predictive performance of qXR-LNMS varied across different demographic groups with overlapping CI (Figure 2). For males, the PPV and NPV were 57.4% (95% CI, 51.9%-62.7%) and 93.8% (95% CI, 88.3%-96.8%), respectively. For females, the PPV and NPV were 48.6% (95% CI, 41.4%-55.9%) and 92.9% (95% CI, 85.3%-96.7%), respectively. Individuals aged 55 years or older reported a PPV and NPV of 60.1% (95% CI, 54.5%-65.5%) and 92.9% (95% CI, 87.4%-96.1%), whereas those younger than 55 years had a PPV of 45.5% (95% CI, 38.8%-52.4%) and NPV of 94.6% (95% CI, 86.9%-97.9%), respectively. Participants with solitary nodules had a PPV and NPV of 49.6% (95% CI, 44.3%-54.8%) and 92.2% (95% CI, 86.6%-95.6%), and those with multiple nodules exhibited a PPV and NPV of 64.5% (95% CI, 56.7%-71.6%) and 95.9% (95% CI, 88.6%-98.6%), respectively, with CI overlapping across categories for both PPV and NPV (Figure 3).
qXR-LNMS and Alternative Malignancy Risk Scoring Models
Mayo Clinic Model
Mayo score, used to assess malignancy risk of nodule on CXR, ranged from 0.7 to 95.6, with a median of 13.8. (Supplemental Figure 2, available online at https://www.mcpdigitalhealth.org/). In the high-risk LNMS and low-risk LNMS groups, the median Mayo scores were 17.4 (range, 0.7-94.4) and 8.6 (range, 1.2-95.6), respectively. The Spearman correlation coefficient between LNMS groups and Mayo score was 0.247. When Mayo score was binarized at a 2% threshold, the agreement between Mayo Clinic model and qXR-LNMS was observed in 70.6% (high-risk LNMS group: 487 and low-risk LNMS group: 16) participants (Supplemental Table 2, available online at https://www.mcpdigitalhealth.org/). The Mayo Clinic model demonstrated an accuracy of 42.8%, correctly categorizing 281 of 284 high-risk and 24 of 428 low-risk individuals according to the reference standard (Supplemental Table 3, available online at https://www.mcpdigitalhealth.org/). Of 470 (high-risk: 270 and low-risk: 200) participants, where both the qXR-LNMS and the reference standard agreed, Mayo score disagreed in 187 (39.8%) participants. For the 305 participants where Mayo score and the reference standard agreed, there was a disagreement in 22 participants (7.2%) by qXR-LNMS (Supplemental Figure 3, available online at https://www.mcpdigitalhealth.org/).
Radiologist Assessment
On the Likert scale, 59.6% (297/498) high-risk LNMS participants were classified by radiologists into 1 of the following categories: malignant, probably malignant, uncertain, or probably nonmalignant. In the low-risk LNMS group, 81.8% (175/214) participants were classified as nonmalignant (Supplemental Figure 4, available online at https://www.mcpdigitalhealth.org/).
Discussion
Discussion
The initial data from the CREATE study provide insights about real-world utility of AI-enabled triage of IPN for follow-up. We report the predictive performance of qXR-LNMS, a proprietary malignancy risk scoring component of the qXR system,14 to stratify nodules as per the risk of malignancy, with an overall PPV of 54.2% and NPV of 93.5%. The study results crossed the predefined thresholds of 20% and 70% for the PPV and NPV, which were selected as precision targets for sample size estimation rather than clinical decision cutoffs. The 20% PPV threshold reflects a conservative estimate based on expected lower lung cancer prevalence in real-world, low-screening settings, compared with that in higher-risk cohorts like National Lung Screening Trial.19 The NPV threshold of 70% was intended to describe residual risk among AI-flagged low-risk cases. All participants, regardless of AI risk categorization, continued standard radiologist review and clinical follow-up.
The initial validation study of the qXR algorithm demonstrated a high area under the curve (AUC = 0.99), sensitivity of 1.0, and specificity of 0.90 (range, 0.87-0.92).15 The CREATE study provides the first data to prospectively assess the clinical utility of qXR-LNMS to predict the likely risk of benign and malignant IPNs on CXR against radiologist assessment of LDCT in clinical settings across diverse populations, thereby having the potential to help in ruling out low-risk nodules and reducing unnecessary referrals for CT scans and numerous invasive procedures.20, 21, 22 It is important to note that this tool is intended to assist clinical workflows and should not be used as a standalone method to exclude malignancy.
Although LDCT is recommended for lung cancer screening by US Preventive Task Force,3 utility in resource-limited regions is constrained owing to substantial costs and accessibility.10 Additionally, only individuals with high-risk (eg, smokers and age of >50 years) are recommended for LDCT-based screening.3,4 Hence, AI-enabled CXR reading could potentially help triage individuals for LDCT who may not be eligible for screening as per current guidelines. Over the years, AI has shown potential in detecting and classifying pulmonary nodules as malignant and benign.23 Further, AI-enabled interpretation of CXRs may offer a complementary triage pathway to identify at risk individuals who fall outside standard screening eligibility criteria. Although chest radiography is commonly used for diagnosis of lung diseases, its application for lung cancer screening is limited.24, 25, 26, 27 Integrating AI assistance for IPN interpretation on CXR in real-world clinical settings is promising.28,29
Demographic factors, such as age and smoking history, are well established risk factors for lung cancer and are prime considerations for lung cancer screening programs.30,31 The current National Comprehensive Cancer Network4 and American Society of Clinical Oncology32 guidelines recommend annual LDCT screening for individuals aged 50 to 80 years with a 20-pack-year or more smoking history. However, the lung cancer screening landscape is changing because recent studies highlight the benefits of lung cancer screening for nonsmokers, showing early detection rates of 92.7% and significantly better survival rates.5,33, 34, 35, 36, 37 The CREATE study cohort predominantly comprised men (63.1%) and individuals aged 55 years or older (61.2%), most being never smokers (71.6%) and without a family history of lung cancer (97.2%). The PPV and NPV in our study crossed the predefined thresholds in high-risk groups, such as males (PPV, 57.4%; NPV, 93.8%) and individuals aged 55 years or older (PPV, 60.1%; NPV, 92.9%), as well as low-risk groups, such as nonsmokers (PPV, 56.1%; NPV, 94.2%), younger population (PPV, 45.5%; NPV, 94.6%), and those without a family history of lung cancer (PPV, 55.0%; NPV, 93.3%). Notably, the NPVs were consistently high across all clinicodemographic categories, demonstrating the effectiveness of qXR in identifying benign nodules and reducing false positive cases. False positive results are associated with increased costs for further follow-ups and invasive procedures and with increased anxiety.38,39 Additionally, the PPV in subgroups who would normally not be eligible for lung cancer screening programs, such as nonsmokers, younger individuals (aged <55 years), and those without a family history of lung cancer, was also higher than the predetermined threshold, highlighting the utility of qXR for lung cancer risk stratification across varied segments of population particularly in low-risk individuals.
The qXR-LNMS model generated a malignancy score (0-100) for each detected nodule based on various well established risk dimensions related to nodule characteristics such as size,40 calcification,41 spiculation,19,42, 43, 44, 45 and homogeneity. In the study cohort, participants primarily had solitary (68.0%) and solid (72.6%) nodules, nonspiculated (71.1%), and noncalcified nodules (81.5%). Overall, the subgroup specific PPVs (range, 25.7%-83.3%) and NPVs (range,72.7%-100%) crossed the predefined threshold set for the study including those for high-risk groups such as those with multiple (PPV, 64.5%) or spiculated (PPV, 80.9%) nodules (Figures 2 and 3).
The study assessed the level of agreement between qXR-LNMS and alternate malignancy risk scoring models like Mayo Clinic model and radiologists’ Likert scale. A positive correlation (Spearman correlation coefficient, 0.247) between LNMS and Mayo’s score shows that the 2 tools complement each other. Notably, qXR-LNMS demonstrated high agreement with Mayo score when binarized at a 2% threshold, accurately identifying high-risk cases, although some discrepancies were observed in low-risk classifications. The level of agreement with radiologists’ assessment on Likert scale showed effective risk stratification, in both high-risk and low-risk qXR-LNMS groups, with 81.8% of cases categorized as nonmalignant in the low-risk group and 59.6% of cases categorized as malignant in the high-risk group.
Despite their widespread use in clinical practice, CXRs have traditionally been considered suboptimal for lung cancer screening owing to their lower sensitivity than LDCT. However, recent advancements in AI-driven imaging analysis have renewed interest in using CXRs for early lung cancer risk stratification along with AI interpretation.46, 47, 48 Health care systems currently face a critical challenge in accurately identifying individuals at high-risk of lung cancer, which is crucial for further imaging and subsequent treatment. Artificial intelligence tools can enable earlier detection, improve diagnostic accuracy, and prioritize urgent cases, offering a cost-effective solution for global shift in cancer stage at the time of diagnosis, leading to improved disease management and clinical outcomes for individuals with lung cancer. Our results support the potential utility of qXR-LNMS in identifying high-risk nodules that require follow-up and further evaluation while ruling out individuals with low-risk of malignancy. Moreover, qXR could also serve as a triage tool for radiologists in high workload settings to prioritize CXRs with high-risk LNMS for further workup. The ongoing phase 2 of this study is expected to provide deeper insight into the proportion of participants who are identified as having high-risk for malignancy by AI-driven risk stratification at baseline, getting diagnosed as having lung cancer after further referral, workup (including biopsies), and long-term follow-up.
Although the study demonstrates the utility of qXR-LNMS in predicting benign and malignant IPNs on CXRs, the results need to be interpreted with caution. In this study, although qXR-LNMS was applied to all consecutive CXRs, radiologists confirmed the IPNs on CXR and facilitated selection for the study, which might have introduced a selection bias. Moreover, most of the study population was from a single region. Thus, the results need to be confirmed in large-scale multicenter studies at primary care settings to assess its utility where specialists may not be available.
The initial data from the CREATE study provide insights about real-world utility of AI-enabled triage of IPN for follow-up. We report the predictive performance of qXR-LNMS, a proprietary malignancy risk scoring component of the qXR system,14 to stratify nodules as per the risk of malignancy, with an overall PPV of 54.2% and NPV of 93.5%. The study results crossed the predefined thresholds of 20% and 70% for the PPV and NPV, which were selected as precision targets for sample size estimation rather than clinical decision cutoffs. The 20% PPV threshold reflects a conservative estimate based on expected lower lung cancer prevalence in real-world, low-screening settings, compared with that in higher-risk cohorts like National Lung Screening Trial.19 The NPV threshold of 70% was intended to describe residual risk among AI-flagged low-risk cases. All participants, regardless of AI risk categorization, continued standard radiologist review and clinical follow-up.
The initial validation study of the qXR algorithm demonstrated a high area under the curve (AUC = 0.99), sensitivity of 1.0, and specificity of 0.90 (range, 0.87-0.92).15 The CREATE study provides the first data to prospectively assess the clinical utility of qXR-LNMS to predict the likely risk of benign and malignant IPNs on CXR against radiologist assessment of LDCT in clinical settings across diverse populations, thereby having the potential to help in ruling out low-risk nodules and reducing unnecessary referrals for CT scans and numerous invasive procedures.20, 21, 22 It is important to note that this tool is intended to assist clinical workflows and should not be used as a standalone method to exclude malignancy.
Although LDCT is recommended for lung cancer screening by US Preventive Task Force,3 utility in resource-limited regions is constrained owing to substantial costs and accessibility.10 Additionally, only individuals with high-risk (eg, smokers and age of >50 years) are recommended for LDCT-based screening.3,4 Hence, AI-enabled CXR reading could potentially help triage individuals for LDCT who may not be eligible for screening as per current guidelines. Over the years, AI has shown potential in detecting and classifying pulmonary nodules as malignant and benign.23 Further, AI-enabled interpretation of CXRs may offer a complementary triage pathway to identify at risk individuals who fall outside standard screening eligibility criteria. Although chest radiography is commonly used for diagnosis of lung diseases, its application for lung cancer screening is limited.24, 25, 26, 27 Integrating AI assistance for IPN interpretation on CXR in real-world clinical settings is promising.28,29
Demographic factors, such as age and smoking history, are well established risk factors for lung cancer and are prime considerations for lung cancer screening programs.30,31 The current National Comprehensive Cancer Network4 and American Society of Clinical Oncology32 guidelines recommend annual LDCT screening for individuals aged 50 to 80 years with a 20-pack-year or more smoking history. However, the lung cancer screening landscape is changing because recent studies highlight the benefits of lung cancer screening for nonsmokers, showing early detection rates of 92.7% and significantly better survival rates.5,33, 34, 35, 36, 37 The CREATE study cohort predominantly comprised men (63.1%) and individuals aged 55 years or older (61.2%), most being never smokers (71.6%) and without a family history of lung cancer (97.2%). The PPV and NPV in our study crossed the predefined thresholds in high-risk groups, such as males (PPV, 57.4%; NPV, 93.8%) and individuals aged 55 years or older (PPV, 60.1%; NPV, 92.9%), as well as low-risk groups, such as nonsmokers (PPV, 56.1%; NPV, 94.2%), younger population (PPV, 45.5%; NPV, 94.6%), and those without a family history of lung cancer (PPV, 55.0%; NPV, 93.3%). Notably, the NPVs were consistently high across all clinicodemographic categories, demonstrating the effectiveness of qXR in identifying benign nodules and reducing false positive cases. False positive results are associated with increased costs for further follow-ups and invasive procedures and with increased anxiety.38,39 Additionally, the PPV in subgroups who would normally not be eligible for lung cancer screening programs, such as nonsmokers, younger individuals (aged <55 years), and those without a family history of lung cancer, was also higher than the predetermined threshold, highlighting the utility of qXR for lung cancer risk stratification across varied segments of population particularly in low-risk individuals.
The qXR-LNMS model generated a malignancy score (0-100) for each detected nodule based on various well established risk dimensions related to nodule characteristics such as size,40 calcification,41 spiculation,19,42, 43, 44, 45 and homogeneity. In the study cohort, participants primarily had solitary (68.0%) and solid (72.6%) nodules, nonspiculated (71.1%), and noncalcified nodules (81.5%). Overall, the subgroup specific PPVs (range, 25.7%-83.3%) and NPVs (range,72.7%-100%) crossed the predefined threshold set for the study including those for high-risk groups such as those with multiple (PPV, 64.5%) or spiculated (PPV, 80.9%) nodules (Figures 2 and 3).
The study assessed the level of agreement between qXR-LNMS and alternate malignancy risk scoring models like Mayo Clinic model and radiologists’ Likert scale. A positive correlation (Spearman correlation coefficient, 0.247) between LNMS and Mayo’s score shows that the 2 tools complement each other. Notably, qXR-LNMS demonstrated high agreement with Mayo score when binarized at a 2% threshold, accurately identifying high-risk cases, although some discrepancies were observed in low-risk classifications. The level of agreement with radiologists’ assessment on Likert scale showed effective risk stratification, in both high-risk and low-risk qXR-LNMS groups, with 81.8% of cases categorized as nonmalignant in the low-risk group and 59.6% of cases categorized as malignant in the high-risk group.
Despite their widespread use in clinical practice, CXRs have traditionally been considered suboptimal for lung cancer screening owing to their lower sensitivity than LDCT. However, recent advancements in AI-driven imaging analysis have renewed interest in using CXRs for early lung cancer risk stratification along with AI interpretation.46, 47, 48 Health care systems currently face a critical challenge in accurately identifying individuals at high-risk of lung cancer, which is crucial for further imaging and subsequent treatment. Artificial intelligence tools can enable earlier detection, improve diagnostic accuracy, and prioritize urgent cases, offering a cost-effective solution for global shift in cancer stage at the time of diagnosis, leading to improved disease management and clinical outcomes for individuals with lung cancer. Our results support the potential utility of qXR-LNMS in identifying high-risk nodules that require follow-up and further evaluation while ruling out individuals with low-risk of malignancy. Moreover, qXR could also serve as a triage tool for radiologists in high workload settings to prioritize CXRs with high-risk LNMS for further workup. The ongoing phase 2 of this study is expected to provide deeper insight into the proportion of participants who are identified as having high-risk for malignancy by AI-driven risk stratification at baseline, getting diagnosed as having lung cancer after further referral, workup (including biopsies), and long-term follow-up.
Although the study demonstrates the utility of qXR-LNMS in predicting benign and malignant IPNs on CXRs, the results need to be interpreted with caution. In this study, although qXR-LNMS was applied to all consecutive CXRs, radiologists confirmed the IPNs on CXR and facilitated selection for the study, which might have introduced a selection bias. Moreover, most of the study population was from a single region. Thus, the results need to be confirmed in large-scale multicenter studies at primary care settings to assess its utility where specialists may not be available.
Conclusion
Conclusion
In conclusion, the observed PPV (54.2%) and NPV (93.5%), crossing the predefined thresholds of 20% and 70%, demonstrate the utility of qXR-LNMS in predicting benign and malignant IPNs on CXRs across diverse health care settings. The PPVs and NPVs were consistent across all subgroups including participants aged younger than 55 years and nonsmokers. The CREATE study findings suggest the potential utility of AI-enabled triaging of IPNs on CXRs to support lung cancer screening workflows, particularly in resource-limited settings. However, further validation, including prospective studies and dedicated reader studies, is necessary to establish its impact on clinical decision making.
In conclusion, the observed PPV (54.2%) and NPV (93.5%), crossing the predefined thresholds of 20% and 70%, demonstrate the utility of qXR-LNMS in predicting benign and malignant IPNs on CXRs across diverse health care settings. The PPVs and NPVs were consistent across all subgroups including participants aged younger than 55 years and nonsmokers. The CREATE study findings suggest the potential utility of AI-enabled triaging of IPNs on CXRs to support lung cancer screening workflows, particularly in resource-limited settings. However, further validation, including prospective studies and dedicated reader studies, is necessary to establish its impact on clinical decision making.
Potential Competing Interests
Potential Competing Interests
Dr Gonuguntla has received lecture fees from Fujifilm, Erbe, and Pfizer. Dr Cordova has received honoraria from AstraZeneca. Drs Sen and Agrawal are employees of Qure.ai. Drs McCutcheon, Saha, and Kantharaju are employees of AstraZeneca. The other authors report no competing interests. The qXR-Lung Nodule Malignancy Score (qXR-LNMS) system is a proprietary product of Qure.ai. The medical writing support was provided by Fortrea Scientific. Dr Cordova was provided a laptop by AstraZeneca for installation of the Qure.ai program.
Dr Gonuguntla has received lecture fees from Fujifilm, Erbe, and Pfizer. Dr Cordova has received honoraria from AstraZeneca. Drs Sen and Agrawal are employees of Qure.ai. Drs McCutcheon, Saha, and Kantharaju are employees of AstraZeneca. The other authors report no competing interests. The qXR-Lung Nodule Malignancy Score (qXR-LNMS) system is a proprietary product of Qure.ai. The medical writing support was provided by Fortrea Scientific. Dr Cordova was provided a laptop by AstraZeneca for installation of the Qure.ai program.
Ethics Statement
Ethics Statement
The CREATE study was conducted across a total of 23 clinical sites. Ethical approval was obtained from the respective local Ethics Committees at all participating sites. At the principal investigator’s institution, Hacettepe University Faculty of Medicine, Turkey, approval was granted by the Ministry of Health Turkish Medicines & Devices EC, Turkey, under reference number E-24931227-511.06.01.02-5655491. Participants gave informed consent to participate in the study before enrollment.
The CREATE study was conducted across a total of 23 clinical sites. Ethical approval was obtained from the respective local Ethics Committees at all participating sites. At the principal investigator’s institution, Hacettepe University Faculty of Medicine, Turkey, approval was granted by the Ministry of Health Turkish Medicines & Devices EC, Turkey, under reference number E-24931227-511.06.01.02-5655491. Participants gave informed consent to participate in the study before enrollment.
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.