Clinicopathological characteristics associated with intrapulmonary metastasis rather than single primary lung cancer at first diagnosis: a study based on the Surveillance, Epidemiology, and End Results database using Bayesian networks and structural equation modeling.
1/5 보강
PICO 자동 추출 (휴리스틱, conf 2/4)
유사 논문P · Population 대상 환자/모집단
194 patients, 9,302 had IPM and 35,892 had SPLC.
I · Intervention 중재 / 시술
추출되지 않음
C · Comparison 대조 / 비교
추출되지 않음
O · Outcome 결과 / 결론
[CONCLUSIONS] In this large registry-based study, intrapulmonary metastatic disease at first lung cancer diagnosis was more strongly associated with tumor differentiation, laterality, and anatomical distribution than with measured sociodemographic factors. These observational associations may help characterise patients who present with more extensive intrapulmonary disease.
[BACKGROUND] At initial lung cancer diagnosis, intrapulmonary metastasis (IPM) usually reflects more advanced intrathoracic disease than single primary lung cancer (SPLC).
- 95% CI 1.571-1.772
- 연구 설계 cross-sectional
APA
Liu W, Chan JCS, et al. (2026). Clinicopathological characteristics associated with intrapulmonary metastasis rather than single primary lung cancer at first diagnosis: a study based on the Surveillance, Epidemiology, and End Results database using Bayesian networks and structural equation modeling.. Translational lung cancer research, 15(1), 9. https://doi.org/10.21037/tlcr-2025-1085
MLA
Liu W, et al.. "Clinicopathological characteristics associated with intrapulmonary metastasis rather than single primary lung cancer at first diagnosis: a study based on the Surveillance, Epidemiology, and End Results database using Bayesian networks and structural equation modeling.." Translational lung cancer research, vol. 15, no. 1, 2026, pp. 9.
PMID
41659273 ↗
Abstract 한글 요약
[BACKGROUND] At initial lung cancer diagnosis, intrapulmonary metastasis (IPM) usually reflects more advanced intrathoracic disease than single primary lung cancer (SPLC). However, the clinical and pathological characteristics associated with presenting as IPM rather than SPLC, and with more extensive IPM patterns, are not well described. This study aimed to characterise how sociodemographic and tumor features at diagnosis are associated with IPM compared with SPLC and with different intrapulmonary metastatic patterns in a large population-based registry.
[METHODS] We conducted a cross-sectional analysis of patients with non-small cell lung cancer in the Surveillance, Epidemiology, and End Results database from 2000 to 2019. IPM and SPLC were defined using the "Separate Tumor Nodules Ipsilateral Lung" recode. Bayesian network modeling and structural equation modeling were used to describe conditional association structures among sociodemographic variables, tumor characteristics, and lung cancer type. Simulated interventions in the Bayesian network yielded model-based risk ratios (RRs) with 95% confidence intervals (CIs) for IPM versus SPLC. Logistic regression was used in an exploratory subgroup analysis of IPM patterns comparing disease confined to the same lobe, disease in different lobes, and disease in both the same and different lobes.
[RESULTS] Among 45,194 patients, 9,302 had IPM and 35,892 had SPLC. In the Bayesian network, tumor grade and laterality showed the strongest direct associations with lung cancer type, and the model discriminated IPM from SPLC with an area under the curve of 0.919. Sociodemographic variables showed weaker and less consistent associations with lung cancer type after adjustment for tumor characteristics. Simulated interventions suggested progressively higher model-based risk of IPM with poorer differentiation (RR of well-differentiated to poorly differentiated grade: 1.664, 95% CI: 1.571-1.772) and with right-sided disease (RR of right-sided to left-sided disease: 1.136, 95% CI: 1.093-1.178). In subgroup analyses, higher grade and lower and middle lobe location were associated with IPM patterns involving multiple lobes.
[CONCLUSIONS] In this large registry-based study, intrapulmonary metastatic disease at first lung cancer diagnosis was more strongly associated with tumor differentiation, laterality, and anatomical distribution than with measured sociodemographic factors. These observational associations may help characterise patients who present with more extensive intrapulmonary disease.
[METHODS] We conducted a cross-sectional analysis of patients with non-small cell lung cancer in the Surveillance, Epidemiology, and End Results database from 2000 to 2019. IPM and SPLC were defined using the "Separate Tumor Nodules Ipsilateral Lung" recode. Bayesian network modeling and structural equation modeling were used to describe conditional association structures among sociodemographic variables, tumor characteristics, and lung cancer type. Simulated interventions in the Bayesian network yielded model-based risk ratios (RRs) with 95% confidence intervals (CIs) for IPM versus SPLC. Logistic regression was used in an exploratory subgroup analysis of IPM patterns comparing disease confined to the same lobe, disease in different lobes, and disease in both the same and different lobes.
[RESULTS] Among 45,194 patients, 9,302 had IPM and 35,892 had SPLC. In the Bayesian network, tumor grade and laterality showed the strongest direct associations with lung cancer type, and the model discriminated IPM from SPLC with an area under the curve of 0.919. Sociodemographic variables showed weaker and less consistent associations with lung cancer type after adjustment for tumor characteristics. Simulated interventions suggested progressively higher model-based risk of IPM with poorer differentiation (RR of well-differentiated to poorly differentiated grade: 1.664, 95% CI: 1.571-1.772) and with right-sided disease (RR of right-sided to left-sided disease: 1.136, 95% CI: 1.093-1.178). In subgroup analyses, higher grade and lower and middle lobe location were associated with IPM patterns involving multiple lobes.
[CONCLUSIONS] In this large registry-based study, intrapulmonary metastatic disease at first lung cancer diagnosis was more strongly associated with tumor differentiation, laterality, and anatomical distribution than with measured sociodemographic factors. These observational associations may help characterise patients who present with more extensive intrapulmonary disease.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
같은 제1저자의 인용 많은 논문 (5)
- Neutrophils in nasopharyngeal carcinoma: from mechanisms to therapeutics.
- The application of mixed reality navigation system in robot-assisted radical prostatectomy for high-risk prostate cancer: a propensity score‑matched cohort study.
- CD146 pericyte-like lung cancer brain metastatic stem cells promote tumor angiogenesis through dual regulatory effects on the VEGF/VEGFR axis.
- Pharmacovigilance analysis of infliximab in inflammatory bowel disease: novel safety signals and sex-specific adverse event profiles from the FAERS database (2004-2024).
- Is a PERK-Regulated Long Non-Coding RNA That Fine-Tunes UPR Signalling and Inhibits Endoplasmic Reticulum Stress-Induced Cell Death.
📖 전문 본문 읽기 PMC JATS · ~59 KB · 영문
Introduction
Introduction
Lung cancer remains one of the leading causes of cancer-related mortality worldwide (1). Its development reflects the combined influence of genetic, environmental, and clinical factors (2,3). With the widespread use of low-dose chest computed tomography (CT) screening, an increasing number of patients are being diagnosed with multiple ground-glass nodules (GGNs), including GGN that may represent very early or subclinical disease (4-6). In clinical practice, these nodules may correspond to a single primary lung cancer (SPLC) or to a more advanced presentation in which cancer cells from an index tumor have already spread within the lung to form additional intrapulmonary metastasis (IPM) (7,8). Distinguishing these diagnostic patterns at the time of first lung cancer diagnosis is crucial because IPM usually indicates higher tumor burden, more advanced stage, and poorer clinical outcomes compared with SPLC.
Most population-based and clinical studies have examined SPLC in isolation and have focused on incidence, treatment, and survival in non-small cell lung cancer (9,10). In contrast, less attention has been paid to the characteristics of patients who already have IPM spread at their initial diagnosis. Existing work has mainly evaluated survival after resection, patterns of recurrence, or the performance of pathologic and imaging criteria used to separate intrapulmonary spread from separate primary tumors (11-13). As a result, there is limited evidence on how routine sociodemographic and tumor characteristics at first diagnosis differ between patients recorded as SPLC and those recorded as IPM in large cancer registries. Furthermore, among patients with IPM, even less is known about how patterns of lobe involvement, such as metastases confined to one lobe, metastases in different lobes, or metastases involving both the same and different lobes relate to more aggressive intrapulmonary disease. The Surveillance, Epidemiology, and End Results (SEER) program provides large-scale data on sociodemographic variables and tumor characteristics (14,15). A detailed description of how sociodemographic features and tumor characteristics are jointly associated with SPLC versus IPM at initial presentation can improve clinical understanding of which patients tend to present with intrapulmonary metastatic disease rather than a single primary tumor. In addition, examining how these characteristics relate to different intrapulmonary metastatic patterns within the IPM group may provide further insight into the clinical profile of more extensive intrapulmonary spread.
Bayesian networks and structural equation modeling (SEM) are complementary multivariable approaches that can summarize complex association structures in such data. Bayesian networks represent conditional dependence relationships among variables in a probabilistic graphical form and can be used to explore how changes in one characteristic are associated with changes in the predicted distribution of others (16-19). SEM provides a flexible framework to represent direct and indirect associations among observed variables and an outcome and has been widely used in cancer epidemiology to describe patterns in covariance structures (20-22). In observational cross-sectional data, these methods do not establish causality but can offer an integrated view of how sociodemographic and tumor characteristics co-vary with diagnostic patterns.
In this study, we conducted a cross-sectional analysis of SEER data from 2000 to 2019 to characterize how sociodemographic and tumor features at diagnosis are associated with SPLC and IPM. Our primary objective was to describe the associations between these features and the diagnostic pattern recorded as SPLC or IPM at the time of first lung cancer diagnosis. As a secondary objective, among patients with IPM and known lobe status, we explored how these characteristics are related to different intrapulmonary metastatic patterns defined by lobe involvement, in order to supplement clinical knowledge about the profiles of more aggressive intrapulmonary metastatic disease. We present this article in accordance with the STROBE reporting checklist (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-1085/rc).
Lung cancer remains one of the leading causes of cancer-related mortality worldwide (1). Its development reflects the combined influence of genetic, environmental, and clinical factors (2,3). With the widespread use of low-dose chest computed tomography (CT) screening, an increasing number of patients are being diagnosed with multiple ground-glass nodules (GGNs), including GGN that may represent very early or subclinical disease (4-6). In clinical practice, these nodules may correspond to a single primary lung cancer (SPLC) or to a more advanced presentation in which cancer cells from an index tumor have already spread within the lung to form additional intrapulmonary metastasis (IPM) (7,8). Distinguishing these diagnostic patterns at the time of first lung cancer diagnosis is crucial because IPM usually indicates higher tumor burden, more advanced stage, and poorer clinical outcomes compared with SPLC.
Most population-based and clinical studies have examined SPLC in isolation and have focused on incidence, treatment, and survival in non-small cell lung cancer (9,10). In contrast, less attention has been paid to the characteristics of patients who already have IPM spread at their initial diagnosis. Existing work has mainly evaluated survival after resection, patterns of recurrence, or the performance of pathologic and imaging criteria used to separate intrapulmonary spread from separate primary tumors (11-13). As a result, there is limited evidence on how routine sociodemographic and tumor characteristics at first diagnosis differ between patients recorded as SPLC and those recorded as IPM in large cancer registries. Furthermore, among patients with IPM, even less is known about how patterns of lobe involvement, such as metastases confined to one lobe, metastases in different lobes, or metastases involving both the same and different lobes relate to more aggressive intrapulmonary disease. The Surveillance, Epidemiology, and End Results (SEER) program provides large-scale data on sociodemographic variables and tumor characteristics (14,15). A detailed description of how sociodemographic features and tumor characteristics are jointly associated with SPLC versus IPM at initial presentation can improve clinical understanding of which patients tend to present with intrapulmonary metastatic disease rather than a single primary tumor. In addition, examining how these characteristics relate to different intrapulmonary metastatic patterns within the IPM group may provide further insight into the clinical profile of more extensive intrapulmonary spread.
Bayesian networks and structural equation modeling (SEM) are complementary multivariable approaches that can summarize complex association structures in such data. Bayesian networks represent conditional dependence relationships among variables in a probabilistic graphical form and can be used to explore how changes in one characteristic are associated with changes in the predicted distribution of others (16-19). SEM provides a flexible framework to represent direct and indirect associations among observed variables and an outcome and has been widely used in cancer epidemiology to describe patterns in covariance structures (20-22). In observational cross-sectional data, these methods do not establish causality but can offer an integrated view of how sociodemographic and tumor characteristics co-vary with diagnostic patterns.
In this study, we conducted a cross-sectional analysis of SEER data from 2000 to 2019 to characterize how sociodemographic and tumor features at diagnosis are associated with SPLC and IPM. Our primary objective was to describe the associations between these features and the diagnostic pattern recorded as SPLC or IPM at the time of first lung cancer diagnosis. As a secondary objective, among patients with IPM and known lobe status, we explored how these characteristics are related to different intrapulmonary metastatic patterns defined by lobe involvement, in order to supplement clinical knowledge about the profiles of more aggressive intrapulmonary metastatic disease. We present this article in accordance with the STROBE reporting checklist (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-1085/rc).
Methods
Methods
Study cohort, data processing and baseline analysis
In this retrospective cohort study, we retrieved clinicopathological data, including age, sex, race, pathology, location, grade, laterality, tumor node metastasis (TNM) stage, T stage, N stage, M stage, median household income, position on the rural-urban continuum, and marital status, from patients with confirmed lung cancer diagnoses in the 17 SEER registries (November 2021 submission) using SEER Stat 8.4.4 software. The site record International Classification of Diseases for Oncology, third edition (ICD-O-3)/World Health Organization (WHO) 2008 was set to “Lung and Bronchus”. The inclusion criteria were as follows: (I) a diagnosis between 2000 and 2019; (II) age over 18 years; (III) histologically confirmed non-small cell adenocarcinoma, non-small cell neuroendocrine carcinoma (NEC), non-small cell neuroendocrine tumors (NETs), or NEC; (IV) the record of the patient’s first visit, with sequence numbers marked as “1st of 2 or more primaries” or “One primary only”; (V) laterality recorded as “Right-origin of primary”, “Bilateral, single primary”, or “Left-origin of primary”. The exclusion criteria included the following: (I) patients with prior malignancies before the diagnosis of primary lung cancer; (II) incomplete information. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Definition of IPM and SPLC
The “Separate Tumor Nodules Ipsilateral Lung” recode for SPLC was set as “None; No intrapulmonary mets; Foci in situ/minimally invasive adenocarcinoma”. For IPM, the recode was set as “Separate nodules of same hist type in ipsilateral lung, different lobe”, “Separate nodules of same hist type in ipsilateral lung, same lobe”, “Separate nodules of same hist type in ipsilateral lung, same AND different lobes”, and “Separate tumor nodules, ipsilateral lung, unknown if same or different lobe”.
Bayesian network analysis
We employed Bayesian network to describe the joint distribution of clinical variables and lung cancer type and to explore conditional dependence structures. The network structure was learned using the hill climb search algorithm (pgmpy.estimators), optimized with the Bayesian information criterion (BIC). To respect a plausible temporal and clinical ordering, we constrained the structure learning. Edges from birth variables to sociodemographic and tumor variables were allowed. Edges from sociodemographic variables to sociodemographic and tumor variables were allowed. Edges from tumor variables to other tumor variables were allowed. Edges from later tiers back to earlier tiers were prohibited, and edges from composite TNM stage to its components T, N, and M stages were not allowed. Directed edges were therefore interpreted as conditional dependence relationships within the fitted probabilistic model rather than as evidence of causal effects. The performance of the Bayesian network in predicting outcomes was assessed using the receiver operating characteristic (ROC) curve and the area under the curve (AUC), calculated with sklearn and matplotlib.
SEM analysis
We then fitted an SEM as a path analysis representation of the main conditional associations suggested by the Bayesian network. Residual covariances were allowed between TNM, T, N, and M stages. The model was fitted with maximum likelihood in the semopy package. Global fit indices, including goodness of fit index (GFI), Chi-squared statistics, comparative fit index (CFI), Tucker and Lewis index (TLI), normed fit index (NFI), adjusted GFI (AGFI), root mean square error of approximation (RMSEA), and standardised root mean square residual (SRMR), were used to describe the concordance between the model-implied and observed covariance structures. Given the cross-sectional design and imperfect fit, the SEM was interpreted as a descriptive association model rather than a validated causal structure.
Simulated intervention
Conditional probabilities were estimated using maximum likelihood. Inference was performed with variable elimination. We used the fitted network to evaluate how the model-implied probability of IPM versus SPLC changed when fixing specific variables at different levels while leaving the empirical distribution of other variables unchanged. These contrasts represent differences within the fitted joint distribution and are interpreted as statistical associations. We contrasted each level with a clinically relevant reference level and derived model-based risk ratios (RRs) and risk differences for IPM versus SPLC. For each of one thousand bootstrap samples, we refitted the Bayesian network under the same structural constraints and recomputed the RRs and risk differences. Two-sided P values were derived from the bootstrap distributions and confidence intervals (CIs) were obtained from bootstrap percentiles. Results were displayed as forest plots.
Subgroup analysis of IPM patterns
As an exploratory subgroup analysis, we restricted the cohort to patients with IPM and examined factors associated with different patterns of intrapulmonary disease. Two separate binary logistic regression models were fitted comparing IPM in different lobes versus IPM in the same lobe and IPM in same and different lobes versus IPM in the same lobe. The same set of variables as in the main analyses was used. Age entered as a continuous variable. Grade, income, rural and urban continuum, and TNM, T, N, and M stages were treated as ordered variables using the clinical order. Race, sex, pathology, tumor location, laterality, and marital status were treated as nominal variables and were coded with indicator variables, with White race, male, non-small cell adenocarcinoma, upper lobe location, left-sided laterality, and married status as reference categories. Odds ratios (ORs) with 95% CIs and P values were reported for each comparison. These models were used to describe how sociodemographic and tumor characteristics were associated with more extensive intrapulmonary metastatic patterns relative to the reference IPM in the same lobe group.
Statistical analyses
Continuous variables were summarized as means with standard deviations, and categorical variables as counts and percentages. Group comparisons among SPLC and IPM used Chi-squared tests and analysis of variance. Analyses were performed in Python 3.12 and R 4.3.2, with a two-tailed P value less than 0.05 considered significant.
Study cohort, data processing and baseline analysis
In this retrospective cohort study, we retrieved clinicopathological data, including age, sex, race, pathology, location, grade, laterality, tumor node metastasis (TNM) stage, T stage, N stage, M stage, median household income, position on the rural-urban continuum, and marital status, from patients with confirmed lung cancer diagnoses in the 17 SEER registries (November 2021 submission) using SEER Stat 8.4.4 software. The site record International Classification of Diseases for Oncology, third edition (ICD-O-3)/World Health Organization (WHO) 2008 was set to “Lung and Bronchus”. The inclusion criteria were as follows: (I) a diagnosis between 2000 and 2019; (II) age over 18 years; (III) histologically confirmed non-small cell adenocarcinoma, non-small cell neuroendocrine carcinoma (NEC), non-small cell neuroendocrine tumors (NETs), or NEC; (IV) the record of the patient’s first visit, with sequence numbers marked as “1st of 2 or more primaries” or “One primary only”; (V) laterality recorded as “Right-origin of primary”, “Bilateral, single primary”, or “Left-origin of primary”. The exclusion criteria included the following: (I) patients with prior malignancies before the diagnosis of primary lung cancer; (II) incomplete information. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Definition of IPM and SPLC
The “Separate Tumor Nodules Ipsilateral Lung” recode for SPLC was set as “None; No intrapulmonary mets; Foci in situ/minimally invasive adenocarcinoma”. For IPM, the recode was set as “Separate nodules of same hist type in ipsilateral lung, different lobe”, “Separate nodules of same hist type in ipsilateral lung, same lobe”, “Separate nodules of same hist type in ipsilateral lung, same AND different lobes”, and “Separate tumor nodules, ipsilateral lung, unknown if same or different lobe”.
Bayesian network analysis
We employed Bayesian network to describe the joint distribution of clinical variables and lung cancer type and to explore conditional dependence structures. The network structure was learned using the hill climb search algorithm (pgmpy.estimators), optimized with the Bayesian information criterion (BIC). To respect a plausible temporal and clinical ordering, we constrained the structure learning. Edges from birth variables to sociodemographic and tumor variables were allowed. Edges from sociodemographic variables to sociodemographic and tumor variables were allowed. Edges from tumor variables to other tumor variables were allowed. Edges from later tiers back to earlier tiers were prohibited, and edges from composite TNM stage to its components T, N, and M stages were not allowed. Directed edges were therefore interpreted as conditional dependence relationships within the fitted probabilistic model rather than as evidence of causal effects. The performance of the Bayesian network in predicting outcomes was assessed using the receiver operating characteristic (ROC) curve and the area under the curve (AUC), calculated with sklearn and matplotlib.
SEM analysis
We then fitted an SEM as a path analysis representation of the main conditional associations suggested by the Bayesian network. Residual covariances were allowed between TNM, T, N, and M stages. The model was fitted with maximum likelihood in the semopy package. Global fit indices, including goodness of fit index (GFI), Chi-squared statistics, comparative fit index (CFI), Tucker and Lewis index (TLI), normed fit index (NFI), adjusted GFI (AGFI), root mean square error of approximation (RMSEA), and standardised root mean square residual (SRMR), were used to describe the concordance between the model-implied and observed covariance structures. Given the cross-sectional design and imperfect fit, the SEM was interpreted as a descriptive association model rather than a validated causal structure.
Simulated intervention
Conditional probabilities were estimated using maximum likelihood. Inference was performed with variable elimination. We used the fitted network to evaluate how the model-implied probability of IPM versus SPLC changed when fixing specific variables at different levels while leaving the empirical distribution of other variables unchanged. These contrasts represent differences within the fitted joint distribution and are interpreted as statistical associations. We contrasted each level with a clinically relevant reference level and derived model-based risk ratios (RRs) and risk differences for IPM versus SPLC. For each of one thousand bootstrap samples, we refitted the Bayesian network under the same structural constraints and recomputed the RRs and risk differences. Two-sided P values were derived from the bootstrap distributions and confidence intervals (CIs) were obtained from bootstrap percentiles. Results were displayed as forest plots.
Subgroup analysis of IPM patterns
As an exploratory subgroup analysis, we restricted the cohort to patients with IPM and examined factors associated with different patterns of intrapulmonary disease. Two separate binary logistic regression models were fitted comparing IPM in different lobes versus IPM in the same lobe and IPM in same and different lobes versus IPM in the same lobe. The same set of variables as in the main analyses was used. Age entered as a continuous variable. Grade, income, rural and urban continuum, and TNM, T, N, and M stages were treated as ordered variables using the clinical order. Race, sex, pathology, tumor location, laterality, and marital status were treated as nominal variables and were coded with indicator variables, with White race, male, non-small cell adenocarcinoma, upper lobe location, left-sided laterality, and married status as reference categories. Odds ratios (ORs) with 95% CIs and P values were reported for each comparison. These models were used to describe how sociodemographic and tumor characteristics were associated with more extensive intrapulmonary metastatic patterns relative to the reference IPM in the same lobe group.
Statistical analyses
Continuous variables were summarized as means with standard deviations, and categorical variables as counts and percentages. Group comparisons among SPLC and IPM used Chi-squared tests and analysis of variance. Analyses were performed in Python 3.12 and R 4.3.2, with a two-tailed P value less than 0.05 considered significant.
Results
Results
Baseline analysis
A total of 45,194 lung cancer patients were included in this study, consisting of 9,302 patients with IPM and 35,892 patients with SPLC. The clinicopathological characteristics of the patients are summarized in Table 1. The mean ages for the two groups were 67.24±11.02 and 66.93±10.79 years, respectively (P=0.01). Significant differences among the lung cancer types were found for variables, such as sex, race, pathology, location, grade, laterality, TNM stage, T stage, N stage, M stage, marital status, and median household income. No significant differences were observed in position on the rural-urban continuum.
Bayesian network analysis
Bayesian network captured the main conditional dependencies among sociodemographic and tumor-related variables and lung cancer type (Figure 1A). Sex and race were connected to marital status, age, median household income, and the rural-urban continuum, while TNM, T, N, and M stages, grade, laterality, tumor location, pathology, and lung cancer type formed a clinically plausible cluster of tumor characteristics. In this structure, grade and laterality were directly linked to lung cancer type. When the Bayesian network was used as a probabilistic classifier, the AUC for distinguishing IPM from SPLC was 0.919 (Figure 1B), indicating good discrimination based on the joint pattern of clinical and pathological variables. This performance reflects the network’s ability to summarise multivariable associations.
SEM
To represent the main associations suggested by the Bayesian network in a conventional path analysis framework, we fitted an SEM with only observed variables (Figure 2A). The model included regressions of T stage on tumor location and lung cancer type, M stage on T stage and lung cancer type, N stage on M stage and T stage, grade on TNM stage, pathology on grade and location, location on laterality, lung cancer type on grade and laterality, marital status on race and sex, median household income on race and the rural-urban continuum, the rural-urban continuum on race, and age on race and marital status.
Most specified paths were statistically significant (Figure 2B, Table S1). Higher TNM stage was associated with poorer histological grade (estimate −0.1428, P<0.001), and poorer grade in turn was associated with higher probability of IPM relative to SPLC (lung cancer type on grade: estimate −0.0452, P<0.001, with grade coded from undifferentiated to well-differentiated). Lung cancer type was positively associated with T stage (estimate 1.6194, P<0.001) and negatively associated with M stage (estimate −0.5600, P=0.02), consistent with more advanced local disease among patients classified as IPM. Laterality showed a small but significant association with lung cancer type (estimate 0.0202, P<0.001), reflecting the strong concentration of bilateral disease within the IPM group. As expected, race was strongly related to the rural-urban continuum and median household income, and sex was associated with marital status (all P<0.001).
Global fit indices indicated only moderate concordance between the model-implied and observed covariance structures (Figure 2C). The CFI, NFI and GFI were all approximately 0.91; the TLI and AGFI were approximately 0.88; the RMSEA was 0.066; and the SRMR was 0.071. These values support treating the SEM as a descriptive association model that captures the main dependency patterns highlighted by the Bayesian network.
Simulated intervention
Using the fitted Bayesian network, we performed model-based simulated interventions to examine how the predicted probability of IPM versus SPLC changed when fixing grade or laterality at different levels while keeping the empirical distribution of other variables unchanged. For histological grade, compared with well-differentiated tumors, the model predicted a higher probability of IPM for moderately and poorly differentiated and undifferentiated tumors, with corresponding lower probabilities for SPLC (Figure 3A). For example, moving from well-differentiated to moderately differentiated grade increased the model-based risk of IPM by roughly 20–30% (RR 1.235, 95% CI: 1.160–1.313), whereas moving from well-differentiated to poorly differentiated grade increased the risk by roughly 60% (RR 1.664, 95% CI: 1.571–1.772). In contrast, RRs comparing undifferentiated with poorly differentiated tumors were not statistically significant, suggesting a plateau in the association at the highest levels of poor differentiation.
For laterality, bilateral lesions were associated with markedly higher model-based risk of IPM (Figure 3B). Fixing laterality to bilateral versus left produced an estimated RR for IPM of around five (RR 5.215, 95% CI: 4.987–5.411), and bilateral versus right produced a RR of around four and a half (RR 4.591, 95% CI: 4.419–4.736). Right-sided tumors also showed a modestly higher predicted probability of IPM (RR 1.136, 95% CI: 1.093–1.178) and a lower probability of SPLC compared with left-sided disease (RR 0.968, 95% CI: 0.960–0.978). These simulated intervention contrasts summarise how the fitted joint distribution links grade and laterality to the likelihood of IPM.
Subgroup analysis of IPM patterns
In an exploratory subgroup analysis restricted to patients with IPM, we further examined factors associated with different IPM patterns. Higher histological grade, reflecting poorer differentiation remained strongly associated with more extensive intrapulmonary spread (IPM in a different lobe versus IPM in the same lobe: OR per grade level 1.018, 95% CI: 1.005–1.030; IPM in the same and different lobes versus IPM in the same lobe: OR per grade level; 1.777, 95% CI: 1.423–2.219) (Figure 4A,4B). In particular, lesions in the lower lobe (IPM in a different lobe versus IPM in the same lobe: OR versus upper lobe 2.025, 95% CI: 1.488–2.755; IPM in the same and different lobes versus IPM in the same lobe: OR 1.832, 95% CI: 1.322–2.538) and in the middle lobe (IPM in a different lobe versus the same lobe: OR 6.671, 95% CI: 2.668–16.677; IPM in the same and different lobes versus the same lobe: OR 3.478, 95% CI: 1.343–9.005) also showed higher odds of involving multiple lobes compared with upper lobe disease. In contrast, most sociodemographic variables, including race, sex, marital status, income, and rural-urban continuum, were not consistently associated with intrapulmonary metastatic pattern after adjustment for tumor characteristics. Overall, these subgroup findings suggest that, within this cohort, variation in IPM pattern is more strongly associated with tumor grade and anatomical distribution than with sociodemographic characteristics.
Baseline analysis
A total of 45,194 lung cancer patients were included in this study, consisting of 9,302 patients with IPM and 35,892 patients with SPLC. The clinicopathological characteristics of the patients are summarized in Table 1. The mean ages for the two groups were 67.24±11.02 and 66.93±10.79 years, respectively (P=0.01). Significant differences among the lung cancer types were found for variables, such as sex, race, pathology, location, grade, laterality, TNM stage, T stage, N stage, M stage, marital status, and median household income. No significant differences were observed in position on the rural-urban continuum.
Bayesian network analysis
Bayesian network captured the main conditional dependencies among sociodemographic and tumor-related variables and lung cancer type (Figure 1A). Sex and race were connected to marital status, age, median household income, and the rural-urban continuum, while TNM, T, N, and M stages, grade, laterality, tumor location, pathology, and lung cancer type formed a clinically plausible cluster of tumor characteristics. In this structure, grade and laterality were directly linked to lung cancer type. When the Bayesian network was used as a probabilistic classifier, the AUC for distinguishing IPM from SPLC was 0.919 (Figure 1B), indicating good discrimination based on the joint pattern of clinical and pathological variables. This performance reflects the network’s ability to summarise multivariable associations.
SEM
To represent the main associations suggested by the Bayesian network in a conventional path analysis framework, we fitted an SEM with only observed variables (Figure 2A). The model included regressions of T stage on tumor location and lung cancer type, M stage on T stage and lung cancer type, N stage on M stage and T stage, grade on TNM stage, pathology on grade and location, location on laterality, lung cancer type on grade and laterality, marital status on race and sex, median household income on race and the rural-urban continuum, the rural-urban continuum on race, and age on race and marital status.
Most specified paths were statistically significant (Figure 2B, Table S1). Higher TNM stage was associated with poorer histological grade (estimate −0.1428, P<0.001), and poorer grade in turn was associated with higher probability of IPM relative to SPLC (lung cancer type on grade: estimate −0.0452, P<0.001, with grade coded from undifferentiated to well-differentiated). Lung cancer type was positively associated with T stage (estimate 1.6194, P<0.001) and negatively associated with M stage (estimate −0.5600, P=0.02), consistent with more advanced local disease among patients classified as IPM. Laterality showed a small but significant association with lung cancer type (estimate 0.0202, P<0.001), reflecting the strong concentration of bilateral disease within the IPM group. As expected, race was strongly related to the rural-urban continuum and median household income, and sex was associated with marital status (all P<0.001).
Global fit indices indicated only moderate concordance between the model-implied and observed covariance structures (Figure 2C). The CFI, NFI and GFI were all approximately 0.91; the TLI and AGFI were approximately 0.88; the RMSEA was 0.066; and the SRMR was 0.071. These values support treating the SEM as a descriptive association model that captures the main dependency patterns highlighted by the Bayesian network.
Simulated intervention
Using the fitted Bayesian network, we performed model-based simulated interventions to examine how the predicted probability of IPM versus SPLC changed when fixing grade or laterality at different levels while keeping the empirical distribution of other variables unchanged. For histological grade, compared with well-differentiated tumors, the model predicted a higher probability of IPM for moderately and poorly differentiated and undifferentiated tumors, with corresponding lower probabilities for SPLC (Figure 3A). For example, moving from well-differentiated to moderately differentiated grade increased the model-based risk of IPM by roughly 20–30% (RR 1.235, 95% CI: 1.160–1.313), whereas moving from well-differentiated to poorly differentiated grade increased the risk by roughly 60% (RR 1.664, 95% CI: 1.571–1.772). In contrast, RRs comparing undifferentiated with poorly differentiated tumors were not statistically significant, suggesting a plateau in the association at the highest levels of poor differentiation.
For laterality, bilateral lesions were associated with markedly higher model-based risk of IPM (Figure 3B). Fixing laterality to bilateral versus left produced an estimated RR for IPM of around five (RR 5.215, 95% CI: 4.987–5.411), and bilateral versus right produced a RR of around four and a half (RR 4.591, 95% CI: 4.419–4.736). Right-sided tumors also showed a modestly higher predicted probability of IPM (RR 1.136, 95% CI: 1.093–1.178) and a lower probability of SPLC compared with left-sided disease (RR 0.968, 95% CI: 0.960–0.978). These simulated intervention contrasts summarise how the fitted joint distribution links grade and laterality to the likelihood of IPM.
Subgroup analysis of IPM patterns
In an exploratory subgroup analysis restricted to patients with IPM, we further examined factors associated with different IPM patterns. Higher histological grade, reflecting poorer differentiation remained strongly associated with more extensive intrapulmonary spread (IPM in a different lobe versus IPM in the same lobe: OR per grade level 1.018, 95% CI: 1.005–1.030; IPM in the same and different lobes versus IPM in the same lobe: OR per grade level; 1.777, 95% CI: 1.423–2.219) (Figure 4A,4B). In particular, lesions in the lower lobe (IPM in a different lobe versus IPM in the same lobe: OR versus upper lobe 2.025, 95% CI: 1.488–2.755; IPM in the same and different lobes versus IPM in the same lobe: OR 1.832, 95% CI: 1.322–2.538) and in the middle lobe (IPM in a different lobe versus the same lobe: OR 6.671, 95% CI: 2.668–16.677; IPM in the same and different lobes versus the same lobe: OR 3.478, 95% CI: 1.343–9.005) also showed higher odds of involving multiple lobes compared with upper lobe disease. In contrast, most sociodemographic variables, including race, sex, marital status, income, and rural-urban continuum, were not consistently associated with intrapulmonary metastatic pattern after adjustment for tumor characteristics. Overall, these subgroup findings suggest that, within this cohort, variation in IPM pattern is more strongly associated with tumor grade and anatomical distribution than with sociodemographic characteristics.
Discussion
Discussion
In this study, we applied Bayesian network modelling and SEM to describe how clinicopathological variables relate to the probability of IPM compared with SPLC. The analyses consistently indicated that tumor grade, laterality, and anatomical distribution showed the strongest associations with intrapulmonary metastatic spread, whereas sociodemographic characteristics had a much weaker and less consistent role after adjustment for tumor features.
Higher histological grade was associated with a higher probability of IPM rather than SPLC in both the Bayesian network and the SEM. The simulated intervention analysis further illustrated that, within the fitted joint distribution, moving from well-differentiated to moderately or poorly differentiated grade was accompanied by substantial increases in the model-based risk of IPM, while SPLC became less likely. Taken together, these results are in line with the widely accepted view that less differentiated tumors tend to present with more extensive disease (23), although our findings should be interpreted as statistical associations conditioned on the available variables rather than as evidence of temporal progression.
Laterality and tumor location also showed important associations with intrapulmonary metastatic patterns. In the Bayesian network and simulated intervention analyses, right-sided tumors were linked to a higher predicted probability of IPM. In the subgroup analysis restricted to patients with IPM, lesions in the lower and middle lobes had higher odds of involving different lobes or both the same and different lobes compared with upper lobe lesions. Previous studies have either not examined laterality in detail or have reported no significant association between laterality and an IPM (24) or prognosis (25-27). Interestingly, one retrospective-prospective study had found that among patients with more than five GGNs, most of their nodules occur unilaterally, mainly in the right lung and upper lobe (28). This divergence could be due to the large, diverse sample size used in this study, which may have captured subtle associations that previous research could not detect. Differences in study methodologies, such as sample sizes and statistical modeling techniques, may also account for these discrepancies.
In contrast, sociodemographic variables, such as sex, race, marital status, income and rural-urban continuum were not consistently associated with intrapulmonary metastatic patterns once tumor characteristics were taken into account. The Bayesian network did connect these variables to one another in clinically plausible ways, for example, the strong links between these sociodemographic variables, but these chains did not extend strongly to lung cancer type in the final models. This pattern suggests that, in this SEER cohort, variation in IPM versus SPLC is more tightly linked to tumor biology and anatomical distribution than to measured social factors. However, residual confounding by unmeasured exposures and access to care cannot be excluded.
Several limitations should be acknowledged. First, the study relied on data from the SEER database, which may have introduced bias due to missing or incomplete data, particularly for variables, such as smoking history, genetic mutations, and treatment modalities. Additionally, in keeping with the limitations of cross-sectional SEER data without temporal information, we therefore interpret both models as descriptive summaries of association patterns rather than as validated causal structures. The simulated interventions should likewise be understood as contrasts within the fitted probabilistic model and not as predictions of the effects of real-world clinical interventions. Future work that incorporates imaging-based measures, molecular profiles and longitudinal follow-up in prospective cohorts will be needed to clarify temporal sequences and to evaluate how these association patterns can best be integrated into diagnostic and treatment decision-making.
In this study, we applied Bayesian network modelling and SEM to describe how clinicopathological variables relate to the probability of IPM compared with SPLC. The analyses consistently indicated that tumor grade, laterality, and anatomical distribution showed the strongest associations with intrapulmonary metastatic spread, whereas sociodemographic characteristics had a much weaker and less consistent role after adjustment for tumor features.
Higher histological grade was associated with a higher probability of IPM rather than SPLC in both the Bayesian network and the SEM. The simulated intervention analysis further illustrated that, within the fitted joint distribution, moving from well-differentiated to moderately or poorly differentiated grade was accompanied by substantial increases in the model-based risk of IPM, while SPLC became less likely. Taken together, these results are in line with the widely accepted view that less differentiated tumors tend to present with more extensive disease (23), although our findings should be interpreted as statistical associations conditioned on the available variables rather than as evidence of temporal progression.
Laterality and tumor location also showed important associations with intrapulmonary metastatic patterns. In the Bayesian network and simulated intervention analyses, right-sided tumors were linked to a higher predicted probability of IPM. In the subgroup analysis restricted to patients with IPM, lesions in the lower and middle lobes had higher odds of involving different lobes or both the same and different lobes compared with upper lobe lesions. Previous studies have either not examined laterality in detail or have reported no significant association between laterality and an IPM (24) or prognosis (25-27). Interestingly, one retrospective-prospective study had found that among patients with more than five GGNs, most of their nodules occur unilaterally, mainly in the right lung and upper lobe (28). This divergence could be due to the large, diverse sample size used in this study, which may have captured subtle associations that previous research could not detect. Differences in study methodologies, such as sample sizes and statistical modeling techniques, may also account for these discrepancies.
In contrast, sociodemographic variables, such as sex, race, marital status, income and rural-urban continuum were not consistently associated with intrapulmonary metastatic patterns once tumor characteristics were taken into account. The Bayesian network did connect these variables to one another in clinically plausible ways, for example, the strong links between these sociodemographic variables, but these chains did not extend strongly to lung cancer type in the final models. This pattern suggests that, in this SEER cohort, variation in IPM versus SPLC is more tightly linked to tumor biology and anatomical distribution than to measured social factors. However, residual confounding by unmeasured exposures and access to care cannot be excluded.
Several limitations should be acknowledged. First, the study relied on data from the SEER database, which may have introduced bias due to missing or incomplete data, particularly for variables, such as smoking history, genetic mutations, and treatment modalities. Additionally, in keeping with the limitations of cross-sectional SEER data without temporal information, we therefore interpret both models as descriptive summaries of association patterns rather than as validated causal structures. The simulated interventions should likewise be understood as contrasts within the fitted probabilistic model and not as predictions of the effects of real-world clinical interventions. Future work that incorporates imaging-based measures, molecular profiles and longitudinal follow-up in prospective cohorts will be needed to clarify temporal sequences and to evaluate how these association patterns can best be integrated into diagnostic and treatment decision-making.
Conclusions
Conclusions
In conclusion, poorer histological grade, right-sided tumors were consistently associated with a higher probability of IPM, while measured sociodemographic variables showed weaker and less consistent associations after adjustment for tumor features. These findings, interpreted as observational associations, suggest that detailed assessment of tumor differentiation and anatomical extent may assist in characterising complex intrapulmonary disease patterns.
In conclusion, poorer histological grade, right-sided tumors were consistently associated with a higher probability of IPM, while measured sociodemographic variables showed weaker and less consistent associations after adjustment for tumor features. These findings, interpreted as observational associations, suggest that detailed assessment of tumor differentiation and anatomical extent may assist in characterising complex intrapulmonary disease patterns.
Supplementary
Supplementary
The article’s supplementary files as
The article’s supplementary files as
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- Reforming the delivery of smoking cessation: a distributional cost-effectiveness analysis of providing smoking cessation as part of targeted lung cancer screening.
- A Phase II Study of Durvalumab, Doxorubicin, and Ifosfamide in Recurrent and/or Metastatic Pulmonary Sarcomatoid Carcinoma (KCSG LU-19-24).
- A herbal formulation inhibits growth and survival of lung cancer cells through DNA damage and apoptosis - in vitro and in vivo studies.
- Negative trial but positive lesson: reframing immunotherapy resistance from one-size-fits-all to precision strategies.
- Lung Cancer Screening in Adults: State-of-the-Art and Policy Mapping (2025).
- Retrospective dosimetric evaluation of the collapsed cone, AAA, and Acuros XB algorithms for lung cancer Halcyon VMAT plans.