본문으로 건너뛰기
← 뒤로

Translating tumor epigenetic subtyping into methylome network-based prognostic models in early-stage NSCLC: results from the prospective MOBIT study.

1/5 보강
Journal of translational medicine 📖 저널 OA 98.3% 2021: 1/1 OA 2022: 1/1 OA 2023: 4/4 OA 2024: 24/24 OA 2025: 173/173 OA 2026: 141/147 OA 2021~2026 2026 Vol.24(1)
Retraction 확인
출처

PICO 자동 추출 (휴리스틱, conf 2/4)

유사 논문
P · Population 대상 환자/모집단
환자: high risk of short time to recurrence (TTR) and overall survival (OS) post-surgery in an early-stage Non-Small Cell Lung Cancer (NSCLC)
I · Intervention 중재 / 시술
추출되지 않음
C · Comparison 대조 / 비교
추출되지 않음
O · Outcome 결과 / 결론
DNAm signatures carry prognostic value beyond established clinical variables, suggesting potential utility for decision support tool for post-surgical risk stratification and improvement of individualized therapy in early-stage NSCLC. [SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12967-026-07957-x.

Chwialkowska K, Niemira M, Zeller A, Ostrowska A, Michalska-Falkowska A, Reszec-Gielazyn J

📝 환자 설명용 한 줄

[BACKGROUND] DNA methylation (DNAm) is a key epigenetic feature contributing to tumorigenesis with coordinated methylation changes across the genome potentially reflecting tumor biology and disease ag

이 논문을 인용하기

↓ .bib ↓ .ris
APA Chwialkowska K, Niemira M, et al. (2026). Translating tumor epigenetic subtyping into methylome network-based prognostic models in early-stage NSCLC: results from the prospective MOBIT study.. Journal of translational medicine, 24(1). https://doi.org/10.1186/s12967-026-07957-x
MLA Chwialkowska K, et al.. "Translating tumor epigenetic subtyping into methylome network-based prognostic models in early-stage NSCLC: results from the prospective MOBIT study.." Journal of translational medicine, vol. 24, no. 1, 2026.
PMID 41821028 ↗

Abstract

[BACKGROUND] DNA methylation (DNAm) is a key epigenetic feature contributing to tumorigenesis with coordinated methylation changes across the genome potentially reflecting tumor biology and disease aggressiveness. There is a clinical need for identification of patients with high risk of short time to recurrence (TTR) and overall survival (OS) post-surgery in an early-stage Non-Small Cell Lung Cancer (NSCLC). The aim of this study was to translate tumor epigenetic subtyping and methylome network approach into DNAm-based prognostic models for post-surgical risk stratification in early-stage NSCLC.

[METHODS] We have investigated 252 genome-wide methylomes derived from Next Generation Sequencing (NGS) in the Polish MOBIT prospective study cohort of 126 NSCLC patients. Epigenetic subtyping was performed in adenocarcinoma (AC) and squamous cell carcinoma (SCC) tumors methylome profiles using a hierarchical clustering and Monte Carlo simulations. We have developed an approach of elastic net-based machine learning survival modelling informed by weighted correlation network analysis (WGCNA) of cancer methylomes. Obtained models were cross-validated and subjected to further survival and biomarker selection analyses. Epitypes were characterized by tumor immune microenvironment (TIME) using RNA sequencing (NGS) based deconvolution.

[RESULTS] Five epitypes of AC and SCC with different TIMEs were detected. In SCC, epitype 1a was associated with the significantly worse OS compared to epitype 3b ( = 0.0018). We developed a model for 5-year post-surgery OS in SCC (AUC = 0.762;  = 0.007) that included sex, TNM staging, and a single epitype 1a-related and network-based DNAm biomarker independently associated with survival. In AC, using network strategy we identified DNAm biomarker significantly associated with shorter TTR in a time-frame of 5 years post-surgery recurrence ( = 0.0011). A joint recurrence AC model combining single DNAm locus with SUVmax, reached an AUC of 0.8857 ( = 0.0035), compared to AUC = 0.71 for SUVmax only ( = 0.0111).

[CONCLUSIONS] Epigenetic subtyping and methylome network analysis can be translated into prognostic models in NSCLC. Developed risk stratification models incorporating single-locus DNAm biomarkers with promising performance for 5-year mortality prediction in SCC and 5-year recurrence prediction in AC. DNAm signatures carry prognostic value beyond established clinical variables, suggesting potential utility for decision support tool for post-surgical risk stratification and improvement of individualized therapy in early-stage NSCLC.

[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12967-026-07957-x.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

📖 전문 본문 읽기 PMC JATS · ~163 KB · 영문

Background

Background
The leading cause of cancer-related death worldwide is lung cancer. Based on the population data collected by central cancer registries, it is estimated that in the USA, in 2023, 350 people died each day from lung cancer (LC) [1]. Non-small cell lung cancer (NSCLC) is the most common subtype of lung cancer, and it can be further subdivided into adenocarcinoma (AC) and squamous cell carcinoma (SCC) for the majority of cases.
The 5-year overall survival rate for lung cancer patients is comparatively lower than other common cancers such as colorectal, breast, and prostate, even considering early-stage disease. Surgery is a foundation for managing early-stage LC as the most potentially curative treatment [2, 3]. However, post-operative recurrence is common, with rates ranging from 30% to 77% [4]. These data point towards a great need for the utilization of accurate prognostic markers after surgery in the clinical management of NSCLC patients that would facilitate the choice of optimal post-operative treatment and diagnostic procedures.
DNA methylation (DNAm) aberration has been associated with the oncogenic process, and extensive changes in methylation patterns have been reported in many tumor types [5–8]. Tumors can be distinguished based on the epigenetic profile (epitype – epigenetic subtype). Epitype stratification can be used as a predictor of the time to first treatment, time to progression after treatment, and overall survival [9]. It has recently been increasingly discussed that the DNAm profile may become a key component in prognostic biomarker models and other molecular and clinical features. There is, however, still limited information on the specific associations of DNAm prognostic biomarkers in AC and SCC [10].
Thus, the primary objective of the presented study was to translate tumor epigenetic subtyping and methylome network analysis into prognostic models for early-stage NSCLC, incorporating DNAm biomarkers for post-surgical risk estimation. This required stratification of adenocarcinoma (AC) and squamous cell carcinoma (SCC) tumors based on methylome profiles (epitypes), followed by network-based prioritization of single-locus prognostic biomarkers. The prespecified hypotheses tested if DNA methylation levels at the specific loci in the NSLC tumors at the time of surgery are associated with five-year mortality or recurrence.

Materials and methods

Materials and methods

Study design and participants
The presented study involved clinical data and genome-wide analysis of DNAm from 252 samples (tumor and non-cancerous adjacent lung tissue) from 126 NSCLC patients from the “Molecular Biomarkers for Individualized Therapy (MOBIT)” project [11] that underwent lung tumor resection surgery in two separate clinical hospitals in Poland from 2015 to 2018. None of the patients were treated with radio- or chemotherapy prior to surgery (exclusion criteria). Pathological samples were reviewed independently by two pathomorphologists (medical doctors) to confirm the diagnosis of AC, SCC (inclusion criteria), or non-malignant lung tissue and evaluated for the completely resected tumor (free resection margins). Biospecimens were subjected to frozen section histology by a pathologist for precise tissue sectioning. Samples from the tumor center with ≥50% tumor cell content, according to the threshold established by pathology team based on the clinical practice and biobank standard operating procedures, and always with paired non-cancerous adjacent tissue samples from the same patient were the inclusion criteria for the sample’s further molecular evaluation. Samples were flash frozen before nucleic acids isolation. The clinical dataset covered information about patient demographics, tumor Maximum Standardized Uptake Value (SUVmax) evaluated before surgery, TNM (Tumor Node Metastasis) staging data (8th edition) [12], lymph node spread, and metastasis. Obesity was defined as a BMI ≥ 30 kg/m2. Pack-years were calculated as one pack year being equal to smoking one pack per day for one year per one patient. Follow-up data involved analysis of recurrence, metastasis, and death. For end-point analyses, the follow-up period was censored at five years from surgery, whereas in the survival analysis, the whole follow-up period was evaluated.
Patients signed written informed consent forms after receiving detailed information on the study and associated risks. The study was conducted in accordance with the Declaration of Helsinki and was approved by the local ethics committee of the Medical University of Bialystok, Poland (approval number: R-I-002/36/2014). AC and SCC were analyzed separately as these histotypes arise from distinct cellular origins within the respiratory epithelium (alveolar type II cells for AC versus bronchial basal cells for SCC) [13, 14], exhibit significantly different prognostic outcomes at equivalent disease stages [15], and display fundamentally different molecular landscapes including distinct mutational as well as DNA methylation profiles [16–19]. This design is consistent with major genomic consortia approach including The Cancer Genome Atlas [18, 19]. Thus, only samples classified as AC or SCC were used. In total, from the 130 patients, the dataset in the presented study included 126 patients: 54 AC and 72 SCC. The molecular profiling, including DNAm, was performed at the baseline, and the patients were further followed up for up to several years. The primary endpoint was five years mortality (5yM, death within five years from surgery), and the secondary endpoint was occurrence of recurrence within five years from surgery (5 yR). The study is adherent to the Reporting Recommendations for Tumor Marker Prognostic Studies (REMARK) guidelines [20].

DNA methylation analysis using next generation sequencing (NGS)
DNA was isolated from each patient’s bio-banked tumor and non-cancerous adjacent lung tissue. Reduced Representation Bisulfite Sequencing (RRBS) was selected as the methylation profiling method because it offers a cost-effective approach for genome-wide methylation analysis at single-base resolution compared to whole-genome bisulfite sequencing (WGBS), while maintaining significantly broader genomic coverage than array-based platforms. RRBS enables detailed methylome profiling covering millions of CpGs in the genome, compared to the typical hundreds of thousands analyzed using methylation microarrays, without being constrained to pre-defined probe sets. RRBS libraries were sequenced on the Illumina HiSeq 4000 platform. Obtained raw sequencing methylome data were quality checked and processed using BBduk2 (U.S. Department of Energy Joint Genome Institute, Walnut Creek, USA) and mapped to the human reference genome GRCh37 (Ensembl 75, soft masked) using Bismark in paired-end mode followed by methylation calling per single CpG (chromosomal position, where cytosine is followed by guanine residue) in the genome [14]. CpG filtering steps included: removal of cytosines located on sex chromosomes, overlapping with known SNPs, missing in more than 20% of patients, less than 10 coverage per site, or more than 10 times the 95-percentile of coverage.

Statistical analysis
All statistical tests were conducted under α = 0.01 or 0.05 (as specified per each of the test performed), two-tailed and multiple comparison correction was applied with the Bonferroni or Benjamini-Hochberg FDR (false discovery rate) method.

Power analysis
Power analyses were performed in STATA based on reference values from International Association for the Study of Lung Cancer (IASLC) data sets [21] regarding information about 5-year overall survival and hazard ratios of different TNM stages (Supplementary Figure S1). If the smallest subgroup contains at least 10 patients a power of 80% is reached for: a proportion test in the comparison of death within 5 years and a log-rank test of survivor functions in the comparison of 5-year reference values for TNM stage IA1 vs IIIB, IIIC, IV and a Cox PH regression with hazard ratio > 3.0.

Methylome data analysis
To reduce the number of statistical tests and to identify broader genomic regions in which DNAm differs between non-cancerous lung tissue and NSCLC tumors, a paired statistical analysis utilizing a circular regularization algorithm for dynamic selection of loci and their borders with the Metilene tool was used [22]. Genome segmentation into regions of variable length enriched for CpGs with similar methylation profiles was performed under default parameters, with the additional requirement of at least five CpGs per region. Further statistical analyses on the methylome data were performed in the R statistical environment (R 4.3.2), and clinical statistical analyses were performed in STATA (18.0). Each statistical test was conducted under α = 0.05, two-tailed and multiple comparison correction was applied with the Bonferroni or Benjamini-Hochberg FDR (false discovery rate) method. For accurate tumor clinical stratification, loci were selected taking into account, that a minimum of five patients differentiated in tumor vs. non-tumor by more than 0.1 β-value (10% difference) the high inter-tumor variance of ≥0.1 SD (standard deviation of β-value, corresponding to substantial methylation differences at 10% ranges), and stable methylation profile among non-tumor tissues defined as low inter-individual variance based on the ≤0.05 SD of β-value (indicating minimal inter-individual baseline variation according to methylation percentage range, and lower that tumor variability), based on standard thresholds in methylome studies [23–26]. The regulatory context of analyzed genomic loci was evaluated using annotations in ENCODE SCREEN [27], ENSEMBL [28] and GTEx databases [29]. Functional enrichments in cancer-related pathways were conducted with Enrichr [30].

Clustering-informed network analyses
Hierarchical clustering was performed using Euclidean distance metrics and the Ward D2 agglomeration method to identify distinct DNAm epitypes in each set of the AC and SCC. Clustering statistical significance for branch cutting was evaluated at each node along the dendrogram starting from the root using a Gaussian null hypothesis test and Monte Carlo simulation-based significance testing (5000 simulations per node, minimum five observations, α = 0.05) with family-wise error rate (FWER) controlling procedure. Epitypes were labeled sequentially from left to right on each dendrogram, with letter subdivisions (a, b) denoting statistically significant splits identified through Monte Carlo simulation testing at individual nodes. DNAm profiles were subjected to Weighted Gene Co-expression Network Analysis – WGCNA [31] to identify co-regulated regions within modules. This allows for data dimensionality reduction and minimizing the number of statistical tests performed in the subsequent analyses. WGCNA signed networks with soft thresholding power of 12 were constructed, based on WGCNA authors recommendation with regard to signed networks and sample sizes larger than 40. Network robustness and module preservation was assessed by two approaches: a random 50% split of datasets into test and reference for preservation metrics evaluation, and by assessing the quality and robustness on the whole datasets. Module preservation was assessed using Zsummary statistics (split analysis) and robustness using Zsummary.qual (whole set), with metrics defined by WGCNA authors (Z ≥ 2) [32]. Hub loci were selected based on the connectivity and gene/loci-significance metrics. Module eigenvalues were subjected to module-wise correlation analyses. Moreover, the automaticNetworkScreening command from the WGCNA package was used to identify loci correlated to the specific clinical variables. Associations significant under α = 0.05 after multiple testing correction were further selected for modelling.

Tumor microenvironment deconvolution
Infiltrating cell-type fractions in the tumor microenvironment (TIME - tumor immune microenvironment) were estimated using xCell [33] and EPIC (Estimating the Proportions of Immune and Cancer cells) [34] models based on matched RNA-Seq data from the same patient samples published before [35]. Relationships among features were evaluated through Spearman’s correlation coefficient or point biserial correlation when one of them was dichotomous (α = 0.05 after FDR correction – where corrected P-values are denoted as q-values).

Statistical modelling of survival outcomes
Clinical values were compared using appropriate statistical tests: Student’s t-test after homoscedasticity evaluation, Wilcoxon rank-sum test, χ2 or Fisher’s exact test where appropriate, univariable, and multivariable logistic regression after exclusion of the variables multicollinearity, Box-Tidwell test for linearity between the predictors and the logit, and observations independence. Five-year mortality and 5-year recurrence models were developed with logistic regression with a binary outcome of death or recurrence occurrence accordingly. The covariates suggestively associated (p < 0.25) on univariate analysis were considered for inclusion in a multivariate logistic regression model developed using manual forward stepwise selection and compared to an elastic net for prediction and model selection. Given the moderate sample sizes and sparse events in some subgroups, sensitivity analyses were performed using Firth’s penalized logistic regression to assess model stability and potential separation issues. Firth’s correction applies a bias-reduction method that produces more reliable estimates in small samples with wide confidence intervals. Both standard maximum likelihood and Firth-corrected models were compared to evaluate the robustness of findings. Receiver operating characteristic (ROC) analysis was done to assess the accuracy of model classification predictions considering true and false positive rates. Regarding the relatively small datasets, model validation was performed internally with leave-one-out cross-validation (LOOCV). However, the LOOCV approach has some limitations, as it may overestimate model performance with high variance in small samples, potentially overfitting to individual observations. To better assess model stability and quantify uncertainty, we additionally computed 95% confidence intervals for AUC using bootstrap resampling with 1000 iterations. Survival data was analyzed using Kaplan-Meier survival curves, Log-Rank test, and univariable and multivariable Cox proportional-hazards (PH) regression models with the Breslow method for handling tied failures. Proportional-hazards assumptions were evaluated based on Schoenfeld residuals. Analyses covering results validation or conclusions evaluation with external cohorts methylation data were performed using The Cancer Genome Analyses (TCGA [18, 36]).

Results

Results

Study sample
The baseline characteristics of the participants at the baseline (surgery) and during follow-up stratified by histopathological subtype and overall survival are presented in Table 1 or by recurrence during follow-up in Table 2. In total, 126 NSCLC patients were included in the study: 72 were diagnosed with squamous cell carcinoma (SCC) and 54 adenocarcinoma (AC). The average follow-up period in both histological subtypes was 42.9 months. In the SCC group, resected tumors were classified in the stages of I-III, where 47.2% were of stage II, and none had advanced NSCLC of stage IV. The mean SUVmax was 10.35. Almost all patients were smokers. During the course of the study, 21 SCC patients (29.2%) had recurrence, and 29 (40.3%) died within five years from the surgery. In the AC group, resected tumors were classified in the stages of I-III, where 57.4% were of stage I, and none had advanced NSCLC of stage IV. The mean SUVmax was 6.15. The majority (87%) of patients were smokers. During the course of the study, 11 AC patients (20.4%) had recurrence, and 23 (42.6%) died within five years from the surgery. In both AC and SCC, the groups stratified by death or recurrence within five years did not differ in the majority of clinicopathological parameters, except for the sex ratio and TNM stage in the SCC who died, and the age of AC patients who exhibited recurrence. The main differences involved the percentages of patients who exhibited metastasis, recurrence or died during follow-up.

DNA-methylation based epigenetic tumor subtyping
RRBS-based methylome profiling allowed for an analysis of 7.95 million raw CpG positions (±3.13 million SD) per each of the 252 sampleswith an average coverage of 20.85× (SD 6.85). After applying strict quality filters, the final dataset included 1,608,143 reliably covered CpGs, defined as sites with coverage above 10× and present in more than 80% of samples. After strict quality CpG filtering and circular regularization of broader regional CpG clustering to genomic loci followed by low inter-individual and high inter-tumor variance filtering, the final methylome dataset for SCC covered 21,546 regions and 19,200 for AC. Using hierarchical clustering for both AC and SCC, we have detected three main branches that could be further divided into five distinct subclusters supported by statistical analysis with Monte-Carlo simulations (Fig. 1). In the SCC methylomes, there are epitypes: 1a, 1b, 2, 3a, and 3b (Fig. 1 - A), whereas in the SC methylomes, there are epitypes: 1, 2a, 2b, 3a, and 3b (Fig. 1 – B). Statistical analysis of the distribution of five epigenetic subtypes stratified by death or recurrence within five years from surgery using Fisher’s exact test revealed significant differences in the epitypes of SCC patients that died during follow-up in comparison to those who are still alive at the end of the study (Table 3; p=0.038). Specifically, there were more SCC patients with epitype 1a (p=0.014) who died and more with epitype 3b who were alive (p=0.042). The epitypes distribution analysis stratified by recurrence revealed no differences in SCC or AC patients. Moreover, stratification by death did not show any differences in epitypes enrichment in AC in contrast to SCC (Tables 3 and 4).

Epigenetic subtyping prognostic evaluation in SCC
Analysis of the relationship of the clinicopathological end epitype parameters with the outcome of death within five years from surgery (5yM) using univariable logistic regression (Table 5) revealed a significance of female sex with OR = 0.36 (95% CI: 0.13–0.99867; p=0.05); TNM staging where TNM stage III had OR = 5.0 (95% CI: 1.19–20.9; p=0.028); epigenetic subtyping where epitype 1a had OR = 5.13 (95% CI: 1.42–18.51; p=0.012). Sensitivity analyses using Firth’s penalized logistic regression confirmed the stability of these findings despite wide confidence intervals in some estimates (Supplementary Tables S1–S2). The simplified model retaining only epitype 1a (without epitype 3b) showed consistent results across both standard (OR = 4.98, 95% CI: 1.14–21.57, p = 0.032) and Firth-corrected estimation (OR = 4.19, 95% CI: 1.07–16.42, p = 0.039), supporting epitype 1a as a robust predictor of 5-year mortality. For multivariable logistic regression modeling, an elastic net regression with cross-validation was performed to select variables for a final model that are predictive of 5-year death (5yM), considering sex, TNM staging with all three stages, and epigenetic subtyping with all five epitypes. By cross-validation, in the elastic net analysis, α = 0.500 and λ=0.0858 were selected with variables: female sex, TNM staging with all three stages, epitype 1a and epitype 3b. Picked variables were fed into the multivariable logistic regression model to analyze the logit of death within five years from surgery and different models were compared and it has been found that adding the information about epigenetic subtypes 1a and 3b improves the predictive model (AUROC = 0.7683, bootstrap 95% CI: 0.6572–0.8794; standard error (SE) = 0.0575; Table 5, Figure 2). Then, to estimate models performance and evaluate the possibility of models overfitting, an internal validation of them was performed using Leave One Out Cross-Validation (LOOCV). Considering MAE and pseudo-R2, the best performing model was again the full model (MAE = 0.413; pseudo-R2 = 0.113; Table 6).
As a separate analysis, it was evaluated whether we can build a prognostic model for recurrence within five years (5 yR) from surgery in SCC patients. However, no variables exhibited statistical significance in univariable logistic regression (Table 7). Moreover, elastic net regression with cross-validation also did not select any variables for the predictive model.

Development of epigenetic markers for 5-year mortality prognosis in SCC
In the practical clinical setting, performing genome-wide RRBS-based epigenetic subtyping of resected tumors would be inefficient and too laborious. Thus, we aimed to identify any single-locus epigenetic markers that could be further implemented into the easy diagnostic tests, such as MS-qPCR (Methylation Sensitive quantitative Polymerase Chain Reaction). Markers were selected using two methods based on the WGCNA network analysis (Supplementary File S1). SCC co-methylation network performed high modules preservation in split assessment (0.89 of Z ≥ 2) and robustness in terms of modules quality on whole dataset (0.94 of Z ≥ 2). The first marker was chosen as being a high-connectivity hub locus in a WGCNA blue module significantly correlated with epitype 1a (ρ=0.6856, P-value after Bonferroni correction < 0.0001), and it was chr12:79438980-79439273 (GRCh37), that is located in the intron 2 of SYT1 gene (Fig. 3 - B). Then, a WGCNA internal function for network screening was applied to select loci directly associated with epitype 1a, which was chr1:184970798-184971004 (GRCh37) (correlation weighted = 0.7814, weighted q-value = 1.8 × 10−12), that is located in the intron 2 of the lncRNA gene LINC01633 (Fig. 3 – D). For epitype 3b network screening, picked locus chr6:84417712-84417918 (GRCh37) (correlation weighted = 0.906, weighted q-value < 1.0 × 10−14), located in the intron 1 of SNAP91 gene (Fig. 3 – F). Selected markers were implemented in the logistic regression models, and ROC comparison was performed (Fig. 3 – A, C, E). Analysis revealed that for epitype 1a, the best marker is from WGCNA network screening (chr1:184970798-184971004 - LINC01633) – AUROC = 0.7620; bootstrap 95% CI: 0.6430–0.88210; SE = 0.0615; p=0.007; goodness-of-fit test showing a good fit of the model (p > χ2 = 0.34). This marker showed slightly better performance than the hub marker across multiple evaluation metrics, including higher AUROC (0.7620 vs 0.7393), improved sensitivity (65.5% vs 55.2%), better overall classification accuracy (71.4% vs 68.6%), and lower prediction error in internal LOOCV validation (MAE = 0.422 vs 0.439; RMSE = 0.475 vs 0.484; Pseudo-R2 = 0.089 vs 0.058; Table 8). Adding the marker for epitype 3b did not provide much improvement in the, although AUROC was marginally better with value of 0.7653 (bootstrap 95% CI: 0.6493–0.8814; SE = 0.06; p = 0.0137), with similar classification accuracy (71.4%) and sensitivity (65.5%), but slightly higher prediction errors in internal validation (MAE = 0.427 vs 0.422; RMSE = 0.480 vs 0.475) and lower model fit (Pseudo-R2 = 0.077 vs 0.089; Table 8). There was no evidence for confounding or effect modification under interactions among independent variables. Logistic regression for 5yM model revealed that the WGCNA network screening marker for epitype 1a is an independent risk factor in the final model with p=0.019 and OR = 10.70 (95% CI: 1.47–77.64) (Fig. 4A). Fitted probabilities of death within 5 years post-surgery (5yM) ranged from 7.9% for female with stage I SCC and 9% methylation level of epitype 1a methylation marker up to 82.2% for male with stage III SCC and 96.9% methylation level of epitype 1a methylation marker (Fig. 4B). We have also externally evaluated TCGA-based survival results with relation to 450k arrays methylation probes located within 1000 bp around selected methylation markers. For epitype 1a, data for SYT1 first exon methylation marker (probe cg12430457 located within 274 bp) show its high methylation level was associated with lower OS with HR = 1.56 (log rank p = 0.0019) with sex and stage as model covariates and HR = 1.485 (log rank p = 0.019) for not adjusted model (Supplementary Figure S2). For epitype 3b, the SNAP91 first intron marker (probe cg07335294, 15 bp away) demonstrated that high methylation was associated with better OS: HR = 0.856 (log rank p = 0.021) adjusted for sex and stage, and HR = 0.848 (log rank p = 0.38) unadjusted (Supplementary Figure S3). The direction of associations is in agreement with the results obtained in our MOBIT cohort. However, the differences in the intersection of RRBS-targeted genome and array technology used in TCGA as well as probes QC fall out in the final TCGA data, do not allow for any further comparative validations of other methylation markers.

Overall survival analysis in SCC
Data covering time to death from the surgery were used to construct Kaplan-Meier (K-M) survival curves that were further statistically evaluated with log-rank tests and univariable and multivariable Cox proportional hazards (Cox PH) regression (Fig. 5 – A-F). The analysis of K-M curves stratified by all five epigenetic subtypes suggested a significant difference in survival curves under p=0.0492 in Cox PH and slightly above the threshold in the log-rank test (p=0.0559). Different stratifications for epitype 1a and adjustments for TNM were evaluated, all showing statistically significant differences among K-M curves (Fig. 5 – B-E). The final model with stratification by single-loci marker - WGCNA hub marker for epitype 1a (having better statistical parameters from network screening marker in the survival analysis) - and adjustment TNM was significant in the log-rank test with p=0.026 and hazard ratio of 2.13 (95% CI: 1.00–4.53) under p=0.05 showing that the marker is independent risk factor for overall survival time in SCC patients (Fig. 5 – F).

Epigenetic subtyping prognostic evaluation in AC
In the AC patients, the analysis of the relationship of the clinicopathological end epitype parameters with the outcome of death within five years (5yM) from surgery using univariable logistic regression did not reveal any variables of predictive significance (Table 9).
Analysis of the relationship of the clinicopathological end epitype parameters with the outcome of recurrence within five years (5 yR) from surgery in the AC using univariable logistic regression (Table 10) revealed a significance only of SUVmax with OR = 1.37 (95% CI: 1.03–1.82; p=0.029). For multivariable logistic regression modeling, an elastic net regression with cross-validation was performed to evaluate if any 5 yR model can be constructed considering SUVmax and all five epitypes. By cross-validation, in the elastic net analysis, α = 0.500 and λ=0.047 were selected with SUVmax and epitype 2b variables. In multivariable logistic regression model epitype 2b was associated with OR = 0.09 (0.004–2.43, p = 0.154), while Firth’s method correction gave OR = 0.18, 95% CI: 0.01–2.17, p = 0.177 (Supplementary Table S3). The wide intervals reflect the limited number of modelled event and supported the decision to focus on continuous single-locus markers for AC recurrence prediction. In addition, WGCNA network was constructed, and it performed high modules preservation in split assessment (0.85 of Z ≥ 2) and robustness in terms of modules quality on whole dataset (0.91 of Z ≥ 2). By applying co-methylation network-wise screening approach, a marker for recurrence (chr1:15945890-15946234 in GRCh37, correlation weighted = −0.79, weighted q-value < 2.39 × 10−8, located in the intron 1 of DDI2 gene) in AC was identified separately along with SUVmax (elastic net regression with selection of α = 0.500 and λ=0.090). Picked variables were fed into logistic regression models to analyze the logit of recurrence within five years from surgery (5 yR) (Tables 11 and 12; Fig. 6 - A). There was no evidence for effect modification under interactions among independent variables, SUVmax confounded WGCNA network screening marker for recurrence. Selected 5 yR models were internally validated with LOOCV. The best-performing model contained SUVmax + WGCNA network screening marker for recurrence as independent variables having: AUROC = 0.8857 (bootstrap 95% CI: 0.6732–1.0), SE = 0.0905, MAE = 0.211, good fit of the model (goodness-of-fit test p > χ2 = 0.27), and allowing for correct classification of 92.31%, which was an improvement of only SUVmax-based model for recurrence (AUROC = 0.7143; bootstrap 95% CI: 0.3601–1.0). Decision curve analysis demonstrated that the model incorporating SUVmax and DNAm marker provided higher net benefit than the SUVmax-only model across clinically relevant threshold probabilities (Supplementary Figure S4).

Time to recurrence analysis in AC
Data covering time to recurrence from the surgery were used to construct K-M survival curves that were further statistically evaluated with log-rank tests and univariable and multivariable Cox PH regression (Fig. 6 – B). The analysis of K-M curves stratified by the WGCNA network screening marker for recurrence showed a significant difference in survival curves under p-value < 0.0001 in the log-rank test and HR = 14.28 (95% CI: 3.17–64.33) with p = 0.0011 in Cox PH univariable model, suggesting that the marker can be evaluated as a risk factor for short time to recurrence (TTR) in AC patients. However, wide confidence intervals reflecting sample size should be considered in effect size assesment.

Genomic annotation of prognostic DNA methylation markers
To characterize the regulatory context of identified prognostic markers, we performed genomic annotation using regulatory databases (Supplementary Table S4). All four markers are located within intronic regions and overlap with candidate cis-regulatory elements (cCREs). The SCC WGCNA blue module hub marker for epitype 1a (SYT1 intron 2) and SCC WGCNA network screening marker for epitype 3b (SNAP91 intron 1) overlap with proximal enhancers (EH38E3029022 and EH38E3718863 respectively), while the SCC WGCNA network screening marker for epitype 1a (LINC01633 intron 2) overlaps with a distal enhancer (EH38E3985974). The AC WGCNA network screening marker for recurrence (DDI2 intron 1) maps to a region classified as transcription factor-associated (EH38E3955575). Transcription factor binding analysis revealed enrichment for cancer-relevant regulatory networks, including SMAD2/3 signaling, FOXA1 network, retinoblastoma protein regulation, NFAT-dependent transcription in lymphocytes, and Notch-mediated signaling. An eQTL colocalizing with the LINC01633 marker (network screening for SCC epitype 1a) was associated with NIBAN1 expression in GTEx (p = 9.78 × 10− 2 4).

Immune cell infiltration evaluation in AC and SCC epigenetic subtypes
The analysis of tumor immune microenvironment (TIME) heterogeneity among identified epigenetic subtypes showed significant differences (Fig. 7, Supplementary File S2). In SCC, an epitype 1a was characterized by high MicroenvironmentScore (rpb = 0.53, q-value = 4.11 × 10−6), StromaScore rpb = 0.49, q-value = 2.1 × 10−5), ImmuneScore (rpb = 0.39, q-value = 8.7 × 10−4), and an enrichment of for example: CD8+ T−cells (rpb = 0.39, q-value = 1.1 × 10−3), eosinophils (rpb = 0.36, q-value = 3.0 × 10−3) and CD4+ naive T−cells (rpb = 0.28, q-value = 2.1 × 10−2), as well as depletion of for e.g.: MEP (megakaryocytic-erythroid progenitors, rpb = −0.46, q-value = 7.7 × 10−5) and Th2 cells (rpb = −0.41, q-value = 5.3 × 10−4). In contrast, an epitype 3b of SCC was characterized by an enrichment of MEP (megakaryocytic-erythroid progenitors, rpb = 0.33, q-value = 6.4 × 10−3) and Th2 cells (rpb = 0.26, q-value = 3.3 × 10−2), and a depletion of MSC (mesenchymal stem cells, rpb = −0.27, q-value = 2.3 × 10−2) and CAF (cancer-associated fibroblast, rpb = −0.26, q-value = 3.0 × 10−2).
The epitype 2b of AC was characterized by high ImmuneScore (rpb = 0.29, q-value = 4.5 × 10−5), and an enrichment of: class−switched memory B−cells (rpb = 0.45, q-value = 1.9 × 10−2), CD8+ T−cells (rpb = 0.43, q-value = 2.1 × 10−3), CD8+ Tcm (rpb = 0.42, q-value = 2.5 × 10−3), Tregs (rpb = 0.41, q-value = 3.0 × 10−3) and CD4+ naive T−cells (rpb = 0.38, q-value = 6.8 × 10−3), as well as depletion of neutrophils (rpb = −0.30, q-value = 3.9 × 10−2).

Discussion

Discussion

Principal findings
The primary objective of this study was to translate tumor epigenetic profiling into clinically applicable prognostic tools for early-stage NSCLC. Toward this goal, we have shown that DNAm markers are of significant prognostic value in the mortality and recurrence risk estimation models in NSCLC. Epigenetic subtyping and network-based biomarker selection served as methodological steps enabling the identification of single-locus signatures with potential for routine clinical implementation. Based on genome-wide DNAm profiling of NSCLC tumors, we have detected five epigenetic subtypes of AC and SCC separately. Identified epitypes were clinically and immunologically distinct. Adding the information from the clinicopathological analysis of data collected at the time of surgery and during five-year follow-up, we have identified that in SCC epitype 1a is associated with the worst prognosis in terms of overall survival (OS), while epitype 3b with the best prognosis of OS. We have developed a prognostic model for death within five years (5yM) from surgery in SCC that included sex, TNM staging, and a single DNAm-based marker for epitype 1a. In AC, epitype 2b is associated with the best prognosis in terms of time to recurrence (TTR). We have developed a prognostic model for recurrence within five years (5 yR) from surgery in AC that included sex, SUVmax, and a single DNAm-based methylation marker for recurrence.

In the context of the current literature
The technological advancement in the practical application of next-generation sequencing methods for global profiling of DNA mutations, cytosine methylation, and gene expression has expanded the clinical opportunities for a comprehensive molecular landscape evaluation in cancer and the search for novel clinically effective biomarkers with much better predictive values [37]. We have identified five epitypes (epigenetic subtypes) within the two main NSCLC histological subtypes: AC and SCC. Survival curve analysis using our patients’ data with TNM-based stratification recapitulated the OS data of different TNM stages in NSCLC by the International Association for the Study of Lung Cancer (IASLC, 8th edition) and its independent validations [21]. The hazard ratio (HR) for TNM of stage III versus stage I was 4.12 (95% CI: 1.4–12.2, p=0.01), while the comparison of epitype 1a versus 3b resulted in HR = 11.65 (95% CI: 1.48–91.77, p=0.0018). Adjusting the model with TNM still yielded HR = 9.77 (95% CI: 1.2–79.60, p=0.033) for epitype 1a vs 3b. These findings, suggest that DNAm profiling may be an independent predictor of OS in SCC, however wide confidence intervals reflecting sample size should be taken into account in effect size evaluation.
This stratification of NSCLC tumors based on DNAm aligns with the literature data for lung and other solid cancers [38–43]. Efficient categorization of different subtypes of lung cancer is crucial for making treatment decisions, with the TNM classification playing a significant role in guiding therapy and predicting prognosis [44]. Our results lead to developing a predictor for 5-year post-surgery mortality (5yM) in SCC, including one DNAm marker, sex, and TNM stage with an AUC of 0.762 (p=0.0070) that showed the best performance. In the published analyses there are few models utilizing DNA methylation biomarker approach, mainly based on TCGA dataset. One of them is DNAm signature model (TCGA dataset based) with 11 markers as prognostic predictor of patients’ survival, which had AUC for 5-year survival of 0.737, but alone performed worse than nomogram combining clinical data such as histologic grade, tumor stage, lymph node stage, metastasis stage, and tobacco smoking with AUC for 5-year survival of 0.811 [45]. Our model lies in between, while contains both basic clinical parameters and only one methylation biomarker, allowing for straightforward laboratory test implementation. Prognostic model, also developed using TCGA patient cohort, based on expression of the N6-methyladenine RNA methyltransferase METTL3 in SCC used to stratify patients into high- and low-risk groups for overall survival with a general AUC of 0.706 [46]. TCGA-based prognostic methylation risk model based on four genes (FGA, GPR39, RRAD, TINAGL1) associated with lymph node metastasis predicted 5 year survival in SCC with AUC of 0.68 in the independent test dataset [47]. Another study, TCGA independent, explored the impact of intratumor transcriptomic heterogeneity on prognosis in localized NSCLC using multiregional RNA-seq profiling, developing a disease-free survival (DFS) prognostic model, which achieved a maximum C-index of 0.693 and included data from 26 genes [48].
Research indicates that the maximum standardized uptake value (SUVmax) determined by FDG-PET is independently associated with the risk of recurrence after resection of early-stage lung cancer [49]. SUVmax might be used in stratifying early-staged lung cancer patients for adjuvant therapy and optimized surveillance frequency [49]. It has been independently shown in several studies that SUVmax of the primary tumor is a significant prognostic marker for AC but not SCC [50, 51]. SUVmax has been demonstrated as an important predictor of disease-free survival in AC [51]. In our study, we have also detected only the significance of SUVmax in terms of recurrence in AC, but not recurrence nor overall survival in SCC. For AC, we have established a model for 5-year post-surgery recurrence (5 yR), which incorporated one DNAm marker and SUVmax with an AUC of 0.8857 (p=0.0035), in comparison to just SUVmax-based model with AUC = 0.71 (p=0.0111). The observed increase in net benefit from decision curve analysis suggests that inclusion of the DNAm marker may reduce unnecessary interventions while maintaining detection of true recurrence events. In the multivariable models, neither SUVmax nor DNAm markers were independently associated with recurrence outcomes within five years of follow-up. Identified DNAm biomarkers allowed for AC patient stratification in terms of time to recurrence (TTR) (p = 0.0011). Based on stage I AC patients from TCGA a 13-DNA methylation signature for predicting recurrence free survival (RFS) was developed and performed with AUC of 0.909 [52]. Also in early-stage AC, an immune-based recurrence signature developed in the TCGA cohort demonstrated predictive performance for 5 year recurrence risk, with AUC of 0.789 [53]. A tumor microenvironment–related three-gene signature (ADAM12, BTK, and ERG) stratified patients into low- and high-risk groups achieving AUC values ranging from 0.569 to 0.650 in the validation datasets and an AUC of 0.738 in the TCGA training cohort [54]. Sixteen CpG methylation classifier of overall survival was developed with a predictive value with AUC around 0.69 for 5-year survival in AC TCGA dataset [55]. Their nomogram to predict patients’ survival probability five years, which consisted of age, sex, EGFR status, TNM staging, and 16-CpG-based model, has shown similar performance with an AUC of 0.7. Another methylomics associated nomogram, with 16 loci, for predicting 5 year survival of stage I–II AC in TCGA exhibited AUC of 0.831 in the training set and 0.904 in the validation set [56]. A multicenter study using TCGA and several clinical institutions’ data for AC and SCC combined developed a prognostic scoring method using DNAm and gene expression data, including main effects and G×G interactions [57]. The trans-omics model integrating methylome and transcriptome data was superior to other published models for 5-year overall survival in early-stage NSCLC with AUC = 0.89. However, it requires information on 54 biomarkers, significantly restricting its practical clinical usage in standard patient management.
Compared with previously published models described above, our 5-year mortality model for early-stage SCC and 5-year recurrence model for early-stage AC demonstrate comparable discriminatory performance to complex models, warranting possibility for further external validation and evaluation in clinical laboratory setting. Noteworthy, most of the DNA-methylation-based predictor models exploit using the same datasets from TCGA and utilizing only information from methylation arrays. The presented study analyzes a real-life independent cohort with extended genome-wide profiled DNAm using next-generation sequencing. We are proposing models based just on one CpG analysis, which could be easily implemented with commonly used qPCR techniques in the post-COVID era.

Potential mechanisms
DNA-methylation markers selected in the present study as predictive for survival or recurrence are located in functional genic loci. Although they cannot be assumed as lung cancer oncogenic or mechanistically related to SCC or AC progression, based on the present evidence, their molecular role might be generally discussed in terms of potential contribution to some oncogenic pathways. Evaluating their direct location within gene coding regions we can firstly discuss the potential mechanisms related to directly associated genes. Hub module methylation marker for epitype 1a in SCC is localized in SYT1 intron, and SYT1 has been previously associated with overall survival in lung adenocarcinoma based in expressional networks TCGA data [58]. WGCNA network screening epitype 1a marker is located in the LINC01633 encoding the lncRNA gene, which was found to be differentially expressed in SCC [59]. LncRNAs, in general, may act by sponging miRNA and further regulate oncogenesis-related genes [60–62]. The epitype 3b network screening marker is located in the SNAP91 gene, which has been found as a regulator of metastasis in prostate cancer [63], and as a transcriptomic marker with diagnostic and prognostic value in glioblastoma [64] as well as lung cancer of SCC histotype [65]. WGCNA network screening marker for recurrence in AC is located within the DDI2 gene, which was shown to be responsible for the timely degradation of some ubiquitylated proteins, promoting tumor metastasis and resistance to pharmacologically-induced apoptosis in colorectal cancer [66] and influencing proteasome inhibition adaptation in multiple myeloma [67]. Secondly, further genomic annotation revealed that all four prognostic markers overlap with enhancer or transcription factor-associated regulatory elements, that may modulate gene expression through distal regulatory mechanisms rather than direct silencing of nearby genes. Transcription factor binding enrichment analysis of overlapping enhancers/TF features allowed for highlighting of significantly over-represented cancer-related pathways. The proximal enhancer colocalized with SCC hub marker for epitype 1a showed enrichment for SMAD2/3 signaling and FOXA1 transcription factor network, implicated in epithelial-mesenchymal transition (EMT), tumor immunological microenvironment and lung cancer progression [68–76]. The distal enhancer present in the locus with SCC network screening marker for epitype 1a exhibited for example enrichment for ATM pathway signaling with potential link to DNA damage related cell cycle regulation and EMT [77–79]. ATM gene was suggested to be a moderate-penetrance germline risk gene for lung cancer and it is also often somatically mutated in lung tumors, moreover it was shown to possess a potential as clinical biomarker [80, 81]. Additionally, this marker colocalized with an eQTL for NIBAN1 (FAM129A), a gene previously implicated in cell survival, tumor proliferation and invasion as well as stress response in cancer [82–84]. The SCC WGCNA network screening marker for epitype 3b overlapped with transcription factors enriched in regulation of retinoblastoma protein, a key tumor suppressor pathway regulating cell cycle [85]. Moreover, it has been shown that retinoblastoma protein may regulate the metabolic reprogramming of lung cancer, and RB1 mutant status is associated with worse outcomes in NSCLC [86, 87]. The Transcription Factor binding site colocalized with AC WGCNA network screening marker for recurrence was enriched for Notch-HES/HEY signaling related TFs, playing roles in EMT, cancer stem cell maintenance and therapy resistance [88–91]. Nonetheless, any potential causal involvement of these marker-related loci in lung cancer associated biological processes remains not established and requires further research to draw any lung cancer oncogenesis and progression relationships. Moreover, while we cannot exclude indirect regulatory relationships with known AC and SCC cancer drivers, the prognostic value of these epigenetic markers may reflect biological programs not captured by mutation-centric analyses. Future studies integrating mutation profiling with methylome data in early-stage NSCLC cohorts are warranted to elucidate potential interactions between genetic and epigenetic alterations.
Previous research indicates that the immune microenvironment of NSCLC can serve as a post-resection prognostic marker [92]. It has been proposed that immune cell infiltration levels and the interaction between immune cells and tumor varies between SCC and AD histological subtypes [65]. The impact of immune microenvironment on prognosis also differs [92]. SCC tumors might be more vulnerable to Natural Killer (NK) cell regulation [65]. In AD, a high immune score and a high level of adaptive immune system cells were correlated with improved progression-free survival [92]. In breast cancer, a molecular subtype characterized by better survival was associated with greater estimates of activated NK cells [93]. Our results for SCC show that epitype 1a, characterized by the worst OS prognosis, had high MicroenvironmentScore and ImmuneScore and was enriched with endothelial cells, fibroblasts, eosinophils, hematopoietic stem cells, CD8+ T−cells, CD4+ naive T−cells, but depleted of MEP, CLP, Th2, and Th1 cells. On the other hand, the SCC epitype 3b, characterized by the best OS prognosis exhibited an enrichment of MEP and Th2 cells, and a depletion of CAF and MSC. This shows the presence of antagonistic pathways related to different immune cell infiltration in SCC epitypes with contrasting OS prognoses. In AC, an epitype 2b, characterized by lower recurrence risk, was characterized by high ImmuneScore, and enriched with Tregs, class−switched memory B−cells, CD8+ Tcm, CD8+ T−cells, and CD4+ naive T−cells, as well as a depletion of neutrophils. A number of studies have proven Tregs as a cells playing an important role in the development and progression of primary lung cancer and metastasis, and are known to be recruited to the tumor tissue in order to facilitate tumor cell escape from immunological surveillance [94]. Population of Tregs within tumor microenvironment is heterogenous, and thus may lead to a dual action in the carcinogenesis at different disease development stage. The potential anti-tumor action of Tregs, including suppressing tumor-promoting inflammatory responses and boosting anti-tumor immunity have been published for various cancers [95–97], thus increased infiltration of Tregs may reduce the risk of recurrence in some NSCLC by a currently unknown mechanism. It has been shown in breast cancer that tumor cells epigenetic reprogramming in a carcinogenesis suppressor pathway is related to the extended infiltration of Th2 [98]. Th2s are shown to affect tumor regression via inflammatory-related processes. Th2-dependent cell immunity in breast cancer leads to the activation of cell differentiation, suppressing epithelial-mesenchymal transition [99]. Noteworthy, CD4+ T-helper and CD8+ cytotoxic T-cells are one of the key players in immune response, and thus are frequently used in modern cancer immunotherapies such as CAR T-cell [100]. Unfortunately, we were not able to detect any enrichment of NK cells, a potent anticancer population of innate immunity. Taken together obtained results, and limitations of single TIME populations analysis we can conclude that the observed differences in disease-free and overall survival for detected epitypes in SCC and AC might be associated with distinct immune microenvironment of the tumors.

Implications
Survival data from real-world scenarios indicates that patients with NSCLC locoregional recurrence generally face poor prognostic estimates [101]. Post-surgery recurrency occurs in approximately 10–50% of patients treated for early-stage NSCLC. This data underscores the necessity for developing efficient preventative treatments aimed at reducing the incidence of locoregional recurrence. Furthermore, it emphasizes the importance of additional research to improve treatment strategies and optimize the outcomes for patients who experience recurrence [101]. The presented study shows that the risk for recurrence or death in early staged NSCLC can be more precisely estimated when the prognostic models incorporate DNAm data on top of commonly used clinicopathological characteristics such as TNM stage or SUVmax. The developed prognostic models for AC and SCC could be used post-surgery or post-biopsy of the primary tumor. They consist of just a single methylation biomarker, which can be easily implemented into medical diagnostics laboratories using routine MS-qPCR (Methylation Sensitive quantitative PCR). Quantitative PCR techniques are now widespread across the globe, as they were the foundation of SARS-CoV-2 testing during the recent COVID-19 epidemic. Methylation biomarkers are analyzed from DNA samples extracted from resected tumors or tumor biopsies, and this could be the same DNA sample that would be used for mutation testing for EGFR, KRAS, ALK, etc. DNAm biomarker analysis would just involve additional qPCR reaction on this DNA sample. Such approach is deliberately designed for cost-effective clinical translation. DNA methylation measured from resected tumor tissue via MS-qPCR offers high repeatability due to standardized bisulfite conversion protocols and stability of DNA and methylation markers compared to RNA or circulating biomarkers, facilitating both clinical reproducibility and retrospective validation. Proposed tumour tissue-based DNAm approach can be contextualized among emerging prognostic technologies. Circulating tumor DNA (ctDNA) offers non-invasive monitoring and has shown promise for minimal residual disease detection, but sensitivity remains limited in early-stage NSCLC due to low tumor burden, particularly post-resection. The use of the proposed approach would complement pre-operative radiomics, protein assays and post-operative ctDNA monitoring. Any performance comparison should rely on larger validation cohorts data.
Early identification of patients at risk of recurrence or shortened survival will allow for proper management and obtaining better clinical outcomes during disease treatment. It is suggested that patients may benefit from tumor molecular profiling on early-stage disease, allowing for the timely start of appropriate first-line treatment [102]. Such data about elevated risk can guide policy decisions leading to a possible extent of disease-free and overall survival of early-stage NSCLC patients. The developed risk stratification models could be integrated into clinical decision-making for post-operative management of early-stage NSCLC. For SCC, the 5-year mortality model with female sex, TNM staging, and epitype 1a DNAm marker could identify patients with high epigenetic risk despite favorable clinical staging. These patients could be candidates for adjuvant chemotherapy or intensified surveillance, as their molecular profile suggests elevated mortality risk not captured by other parameters alone. For AC, the 5-year recurrence model with SUVmax and recurrence-associated DNAm marker would address a similar clinical gap. SUVmax is an established prognostic indicator, and our decision curve analysis demonstrates that adding the DNAm marker provides consistently higher net benefit. This suggests the combined model could meaningfully inform clinical decisions - patients with low SUVmax but high DNAm-defined risk could be considered for adjuvant therapy or intensified follow-up imaging. The discussed strategies would represents risk stratification rather than treatment de-escalation. Specific decision thresholds for risk reclassification would need to be established through further prospective validation studies evaluating treatment response and outcomes across developed biomarkers defined risk strata. Clinicians might be able to use the estimated risk information to support clinical decisions regarding the type, intensity, and duration of post-surgery adjuvant therapies. Further adjuvant and recurrence-related therapies include various strategies, such as chemotherapy or radiation therapy, more precise targeted therapies (based on tumor molecular profile), and immunotherapy tailored to specific patient needs and tumor immunological profile. Generally, tumors can be divided into immunologically “hot” and “cold” regarding their immunophenotypes [103]. We show that the immunological microenvironment might be related to the epigenetic subtypes of the NSCLC tumors, and a single DNAm locus might be used as a biomarker.
Nevertheless, as the gold standard, we can imagine a prospective clinical trial designed to verify further the prognostic power of developed DNA-methylation biomarkers. In such a study, biomarker testing could be applied to resected cancer tissue in early-stage NSCLC. By application of the prognostic models, patients would be stratified to cohorts of low, intermediate, and high risk of recurrence or short post-surgery survival and then randomized to standard-of-care or prognostic model-guided adjuvant therapy and followed up for five years. Nonetheless, as this would be an ideal scenario for validation, it would involve numerous groups of patients and clinical centers, last long-term, and be very costly [104].

Strengths and limitations
There are a few essential things that should be highlighted as main strengths for this study, including the genome-wide DNAm profiling of NSCLC tumors with RRBS Next Generation Sequencing (NGS), which allows for a detailed methylome profiling covering millions of CpGs in the genome in comparison to hundreds of thousands typically analyzed using methylation microarrays. Thanks to this, the majority of CpGs in the human genome could be screened. Moreover, the analysis is based on new clinical data, not TCGA-based, like most studies addressing this topic. In addition, adjacent non-cancerous lung tissue was also examined for each patient using RRBS to directly compare methylome profiles in tumor patients. Also, samples were collected under strictly controlled conditions after resection, allowing for high-quality NGS-based omics translational research [105]. Nonetheless, several limitations should be mentioned and clearly stated. Firstly, the study population was of European ethnicity, so caution should be taken while extrapolating to other ethnic groups where the survival may differ. Moreover, patients were mostly smokers, and we cannot be sure that the exact estimations are accurate for non-smoking NSCLC patients. The main limitation is the study population size from an epidemiological point of view; however, this is a relatively huge sample in terms of costly and laborious NGS methylome analysis (RRBS). Additionally, RRBS inherently enriches CpG-rich regions including promoters and CpG islands, which may provide limited coverage of distal regulatory enhancers, though our analysis focused on direct functional gene elements. On the other hand, because methylomes were analyzed with RRBS, we cannot as a whole properly validate developed models using external datasets such as TCGA, which are methylation microarray-based and lack information for some of the DNAm level of the loci we have analyzed. Because of the sample size, as well as missing data points for some measurements like SUVmax, our study power might be limited. Moreover, we could not split the samples into separate validation and test cohorts to evaluate the performance and possibility of overfitting in the external, independent dataset. To overcome these limitations, LOOCV internal validations were performed. Wide confidence intervals observed in some multivariable models reflect these sample size constraints, though sensitivity analyses using Firth’s penalized regression confirmed the stability of our main findings. Obtained results might be of limited generalizability and should be further externally and independently validated in the long-term follow-up multi-institutional cohorts upon clinical application consideration. Additionally, some patients were lost to follow-up in shorter periods of time than the study length due to varied reasons. Noteworthy, most of the tumors were not tested molecularly for the presence of mutations common in NSCLC, such as EGFR, KRAS, ALK, etc., since comprehensive molecular profiling in early-stage lung cancers, like these in our study, is not a standard norm in clinical practice, and at the time of the patients surgery was recommended for advanced and metastatic tumors [3]. Consequently, we could not evaluate potential interactions between methylation markers and driver mutation status, which represents an area for future investigation. Moreover, further implementation of our models into practice should also take into account the possibility of inter- and intra-tumor heterogeneity, and a consideration of single cell-based approach might be considered.

Conclusions

Conclusions
Coordinated changes in the tumor methylome can aid the stratification of early NSCLC into clinically and immunologically relevant epigenetic subtypes. Epitype classification might be predictive of tumor recurrence and patient survival. Methylome network-based analysis enables prioritization of single-locus DNAm biomarkers with prognostic value. Translation of these approaches into prognostic models incorporating a single DNAm biomarker alongside established clinicopathological variables demonstrates promising performance for 5-year mortality prediction in SCC and 5-year recurrence prediction in AC. DNAm signatures carry prognostic value beyond TNM staging and SUVmax, suggesting that methylome-informed risk stratification could support post-surgical management and adjuvant therapy decisions in early-stage NSCLC pending further external validation.

Electronic supplementary material

Electronic supplementary material
Below is the link to the electronic supplementary material.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기