본문으로 건너뛰기
← 뒤로

Lipidomic signatures as predictive biomarkers for early-onset lung cancer: Identification and development of a risk prediction model.

1/5 보강
Journal of advanced research 📖 저널 OA 74.2% 2024: 1/1 OA 2025: 33/56 OA 2026: 64/75 OA 2024~2026 2026 Vol.79() p. 679-690
Retraction 확인
출처

Wang F, Guo Z, Tang W, Cao W, Dong X, Xu Y

📝 환자 설명용 한 줄

[INTRODUCTION] Lung cancer is the leading cause of cancer-related mortality worldwide.

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)
  • 95% CI 1.08-7.29
  • OR 2.75

이 논문을 인용하기

↓ .bib ↓ .ris
APA Wang F, Guo Z, et al. (2026). Lipidomic signatures as predictive biomarkers for early-onset lung cancer: Identification and development of a risk prediction model.. Journal of advanced research, 79, 679-690. https://doi.org/10.1016/j.jare.2025.03.045
MLA Wang F, et al.. "Lipidomic signatures as predictive biomarkers for early-onset lung cancer: Identification and development of a risk prediction model.." Journal of advanced research, vol. 79, 2026, pp. 679-690.
PMID 40180245 ↗

Abstract

[INTRODUCTION] Lung cancer is the leading cause of cancer-related mortality worldwide. While traditionally associated with older adults, early-onset lung cancer (EOLC) is rising, particularly in Asia, which accounts for 75.9% of global cases. Existing lung cancer screening guidelines primarily focus on older populations, which may result in missed opportunities for early detection in younger individuals. Given its distinct clinical characteristics, EOLC warrants dedicated research and targeted interventions.

[OBJECTIVES] This study aims to characterize the lipidomic profiles specific to EOLC patients (aged 18-49 years) and develop a biomarker-based predictive model to improve risk assessment and early detection.

[METHODS] The discovery and validation sets included 117 EOLC cases and 121 non-EOLC controls, all aged 18-49 years. Targeted lipidomics analysis, combined with logistic regression, was performed on plasma samples to identify differentially expressed lipids species. Clustering and pathway analyses were conducted to uncover and visualize the internal signatures of the identified lipids. Key lipids were refined using the LASSO-bootstrap regression method combined with the Boruta algorithm. A random forest model was subsequently employed to develop a robust prediction model for EOLC.

[RESULTS] A total of 843 lipids were identified, with 60 differentially expressed lipids detected, of which 33 were validated in the validation set. Cluster analysis revealed that passive smoking (OR: 2.75, 95% CI: 1.08-7.29) and current smoking (OR: 15.65, 95% CI: 2.55-142.10) were associated with elevated lipid metabolite profiles in EOLC patients. The validated lipids were further refined using LASSO and Boruta methods, which ultimately selected 6 lipids for inclusion in a prediction model constructed with random forest. This model achieved an area under the curve (AUC) of 0.874 in the validation set.

[CONCLUSION] Our study identified lipidomic signatures associated with the risk of EOLC, offering potential translational implications for lung cancer prevention strategies.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

📖 전문 본문 읽기 PMC JATS · ~58 KB · 영문

Introduction

Introduction
Lung cancer remains the leading cause of cancer-related mortality worldwide, accounting for 12.4 % of total cancer incidence and 18.7 % of total cancer deaths [1]. In 2022, China accounted for over 40 % of all newly diagnosed lung cancers and associated deaths worldwide [2]. While cancer predominantly affects older adults, with over 50 % of malignancies diagnosed in patients aged 65 and older [2], the incidence of early-onset cancer among individuals under 50 is rising globally [3]. Early-onset lung cancer (EOLC) referred to lung cancer diagnosed in individuals younger than 50 years old [4]. Data from Globocan 2022 indicated that Asia accounted for 75.9 % of EOLC cases globally, with China contributing to 64.1 % of these cases [2]. In China, the incidence of lung cancer increases rapidly after the age of 40 [5]. Between 1990 and 2021, the incidence rate of lung cancer among individuals aged 15–49 increased from 5.38 to 7.98 per 100 000, with an average annual percentage change of 1.29, indicating a rising trend [6]. In 2022, the incidence rate in individuals under 50 years old accounted for 12.5 % of the total lung cancer incidence in China, with a proportion of 17.8 % in females and 9 % in males [2]. These statistics underscore the need for continued research and targeted interventions to address the significant public health concern of EOLC.
Low-dose computed tomography (LDCT) screening has proven to be an effective method for reducing lung cancer mortality and alleviating its burden [7]. Current lung cancer screening guidelines primarily target older populations (≥50 years) and individuals with established risk factors, such as smoking history [8], [9]. However, emerging evidence indicates that EOLC presents distinct epidemiological, genetic, and clinical characteristics compared to late-onset lung cancer. EOLC is more frequently observed in non-smokers and is often associated with adenocarcinomas and unique genetic risk loci [4], [10], [11]. Given these differences, screening models developed based on traditional high-risk criteria may be less effective in identifying younger individuals at risk. Consequently, the exclusion of this population from current screening guidelines may result in delayed diagnoses and poorer prognostic outcomes, including reduced survival time [10]. Therefore, efforts are warranted to identify populations at particularly high risk for EOLC, essential for improving the early detection of EOLC. Efficient biomarkers are highly needed.
Lipids play a key role in a series of metabolic processes in tumor cells, and their close association with lung cancer risk has been established by numerous studies. Wang et al. combined single-cell RNA sequencing with lipidomics in a cohort of 311 participants to uncover dysregulated lipid metabolism in lung cancer [12]. Similarly, Sun et al. analyzed lipid metabolic profiles of lung adenocarcinoma and identified PE (18:0/18:1) as potential lipid signature biomarker [13]. Lipidomics analysis has proven to be a powerful tool for characterizing disease progression and identifying key variables for prediction model development. Studies based on lipidomics achieved an area under the curve (AUC) of the model exceeding 90 %, significantly outperforming traditional lung cancer risk prediction models that rely solely on macro-epidemiological factors [12]. However, most serum lipidomic biomarkers are used for predicting lung cancer risk in the general population, and there is a lack of research specifically focused on EOLC. Given the potential different mechanisms between lung cancer in the general population and EOLC patients [11], further studies are needed to quantify the impact of lipids on EOLC. Such research could facilitate the early detection of EOLC patients and contribute to improved survival outcomes.
Accordingly, the aim of our study is twofold: (1) to conduct an in-depth lipidomics analysis of EOLC and characterize a plasma lipidomic profile specific to EOLC patients; (2) to develop a prediction model for EOLC incorporating individual lipid metabolites. Ultimately, our goal is to establish and optimize an EOLC risk prediction model that can enhance screening practices for younger individuals at high risk of lung cancer.

Material and methods

Material and methods

Study participants
Our study had a discovery stage and a validation stage involving a total of 117 EOLC cases and 121 non-EOLC controls aged 18–49 years, recruited from cancer hospitals in Beijing, Anhui, Zhejiang, and Shandong provinces. We defined the age range of 18–49 years for participant selection based on two key considerations. First, EOLC is commonly classified as lung cancer diagnosed before the age of 50, distinguishing younger patients from the traditionally high-risk population (≥50 years) [4]. Second, this classification aligns with existing epidemiological and clinical studies on EOLC [14], ensuring comparability and contributing to a standardized definition within the research community. In this study, lung cancer cases were classified based on the patient's age at initial diagnosis, whereas the age of control participants was determined at the time of LDCT screening during the corresponding period.
The discovery stage was designed as an unmatched case-control study and involved 111 subjects, including 53 EOLC patients and 58 controls. Inclusion criteria for cases required participants to have no previous history of cancer, no prior radiation or chemotherapy treatments, and a pathological diagnosis of lung cancer. Controls were eligible if they had no positive nodules detected through LDCT. Positive nodules were defined as solid or part-solid nodules with a solid component of 6 mm, or non-solid nodules of 8 mm, based on lung cancer guidelines in China [15]. The validation set included 127 individuals, with 64 EOLC patients and 63 non-EOLC subjects. Details of the participants in each cohort are presented in Fig. 1 and Table S1.
All participants provided informed consent. The study was approved by the ethics committees of National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences, and Peking Union Medical College (Number of IRB: 23/300-4042).

Plasma sample collection
Peripheral venous blood-derived plasma samples were collected from cases and controls, with samples from the case group obtained prior to surgery and those from the control group collected before undergoing LDCT. Fasting peripheral venous blood samples (5 mL) were collected from participants using ethylenediaminetetraacetic acid (EDTA) vacuum anticoagulant tubes. Centrifugation and plasma separation were performed within 2 h of collection. The separated plasma was then aliquoted into five 1.5 mL centrifuge tubes, labeled accordingly, and stored at −80 °C. Within 30 days, all blood samples were transported to the National Cancer Center Biobank for subsequent testing and analysis.

Targeted lipidomics profiling
Targeted lipidomics analysis of plasma samples were performed using ultra-high-performance liquid chromatography coupled with mass spectrometry (LC-MS) as described in detail in the Supplementary Files. A total of 843 lipids were detected across all the 238 samples. Quality control (QC) samples were prepared using mixed plasma samples, with 1 QC sample inserted between every 30 tested samples. In total, 10 QC samples were inserted during the lipidomics analysis for plasma samples to ensure instrument stability and normalize the variations during the run.

Bioinformatics and statistical analyses
Bioinformatics and statistical analyses were conducted for the lipid signatures associated with the occurrence of EOLC compared with non-EOLC patients, based on the discovery and validation sets.

Identification and validation of differential lipids
The plasma levels of lipids were log-transformed and normalized. In the discovery set, Orthogonal Projections to Latent Structures Discriminant Analysis (OPLS-DA) was used to demonstrate the differentiation between cases and controls and to calculate the variable importance projection (VIP) value [16]. Differential lipids were initially analyzed using a two-sided Wilcoxon rank-sum test, and the P values were adjusted for multiple comparisons using the Benjamini–Hochberg (BH) method to control the false discovery rate (FDR). Significant lipids (fold change [FC] > 1.25 or < 0.8, FDR < 0.05 and VIP > 1) [16], [17] identified in the discovery set were further investigated in the validation set by employing logistic regression model (adjusting for age, sex, and smoking status) to assess their relationships with EOLC (Fig. 1). The lipid with a consistent direction of effect and significant P-value (FDR < 0.05) in the validation stage was deemed successfully validated.

Visualizing the internal signatures of identified lipids
To further explore the internal signatures of the identified lipids in EOLC patients, we performed clustering analysis on EOLC cases from both the discovery and validation sets. The Partitioning Around Medoids (PAM) algorithm, a robust partitioning technique widely used in unsupervised learning, was employed to identify the clusters for each individual [18]. The determination of the optimal cluster number was achieved through Silhouette's method, which evaluated the quality of clustering by assessing how well each data point was situated within its assigned cluster. A higher average silhouette width indicated better-defined clusters [19]. Multivariable logistic regression was used to identify significant characteristics associated with distinct lipidomic profiles of EOLC patients across the identified clusters.
BioPAN tool was used for lipid pathway analysis between two groups (https://lipidmaps.org/biopan/) [20]. BioPAN calculates Z-scores for all possible lipid pathways and predict the most likely lipid transforming genes. An absolute Z-value greater than 1.645 indicates that a pathway is significantly different between cases and controls.

Development and validation of the prediction model
To develop the EOLC prediction model, we employed the LASSO-bootstrap regression method and the Boruta algorithm to refine key lipid predictors. Given the high variability of lipidomics data among individuals, a bootstrap procedure with 1000 resamples was applied in the discovery set to enhance the robustness and generalizability of variable selection [21]. For each bootstrap sample, ten-fold cross-validation was conducted to determine the optimal regularization parameter (λ) that minimized prediction error. Features with non-zero coefficients in the final regression model of each bootstrap sample were recorded. Upon completing all 1000 bootstrap iterations, features appearing in at least 60 % of the samples were selected based on the LASSO regression [21], [22]. The Boruta algorithm, a feature selection method based on random forests, identifies variables strongly associated with the dependent variable rather than merely optimizing the feature set for a specific model [23]. This method evaluates the significance of each feature by comparing its Z-Score to that of randomly generated “shadow features.” A feature is considered important if its Z-Score exceeds the maximum Z-Score observed among the shadow features. Lipids selected by both LASSO and Boruta were used to develop the prediction model. The integration of the LASSO and Boruta algorithms for lipid metabolism biomarker selection offers a synergistic approach that leverages the complementary strengths of both methods. LASSO’s shrinkage properties effectively regulate model complexity, while Boruta’s iterative elimination of non-informative features enhances the robustness of feature selection. This combined approach mitigates the risk of overfitting, a prevalent challenge in high-dimensional data analysis. Furthermore, prior study has demonstrated that ensemble feature selection methods generally achieve superior classification accuracy compared to single-method approaches [24]. By integrating these techniques, this strategy facilitates a more comprehensive and reliable identification of relevant biomarkers.
Given the high dimensionality of omics data, we utilized machine learning techniques to develop a prediction model for EOLC. Specifically, we employed logistic regression (LR), random forest (RF), and support vector machine (SVM) methods to construct the model. The models were further evaluated in the validation set using accuracy (the proportion of correctly predicted outcomes out of the total number of samples) [25], the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), calibration curve, and F1 score (F1 = 2 × Precision x Recall/ (Precision + Recall)) [26]. The final model was determined based on the highest AUC of the models in the validation set.
To enhance the interpretability of the developed machine learning model, we introduced the SHapley Additive exPlanations (SHAP) method [27]. The SHAP values of each feature for all the patients were used to assess the importance of the features for prediction. Features with higher mean absolute SHAP values were considered more influential in the model’s predictions.

Sample size calculation
A previously developed lung cancer risk prediction model based on macro-epidemiological characteristics demonstrated an AUC of 0.7 [28]. This study hypothesizes that incorporating lipid metabolism biomarkers can enhance predictive performance, achieving an AUC of 0.85. Given a significance level of α = 0.05 and an allowable margin of error (δ) of 0.10, a minimum of 59 EOLC cases and 59 controls are required to ensure statistical power [29]. Thus, the sample size in this study meets the necessary methodological requirements.
All analyses were performed using R v4.4.2 (R Foundation for Statistical Computing, Vienna, Austria), and P < 0.05 was considered significant unless otherwise specified.

Results

Results

Lipids associated with the occurrence of EOLC
The discovery set comprised 53 cases of lung adenocarcinoma, with 8 (15.09 %) classified as Stage 0 and 45 (84.91 %) in Stage I. In the validation set, 64 cases of lung adenocarcinoma were included, with 10 (15.63 %) in Stage 0, 52 (81.25 %) in Stage I and 2 (3.13 %) in Stage Ⅲ. Table S1 shows the characteristics of the study participation in the discovery and validation set. In the discovery set, 83.02 % of cases were never smokers, while in the validation set, this figure was 87.50 %. Additionally, 75.47 % of cases in the discovery set were female, compared to 78.12 % in the validation set.
Principle component analysis revealed high correlation among the QC samples with spearman correlation of the first and last QC of 0.997 (Fig. S1A-B). All QC samples were distributed within 2 standard deviations of the first component (Fig. S1C). 99.1 % of the lipids exhibited relative standard deviation values below 30 % (Fig. S1D), which indicated consistent and reliable measurements.
In the discovery set, 843 lipids were identified through targeted lipidomics analysis (Fig. 2A). The lipidomic profiles of EOLC patients could be distinguished from those of non-EOLC patients by OPLS-DA (Fig. S2). A total of 60 differentially expressed lipids were found between EOLC and non-EOLC patients, including 12 classes (Fig. 2B). The most frequent lipid identified was lysophosphatidylcholine (LysoPC, n = 28), followed by phosphatidylethanolamine (PE, n = 11) and lysophosphatidylethanolamine (LysoPE, n = 6). Further analyses on these 60 lipids found that 48 lipids were upregulated and 12 downregulated (adjusted P value < 0.05 and FC > 1.25 or < 0.8) and their VIP values were all > 1 (Fig. S3).
When validating these 60 lipids in the validation set using logistic regression, 33 lipids have consistent associations (FDR < 0.05, Table 1). 21 lipids were positively associated with the risk of EOLC, including 12 LysoPC, 3 LysoPE, 3 Diacylglycerol (DG), 2 PE, and 1 Triacylglycerol (TG). 12 lipids were negatively associated with the risk of EOLC, including 6 PE, 2 Lactosylceramide (LacCer), 1 Ceramide (Cer), 1 Phosphatidylcholine (PC), 1 Phosphatidylserine (PS), 1 Sphingomyelin (SM). The strongest positive association is observed for DG (16:0/18:1) (OR = 3.57, 95 % CI: 2.12–6.62), followed by LysoPC (16:0) (OR = 3.17, 95 % CI: 1.89–5.83). Conversely, the strongest negative association is seen for PS (18:0/18:1) (OR = 0.31, 95 % CI: 0.17–0.50), followed by PE (P-18:0/22:5) (OR = 0.32, 95 % CI: 0.18–0.53). The distributions and correlations of the 33 lipids were shown in Fig. S4 with most lipids exhibiting positive correlations.

The internal signatures of identified lipids
The highest average silhouette width was observed when the number of clusters was set to two (Fig. S5A). Therefore, we classified EOLC patients into two distinct clusters. The lipids in the two clusters were distinctly separated (Fig. S5B). Cluster 2 exhibited elevated levels of most lipid metabolites, while cluster 1 showed reduced lipid levels (Fig. S5C). Multivariable logistic regression analysis revealed that EOLC patients with exposure to passive smoking (OR: 2.75, 95 % CI: 1.08–7.29, P = 0.037) and those who were current smokers (OR: 15.65, 95 % CI: 2.55–142.10, P = 0.006) were more likely to exhibit the elevated lipid metabolite profiles observed in cluster 2 (Table S2). Subgroup analyses conducted specifically among non-smokers identified 21 lipids that were differentially expressed in both the discovery and validation sets (FDR < 0.05, VIP > 1, and FC > 1.25 or < 0.8) when comparing patients with EOLC to those without EOLC (Table S3). Among these lipids, 10 species exhibited upregulation, while 11 were downregulated. Notably, 17 of the 21 differentially expressed lipids were consistent with those previously identified in the overall study population, which included both smokers and non-smokers. The remaining four lipid species—PE (O-16:0/18:3), PE (P-16:0/18:1), PE (P-18:0/18:3), and TG (56:8-FA20:4)—were uniquely associated with non-smokers.
In the lipid metabolism of EOLC patients using BioPAN pathway analysis, 6 pathways were significantly active, while 9 pathways were significantly suppressed (absolute Z-score greater than 1.645, Table S4). Notably, the pathways involved in the conversion of O-PC to O-LPC were the most significantly activated (Z-score: 5.201), and the transformation of PC to PS were the most significantly inhibited (Z-score: −6.356). The corresponding predicted genes were PLA2 (phospholipase A2) and PTDSS1, respectively.

Development and validation of prediction models using identified lipids
Using LASSO-bootstrap method with 1000 resamples, 6 differential lipids out of the above 33 lipids were selected for the model construction (selection frequency > 600, Fig. 3A), including PE (P-18:0/22:4), PS (18:0/18:1), PE (18:2/18:2), LysoPC (18:0), LysoPC (P-18:1), and LysoPC (O-18:1). Using the Boruta algorithm, 23 lipids were considered important (Z-Score > maximum Z-Score, Fig. 3B), of which 6 were consistent with the results obtained from LASSO analysis (Fig. 3C). Therefore, we finally included 6 lipids into our prediction model. The distributions of the six lipids are similar in the discovery and validation sets (Fig. S6). PE (P-18:0/22:4) and PS (18:0/18:1) are negatively associated with the occurrence of EOLC, while the remaining four lipids are positively associated with EOLC.
The RF and SVM models exhibited comparable performance in the validation set, achieving AUC values of 0.874 (95 %CI: 0.806–0.933) and 0.873 (0.804–0.928), respectively (Fig. 4). In contrast, the logistic regression model exhibited a slightly lower AUC of 0.846 (0.774-0.914). The RF model demonstrated moderate sensitivity and specificity compared to the LR and SVM models (sensitivity: 0.797, 0.734, and 0.844; specificity: 0.841, 0.889, and 0.794, respectively, Table S5). The LR model achieved the highest positive predictive value (0.870), followed by the RF (0.836) and SVM (0.806) models. Conversely, the LR model exhibited a lower negative predictive value (0.767) compared to the RF (0.803) and SVM (0.833) models. The accuracy and F1 score of the SVM and RF models were comparable and superior to those of the LR model. We selected the RF model as the final prediction model due to its superior AUC, along with its balanced sensitivity and specificity. The calibration plot for the RF model showed good concordance between observed and predicted probabilities (Fig. S7), with a Brier score of 0.144, reflecting a small average squared difference in actual probability and predicted probability.
To further explain the RF model we developed, the average absolute SHAP value of the selected feature indicating the importance of each lipid to the outcome are shown in Fig. 5A. PE (P-18.0/22.4) had the highest contribution to the model, followed by PS (18:0/18:1) and LysoPC (O-18:1). A closer examination of the lipid distribution across patients revealed that PE (P-18:0/22:4) and PS (18:0/18:1) were predominantly elevated with a negative SHAP value, indicating a negative association with EOLC (Fig. 5B). In contrast, other lipids were generally low with a negative SHAP value, suggesting a positive association with EOLC.

Discussion

Discussion
In this study, we comprehensively identified lipidomic signatures associated with the development of EOLC. Out of 843 detected lipids, thirty-three key lipids were found to be significantly associated with EOLC risk in both discovery and validation cohorts. Subsequent analysis revealed that EOLC patients with tobacco exposure exhibited significantly higher levels of these lipids compared to those without tobacco exposure. These lipids were further integrated as latent features to train machine learning models, resulting in the identification of six key lipids that were incorporated into a prediction model for EOLC. This model demonstrated superior predictive performance, with an AUC of 0.874. These findings underscore the potential of lipidomics in identifying high-risk individuals for EOLC, offering a valuable tool for targeted lung cancer screening and early diagnosis through LDCT.
Previous metabolomics studies have explored lipid metabolites in lung cancer patients, highlighting its immense potential for early diagnosis of lung cancer. Klupczynska et al. conducted a targeted analysis to investigate differences in lipid metabolism between stage I non-small cell lung cancer patients and healthy controls [30]. Their findings identified choline-containing lipids as potential biomarkers for early-stage lung cancer. Zhu et al. conducted a lipidomic analysis of 54 patients with different subtypes of lung cancer and identified PE (36:2, 18:0/18:2, and 18:1/18:1) as specific to small cell lung cancer, while LPC (20:1 and 22:0 sn-position-1) and PC (19:0/19:0 and 19:0/21:2) were found to be specific to adenocarcinoma [31]. However, most existing studies focused on the general population, resulting in a scarcity of lipidomic analyses specifically targeting EOLC patients.
Six lipid species were incorporated into the final predictive model for EOLC patients, including PE (P-18:0/22:4), PS (18:0/18:1), PE (18:2/18:2), LysoPC (18:0), LysoPC (P-18:1), and LysoPC (O-18:1). LysoPCs, derived from the hydrolysis of phosphatidylcholine, are membrane lipids with pro-inflammatory properties that play critical roles in signal transduction and cancer metastasis [32]. In our study, LysoPC levels were elevated in EOLC patients, consistent with findings from Goldberg et al., who reported similar increases in a cohort of 42 lung cancer patients, particularly in adenocarcinoma and acinar cell carcinoma [33]. The upregulation of LysoPC in EOLC patients may be attributed to its pro-inflammatory function, as specific LysoPC species (e.g., LPC18:1, LPC16:0, and LPC18:0) are known to induce monocyte chemotaxis and stimulate pro-inflammatory cytokine production by macrophages [34]. However, contrasting findings in late-onset lung cancer have reported a decrease in LysoPC levels, suggesting potential context-dependent effects [35]. Additionally, our study observed a reduction in PS levels in EOLC patients, aligning with previous analyses in 162 non-small cell lung cancer patients which also demonstrated a decrease in PS levels [36]. This reduction may be attributed to the increased serine depletion, as serine serves as a common precursor for multiple phospholipids within tumor cells [36]. Similarly, PE levels were found to be reduced, potentially linked to PD-1 signaling. Research on the tumor microenvironment has shown that PE levels are significantly diminished in CD8 + T cells within lung cancer tissues, a phenomenon correlated with decreased expression of phospholipid phosphatase 1 (PLPP1). This reduction renders CD8 + T cells more susceptible to ferroptosis, a form of programmed cell death [37].
Clustering analysis of lipidomic profiles in EOLC patients revealed two distinct clusters. The first cluster showed reduced levels of most of lipid species, while the second cluster displayed the opposite lipidomic profile. These findings align with our previous research on blood lipids and lipoproteins, which indicated that both elevated and decreased levels of TG and total cholesterol are associated with an increased risk of lung cancer, suggesting a U-shaped relationship between lipid levels and disease risk [38]. Our further characteristic analysis revealed that tobacco exposure may be a potential factor influencing lipid metabolism profiles. This finding aligns with previous studies indicating that lung cancer in smokers and never smokers may represent two distinct diseases, with differing etiologies and molecular characteristics [39]. The tumorigenesis mechanisms of lung cancer in smokers and never-smokers exhibit significant differences in the genes EGFR, KRAS, and TP53, as well as in the tumor microenvironment [39], [40]. In subgroup analyses, we identified four lipid species with distinct abundance profiles in non-smoking EOLC patients compared to controls: PE (O-16:0/18:3), PE (P-16:0/18:1), PE (P-18:0/18:3), and TG (56:8-FA20:4). TG plays a critical role in modulating inflammation, redox homeostasis, and autophagy, primarily through peroxisome proliferator-activated receptor alpha (PPARα) signaling pathways [41]. In non-smokers, TG accumulation may activate PPARα-independent pathways such as NF-κB-mediated inflammation [42].
We found that PLA2 is a key gene that may be potentially implicated in the pathogenesis of EOLC from our lipid pathway analysis. PLA2 is an enzyme that catalyzes the hydrolysis of the sn-2 ester bond in membrane phospholipids, such as PC to LPC [34]. PLA2 plays a vital role in regulating tumor angiogenesis by altering the metabolism of phospholipids, which are essential constituents of cellular membranes. Through these modifications, PLA2 influences processes that govern angiogenesis, thereby impacting tumor growth [43]. We also found that PTDSS1 gene is involved in the biosynthesis of PS. Growing evidence indicates that PTDSS1 may play a role in the expansion of tumor-associated macrophages and tumor growth, suggesting its potential contribution to cancer progression [44].
A review published in CA: A Cancer Journal for Clinicians in 2021 highlights the comprehensive capability of metabolomics in capturing molecular alterations at the DNA, RNA, and protein levels, establishing it as a highly sensitive tool for detecting pathological changes [45]. Our predictive model based on six lipid species demonstrated strong performance (AUC = 0.874), exceeding the predictive accuracy of traditional lung cancer risk models, which typically report AUC values below 0.80 [28], [46]. Compared to other emerging biomarkers, such as circulating tumor DNA (ctDNA) and proteomics, lipidomics offers distinct advantages in lung cancer screening programs. The utility of ctDNA is constrained by its low abundance in early-stage tumors and technical limitations in detecting rare mutations, such as EGFR or KRAS variants [47]. In contrast, lipidomics is highly sensitive to biological and pathological changes, offering a more reliable reflection of systemic alterations [12]. Lipidomic profiling enables the identification of tumor-specific lipid biomarkers, such as PC and PE, even in early-stage non-small cell lung cancer, where proteomic and ctDNA-based biomarkers often lack sufficient discriminatory power [48]. For instance, a study by Shang et al. demonstrated that an eight-metabolite panel (including lipids) achieved an AUC of 0.922 for lung cancer detection [49], outperforming ctDNA-based assays (AUC = 0.839) [50] and a 10-protein biomarker model (AUC = 0.87) [51]. Furthermore, advancements in lipidomics technology have improved its feasibility and cost-effectiveness, reinforcing its potential as a promising tool for large-scale lung cancer screening initiatives [12].
Current lung cancer screening guidelines (e.g., USPSTF, NCCN) primarily focus on older adults (aged 50–80) with a substantial smoking history (≥20 pack-years) [8], [9]. However, the rising incidence of EOLC underscores the need to extend screening to younger, high-risk individuals—a gap not addressed by current recommendations. Lowering the age threshold indiscriminately could lead to over-screening, increased healthcare burdens, and unnecessary radiation exposure. To address this challenge, our lipidomic-based predictive model offers a targeted approach by identifying high-risk individuals under 50 based on lipidomic signatures. This strategy enables precision screening, directing LDCT to those most likely to benefit while minimizing redundant testing in low-risk populations. A sequential workflow—first applying the 6-lipid model to stratify risk in asymptomatic younger populations, followed by targeted LDCT—may have the potential to enhance screening efficiency and cost-effectiveness. Additionally, in individuals with LDCT-detected pulmonary nodules, lipidomic profiling may improve risk stratification, prioritizing high-risk nodules for histopathological evaluation while reducing unnecessary invasive procedures for low-risk nodules, thereby lowering false-positive rates. Successful implementation of this model will require interdisciplinary collaboration among oncologists, radiologists, lipid biologists, and policymakers to ensure rigorous validation and inform future screening guidelines, ultimately improving early detection and reducing lung cancer mortality in this underserved population.
This study has several limitations. First, our findings indicate an association between lipidomic signatures and EOLC rather than a causal relationship, a limitation inherent to our case-control study design and the lack of mechanistic investigations. As plasma lipidomic metabolites were measured at the time of EOLC patient enrollment, the potential for reverse causation remains. The observed lipid changes may occur after lung cancer onset rather than precede it. However, a Mendelian randomization study, which minimizes reverse causation by using genetic variants as instrumental variables, identified a causal link between specific lipid profiles and increased lung cancer risk [52]. Additionally, prior studies have indicated that lipid profiles associated with lung cancer risk remain consistent across various disease stages [53], supporting the notion that lipidomic alterations may arise early in the disease process and contribute to lung cancer development rather than being a consequence of the disease itself [54]. Further prospective cohort studies are needed to establish the temporal relationship between lipid alterations and lung cancer onset, while mechanistic investigations are essential to elucidate the underlying biological mechanisms linking lipid metabolism to lung cancer. These efforts are crucial to determining whether the observed association reflects a causal relationship. Second, while our study was adequately powered to identify significant lipid biomarkers for EOLC and to develop predictive models, it was underpowered to assess potential interactions between EOLC risk factors and lipid profiles. Moreover, we were unable to perform stratified analyses for smokers and non-smokers, respectively, as 83 % of EOLC cases and 71 % of controls in the discovery set were non-smokers. Consequently, we elected to conduct subgroup analyses within the non-smoker category to investigate lipid profiles that might be distinctive to this subset. Additional studies with adequate statistical power encompassing both groups are necessary to validate the differential expression of lipids between smokers and non-smokers. Third, the possibility of residual confounding remains, given that comprehensive details regarding drug usage histories, variables affecting hospital choice, and variations in sample collection methodologies across medical institutions could potentially impact the observed correlations between lipid expression levels and the occurrence of EOLC. To minimize such biases, stringent standardization protocols were meticulously applied throughout the sample preparation process. Moreover, multivariable adjustment was employed to account for demographic characteristics, lifestyle choices and baseline comorbidities. Nevertheless, additional research is warranted to corroborate and reinforce our current findings. Fourth, all participants in our study were recruited from hospital settings, which may limit the generalizability of our results to community-based populations. Additionally, given that all EOLC patients in our study had adenocarcinoma, caution is warranted when extrapolating our findings to other histological subtypes of lung cancer. Prior studies have demonstrated distinct lipid metabolism patterns between small cell lung cancer and non-small cell lung cancer [55]. Future population-based studies incorporating more diverse populations and additional histological subtypes are necessary to validate the broader applicability of our findings. Finally, despite a comprehensive targeted lipidomics approach and the use of BioPAN pathway analysis, the biological mechanisms underlying the observed lipid dysregulation remain unclear. Further mechanistic research focusing on the metabolic pathways of the identified lipids is crucial for laying the groundwork for early diagnostic strategies and targeted therapeutic interventions. Moreover, examining the metabolic shifts along with the bioelectric alterations, particularly the electrophoretic characteristics of malignant cells, may yield significant insights into the interplay between these modifications, lipid metabolism, and the progression of cancer [56].

Conclusion

Conclusion
This study reveals that lipid metabolites are associated with the occurrence of EOLC, providing important insights and a theoretical basis for the early detection of EOLC patients. Our findings highlight the clinical significance of lipidomic profiling as a potential tool for improving lung cancer screening strategies and facilitating early intervention. Additionally, the identified lipid signatures may shed light on metabolic reprogramming and tumor microenvironment dynamics in EOLC, offering potential avenues for therapeutic target discovery. However, large-scale, prospective studies across diverse populations are needed to confirm the predictive accuracy and generalizability of the model, as well as to explore its clinical utility in real-world settings.

CRediT authorship contribution statement

CRediT authorship contribution statement
Fei Wang: Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Software, Validation, Writing – original draft, Writing – review & editing. Zeming Guo: Data curation, Formal analysis, Software, Writing – original draft, Writing – review & editing. Wei Tang: Data Curation, Investigation, Methodology, Resources, Writing – review & editing. Wei Cao: Methodology, Visualization, Writing – review & editing. Xuesi Dong: Methodology, Writing – review & editing, Software. Yongjie Xu: Methodology, Software, Writing – review & editing. Chenran Wang: Investigation, Writing – review & editing. Jiaxin Xie: Data curation, Writing – review & editing. Xiaoyue Shi: Investigation, Writing – review & editing. Zilin Luo: Data curation, Writing – review & editing. Yadi Zheng: Investigation, Writing – review & editing. Guochao Zhang: Investigation, Writing – review & editing. Na Ren: Investigation, Writing – review & editing. Nan Zhang: Investigation, Resources, Writing – review & editing. Donghua Wei: Investigation, Resources, Writing – review & editing. Lingbin Du: Funding acquisition, Investigation, Writing – review & editing. Ni Li: Conceptualization, Funding acquisition, Methodology, Project administration, Supervision, Writing – review & editing. Fengwei Tan: Conceptualization, Data curation, Funding acquisition, Methodology, Project administration, Resources, Supervision, Writing – review & editing.

Declaration of competing interest

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기