Prediction of EGFR mutation status in non-small cell lung cancer based on multiparametric MRI radiomics.
1/5 보강
PICO 자동 추출 (휴리스틱, conf 2/4)
유사 논문P · Population 대상 환자/모집단
91 patients with NSCLC (72 in the training cohort and 19 in the validation cohort) were included in this study; 1708 radiomics features were extracted from the MRI (T2W and CET1w) sequences.
I · Intervention 중재 / 시술
추출되지 않음
C · Comparison 대조 / 비교
추출되지 않음
O · Outcome 결과 / 결론
The calibration curve showed good predictive performance, and the decision curve indicated that the radiomics nomogram had high clinical benefits. [CONCLUSION] The model based on MRI radiomics shows strong diagnostic efficacy for predicting EGFR mutation status in NSCLC, guiding individualized targeted therapies.
[BACKGROUND] The purpose of the study was to establish and validate a model for predicting the mutation status of epidermal growth factor receptor (EGFR) in non-small cell lung cancer (NSCLC) using ma
APA
Wang Y, Hu H, et al. (2025). Prediction of EGFR mutation status in non-small cell lung cancer based on multiparametric MRI radiomics.. BMC medical imaging, 25(1), 503. https://doi.org/10.1186/s12880-025-02029-w
MLA
Wang Y, et al.. "Prediction of EGFR mutation status in non-small cell lung cancer based on multiparametric MRI radiomics.." BMC medical imaging, vol. 25, no. 1, 2025, pp. 503.
PMID
41387785 ↗
Abstract 한글 요약
[BACKGROUND] The purpose of the study was to establish and validate a model for predicting the mutation status of epidermal growth factor receptor (EGFR) in non-small cell lung cancer (NSCLC) using magnetic resonance imaging (MRI) radiomics features combined with clinicopathological factors.
[METHODS] Overall, 91 patients with NSCLC (72 in the training cohort and 19 in the validation cohort) were included in this study; 1708 radiomics features were extracted from the MRI (T2W and CET1w) sequences. The variance threshold method combined with the univariate selection method and the least absolute shrinkage and selection operator (LASSO) regression was used to screen important radiomics features, calculate radiomics scores, and construct a radiomics model. Multivariate logistic regression analysis was used to combine radiomics scores (Rad-scores) and independent predictive factors to construct a radiomics nomogram for predicting EGFR mutation status. The predictive performance and clinical practicality of the model were evaluated using the area under the curve (AUC), calibration curves, and clinical decision curves.
[RESULT] EGFR mutations were identified in 30.8% (28/91) of patients; 854 radiomics features were extracted from T2WI and CET1w, making a total of 1708 features. First, the variance threshold method was used to screen out features with variance < 0.8, yielding 1702 features. Features with an insignificant differences ( ≥ 0.05) were screened out using the univariate selection method, yielding 43 features. Finally, all features were fitted based on the type of gene mutation using the LASSO algorithm. Thirteen important radiomics features were screened. The radiomics model based on T2WI combined with CET1w provided better classification of EGFR mutant and wild-type, with AUCs of 0.846 and 0.808 in the training and validation cohorts, respectively. The radiomics nomogram model based on T2WI–CET1w radiomics label combination and independent predictors (gender and maximum diameter) for multivariate logistic regression analysis showed higher diagnostic efficiency, with AUCs of 0.880 and 0.859, respectively. The calibration curve showed good predictive performance, and the decision curve indicated that the radiomics nomogram had high clinical benefits.
[CONCLUSION] The model based on MRI radiomics shows strong diagnostic efficacy for predicting EGFR mutation status in NSCLC, guiding individualized targeted therapies.
[METHODS] Overall, 91 patients with NSCLC (72 in the training cohort and 19 in the validation cohort) were included in this study; 1708 radiomics features were extracted from the MRI (T2W and CET1w) sequences. The variance threshold method combined with the univariate selection method and the least absolute shrinkage and selection operator (LASSO) regression was used to screen important radiomics features, calculate radiomics scores, and construct a radiomics model. Multivariate logistic regression analysis was used to combine radiomics scores (Rad-scores) and independent predictive factors to construct a radiomics nomogram for predicting EGFR mutation status. The predictive performance and clinical practicality of the model were evaluated using the area under the curve (AUC), calibration curves, and clinical decision curves.
[RESULT] EGFR mutations were identified in 30.8% (28/91) of patients; 854 radiomics features were extracted from T2WI and CET1w, making a total of 1708 features. First, the variance threshold method was used to screen out features with variance < 0.8, yielding 1702 features. Features with an insignificant differences ( ≥ 0.05) were screened out using the univariate selection method, yielding 43 features. Finally, all features were fitted based on the type of gene mutation using the LASSO algorithm. Thirteen important radiomics features were screened. The radiomics model based on T2WI combined with CET1w provided better classification of EGFR mutant and wild-type, with AUCs of 0.846 and 0.808 in the training and validation cohorts, respectively. The radiomics nomogram model based on T2WI–CET1w radiomics label combination and independent predictors (gender and maximum diameter) for multivariate logistic regression analysis showed higher diagnostic efficiency, with AUCs of 0.880 and 0.859, respectively. The calibration curve showed good predictive performance, and the decision curve indicated that the radiomics nomogram had high clinical benefits.
[CONCLUSION] The model based on MRI radiomics shows strong diagnostic efficacy for predicting EGFR mutation status in NSCLC, guiding individualized targeted therapies.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
같은 제1저자의 인용 많은 논문 (5)
- "I wanna look like the person in that picture": Linking selfies on social media to cosmetic surgery consideration based on the tripartite influence model.
- ZmSKIP enhances drought tolerance by reducing stomatal aperture in maize.
- c.7374_7375insAlu is a French-Canadian founder pathogenic variant associated with predisposition to pancreatic and breast cancer.
- Enhancing Node-RADS for preoperative assessment of cervical lymph node metastases in papillary thyroid carcinoma: validation and modification.
- Aging modulation of the immune system and immunotherapy efficacy in cancer.
📖 전문 본문 읽기 PMC JATS · ~61 KB · 영문
Background
Background
Lung cancer is a common malignant tumor worldwide, with the highest mortality rate worldwide [1]. In China, approximately 733,000 new cases of lung cancer occur annually, and more than 610,000 people die from this disease [2]. According to the latest national cancer statistics released by the National Cancer Center in 2022, the incidence and mortality rates of lung cancer were highest in China [3], accounting for 37% of new cases and 39% of deaths worldwide. Non-small cell lung cancer (NSCLC) accounts for 80–90% of all primary lung cancers. Approximately 50% of NSCLC patients are diagnosed at an advanced stage (stage III or IV), and the 5-year survival rate is only 18% [4]. Recently, a breakthrough in the treatment of patients with advanced NSCLC has been made, and molecular-targeted drug application has significantly improved the survival time of patients with gene mutations [5]. Reportedly, mutations in the EGFR gene lead to the continuous activation of the tyrosine kinase domain of the epidermal growth factor receptor (EGFR) in lung cancer tumor tissue. Therefore, EGFR mutation status is a key factor in determining the therapeutic efficacy of EGFR tyrosine kinase inhibitors (EGFR-TKIs) [6]. Riely et al. [7] found that the response rate to EGFR-TKIs in EGFR-mutant patients (60–80%) was significantly higher than in EGFR wild-type or unknown EGFR mutations (10–20%). Furthermore, several clinical trials have shown that, compared with EGFR wild-type patients, EGFR mutant patients with NSCLC who receive EGFR-TKIs treatment have improved 2-year progression-free survival (PFS) with few relapses. Moreover, patients who receive EGFR-TKIs again after recurrence can also benefit sustainably [8]. However, current research on EGFR-TKI treatment primarily focuses on stage II and III patients [9–11], and better clinical benefits are achieved when the treatment is applied early [12]. Therefore, early and accurate differentiation between EGFR-mutant and wild-type patients is crucial for achieving early, combined, and personalized precision treatment.
Currently, EGFR mutations in NSCLC are mostly detected via puncture or postoperative histopathological biopsy. However, owing to the extensive heterogeneity of tumors, biopsies for detecting EGFR mutations should be performed to accurately locate the affected tissue area, which may increase the risk of cancer metastasis. Difficulties, such as repeated sampling, poor tissue sampling, and high costs, limit the applicability of pathological biopsies [13]. Circulating tumor DNA analysis is an emerging tool for assessing EGFR mutation status [14]. However, its high false-negative rate and cost limit its widespread clinical application [15]. Therefore, developing a non-invasive, simple, rapid, and reliable detection method is urgently required to accurately diagnose EGFR. Recently, with the advent of the big data era, artificial intelligence (AI) has shown unparalleled advantages in mining medical image information. It is expected to be used to extract key imaging markers and dominant features related to EGFR mutations from multimodal and multi-parameter imaging data, enabling the prediction of EGFR mutation status in patients with NSCLC. Radiomics is an emerging non-invasive technology that involves using medical imaging analysis and data mining methods. It extracts a large amount of high-throughput quantitative information, combines clinical data to establish predictive models through machine learning and other methods, and ultimately, guides clinical decision-making, which has been widely used for tumor diagnosis [16]. In radiomics, quantitative imaging features, such as intensity, shape, texture, and wavelets, are mined from medical imaging images. By combining medical statistics and machine learning, key imaging radiomics can be screened to evaluate tumor heterogeneity and provide valuable information for tumor grading, staging, treatment, efficacy evaluation, and survival prediction [17]. Radiomics provides noninvasive preoperative mutation prediction, avoiding unnecessary surgery [18], as well as builds an accurate prognosis prediction model, which can accurately predict PFS and overall survival of patients with EGFR mutation after TKI treatment [19]. Notably, three-dimensional (3D) features can be used to quantify tumor heterogeneity (such as entropy and GLCM contrast), so as to achieve full tumor coverage and dynamic monitoring [20]. Many studies [21–25]have revealed a potential relationship between EGFR mutation status and radiomics features. The quality and performance of the radiomics model in predicting EGFR mutation status in patients with NSCLC were previously evaluated [24], and the results showed that the model had high accuracy in predicting EGFR mutation status, with an area under the curve (AUC) of 0.801. Another analysis on the value of radiomics in predicting EGFR gene mutations in NSCLC also confirmed [25] that the AUC for predicting EGFR mutations in NSCLC using radiomics in the training cohort was 0.84; the AUC in the validation cohort was 0.82. However, there are currently only a few reports involving magnetic resonance imaging (MRI), and these studies have mainly focused on patients with advanced lung cancer.
Therefore, the purpose of this study was to establish and validate a predictive model based on multiparametric MRI radiomics features combined with clinical pathological factors to predict EGFR mutation status in patients with early NSCLC, thereby guiding personalized targeted therapy.
Lung cancer is a common malignant tumor worldwide, with the highest mortality rate worldwide [1]. In China, approximately 733,000 new cases of lung cancer occur annually, and more than 610,000 people die from this disease [2]. According to the latest national cancer statistics released by the National Cancer Center in 2022, the incidence and mortality rates of lung cancer were highest in China [3], accounting for 37% of new cases and 39% of deaths worldwide. Non-small cell lung cancer (NSCLC) accounts for 80–90% of all primary lung cancers. Approximately 50% of NSCLC patients are diagnosed at an advanced stage (stage III or IV), and the 5-year survival rate is only 18% [4]. Recently, a breakthrough in the treatment of patients with advanced NSCLC has been made, and molecular-targeted drug application has significantly improved the survival time of patients with gene mutations [5]. Reportedly, mutations in the EGFR gene lead to the continuous activation of the tyrosine kinase domain of the epidermal growth factor receptor (EGFR) in lung cancer tumor tissue. Therefore, EGFR mutation status is a key factor in determining the therapeutic efficacy of EGFR tyrosine kinase inhibitors (EGFR-TKIs) [6]. Riely et al. [7] found that the response rate to EGFR-TKIs in EGFR-mutant patients (60–80%) was significantly higher than in EGFR wild-type or unknown EGFR mutations (10–20%). Furthermore, several clinical trials have shown that, compared with EGFR wild-type patients, EGFR mutant patients with NSCLC who receive EGFR-TKIs treatment have improved 2-year progression-free survival (PFS) with few relapses. Moreover, patients who receive EGFR-TKIs again after recurrence can also benefit sustainably [8]. However, current research on EGFR-TKI treatment primarily focuses on stage II and III patients [9–11], and better clinical benefits are achieved when the treatment is applied early [12]. Therefore, early and accurate differentiation between EGFR-mutant and wild-type patients is crucial for achieving early, combined, and personalized precision treatment.
Currently, EGFR mutations in NSCLC are mostly detected via puncture or postoperative histopathological biopsy. However, owing to the extensive heterogeneity of tumors, biopsies for detecting EGFR mutations should be performed to accurately locate the affected tissue area, which may increase the risk of cancer metastasis. Difficulties, such as repeated sampling, poor tissue sampling, and high costs, limit the applicability of pathological biopsies [13]. Circulating tumor DNA analysis is an emerging tool for assessing EGFR mutation status [14]. However, its high false-negative rate and cost limit its widespread clinical application [15]. Therefore, developing a non-invasive, simple, rapid, and reliable detection method is urgently required to accurately diagnose EGFR. Recently, with the advent of the big data era, artificial intelligence (AI) has shown unparalleled advantages in mining medical image information. It is expected to be used to extract key imaging markers and dominant features related to EGFR mutations from multimodal and multi-parameter imaging data, enabling the prediction of EGFR mutation status in patients with NSCLC. Radiomics is an emerging non-invasive technology that involves using medical imaging analysis and data mining methods. It extracts a large amount of high-throughput quantitative information, combines clinical data to establish predictive models through machine learning and other methods, and ultimately, guides clinical decision-making, which has been widely used for tumor diagnosis [16]. In radiomics, quantitative imaging features, such as intensity, shape, texture, and wavelets, are mined from medical imaging images. By combining medical statistics and machine learning, key imaging radiomics can be screened to evaluate tumor heterogeneity and provide valuable information for tumor grading, staging, treatment, efficacy evaluation, and survival prediction [17]. Radiomics provides noninvasive preoperative mutation prediction, avoiding unnecessary surgery [18], as well as builds an accurate prognosis prediction model, which can accurately predict PFS and overall survival of patients with EGFR mutation after TKI treatment [19]. Notably, three-dimensional (3D) features can be used to quantify tumor heterogeneity (such as entropy and GLCM contrast), so as to achieve full tumor coverage and dynamic monitoring [20]. Many studies [21–25]have revealed a potential relationship between EGFR mutation status and radiomics features. The quality and performance of the radiomics model in predicting EGFR mutation status in patients with NSCLC were previously evaluated [24], and the results showed that the model had high accuracy in predicting EGFR mutation status, with an area under the curve (AUC) of 0.801. Another analysis on the value of radiomics in predicting EGFR gene mutations in NSCLC also confirmed [25] that the AUC for predicting EGFR mutations in NSCLC using radiomics in the training cohort was 0.84; the AUC in the validation cohort was 0.82. However, there are currently only a few reports involving magnetic resonance imaging (MRI), and these studies have mainly focused on patients with advanced lung cancer.
Therefore, the purpose of this study was to establish and validate a predictive model based on multiparametric MRI radiomics features combined with clinical pathological factors to predict EGFR mutation status in patients with early NSCLC, thereby guiding personalized targeted therapy.
Methods
Methods
Patient cohort and image data
The Institutional Review Board of the First Affiliated Hospital of Nanjing Medical University approved this retrospective study (No. 2022-NT-11). The study included 92 patients with histopathologically confirmed malignant pulmonary nodules at the First Affiliated Hospital of Nanjing Medical University between August 2019 and May 2021; patients were enrolled consecutively during this period. The diagnoses of 69 patients were confirmed by surgical pathology, and 23 by computed tomography (CT)-guided biopsy. Among 92 cases of malignant nodules, the pathological subtypes were adenocarcinoma in 61 cases, squamous cell carcinoma in 4 cases, and 27 cases that were not otherwise specified (NOS). We included (a) patients who underwent MRI examination within 1 month before surgery or biopsy, (b) those who did not receive antitumor treatment before MRI examination, (c) those whose diagnoses were confirmed by surgery or biopsy pathology, and (d) EGFR mutation detection results. Patients with the following conditions were excluded: (a) no EGFR mutation test results (n = 1), and (b) poor MRI image quality or missing image data (n = 0). Ultimately, 91 patients were included in the study and randomly assigned to the training (n = 72) and validation (n = 19) cohorts, with an 8:2 ratio (Fig. 1). Baseline clinical and pathological information of the patients was obtained from the medical record system, including age, sex, and lymph node metastasis (Table 1).
MRI image acquisition and analysis
The MRI scans of all patients were performed using a Siemens 3.0 T MR scanner (Verio Tim) and a 16-channel body phased array coil. Conventional MR scans include axial T1-weighted imaging with a repetition time (TR) and echo time (TE) of 140/2.5 ms and axial free-breathing BLADE T2-weighted imaging (TR/TE, 1200/93 ms). The dynamic contrast-enhanced (DCE) scan included a 3D volumetric interpolated breath-hold examination sequence, TR/TE = 3.19/1.13 ms, layer thickness = 3 mm, field of view = 400 mm², matrix = 160 × 224, and flip angle = 15°. Through elbow vein puncture, a high-pressure injector was used to inject 0.1 mmol/kg of gadopentosamide (GE Health Care) at a flow rate of 4.0 mL/s. Subsequently, 20 mL of saline was injected at the same rate. DCE-MR consists of four baseline and 31 enhanced images. The time resolution was 8.8 s, and the total acquisition time was 5 min 33 s. After completing DCE MR, another set of enhanced T1-weighted images was acquired.
EGFR gene testing
EGFR mutation detection results were obtained from surgically removed or biopsied tumor tissue specimens. The amplification blockade mutation system polymerase chain reaction (ARMS-PCR) method was used to detect mutations in four exons (18–21) of the EGFR coding region. The results were determined based on the interpretation principles provided in the detection kit. If any exon mutation was detected, the tumor was identified as an EGFR mutant; otherwise, it was identified as a wild-type EGFR.
Image segmentation and radiomics feature extraction
We used the default parameter configuration of Meteorology in feature extraction: resampling function was off because we had resampled the image before feature extraction, gray discretization bin-width = 25, among others. This part of the code is as follows: radiomics.featureextractor. RadiomicsFeatureExtractor(). The extraction of radiomics features was conducted using the Image Biomarker Standardization Initiative (IBSI), and Syngo via Frontier 1.2.1, VB10B version (Siemens Healthineers, Germany) complied with IBSI. When the two chest imaging radiologists could not determine the patient’s clinical information, a radiologist with 10 years of experience in chest diagnosis initially semi-automatically delineated the volume of interest (VOI) around the tumor, and another radiologist with 4 years of experience in chest diagnosis confirmed it. First, enhanced T1WI (contrast-enhanced T1 weighted, CET1w) and T2WI (T2 weighted, T2w) images were imported into the radiomics prototype software (Radiomics, Frontier, Siemens), and several segmentation tools were used in the radiomics segmentation module for the semi-automatic segmentation of tumors. Using semi-automatic segmentation, the tumor boundary line was manually drawn; subsequently, using the random walk algorithm, adjacent voxels with the same grayscale as the drawn boundary line were automatically identified in 3D space, ultimately generating segmentation results for solid or subsolid lung lesions. The most effective method for obtaining a voxel set for a portion of a lesion is to implement a 3D region-growing algorithm starting from the center point of the region of interest (ROI). Subsequently, based on the density distribution of the grayscale values in the ROI, the threshold of the lesion was adaptively determined. The region-growing algorithm was initially used to obtain the complete lesion area and additional partial vascular system; subsequently, morphological segmentation algorithms were used to remove the blood vessels from the segmentation results. If the segmentation is incorrect, the operator can manually correct the sketched results in 3D using a radiomics software prototype. A linear interpolation algorithm was used to resample all MRI images to an isotropic voxel size of 1.00 × 1.00 × 1.00 mm3, with B-spline interpolation. After image preprocessing, 854 radiomics features were extracted from the tumor VOI on two MR sequences (CET1w and T2W images, based on PyRadiomics). The extracted radiomics features mainly included (a) intensity statistical features, which quantitatively describe the distribution of voxel intensity in MRI images using commonly used measures, (b) the shape and size features that reflect the shape- and size-related information of the ROI, (c) texture features, which can be used to quantify the heterogeneity of the ROI (obtained using the gray run length and gray-level co-occurrence texture matrix), and (d) features of higher-order statistics using exponents, logarithms, square roots, wavelets (including wavelet-LHL, wavelet-LHH, wavelet-HLL, wavelet-LLH, wavelet-HLH, wavelet-HHH, wavelet-HHL, and wavelet-LLL), and other filters to recalculate the intensity, texture, and other features of the transformed image. For the detailed characteristic calculation formula, please refer to the website https://pyradiomics.Readthedocs.io/en/latest/.html (Fig. 2).
Selection of radiomics features
A random number seed generated with a computer was used to split the dataset into 80% for the training cohort and 20% for the verification cohort. The most relevant radiomics features of the mutation types were selected from the training cohort. Before feature selection, all radiomics features were standardized: the mean value was removed, divided by its standard deviation, and each group of eigenvalues was converted into standardized data with a mean value of 0 and a variance of 1, the above method was also called Z-score normalization. Spearman correlation analysis was used to eliminate the features with a correlation above 0.9 because the number of radiomics features was large. The following methods were used to reduce the dimension of the extracted MRI radiomics features and avoid the problems of model over-fitting and multicollinearity: first, the variance threshold method was used to reduce the dimension of the features, deleting features with variance < 0.8; second, the univariate selection method was used to screen the features that were not significant (p > 0.05 was deleted); and finally, the LASSO regression algorithm was used to fit the most relevant indicators with the research objectives, and the weights of these indicators were obtained. After feature dimensionality reduction, the Rad-score of each patient (T2WI_Rad-score and CET1w_Rad-score) was calculated. This score was considered the comprehensive embodiment of radiomics features and was included in the subsequent model construction.
The Rad-score was calculated as follows:
The intercept and feature[i] are the eigenvalues and fitting intercept terms generated in the LASSO regression, respectively, and the coefficient[i] is the feature weight coefficient generated in the LASSO regression.
Construction and training of the radiomics prediction model
The T2w_ Rad-Score and CET1w_Rad-Score were used as inputs to construct the differential diagnostic model of EGFR mutation status. All the data were randomly divided into training and validation cohorts. A training cohort was used to train the model, and a validation cohort was used to evaluate its generalization ability. The differential diagnosis model for EGFR mutations was constructed using logistic regression, including the following radiomics models: T2W, the CET1w, and the T2W and CET1w combined. Receiver operating characteristic curves (ROC) and AUC were plotted to evaluate the accuracy, precision, sensitivity, and specificity of the model.
Analysis of clinicopathological factors and construction of the prediction model
Clinical factors were initially analyzed using univariate analysis to screen for significant clinical factors related to the EGFR mutation status. A bilateral p < 0.05 was considered statistically significant. Clinical factors significantly related to EGFR mutation status were selected as inputs to build a clinical model for its diagnosis, which was constructed using a logistic regression algorithm. An ROC curve was generated, and the AUC was calculated to evaluate the model’s accuracy, precision, sensitivity, and specificity.
Establishment of radiomics nomogram
A radiomics nomogram combined with T2w_ Rad-Score, CET1w_Rad-Score, and significant clinicopathological factors was established based on a multivariate logistic regression model. ROC curves and AUC values were drawn for the nomogram model. At the same time, the calibration curve of the nomogram was drawn to evaluate the calibration and identification ability of the nomogram. Decision curve analysis was performed to quantify the net benefit of the nomogram model to evaluate its clinical practicability; the accuracy, precision, sensitivity, and specificity of the comprehensive model were evaluated.
Statistical analysis
In the analysis of clinical data, the classified variables were described using frequency and constituent ratios. Continuous variables were described and compared depending on whether the data followed a normal distribution. The Kolmogorov–Smirnov test was used to determine the normality of the data. Normally distributed data are expressed using the mean and standard deviation (mean ± SD); these variables were compared using the independent-sample t-test. SD is an important indicator of the dispersion of a group of data relative to its mean. Non-normally distributed data are expressed using the median (interquartile range [IQR]). The Mann–Whitney U test was used to compare these variables. The AUC and 95% confidence interval (95% CI) were used to evaluate the model’s effectiveness, and the DeLong test was used to compare the diagnostic ability between different models. In the statistical test, bilateral p-values < 0.05 were considered statistically significant. All statistical analyses were performed using R software (version 4.3.0; https://www.r-project.org) and PyRadiomics (version 3.8.0).
Patient cohort and image data
The Institutional Review Board of the First Affiliated Hospital of Nanjing Medical University approved this retrospective study (No. 2022-NT-11). The study included 92 patients with histopathologically confirmed malignant pulmonary nodules at the First Affiliated Hospital of Nanjing Medical University between August 2019 and May 2021; patients were enrolled consecutively during this period. The diagnoses of 69 patients were confirmed by surgical pathology, and 23 by computed tomography (CT)-guided biopsy. Among 92 cases of malignant nodules, the pathological subtypes were adenocarcinoma in 61 cases, squamous cell carcinoma in 4 cases, and 27 cases that were not otherwise specified (NOS). We included (a) patients who underwent MRI examination within 1 month before surgery or biopsy, (b) those who did not receive antitumor treatment before MRI examination, (c) those whose diagnoses were confirmed by surgery or biopsy pathology, and (d) EGFR mutation detection results. Patients with the following conditions were excluded: (a) no EGFR mutation test results (n = 1), and (b) poor MRI image quality or missing image data (n = 0). Ultimately, 91 patients were included in the study and randomly assigned to the training (n = 72) and validation (n = 19) cohorts, with an 8:2 ratio (Fig. 1). Baseline clinical and pathological information of the patients was obtained from the medical record system, including age, sex, and lymph node metastasis (Table 1).
MRI image acquisition and analysis
The MRI scans of all patients were performed using a Siemens 3.0 T MR scanner (Verio Tim) and a 16-channel body phased array coil. Conventional MR scans include axial T1-weighted imaging with a repetition time (TR) and echo time (TE) of 140/2.5 ms and axial free-breathing BLADE T2-weighted imaging (TR/TE, 1200/93 ms). The dynamic contrast-enhanced (DCE) scan included a 3D volumetric interpolated breath-hold examination sequence, TR/TE = 3.19/1.13 ms, layer thickness = 3 mm, field of view = 400 mm², matrix = 160 × 224, and flip angle = 15°. Through elbow vein puncture, a high-pressure injector was used to inject 0.1 mmol/kg of gadopentosamide (GE Health Care) at a flow rate of 4.0 mL/s. Subsequently, 20 mL of saline was injected at the same rate. DCE-MR consists of four baseline and 31 enhanced images. The time resolution was 8.8 s, and the total acquisition time was 5 min 33 s. After completing DCE MR, another set of enhanced T1-weighted images was acquired.
EGFR gene testing
EGFR mutation detection results were obtained from surgically removed or biopsied tumor tissue specimens. The amplification blockade mutation system polymerase chain reaction (ARMS-PCR) method was used to detect mutations in four exons (18–21) of the EGFR coding region. The results were determined based on the interpretation principles provided in the detection kit. If any exon mutation was detected, the tumor was identified as an EGFR mutant; otherwise, it was identified as a wild-type EGFR.
Image segmentation and radiomics feature extraction
We used the default parameter configuration of Meteorology in feature extraction: resampling function was off because we had resampled the image before feature extraction, gray discretization bin-width = 25, among others. This part of the code is as follows: radiomics.featureextractor. RadiomicsFeatureExtractor(). The extraction of radiomics features was conducted using the Image Biomarker Standardization Initiative (IBSI), and Syngo via Frontier 1.2.1, VB10B version (Siemens Healthineers, Germany) complied with IBSI. When the two chest imaging radiologists could not determine the patient’s clinical information, a radiologist with 10 years of experience in chest diagnosis initially semi-automatically delineated the volume of interest (VOI) around the tumor, and another radiologist with 4 years of experience in chest diagnosis confirmed it. First, enhanced T1WI (contrast-enhanced T1 weighted, CET1w) and T2WI (T2 weighted, T2w) images were imported into the radiomics prototype software (Radiomics, Frontier, Siemens), and several segmentation tools were used in the radiomics segmentation module for the semi-automatic segmentation of tumors. Using semi-automatic segmentation, the tumor boundary line was manually drawn; subsequently, using the random walk algorithm, adjacent voxels with the same grayscale as the drawn boundary line were automatically identified in 3D space, ultimately generating segmentation results for solid or subsolid lung lesions. The most effective method for obtaining a voxel set for a portion of a lesion is to implement a 3D region-growing algorithm starting from the center point of the region of interest (ROI). Subsequently, based on the density distribution of the grayscale values in the ROI, the threshold of the lesion was adaptively determined. The region-growing algorithm was initially used to obtain the complete lesion area and additional partial vascular system; subsequently, morphological segmentation algorithms were used to remove the blood vessels from the segmentation results. If the segmentation is incorrect, the operator can manually correct the sketched results in 3D using a radiomics software prototype. A linear interpolation algorithm was used to resample all MRI images to an isotropic voxel size of 1.00 × 1.00 × 1.00 mm3, with B-spline interpolation. After image preprocessing, 854 radiomics features were extracted from the tumor VOI on two MR sequences (CET1w and T2W images, based on PyRadiomics). The extracted radiomics features mainly included (a) intensity statistical features, which quantitatively describe the distribution of voxel intensity in MRI images using commonly used measures, (b) the shape and size features that reflect the shape- and size-related information of the ROI, (c) texture features, which can be used to quantify the heterogeneity of the ROI (obtained using the gray run length and gray-level co-occurrence texture matrix), and (d) features of higher-order statistics using exponents, logarithms, square roots, wavelets (including wavelet-LHL, wavelet-LHH, wavelet-HLL, wavelet-LLH, wavelet-HLH, wavelet-HHH, wavelet-HHL, and wavelet-LLL), and other filters to recalculate the intensity, texture, and other features of the transformed image. For the detailed characteristic calculation formula, please refer to the website https://pyradiomics.Readthedocs.io/en/latest/.html (Fig. 2).
Selection of radiomics features
A random number seed generated with a computer was used to split the dataset into 80% for the training cohort and 20% for the verification cohort. The most relevant radiomics features of the mutation types were selected from the training cohort. Before feature selection, all radiomics features were standardized: the mean value was removed, divided by its standard deviation, and each group of eigenvalues was converted into standardized data with a mean value of 0 and a variance of 1, the above method was also called Z-score normalization. Spearman correlation analysis was used to eliminate the features with a correlation above 0.9 because the number of radiomics features was large. The following methods were used to reduce the dimension of the extracted MRI radiomics features and avoid the problems of model over-fitting and multicollinearity: first, the variance threshold method was used to reduce the dimension of the features, deleting features with variance < 0.8; second, the univariate selection method was used to screen the features that were not significant (p > 0.05 was deleted); and finally, the LASSO regression algorithm was used to fit the most relevant indicators with the research objectives, and the weights of these indicators were obtained. After feature dimensionality reduction, the Rad-score of each patient (T2WI_Rad-score and CET1w_Rad-score) was calculated. This score was considered the comprehensive embodiment of radiomics features and was included in the subsequent model construction.
The Rad-score was calculated as follows:
The intercept and feature[i] are the eigenvalues and fitting intercept terms generated in the LASSO regression, respectively, and the coefficient[i] is the feature weight coefficient generated in the LASSO regression.
Construction and training of the radiomics prediction model
The T2w_ Rad-Score and CET1w_Rad-Score were used as inputs to construct the differential diagnostic model of EGFR mutation status. All the data were randomly divided into training and validation cohorts. A training cohort was used to train the model, and a validation cohort was used to evaluate its generalization ability. The differential diagnosis model for EGFR mutations was constructed using logistic regression, including the following radiomics models: T2W, the CET1w, and the T2W and CET1w combined. Receiver operating characteristic curves (ROC) and AUC were plotted to evaluate the accuracy, precision, sensitivity, and specificity of the model.
Analysis of clinicopathological factors and construction of the prediction model
Clinical factors were initially analyzed using univariate analysis to screen for significant clinical factors related to the EGFR mutation status. A bilateral p < 0.05 was considered statistically significant. Clinical factors significantly related to EGFR mutation status were selected as inputs to build a clinical model for its diagnosis, which was constructed using a logistic regression algorithm. An ROC curve was generated, and the AUC was calculated to evaluate the model’s accuracy, precision, sensitivity, and specificity.
Establishment of radiomics nomogram
A radiomics nomogram combined with T2w_ Rad-Score, CET1w_Rad-Score, and significant clinicopathological factors was established based on a multivariate logistic regression model. ROC curves and AUC values were drawn for the nomogram model. At the same time, the calibration curve of the nomogram was drawn to evaluate the calibration and identification ability of the nomogram. Decision curve analysis was performed to quantify the net benefit of the nomogram model to evaluate its clinical practicability; the accuracy, precision, sensitivity, and specificity of the comprehensive model were evaluated.
Statistical analysis
In the analysis of clinical data, the classified variables were described using frequency and constituent ratios. Continuous variables were described and compared depending on whether the data followed a normal distribution. The Kolmogorov–Smirnov test was used to determine the normality of the data. Normally distributed data are expressed using the mean and standard deviation (mean ± SD); these variables were compared using the independent-sample t-test. SD is an important indicator of the dispersion of a group of data relative to its mean. Non-normally distributed data are expressed using the median (interquartile range [IQR]). The Mann–Whitney U test was used to compare these variables. The AUC and 95% confidence interval (95% CI) were used to evaluate the model’s effectiveness, and the DeLong test was used to compare the diagnostic ability between different models. In the statistical test, bilateral p-values < 0.05 were considered statistically significant. All statistical analyses were performed using R software (version 4.3.0; https://www.r-project.org) and PyRadiomics (version 3.8.0).
Results
Results
Clinical features
The demographic and clinicopathological characteristics of the patients in the training and validation cohorts are presented in Table 1. The patients were randomly divided into training and validation cohorts, with 72 and 19 patients, respectively. Sex or age did not significantly differ between the training and validation cohorts (p > 0.05). The other clinicopathological features are shown in Table 1, which shows that the baseline data between the training and validation cohorts were balanced.
Selection of important radiomics features and establishment of radiomics signature
In total, 854 radiomics features were extracted using T2WI. First, the variance threshold method was used to screen out features with variances < 0.8, yielding 851 features. Features with an insignificant difference (p < 0.05) were screened out using the univariate selection method, yielding 49 features. Finally, the LASSO algorithm was used to fit all the features based on gene mutation types (Fig. 3a). Seven important radiomics features were screened, and the Rad-score was calculated. The screened radiomics features were as follows (Fig. 4). On the other hand, 854 radiomics features were extracted from CET1w. Using the same methods above, four important radiomics features were screened (Fig. 5), and the Rad-score was calculated (Fig. 3b). The screened radiomics features are as follows (Fig. 5): wavelet-LHH_firstorder_ kurtosis, wavelet-LLLglcm_DifferenceEntropy, wavelet-LLL_glcm_Imc2, and wavelet-LLH_glcm_InverseVariance. A total of 1708 radiomics features were extracted from the T2WI and CET1w images. Using the same methods, 13 important radiomics features were screened, and the Rad-score was calculated (Fig. 3c). The screened radiomics features are as follows (Fig. 6).
Analysis of clinicopathological factors
Multivariate logistic regression analysis of the clinicopathological factors showed that sex (OR = 0.155) and maximum diameter (OR = 0.716) were independent predictors (Table 2).
Construction and prediction efficiency of multimodal prediction model
The AUC of the training and validation cohorts for the prediction model based on the T2WI radiomics signature were 0.780 (95% CI: 0.653–0.871) and 0.632 (95% CI: 0.417–0.904), respectively. Meanwhile, both AUCs were 0.725 (95% CI: 0.591–0.826) and 0.684 (95% CI: 0.478–0.857), respectively, for the prediction model based on the CET1w radiomics signature. Nonetheless, the AUCs were 0.846 (95%CI:0.742–0.919) and 0.808 (95% CI: 0.571–0.941), respectively, when T2WI was combined with the CET1w. A clinical model was established using the independent predictive factors (sex and maximum diameter) with AUCs of 0.720 (95% CI: 0.595–0.819) and 0.679 (95% CI: 0.450–0.871) for the training and validation cohorts, respectively. The combined model was constructed using multivariate logistic regression analysis based on T2WI–CET1w radiomics signature combination and independent predictors (sex and maximum diameter). This combined model had AUCs of 0.880 (95% CI: 0.790–0.948) and 0.859 (95% CI: 0.657–0.983), in the training and validation cohorts, respectively (Table 3). The performance of different models was compared using the Delong test (Table 4). Figure 7a and b show the AUCs of different prediction models in the training and validation cohorts. Based on the multivariate logistic regression analysis, independent predictors (Rad-Score, sex, and maximum diameter) were screened to construct a radiomics nomogram (Fig. 8). The calibration curve showed that the nomogram prediction results were in good agreement with the pathological results (Fig. 9a, b). Decision curve analysis showed that, within a reasonable threshold range, the combined model had a higher overall net benefit than those of the T2WI, CET1w, T2WI–CET1w combination, and clinical models (Fig. 10).
Clinical features
The demographic and clinicopathological characteristics of the patients in the training and validation cohorts are presented in Table 1. The patients were randomly divided into training and validation cohorts, with 72 and 19 patients, respectively. Sex or age did not significantly differ between the training and validation cohorts (p > 0.05). The other clinicopathological features are shown in Table 1, which shows that the baseline data between the training and validation cohorts were balanced.
Selection of important radiomics features and establishment of radiomics signature
In total, 854 radiomics features were extracted using T2WI. First, the variance threshold method was used to screen out features with variances < 0.8, yielding 851 features. Features with an insignificant difference (p < 0.05) were screened out using the univariate selection method, yielding 49 features. Finally, the LASSO algorithm was used to fit all the features based on gene mutation types (Fig. 3a). Seven important radiomics features were screened, and the Rad-score was calculated. The screened radiomics features were as follows (Fig. 4). On the other hand, 854 radiomics features were extracted from CET1w. Using the same methods above, four important radiomics features were screened (Fig. 5), and the Rad-score was calculated (Fig. 3b). The screened radiomics features are as follows (Fig. 5): wavelet-LHH_firstorder_ kurtosis, wavelet-LLLglcm_DifferenceEntropy, wavelet-LLL_glcm_Imc2, and wavelet-LLH_glcm_InverseVariance. A total of 1708 radiomics features were extracted from the T2WI and CET1w images. Using the same methods, 13 important radiomics features were screened, and the Rad-score was calculated (Fig. 3c). The screened radiomics features are as follows (Fig. 6).
Analysis of clinicopathological factors
Multivariate logistic regression analysis of the clinicopathological factors showed that sex (OR = 0.155) and maximum diameter (OR = 0.716) were independent predictors (Table 2).
Construction and prediction efficiency of multimodal prediction model
The AUC of the training and validation cohorts for the prediction model based on the T2WI radiomics signature were 0.780 (95% CI: 0.653–0.871) and 0.632 (95% CI: 0.417–0.904), respectively. Meanwhile, both AUCs were 0.725 (95% CI: 0.591–0.826) and 0.684 (95% CI: 0.478–0.857), respectively, for the prediction model based on the CET1w radiomics signature. Nonetheless, the AUCs were 0.846 (95%CI:0.742–0.919) and 0.808 (95% CI: 0.571–0.941), respectively, when T2WI was combined with the CET1w. A clinical model was established using the independent predictive factors (sex and maximum diameter) with AUCs of 0.720 (95% CI: 0.595–0.819) and 0.679 (95% CI: 0.450–0.871) for the training and validation cohorts, respectively. The combined model was constructed using multivariate logistic regression analysis based on T2WI–CET1w radiomics signature combination and independent predictors (sex and maximum diameter). This combined model had AUCs of 0.880 (95% CI: 0.790–0.948) and 0.859 (95% CI: 0.657–0.983), in the training and validation cohorts, respectively (Table 3). The performance of different models was compared using the Delong test (Table 4). Figure 7a and b show the AUCs of different prediction models in the training and validation cohorts. Based on the multivariate logistic regression analysis, independent predictors (Rad-Score, sex, and maximum diameter) were screened to construct a radiomics nomogram (Fig. 8). The calibration curve showed that the nomogram prediction results were in good agreement with the pathological results (Fig. 9a, b). Decision curve analysis showed that, within a reasonable threshold range, the combined model had a higher overall net benefit than those of the T2WI, CET1w, T2WI–CET1w combination, and clinical models (Fig. 10).
Discussion
Discussion
In this study, we established five prediction models based on multiparametric MRI to extract radiomics features combined with clinicopathological factors. We also classified EGFR mutation status in NSCLC, and compared the classification efficiencies of the five prediction models. The results showed that the five prediction models could effectively classify EGFR mutation status in NSCLC. However, the combined prediction model, incorporating radiomics scores with independent predictors, had the best classification efficiency. The AUCs of the training and validation cohorts were 0.880 and 0.859, respectively, showing that our model can be effectively used to diagnose and classify EGFR mutation status in NSCLC. Compared with the clinical or single radiomics model, the comprehensive and multiparametric radiomics models showed a higher AUC value, while the DeLong test showed that the difference was statistically significant, indicating that combining multiple sequences effectively enhanced the diagnostic ability of the model. However, the sample size was limited, and the DeLong test showed that the AUCs for the comprehensive prediction and the multiparametric radiomics model differed but were not statistically significant (p > 0.05).
Following EGFR-TKIs treatment, the median survival time of patients with EGFR mutation can reach 25 months, which can improve their quality of life [26]. The objective remission rate is approximately 70–80%, and the PFS can reach 9–14 months. Therefore, EGFR-TKIs have become the standard first-line treatment for patients with EGFR-mutant NSCLC [27]. However, its clinical efficacy in patients with wild-type EGFR is poor, possibly leading to drug resistance and related adverse reactions [28], resulting in poor clinical applicability. Currently, determining EGFR mutation status mainly depends on gene detection, which is expensive and time-consuming. Early and accurate prediction of this status in patients with early-stage NSCLC can more accurately guide the clinical selection of targeted therapy populations. Therefore, screening for more comprehensive and effective lesion heterogeneity and microenvironment characteristics through noninvasive examination and achieving accurate prediction of EGFR mutation status in NSCLC are the key scientific problems addressed in this study.
Emerging AI technologies, including deep learning feature extraction and segmentation technology, deep survival analysis, and radiomics analysis, can be used to quantitatively extract and accurately segment the depth features of the lung tumor lesion area, simulate the nonlinear risk score function for survival analysis modeling, extract the high-throughput features contained in the image, quantitatively describe the heterogeneity of the lesion, achieve accurate prediction of disease recurrence and prognosis, non-invasively and comprehensively quantify the heterogeneity and microenvironment of the lesion. These capabilities provide guidance for the in-depth mining and application of medical images, which will aid in selecting an appropriate, effective, and personalized treatment scheme. However, currently, applying radiomics and deep learning to the study of lung lesions primarily focuses on CT or positron emission tomography/CT images. Felfli [24] and Ma [25], in their meta-analysis, found that, in most CT radiomics studies, only plain-scan image features of the lesions were extracted, which have certain limitations. The latest meta-analysis [29]on the diagnostic accuracy of MRI radiomics features for predicting the EGFR mutation status in patients with NSCLC who have brain metastases showed that the aggregate sensitivity and specificity of MRI radiomics features for detecting EGFR mutation were 0.86 and 0.83%, respectively, which is consistent with CT and 18F-FDG positron-emission tomography (PET)/CT radiomics. The main reason is that MRI offers multiple sequences and parameters. Different sequences—such as T1WI, T2WI, and diffusion-weighted imaging (DWI)—can reflect the shape, structure, metabolism, and function of the tumors from different angles and comprehensively capture their characteristics. Combining multiple sequences can more comprehensively capture tumor heterogeneity and improve the model’s ability to describe tumor characteristics, integrating complementary information from multiple sequences, reducing the limitations of a single sequence. Li et al. [30] also confirmed these findings. In an experiment to distinguish between T790M resistance and the absence of a T790M mutation in patients with NSCLC who have brain metastasis, a prediction model was established using radiomics features extracted from T2WI, T2 fluid attenuation inversion recovery (T2-FLAIR), and DWI and T1-CE sequences. The AUCs in the training and validation cohorts were 0.886 and 0.850, respectively. Furthermore, a multicenter study [31] showed that combining CT and MRI dual-mode radiomics to predict EGFR status in patients with brain metastases from NSCLC showed good calibration and differentiation abilities in the internal (AUC = 0.866) and external test cohorts (AUC = 0.818). At the same time, compared with the advantages of multi-modal and multi-sequence MRI, extracting different modal features from CT images can also help achieve the corresponding effect. Zhang X [32]combined CT images and clinical-pathological data, and the resulting “deep radscore” learning radiomics model—based on the data from the 3D tumor region—achieved impressive AUCs in predicting EGFR mutation (AUC = 0.884). Zhang G et al. [33]constructed the radiomics model from the radiomics features extracted from multi-phase CT (non-enhanced and enhanced CT, including arterial phase and venous phase CT). The results showed good performance in identifying EGFR mutation status in patients with lung adenocarcinoma (AUC = 0.925) and good consistency with the prediction performance of the clinical radiomics comprehensive model (AUC = 0.927). Compared with those in the MRI correlation studies, the T2WI plain MRI scan images, as well as the T1-enhanced scan features combined with clinical pathological factors, were extracted in this study. The final joint prediction model showed the best performance. The AUCs of the training and validation cohorts were 0.880 and 0.859, respectively, which were better than those of the single radiomics or clinical models. However, compared with those in relevant CT studies, our AUC value was slightly lower. In the diagnosis and treatment path of NSCLC, CT, with its excellent lung spatial resolution, fast scanning speed, and highly standardized process, serves as a first-line imaging technique for screening, diagnosis, and staging primary lung lesions. In contrast, MRI is not a conventional method for evaluating primary lesions because it poorly displays aerated lung tissue, is susceptible to motion artifacts, and is costly to perform. However, the non-radiation advantage of MRI, and its unparalleled soft-tissue resolution and multiparametric functional imaging capabilities have made it irreplaceable in specific clinical scenarios. The main indications include evaluating brain/spinal metastasis [34, 35], determining local invasion of tumor into key structures, such as the chest wall and mediastinum, and serving as an alternative for patients with contraindications to CT contrast agent. For this reason, the clinical rationale of our study was not to challenge the universal use of CT, but to accurately identify the above-mentioned patients who require MRI examination. Beyond scanning for clear clinical indications (such as brain metastasis assessment), MRI provides the additional benefit of “gene prediction” to primary lung lesion imaging, integrating precision medicine more efficiently. The aforementioned studies have proven that multimodal and fused features can provide more comprehensive information, improve diagnostic accuracy, guide personalized treatment programs, and enhance the robustness and generalization ability of the model than those of single-image features. Multimodal feature fusion enables the model to learn the characteristics of data from multiple perspectives, reducing dependence on single-mode data, which helps to improve the model’s adaptability to different datasets and clinical scenes, reducing the risk of overfitting.
MRI can be used effectively to distinguish the different pathological features of tissues. Multiparametric MRI can completely display the tumor heterogeneity, provide more sensitive quantitative and qualitative imaging markers, and help predict the EGFR mutation status of patients with lung cancer to guide clinical treatments. Currently, research on EGFR mutation status in NSCLC using MRI radiomics is mainly focused on patients with advanced lung cancer metastasis, whereas research on the EGFR mutation status of patients with early lung cancer is relatively limited. Fan Ying et al. [36] studied the EGFR mutation status of 230 patients with lung cancer complicated by brain metastasis. Their findings showed good predictive efficiency, and the AUC in the training, validation, and external validation cohorts were 0.896, 0.856, and 0.889, respectively. Cao et al. [37], in their multicenter study, they showed that MRI radiomics has good predictive value for EGFR mutation status and subtypes in patients with lung cancer spinal metastasis. They also showed that T1WI has a higher predictive efficiency than T2WI. The combined model integrating the two sequences and clinicopathological factors showed the best predictive efficiency for EGFR mutation status, including 19 and 21 site mutations: training cohort (0.829 vs. 0.885 vs. 0.919), validation cohort (0.760 vs. 0.777 vs. 0.811), and external validation (0.780 vs. 0.846 vs. 0.818). Park et al. [38] predicted the EGFR mutation status in patients with brain metastasis using DTI imaging and the T1-enhanced sequence of MRI sequences. The results showed that the prediction efficiency of integrating DWI, T1WI, and DTI sequences was significantly higher than that of single sequences, and the AUC, accuracy, sensitivity, and specificity in the test cohort were 0.73, 78.6%, 81.3%, and 76.9%, respectively. Additionally, Wang et al. [39] predicted the EGFR mutation status of NSCLC using multiparametric MRI radiomics. The AUC of the diffusion coefficient prediction model was 0.805. Nonetheless, the multi-sequence prediction model showed better prediction efficiency, with nomogram AUCs of 0.925 in the training cohort and 0.727 in the validation cohort. Based on analyses of existing studies, T1WI, T2WI, and CET1w sequences have been routinely used to predict EGFR mutation status in NSCLC. In addition, although the role of DWI and DTI sequences in predicting EGFR mutations in patients with brain metastases needs further verification, the ability of DWI to reflect tumor heterogeneity by capturing subtle changes in the internal and surrounding microenvironment of the tumor, and thereby predicting EGFR mutation status, has been preliminarily confirmed. Therefore, new diffusion imaging technologies such as time-dependent diffusion [40] are expected to provide more specific characteristics of cell microstructure and help identify the changes in the tumor microenvironment caused by treatment. These findings may be closely related to EGFR mutation status, providing a new perspective for more comprehensively and deeply characterizing tumor biological characteristics. The above study showed that multiparametric MRI radiomics can be used as a non-invasive examination to objectively and scientifically describe the morphological and internal structural characteristics of lung cancer. The integration of clinical and pathological features may replace biomarkers, achieving early, sensitive, and accurate prediction of EGFR mutation status in lung cancer, providing guidance for early targeted treatment of NSCLC, and improving prognosis.
At present, studies have shown that multiparametric MRI radiomics has certain value in predicting the pathological classification of NSCLC. For example, models based on MRI sequence imaging characteristics such as T1WI, T2WI, and apparent ADC have been built to predict the pathological classification of NSCLC, achieving good results [41]. The studies on the prediction of EGFR mutation status are relatively few; nevertheless, the theoretical and technical basis of MRI radiomics and its successful application in other areas provide a foundation for predicting EGFR mutation status. Compared with traditional pathological biopsy, MRI radiomics does not require an invasive procedure, which can reduce pain and risk for patients, especially for those who cannot tolerate biopsy or for whom biopsy is challenging. Additionally, MRI can provide multidimensional information such as tumor morphology, structure, and metabolism, among others. Through radiomics analysis, a large number of features can be extracted, which helps to reflect the biological characteristics of tumors more comprehensively and to predict EGFR mutation status. Doctors choose more appropriate targeted therapy drugs, optimize treatment strategies, improve the treatment effect, prolong the survival time of patients, and improve the prognosis when the EGFR mutation status is accurately predicted. However, some challenges and limitations exist. First, the problem of data standardization: differences in imaging parameters (such as field strength, scanning sequence, layer thickness, among others) owing to different MRI devices may lead to low feature repeatability, affecting the model’s generalizability. Second, feature redundancy and overfitting: high-dimensional image features often lead to multicollinearity; existing dimensionality reduction methods may lose key biological information, leading to overfitting and insufficient generalization. Finally, insufficient sample size and diversity: the current studies are mostly single-center and small-sample studies, lacking the support of multi-center and large-sample data. Hence, the model’s extrapolation and universality required further verification. In addition, the study participants were mostly patients with lung adenocarcinoma, and the applicability to other pathological types (such as lung squamous cell carcinoma) is not clear. Therefore, we suggest the following recommendations. First, establish standardized imaging protocols and formulate unified MRI scanning parameters and imaging processes. Second, optimize feature selection and model construction, and combine biological knowledge and clinical experience to screen meaningful imaging features related to EGFR mutation, thereby avoiding feature redundancy and overfitting. Advanced machine learning algorithms (such as deep learning models) can be used to construct models, improving their accuracy and robustness. Third, conduct a multi-center, large sample prospective study, including patients with different races, pathological types, and treatment stages, to verify the generalizability and clinical practicability of the model.
Our study addresses the knowledge gap that, at present, research on radiomics in predicting the EGFR mutation status of patients with lung cancer is mainly focused on CT and PET/CT, with MRI research being relatively rare. Radiation is absent in MRI, and multiparametric MRI can be used to non-invasively observe various metabolic and pathological changes in the living tissues in the early stage, providing functional imaging, which plays a key role in identifying the EGFR mutation status. Therefore, the model established in this study has a high prediction efficiency. Second, current MRI radiomics research aimed at predicting EGFR mutation status in NSCLC mainly focuses on patients with advanced lung cancer. In this study, patients with early-stage NSCLC were the main focus. Hence, the findings are of great significance for the early diagnosis and treatment of patients with EGFR mutations in early lung cancer, providing a more scientific diagnostic and treatment scheme for clinical practice. Moreover, targeted therapy can be used before surgery to improve patients’ long-term prognosis. Third, all patients were scanned using a standardized protocol with the same MRI device, avoiding heterogeneity caused by differences in scanning and reconstruction parameters, thus making the findings more stable and reliable. In addition, semi-automatic segmentation tools were used in our radiomics research, which largely limited individual differences in manual rendering.
Despite these identified strengths, the main limitations of this study were as follows: first, the retrospective design may have introduced a selection bias. Second, this is a single-center study with a small sample size and no external validation; the current research is mainly based on the critical size validation cohort (n = 19), and the EGFR mutation status category is unbalanced. In addition, because the data were divided into a training and a validation cohort, overfitting may be unavoidable, the statistical effect may be insufficient, and subgroup analysis cannot be performed. The sample size should be expanded in future studies (and establish a standardized process) through multicenter research to improve clinical reliability. Third, mutation sites were not studied because of the small sample size. The prognosis of different mutation sites and the benefits of targeted therapy vary. Therefore, it is important to identify mutation sites before targeted therapy. Hence, the next step will be to study the mutation sites. Finally, owing to the limited sample size, the DeLong test results showed that the AUC values of the comprehensive prediction and the multiparametric radiomics models differed, but were not statistically significant (p > 0.05). This result prompted us to further reflect on whether introducing clinicopathological factors can indeed improve the model’s predictive efficiency. To clarify this issue, the focus of the follow-up research should be on expanding the sample size to enhance the statistical testing power and including more refined feature selection methods to avoid diluting the contribution of the imaging signal due to variable redundancy. In addition, a nonlinear fusion strategy or a machine learning integration method could be used to integrate multi-source features more effectively to verify the added value effectiveness of the comprehensive model in a larger sample.
In this study, we established five prediction models based on multiparametric MRI to extract radiomics features combined with clinicopathological factors. We also classified EGFR mutation status in NSCLC, and compared the classification efficiencies of the five prediction models. The results showed that the five prediction models could effectively classify EGFR mutation status in NSCLC. However, the combined prediction model, incorporating radiomics scores with independent predictors, had the best classification efficiency. The AUCs of the training and validation cohorts were 0.880 and 0.859, respectively, showing that our model can be effectively used to diagnose and classify EGFR mutation status in NSCLC. Compared with the clinical or single radiomics model, the comprehensive and multiparametric radiomics models showed a higher AUC value, while the DeLong test showed that the difference was statistically significant, indicating that combining multiple sequences effectively enhanced the diagnostic ability of the model. However, the sample size was limited, and the DeLong test showed that the AUCs for the comprehensive prediction and the multiparametric radiomics model differed but were not statistically significant (p > 0.05).
Following EGFR-TKIs treatment, the median survival time of patients with EGFR mutation can reach 25 months, which can improve their quality of life [26]. The objective remission rate is approximately 70–80%, and the PFS can reach 9–14 months. Therefore, EGFR-TKIs have become the standard first-line treatment for patients with EGFR-mutant NSCLC [27]. However, its clinical efficacy in patients with wild-type EGFR is poor, possibly leading to drug resistance and related adverse reactions [28], resulting in poor clinical applicability. Currently, determining EGFR mutation status mainly depends on gene detection, which is expensive and time-consuming. Early and accurate prediction of this status in patients with early-stage NSCLC can more accurately guide the clinical selection of targeted therapy populations. Therefore, screening for more comprehensive and effective lesion heterogeneity and microenvironment characteristics through noninvasive examination and achieving accurate prediction of EGFR mutation status in NSCLC are the key scientific problems addressed in this study.
Emerging AI technologies, including deep learning feature extraction and segmentation technology, deep survival analysis, and radiomics analysis, can be used to quantitatively extract and accurately segment the depth features of the lung tumor lesion area, simulate the nonlinear risk score function for survival analysis modeling, extract the high-throughput features contained in the image, quantitatively describe the heterogeneity of the lesion, achieve accurate prediction of disease recurrence and prognosis, non-invasively and comprehensively quantify the heterogeneity and microenvironment of the lesion. These capabilities provide guidance for the in-depth mining and application of medical images, which will aid in selecting an appropriate, effective, and personalized treatment scheme. However, currently, applying radiomics and deep learning to the study of lung lesions primarily focuses on CT or positron emission tomography/CT images. Felfli [24] and Ma [25], in their meta-analysis, found that, in most CT radiomics studies, only plain-scan image features of the lesions were extracted, which have certain limitations. The latest meta-analysis [29]on the diagnostic accuracy of MRI radiomics features for predicting the EGFR mutation status in patients with NSCLC who have brain metastases showed that the aggregate sensitivity and specificity of MRI radiomics features for detecting EGFR mutation were 0.86 and 0.83%, respectively, which is consistent with CT and 18F-FDG positron-emission tomography (PET)/CT radiomics. The main reason is that MRI offers multiple sequences and parameters. Different sequences—such as T1WI, T2WI, and diffusion-weighted imaging (DWI)—can reflect the shape, structure, metabolism, and function of the tumors from different angles and comprehensively capture their characteristics. Combining multiple sequences can more comprehensively capture tumor heterogeneity and improve the model’s ability to describe tumor characteristics, integrating complementary information from multiple sequences, reducing the limitations of a single sequence. Li et al. [30] also confirmed these findings. In an experiment to distinguish between T790M resistance and the absence of a T790M mutation in patients with NSCLC who have brain metastasis, a prediction model was established using radiomics features extracted from T2WI, T2 fluid attenuation inversion recovery (T2-FLAIR), and DWI and T1-CE sequences. The AUCs in the training and validation cohorts were 0.886 and 0.850, respectively. Furthermore, a multicenter study [31] showed that combining CT and MRI dual-mode radiomics to predict EGFR status in patients with brain metastases from NSCLC showed good calibration and differentiation abilities in the internal (AUC = 0.866) and external test cohorts (AUC = 0.818). At the same time, compared with the advantages of multi-modal and multi-sequence MRI, extracting different modal features from CT images can also help achieve the corresponding effect. Zhang X [32]combined CT images and clinical-pathological data, and the resulting “deep radscore” learning radiomics model—based on the data from the 3D tumor region—achieved impressive AUCs in predicting EGFR mutation (AUC = 0.884). Zhang G et al. [33]constructed the radiomics model from the radiomics features extracted from multi-phase CT (non-enhanced and enhanced CT, including arterial phase and venous phase CT). The results showed good performance in identifying EGFR mutation status in patients with lung adenocarcinoma (AUC = 0.925) and good consistency with the prediction performance of the clinical radiomics comprehensive model (AUC = 0.927). Compared with those in the MRI correlation studies, the T2WI plain MRI scan images, as well as the T1-enhanced scan features combined with clinical pathological factors, were extracted in this study. The final joint prediction model showed the best performance. The AUCs of the training and validation cohorts were 0.880 and 0.859, respectively, which were better than those of the single radiomics or clinical models. However, compared with those in relevant CT studies, our AUC value was slightly lower. In the diagnosis and treatment path of NSCLC, CT, with its excellent lung spatial resolution, fast scanning speed, and highly standardized process, serves as a first-line imaging technique for screening, diagnosis, and staging primary lung lesions. In contrast, MRI is not a conventional method for evaluating primary lesions because it poorly displays aerated lung tissue, is susceptible to motion artifacts, and is costly to perform. However, the non-radiation advantage of MRI, and its unparalleled soft-tissue resolution and multiparametric functional imaging capabilities have made it irreplaceable in specific clinical scenarios. The main indications include evaluating brain/spinal metastasis [34, 35], determining local invasion of tumor into key structures, such as the chest wall and mediastinum, and serving as an alternative for patients with contraindications to CT contrast agent. For this reason, the clinical rationale of our study was not to challenge the universal use of CT, but to accurately identify the above-mentioned patients who require MRI examination. Beyond scanning for clear clinical indications (such as brain metastasis assessment), MRI provides the additional benefit of “gene prediction” to primary lung lesion imaging, integrating precision medicine more efficiently. The aforementioned studies have proven that multimodal and fused features can provide more comprehensive information, improve diagnostic accuracy, guide personalized treatment programs, and enhance the robustness and generalization ability of the model than those of single-image features. Multimodal feature fusion enables the model to learn the characteristics of data from multiple perspectives, reducing dependence on single-mode data, which helps to improve the model’s adaptability to different datasets and clinical scenes, reducing the risk of overfitting.
MRI can be used effectively to distinguish the different pathological features of tissues. Multiparametric MRI can completely display the tumor heterogeneity, provide more sensitive quantitative and qualitative imaging markers, and help predict the EGFR mutation status of patients with lung cancer to guide clinical treatments. Currently, research on EGFR mutation status in NSCLC using MRI radiomics is mainly focused on patients with advanced lung cancer metastasis, whereas research on the EGFR mutation status of patients with early lung cancer is relatively limited. Fan Ying et al. [36] studied the EGFR mutation status of 230 patients with lung cancer complicated by brain metastasis. Their findings showed good predictive efficiency, and the AUC in the training, validation, and external validation cohorts were 0.896, 0.856, and 0.889, respectively. Cao et al. [37], in their multicenter study, they showed that MRI radiomics has good predictive value for EGFR mutation status and subtypes in patients with lung cancer spinal metastasis. They also showed that T1WI has a higher predictive efficiency than T2WI. The combined model integrating the two sequences and clinicopathological factors showed the best predictive efficiency for EGFR mutation status, including 19 and 21 site mutations: training cohort (0.829 vs. 0.885 vs. 0.919), validation cohort (0.760 vs. 0.777 vs. 0.811), and external validation (0.780 vs. 0.846 vs. 0.818). Park et al. [38] predicted the EGFR mutation status in patients with brain metastasis using DTI imaging and the T1-enhanced sequence of MRI sequences. The results showed that the prediction efficiency of integrating DWI, T1WI, and DTI sequences was significantly higher than that of single sequences, and the AUC, accuracy, sensitivity, and specificity in the test cohort were 0.73, 78.6%, 81.3%, and 76.9%, respectively. Additionally, Wang et al. [39] predicted the EGFR mutation status of NSCLC using multiparametric MRI radiomics. The AUC of the diffusion coefficient prediction model was 0.805. Nonetheless, the multi-sequence prediction model showed better prediction efficiency, with nomogram AUCs of 0.925 in the training cohort and 0.727 in the validation cohort. Based on analyses of existing studies, T1WI, T2WI, and CET1w sequences have been routinely used to predict EGFR mutation status in NSCLC. In addition, although the role of DWI and DTI sequences in predicting EGFR mutations in patients with brain metastases needs further verification, the ability of DWI to reflect tumor heterogeneity by capturing subtle changes in the internal and surrounding microenvironment of the tumor, and thereby predicting EGFR mutation status, has been preliminarily confirmed. Therefore, new diffusion imaging technologies such as time-dependent diffusion [40] are expected to provide more specific characteristics of cell microstructure and help identify the changes in the tumor microenvironment caused by treatment. These findings may be closely related to EGFR mutation status, providing a new perspective for more comprehensively and deeply characterizing tumor biological characteristics. The above study showed that multiparametric MRI radiomics can be used as a non-invasive examination to objectively and scientifically describe the morphological and internal structural characteristics of lung cancer. The integration of clinical and pathological features may replace biomarkers, achieving early, sensitive, and accurate prediction of EGFR mutation status in lung cancer, providing guidance for early targeted treatment of NSCLC, and improving prognosis.
At present, studies have shown that multiparametric MRI radiomics has certain value in predicting the pathological classification of NSCLC. For example, models based on MRI sequence imaging characteristics such as T1WI, T2WI, and apparent ADC have been built to predict the pathological classification of NSCLC, achieving good results [41]. The studies on the prediction of EGFR mutation status are relatively few; nevertheless, the theoretical and technical basis of MRI radiomics and its successful application in other areas provide a foundation for predicting EGFR mutation status. Compared with traditional pathological biopsy, MRI radiomics does not require an invasive procedure, which can reduce pain and risk for patients, especially for those who cannot tolerate biopsy or for whom biopsy is challenging. Additionally, MRI can provide multidimensional information such as tumor morphology, structure, and metabolism, among others. Through radiomics analysis, a large number of features can be extracted, which helps to reflect the biological characteristics of tumors more comprehensively and to predict EGFR mutation status. Doctors choose more appropriate targeted therapy drugs, optimize treatment strategies, improve the treatment effect, prolong the survival time of patients, and improve the prognosis when the EGFR mutation status is accurately predicted. However, some challenges and limitations exist. First, the problem of data standardization: differences in imaging parameters (such as field strength, scanning sequence, layer thickness, among others) owing to different MRI devices may lead to low feature repeatability, affecting the model’s generalizability. Second, feature redundancy and overfitting: high-dimensional image features often lead to multicollinearity; existing dimensionality reduction methods may lose key biological information, leading to overfitting and insufficient generalization. Finally, insufficient sample size and diversity: the current studies are mostly single-center and small-sample studies, lacking the support of multi-center and large-sample data. Hence, the model’s extrapolation and universality required further verification. In addition, the study participants were mostly patients with lung adenocarcinoma, and the applicability to other pathological types (such as lung squamous cell carcinoma) is not clear. Therefore, we suggest the following recommendations. First, establish standardized imaging protocols and formulate unified MRI scanning parameters and imaging processes. Second, optimize feature selection and model construction, and combine biological knowledge and clinical experience to screen meaningful imaging features related to EGFR mutation, thereby avoiding feature redundancy and overfitting. Advanced machine learning algorithms (such as deep learning models) can be used to construct models, improving their accuracy and robustness. Third, conduct a multi-center, large sample prospective study, including patients with different races, pathological types, and treatment stages, to verify the generalizability and clinical practicability of the model.
Our study addresses the knowledge gap that, at present, research on radiomics in predicting the EGFR mutation status of patients with lung cancer is mainly focused on CT and PET/CT, with MRI research being relatively rare. Radiation is absent in MRI, and multiparametric MRI can be used to non-invasively observe various metabolic and pathological changes in the living tissues in the early stage, providing functional imaging, which plays a key role in identifying the EGFR mutation status. Therefore, the model established in this study has a high prediction efficiency. Second, current MRI radiomics research aimed at predicting EGFR mutation status in NSCLC mainly focuses on patients with advanced lung cancer. In this study, patients with early-stage NSCLC were the main focus. Hence, the findings are of great significance for the early diagnosis and treatment of patients with EGFR mutations in early lung cancer, providing a more scientific diagnostic and treatment scheme for clinical practice. Moreover, targeted therapy can be used before surgery to improve patients’ long-term prognosis. Third, all patients were scanned using a standardized protocol with the same MRI device, avoiding heterogeneity caused by differences in scanning and reconstruction parameters, thus making the findings more stable and reliable. In addition, semi-automatic segmentation tools were used in our radiomics research, which largely limited individual differences in manual rendering.
Despite these identified strengths, the main limitations of this study were as follows: first, the retrospective design may have introduced a selection bias. Second, this is a single-center study with a small sample size and no external validation; the current research is mainly based on the critical size validation cohort (n = 19), and the EGFR mutation status category is unbalanced. In addition, because the data were divided into a training and a validation cohort, overfitting may be unavoidable, the statistical effect may be insufficient, and subgroup analysis cannot be performed. The sample size should be expanded in future studies (and establish a standardized process) through multicenter research to improve clinical reliability. Third, mutation sites were not studied because of the small sample size. The prognosis of different mutation sites and the benefits of targeted therapy vary. Therefore, it is important to identify mutation sites before targeted therapy. Hence, the next step will be to study the mutation sites. Finally, owing to the limited sample size, the DeLong test results showed that the AUC values of the comprehensive prediction and the multiparametric radiomics models differed, but were not statistically significant (p > 0.05). This result prompted us to further reflect on whether introducing clinicopathological factors can indeed improve the model’s predictive efficiency. To clarify this issue, the focus of the follow-up research should be on expanding the sample size to enhance the statistical testing power and including more refined feature selection methods to avoid diluting the contribution of the imaging signal due to variable redundancy. In addition, a nonlinear fusion strategy or a machine learning integration method could be used to integrate multi-source features more effectively to verify the added value effectiveness of the comprehensive model in a larger sample.
Conclusion
Conclusion
Compared with single-sequence, single-parameter, and single-mode models, multiparametric MRI radiomics combined with clinicopathological features has greater predictive diagnostic value for EGFR mutation in patients with NSCLC, and can guide individualized targeted therapy.
Compared with single-sequence, single-parameter, and single-mode models, multiparametric MRI radiomics combined with clinicopathological features has greater predictive diagnostic value for EGFR mutation in patients with NSCLC, and can guide individualized targeted therapy.
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- Diagnostic accuracy of Ga-PSMA PET/CT versus multiparametric MRI for preoperative pelvic invasion in the patients with prostate cancer.
- Whole-body MRI for staging and follow-up of primary musculoskeletal tumours: a systematic review.
- Advances in Targeted Therapy for Human Epidermal Growth Factor Receptor 2-Low Tumors: From Trastuzumab to Antibody-Drug Conjugates.
- Nanotechnology-Assisted Molecular Profiling: Emerging Advances in Circulating Tumor DNA Detection.
- Building Hybrid Pharmacometric-Machine Learning Models in Oncology Drug Development: Current State and Recommendations.
- Acquired L858R mutation following -TKI resistance in lung adenocarcinoma: a case report.