Accurate diagnosis of non-small cell and small cell lung cancer by using machine learning models trained with physical science features extracted from pathological images.
[BACKGROUND] Accurate differentiation between non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC) is crucial for optimising treatment strategies and improving patient outcomes in lung
APA
Shao W, Bao Y, et al. (2026). Accurate diagnosis of non-small cell and small cell lung cancer by using machine learning models trained with physical science features extracted from pathological images.. Journal of microscopy. https://doi.org/10.1111/jmi.70090
MLA
Shao W, et al.. "Accurate diagnosis of non-small cell and small cell lung cancer by using machine learning models trained with physical science features extracted from pathological images.." Journal of microscopy, 2026.
PMID
41964375
Abstract
[BACKGROUND] Accurate differentiation between non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC) is crucial for optimising treatment strategies and improving patient outcomes in lung cancer management. Early and precise classification supports tailored therapeutic decisions and enhances prognosis prediction.
[PURPOSE] This study develops a novel method to use machine learning models trained with physical science features extracted from pathological images to classify lung cancer into NSCLC and SCLC with high accuracy and robustness.
[METHODS] Physical science features were employed to acquire quantitative cellular microarchitecture of cancer cells from histopathological images. Random Forest algorithm was applied to identify the most informative 20 features. Next, the selected top features were used to train and evaluate four machine learning classifiers: Support Vector Machine (SVM), Gradient Boosting, Logistic Regression, and Decision Tree. The dataset included pathological images from 240 histologically confirmed lung cancer cases, divided randomly into training and validation sets (80% train, 20% test). Then, model performance was evaluated using accuracy, recall, F1 score, and area under the receiver operating characteristic curve (AUC), with robustness validation via fivefold cross-validation.
[RESULTS] Logistic Regression achieved the highest overall performance, with a median accuracy near 90% and an AUC consistently above 0.90 across fivefold cross-validation. SVM and Gradient Boosting followed closely, each surpassing 0.90 in AUC, demonstrating reliable discrimination between NSCLC and SCLC. Decision Tree showed broader variability, though it maintained acceptable recall for SCLC. Random Forest feature selection revealed refractive index percentiles and polarisation histograms as top contributors to model performance.
[CONCLUSIONS] Machine learning models trained on physical science features have the potential to serve as a highly accurate and robust framework for differentiating NSCLC from SCLC.
[PURPOSE] This study develops a novel method to use machine learning models trained with physical science features extracted from pathological images to classify lung cancer into NSCLC and SCLC with high accuracy and robustness.
[METHODS] Physical science features were employed to acquire quantitative cellular microarchitecture of cancer cells from histopathological images. Random Forest algorithm was applied to identify the most informative 20 features. Next, the selected top features were used to train and evaluate four machine learning classifiers: Support Vector Machine (SVM), Gradient Boosting, Logistic Regression, and Decision Tree. The dataset included pathological images from 240 histologically confirmed lung cancer cases, divided randomly into training and validation sets (80% train, 20% test). Then, model performance was evaluated using accuracy, recall, F1 score, and area under the receiver operating characteristic curve (AUC), with robustness validation via fivefold cross-validation.
[RESULTS] Logistic Regression achieved the highest overall performance, with a median accuracy near 90% and an AUC consistently above 0.90 across fivefold cross-validation. SVM and Gradient Boosting followed closely, each surpassing 0.90 in AUC, demonstrating reliable discrimination between NSCLC and SCLC. Decision Tree showed broader variability, though it maintained acceptable recall for SCLC. Random Forest feature selection revealed refractive index percentiles and polarisation histograms as top contributors to model performance.
[CONCLUSIONS] Machine learning models trained on physical science features have the potential to serve as a highly accurate and robust framework for differentiating NSCLC from SCLC.
같은 제1저자의 인용 많은 논문 (5)
- Clinical trials of bispecific antibody therapy for colorectal cancer: advanced and next steps.
- A systematic review and meta-analysis of exposure-response analysis of osimertinib in patients with non-small-cell lung cancer.
- Targeted lipidomics meets transcriptomics: how cinobufagin rewires fatty acid, sphingolipid, and glycerophospholipid metabolism to combat hepatoma cell growth.
- Neural network-aided unsupervised input function estimation for dual-time-window PET Patlak analysis.
- Association between diffusion tensor imaging analysis along the perivascular space (DTI-ALPS)-based glial-lymphatic dysfunction and cognitive impairment in non-small cell lung cancer.