Non-small cell lung cancer subtype classification based on cross-scale multi-instance learning.
1/5 보강
Lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC), the two major subtypes of non-small cell lung cancer (NSCLC), present significant diagnostic challenges with direct implications for
APA
Jiang P, Chen W, et al. (2025). Non-small cell lung cancer subtype classification based on cross-scale multi-instance learning.. Scientific reports, 15(1), 43210. https://doi.org/10.1038/s41598-025-27337-7
MLA
Jiang P, et al.. "Non-small cell lung cancer subtype classification based on cross-scale multi-instance learning.." Scientific reports, vol. 15, no. 1, 2025, pp. 43210.
PMID
41350354 ↗
Abstract 한글 요약
Lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC), the two major subtypes of non-small cell lung cancer (NSCLC), present significant diagnostic challenges with direct implications for treatment planning. In this study, we propose a novel multi-instance learning (MIL) pathological image classification model that incorporates an additive attention mechanism and a new category classifier to enhance subtype discrimination. The model further integrates a cross-scale focal region detection strategy to improve sensitivity to key histological features. Trained on the Cancer Genome Atlas (TCGA) dataset, our model achieved a subtype classification accuracy (ACC) of 97.0% and an area under the ROC curve (AUC) of 0.978, outperforming state-of-the-art methods including ABMIL, CLAM, DS-MIL, DTFD-MIL, FR-MIL, and WIKG-MIL across multiple evaluation metrics. Ablation studies validate the contribution of each module to overall performance improvement. Generalization experiments conducted on the Clinical Proteomic Tumor Analysis Consortium (CPTAC) Cancer Imaging Archive (TCIA) Lung dataset and an external dataset from Yantai Yuhuangding Hospital demonstrate the robustness of our model, achieving ACCs of 91.2% and 93.0%, and AUCs of 0.967 and 0.968, respectively. These results underscore the model's strong generalization ability and its potential as a reliable tool for accurate NSCLC subtype classification across diverse clinical scenarios.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
같은 제1저자의 인용 많은 논문 (5)
- ID1 in TAMs promoted the progression of non-small-cell carcinoma via increasing NF-κB/NPM1/SHP1/SHP2 signaling induced M2 polarization.
- A Type I/II Photosensitizer with Lysosome-Targeting Capabilities Induces Immunogenic Cell Death in Cancer Cells to Enhance Tumor Immunotherapy.
- Holistic determination of ends of cfDNA molecules.
- COL3A1 cancer-associated fibroblasts orchestrate metabolic and immune microenvironments to confer chemoresistance in breast cancer.
- Maximising Participation in the Australian National Lung Cancer Screening Program: A Discrete Choice Experiment of Eligible, High-Risk Individuals.
📖 전문 본문 읽기 PMC JATS · ~65 KB · 영문
Introduction
Introduction
Lung cancer remains the leading cause of cancer-related mortality globally, with an estimated 1.8 million deaths (18.7% of all cancer deaths) reported in 2022, according to the International Agency for Research on Cancer (IARC)1. Among all lung cancers, approximately 80–85% are classified as non-small cell lung cancer (NSCLC), which includes two major histological subtypes: lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC). Accurate identification of these subtypes is crucial for determining treatment strategies2.
With the advancement of Whole Slide Image (WSI) technology, histopathological analysis has entered the digital era3. Deep learning models applied to WSIs have demonstrated great promise in automating subtype classification. However, most of these approaches require exhaustive pixel- or tile-level annotations, which are time-consuming and heavily dependent on expert pathologists4. To alleviate this burden, weakly supervised learning strategies, particularly multi-instance learning (MIL), have been introduced. MIL leverages slide-level labels to infer instance-level (tile-level) features, enabling more scalable and annotation-efficient model training. Despite their promise, existing MIL approaches often exhibit two key limitations: (1) limited capacity to capture nuanced morphological differences between LUAD and LUSC, and (2) inadequate integration of multi-scale contextual information, which is vital for accurate diagnosis.
To overcome these challenges, we propose a novel MIL-based framework that mimics the diagnostic workflow of pathologists through a coarse-to-fine, cross-scale analysis paradigm. Specifically, the model first conducts a global assessment of low-magnification WSIs to identify candidate regions of interest (ROIs), simulating the initial screening phase in routine pathology. These regions are subsequently examined at higher magnification to extract fine-grained morphological features, thereby emulating the pathologist’s zoom-in process for detailed inspection. Furthermore, we introduce an additive attention mechanism and a category-specific classifier to improve the model’s capacity for subtype discrimination. To mitigate inconsistencies between different magnification levels, we also incorporate a tagging and prompting mechanism to promote consistent and reliable diagnostic outcomes.
The main contributions of this study are as follows:We propose a novel MIL framework that integrates an attention mechanism and a category-specific classifier to enhance the model’s ability to recognize and distinguish subtle histopathological features of non-small cell lung cancer (NSCLC) subtypes. This framework addresses the challenge of limited annotations in pathological images and improves the robustness of subtype classification.
Inspired by the diagnostic workflow of pathologists, our model employs a coarse-to-fine, multi-scale detection strategy that simulates the human screening process. This hierarchical approach not only reflects real-world clinical reasoning but also enables the model to capture both global contextual cues and fine-grained morphological variations critical for accurate subtype discrimination.
To address the challenge of prediction inconsistency across different magnification levels, we introduce a cross-scale consistency mechanism. By integrating detection results from multiple magnifications and automatically flagging discrepant cases for further review, our model enhances diagnostic reliability and supports the identification of diagnostically relevant regions. This contributes to the interpretability and clinical applicability of AI-assisted histopathological analysis.
Lung cancer remains the leading cause of cancer-related mortality globally, with an estimated 1.8 million deaths (18.7% of all cancer deaths) reported in 2022, according to the International Agency for Research on Cancer (IARC)1. Among all lung cancers, approximately 80–85% are classified as non-small cell lung cancer (NSCLC), which includes two major histological subtypes: lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC). Accurate identification of these subtypes is crucial for determining treatment strategies2.
With the advancement of Whole Slide Image (WSI) technology, histopathological analysis has entered the digital era3. Deep learning models applied to WSIs have demonstrated great promise in automating subtype classification. However, most of these approaches require exhaustive pixel- or tile-level annotations, which are time-consuming and heavily dependent on expert pathologists4. To alleviate this burden, weakly supervised learning strategies, particularly multi-instance learning (MIL), have been introduced. MIL leverages slide-level labels to infer instance-level (tile-level) features, enabling more scalable and annotation-efficient model training. Despite their promise, existing MIL approaches often exhibit two key limitations: (1) limited capacity to capture nuanced morphological differences between LUAD and LUSC, and (2) inadequate integration of multi-scale contextual information, which is vital for accurate diagnosis.
To overcome these challenges, we propose a novel MIL-based framework that mimics the diagnostic workflow of pathologists through a coarse-to-fine, cross-scale analysis paradigm. Specifically, the model first conducts a global assessment of low-magnification WSIs to identify candidate regions of interest (ROIs), simulating the initial screening phase in routine pathology. These regions are subsequently examined at higher magnification to extract fine-grained morphological features, thereby emulating the pathologist’s zoom-in process for detailed inspection. Furthermore, we introduce an additive attention mechanism and a category-specific classifier to improve the model’s capacity for subtype discrimination. To mitigate inconsistencies between different magnification levels, we also incorporate a tagging and prompting mechanism to promote consistent and reliable diagnostic outcomes.
The main contributions of this study are as follows:We propose a novel MIL framework that integrates an attention mechanism and a category-specific classifier to enhance the model’s ability to recognize and distinguish subtle histopathological features of non-small cell lung cancer (NSCLC) subtypes. This framework addresses the challenge of limited annotations in pathological images and improves the robustness of subtype classification.
Inspired by the diagnostic workflow of pathologists, our model employs a coarse-to-fine, multi-scale detection strategy that simulates the human screening process. This hierarchical approach not only reflects real-world clinical reasoning but also enables the model to capture both global contextual cues and fine-grained morphological variations critical for accurate subtype discrimination.
To address the challenge of prediction inconsistency across different magnification levels, we introduce a cross-scale consistency mechanism. By integrating detection results from multiple magnifications and automatically flagging discrepant cases for further review, our model enhances diagnostic reliability and supports the identification of diagnostically relevant regions. This contributes to the interpretability and clinical applicability of AI-assisted histopathological analysis.
Related work
Related work
Multi-instance learning
MIL has garnered increasing attention in the field of WSI analysis, primarily due to its unique advantages in handling incomplete data labeling. In the MIL framework, the training set consists of data labeled as “bags,” where each bag is a collection of unlabeled instances5. Specifically, in WSI analysis, if at least one instance within a bag is a positive sample, the bag is labeled as positive; if all instances are negative samples, the bag is labeled as negative. The objective of MIL is to predict the labels of new bags, which is particularly crucial for datasets where detailed annotations are difficult to obtain6.
The popularity of MIL in WSI analysis is driven by two key factors. First, it requires only slide-level (weak) labels for training and implementation, eliminating the need for detailed tissue-level annotations. This significantly reduces annotation costs and improves efficiency. Second, MIL can effectively handle the incompleteness of instance-level annotations in WSIs. WSIs typically contain numerous image tiles, each of which can be considered an instance. Pathologists often provide diagnostic results for the entire slide rather than detailed annotations for each tile.
The application of MIL in WSI analysis extends beyond basic cancer detection to more complex tasks such as cancer subtype classification and tumor grading. Courtiol et al.7 applied MIL to cancer subtype classification via WSI, specifically in distinguishing LUAD from LUSC within non-small cell lung cancer (NSCLC) subtypes. In the field of tumor grading, Bulten et al.8 used two pre-trained models to outline tumor contours and remove epithelial tissue from WSIs, then applied pathologist-reported Gleason patterns for labeling, demonstrating the potential of MIL in tumor grading.
To further enhance the performance of MIL in WSI analysis, several advanced models have been proposed. The ABMIL model9 introduces an attention mechanism to uncover the contribution of each instance to the bag label, thereby enhancing model interpretability. The CLAM model10 leverages attention mechanisms to identify diagnostically significant subregions and combines instance-level clustering to optimize the feature space. This approach effectively addresses multiclass subtype problems while maintaining efficiency under limited data conditions. The DS-MIL model11 employs an innovative MIL aggregator with two parallel streams: one identifies the key instance (the instance with the highest score), while the other calculates the distances between instances and the key instance, using these distances as weights for soft instance selection. By integrating features from WSIs at varying magnifications, DS-MIL improves the recognition of critical regions.
Building upon ABMIL, the DTFD-MIL model12 derives instance probabilities, providing a more reliable metric for detecting positive regions than traditional attention scores. Its feature distillation and dual-layer MIL architecture enhance performance on small samples and imbalanced datasets. The FR-MIL model13 introduces a feature recalibration technique to optimize MIL performance by adjusting the distribution of key instance features. Additionally, it proposes a novel metric-based loss function to improve bag separability and incorporates a transformer-based pooling method to derive bag-level representations from instance features. The WIKG-MIL14 model adopts an innovative dynamic graph representation approach, dynamically constructing neighbors and directed edge embeddings for WSIs through head-tail embeddings. It integrates a knowledge-aware attention mechanism capable of learning joint attention scores for each neighbor and edge to effectively update the head node features.
Multi-scale in digital pathology
In digital pathology, the morphological features of tissue samples exhibit different characteristics at different scales. Pathologists must carefully examine biopsy samples across multiple magnifications to capture key morphological patterns essential for disease diagnosis. However, when processing high-resolution WSIs, analyzing information across multi-scale becomes a significant challenge. Traditional supervised deep learning systems often require pixel-level or tile-level annotations, which are not only computationally expensive, but also limited by the fact that most existing pathology models are based on standard public datasets, which may not accurately reflect real-world applications.
To address these challenges, researchers have proposed various multi-scale learning strategies. Hashimoto et al.15 proposed a method based on convolutional neural networks (CNNs) that integrates multi-instance learning, domain adversarial learning, and multi-scale learning frameworks to combine knowledge from different scales. Deng et al.16 introduced a novel Cross-scale Multi-Instance Learning (CS-MIL) algorithm, which introduces an attention-based “early fusion” paradigm capable of explicitly modeling interactions between scales during the feature extraction phase.
Despite significant progress in existing multi-scale learning methods for integrating features at different magnifications, they typically rely on feature concatenation, which fails to precisely locate lesion regions at low magnification. These methods overlook the importance of initially identifying lesion areas at low magnification and then performing detailed examination of these regions at higher magnifications. To address this limitation, this study proposes an innovative cross-scale learning strategy. This approach efficiently identifies lesion regions at low magnification and subsequently performs in-depth analysis of these critical regions at higher magnifications, offering a more accurate and comprehensive solution for multi-scale analysis in digital pathology.
Multi-instance learning
MIL has garnered increasing attention in the field of WSI analysis, primarily due to its unique advantages in handling incomplete data labeling. In the MIL framework, the training set consists of data labeled as “bags,” where each bag is a collection of unlabeled instances5. Specifically, in WSI analysis, if at least one instance within a bag is a positive sample, the bag is labeled as positive; if all instances are negative samples, the bag is labeled as negative. The objective of MIL is to predict the labels of new bags, which is particularly crucial for datasets where detailed annotations are difficult to obtain6.
The popularity of MIL in WSI analysis is driven by two key factors. First, it requires only slide-level (weak) labels for training and implementation, eliminating the need for detailed tissue-level annotations. This significantly reduces annotation costs and improves efficiency. Second, MIL can effectively handle the incompleteness of instance-level annotations in WSIs. WSIs typically contain numerous image tiles, each of which can be considered an instance. Pathologists often provide diagnostic results for the entire slide rather than detailed annotations for each tile.
The application of MIL in WSI analysis extends beyond basic cancer detection to more complex tasks such as cancer subtype classification and tumor grading. Courtiol et al.7 applied MIL to cancer subtype classification via WSI, specifically in distinguishing LUAD from LUSC within non-small cell lung cancer (NSCLC) subtypes. In the field of tumor grading, Bulten et al.8 used two pre-trained models to outline tumor contours and remove epithelial tissue from WSIs, then applied pathologist-reported Gleason patterns for labeling, demonstrating the potential of MIL in tumor grading.
To further enhance the performance of MIL in WSI analysis, several advanced models have been proposed. The ABMIL model9 introduces an attention mechanism to uncover the contribution of each instance to the bag label, thereby enhancing model interpretability. The CLAM model10 leverages attention mechanisms to identify diagnostically significant subregions and combines instance-level clustering to optimize the feature space. This approach effectively addresses multiclass subtype problems while maintaining efficiency under limited data conditions. The DS-MIL model11 employs an innovative MIL aggregator with two parallel streams: one identifies the key instance (the instance with the highest score), while the other calculates the distances between instances and the key instance, using these distances as weights for soft instance selection. By integrating features from WSIs at varying magnifications, DS-MIL improves the recognition of critical regions.
Building upon ABMIL, the DTFD-MIL model12 derives instance probabilities, providing a more reliable metric for detecting positive regions than traditional attention scores. Its feature distillation and dual-layer MIL architecture enhance performance on small samples and imbalanced datasets. The FR-MIL model13 introduces a feature recalibration technique to optimize MIL performance by adjusting the distribution of key instance features. Additionally, it proposes a novel metric-based loss function to improve bag separability and incorporates a transformer-based pooling method to derive bag-level representations from instance features. The WIKG-MIL14 model adopts an innovative dynamic graph representation approach, dynamically constructing neighbors and directed edge embeddings for WSIs through head-tail embeddings. It integrates a knowledge-aware attention mechanism capable of learning joint attention scores for each neighbor and edge to effectively update the head node features.
Multi-scale in digital pathology
In digital pathology, the morphological features of tissue samples exhibit different characteristics at different scales. Pathologists must carefully examine biopsy samples across multiple magnifications to capture key morphological patterns essential for disease diagnosis. However, when processing high-resolution WSIs, analyzing information across multi-scale becomes a significant challenge. Traditional supervised deep learning systems often require pixel-level or tile-level annotations, which are not only computationally expensive, but also limited by the fact that most existing pathology models are based on standard public datasets, which may not accurately reflect real-world applications.
To address these challenges, researchers have proposed various multi-scale learning strategies. Hashimoto et al.15 proposed a method based on convolutional neural networks (CNNs) that integrates multi-instance learning, domain adversarial learning, and multi-scale learning frameworks to combine knowledge from different scales. Deng et al.16 introduced a novel Cross-scale Multi-Instance Learning (CS-MIL) algorithm, which introduces an attention-based “early fusion” paradigm capable of explicitly modeling interactions between scales during the feature extraction phase.
Despite significant progress in existing multi-scale learning methods for integrating features at different magnifications, they typically rely on feature concatenation, which fails to precisely locate lesion regions at low magnification. These methods overlook the importance of initially identifying lesion areas at low magnification and then performing detailed examination of these regions at higher magnifications. To address this limitation, this study proposes an innovative cross-scale learning strategy. This approach efficiently identifies lesion regions at low magnification and subsequently performs in-depth analysis of these critical regions at higher magnifications, offering a more accurate and comprehensive solution for multi-scale analysis in digital pathology.
Methods
Methods
Data acquisition
The dataset used in this study was derived from the LUSC and LUAD datasets of The Cancer Genome Atlas (TCGA). We collected 389 images of lung squamous cell carcinoma from TCGA-LUSC and 412 images of lung adenocarcinoma from TCGA-LUAD, for a total of 801 images. To evaluate the model’s generalization ability and reduce the risk of overfitting, we split the dataset into separate training, validation, and test sets, using a ratio of 5:2.5:2.5 to ensure adequate data for each stage of the learning and evaluation process. To avoid label leakage, we ensured that the training, validation, and test sets were strictly separated at the patient level, with no overlap of WSIs across different subsets.
Data preprocessing
In this study, to crop WSIs into smaller patches suitable for model input and remove background regions, we employed a series of image processing techniques. First, we converted the WSI from the RGB color space to the HSV color space to better separate color information from luminance information. Next, median blurring was applied to the saturation channel in HSV space to effectively reduce image noise and improve the accuracy of subsequent processing. The image was then binarized through thresholding, and edge smoothing was applied to generate a binary mask of the tissue regions. To further refine the mask, morphological closing operations were performed to fill small gaps and holes, ensuring the mask’s completeness. Based on an area threshold, we filter the approximate contours of the foreground objects to remove noise and small irrelevant regions, resulting in a more accurate tissue region mask9. Finally, patches of 256 × 256 pixels were cropped at 10× and 20× magnifications based on the mask, ensuring that each patch contained sufficient tissue information while avoiding overlap. The specific process is illustrated in Fig. 1a. This approach not only significantly improves the data processing efficiency but also reduces redundant computation during model training, laying the foundation for the efficient training of subsequent models.
Furthermore, we employed a ResNet17 feature extractor pretrained on ImageNet to obtain compact representations for each image patch. Leveraging a pretrained model enables the transfer of rich semantic features learned from large-scale natural image datasets, which is beneficial for downstream classification performance. Compared to directly using raw image patches, this approach significantly reduces the input dimensionality and allows all patches from a whole slide to be processed simultaneously within GPU memory. This eliminates the need for patch sampling and reduces the risk of introducing noisy labels, thereby improving both the efficiency and ACC of model training10.
Model construction
In this study, we propose a model termed Cross-scale Focus Area Multi-instance Learning (CFAMIL), designed to improve both the accuracy and efficiency of pathological image analysis, as illustrated in Fig. 1b. Building upon the CLAM model, we introduced a novel additive attention mechanism (AddNet), as shown in Fig. 1c. Specifically, input features are processed through two parallel linear layers, which extract different aspects of the input representations. The outputs are then merged and passed through a non-linear transformation using a Tanh activation function. Finally, the transformed features are integrated via a linear layer to capture complex feature interactions, thereby enhancing the model’s sensitivity to critical regions and improving its overall feature extraction capability. Additionally, the CFAMIL model incorporates a new category classifier, as illustrated in Fig. 1d. Unlike traditional MIL models, CFAMIL assigns independent linear classifiers to LUAD and LUSC, enabling more fine-grained learning of class-specific features. This design not only enhances the model’s sensitivity to inter-category differences but also improves its robustness when handling imbalanced datasets. The classifier consists of several independent neural network sequences, each corresponding to a specific category, and is responsible for learning and predicting the input features for that category. Each branch consists of the following components: (1) a linear layer that maps the input features into a lower-dimensional latent space; (2) a non-linear activation function to enhance model expressiveness; (3) a dropout layer that randomly deactivates neurons to prevent overfitting and improve generalization; and (4) a final linear layer that maps the intermediate features to a scalar output, representing the prediction score for the corresponding class. This architecture enables customized classification pathways for each category while maintaining parameter efficiency, thereby achieving more accurate recognition and finer class discrimination. This design enables the model to offer customized prediction paths for each category while maintaining parameter efficiency, thereby achieving more precise recognition and finer category distinctions in classification tasks.
Finally, the CFAMIL model introduces an instance-level evaluation mechanism10, which allows the model to evaluate and optimize the attention distribution of each instance during training. This mechanism refines the understanding of the model at the instance level by computing instance-level losses on attention branches within and outside the category, thereby enabling more precise predictions and deeper analysis in multimodal tasks.
In this model, we employ the LeakyReLU activation function18 to mitigate the potential issue of neuron death that may arise when using the traditional ReLU activation function during training. Unlike ReLU, which outputs zero for negative input values, LeakyReLU allows for a small negative output by introducing a slope parameter, that is multiplied by the input value. Specifically, the mathematical expression for LeakyReLU is given by Eq. (1):
For optimizer selection, we employed Nadam19 as the model optimizer. Nadam combines the strengths of the Adam optimizer and the Nesterov Accelerated Gradient (NAG), enhancing gradient prediction through the incorporation of Nesterov momentum. The Nadam algorithm leverages both the first-order and second-order moment estimates of gradients from Adam while utilizing the Nesterov momentum to update the parameters in advance. This approach enables the optimizer to capture the direction of the gradient more effectively, thereby accelerating convergence. The update rule for Nadam is presented in Algorithm 1.
Key area location
In the field of digital pathology, analyzing WSIs is a complex and time-intensive process. To improve the efficiency and accuracy of diagnosis, we propose a detection pipeline based on key region extraction that mimics the diagnostic workflow of pathologists (as shown in Fig. 2a). Unlike traditional multi-scale detection algorithms, which typically rely on simple concatenation of features from different layers (as shown in Fig. 2b), the proposed algorithm automatically identifies and extracts critical regions within pathological slides, providing more precise target areas for subsequent in-depth analysis.
The overall multi-scale detection workflow of the CFAMIL model is illustrated in Fig. 2c. Initially, we trained the model using 10× magnification data to obtain the preliminary training weights. These weights are then utilized to compute attention scores for patches in the target pathology image at the 10× scale, where each score reflects the model’s evaluation of the importance of each patch.
We subsequently reorder these attention scores and convert them into a percentage form to create a percentage array. This step allows us to obtain a focused attention score array at the 10× magnification level. Based on a predefined threshold, we filter out the image block coordinates with attention percentage scores greater than 70%. These coordinates correspond to critical regions in the pathology slide, which warrants further analysis and review. At the 20× magnification level, we resliced and extracted features from these areas of focus. Using the weights at the 20× magnification level, we perform predictions on the extracted features to obtain the final prediction results.
Compared to conventional single-scale detection strategies, the proposed method captures multi-scale information from histopathological images, substantially improving detection accuracy. As illustrated in Fig. 2c (Evaluation of Multi-scale Detection Results), our system integrates detection outcomes from 10× and 20× magnifications. When discrepancies arise between predictions at different scales, the system automatically flags the corresponding image with a “Discrepancy Detected” alert and generates a report highlighting these discordant cases. This facilitates targeted review by pathologists, allowing them to focus on and further investigate diagnostically critical regions.
Experiment setup
To ensure the rigor of the experiments and minimize random errors, this study maintained consistent parameters across comparative models to guarantee the uniformity of the experimental conditions. Additionally, the performance of different models at varying magnification levels was evaluated, with a particular focus on their accuracy in detecting and classifying lung cancer. The experiments were conducted on an Ubuntu 22.04 operating system with CUDA version 11.8, using Python 3.10 and PyTorch 2.1.2. The computational environment was supported by an Nvidia RTX 3080 × 2 GPU and an Intel(R) Xeon(R) Platinum 8352 V CPU. Table 1 provides detailed experimental parameters and relevant explanations.
Model evaluation
In this study, we utilized several key metrics to evaluate the performance of the model comprehensively, including accuracy (ACC), area under the curve (AUC), sensitivity, and specificity. The ACC was selected to measure the overall performance of the model, representing the proportion of correct predictions. Sensitivity refers to the proportion of true positive samples correctly identified as positive, whereas specificity denotes the proportion of true negative samples correctly identified as negative. The AUC metric was employed to assess the ability of the model to distinguish between positive and negative classes, with values closer to 1 indicating stronger classification performance.
In Eqs. (2), (3), and (4), TP (True Positive) represents the number of samples correctly classified as positive, TN (True Negative) denotes the number of samples correctly classified as negative, FP (False Positive) is the number of samples incorrectly classified as positive, and FN (False Negative) is the number of samples incorrectly classified as negative.
In Eq. (5), represents the number of positive samples, denotes the number of negative samples, is the predicted score for the -th positive sample, and is the predicted score for the -th negative sample.
Data acquisition
The dataset used in this study was derived from the LUSC and LUAD datasets of The Cancer Genome Atlas (TCGA). We collected 389 images of lung squamous cell carcinoma from TCGA-LUSC and 412 images of lung adenocarcinoma from TCGA-LUAD, for a total of 801 images. To evaluate the model’s generalization ability and reduce the risk of overfitting, we split the dataset into separate training, validation, and test sets, using a ratio of 5:2.5:2.5 to ensure adequate data for each stage of the learning and evaluation process. To avoid label leakage, we ensured that the training, validation, and test sets were strictly separated at the patient level, with no overlap of WSIs across different subsets.
Data preprocessing
In this study, to crop WSIs into smaller patches suitable for model input and remove background regions, we employed a series of image processing techniques. First, we converted the WSI from the RGB color space to the HSV color space to better separate color information from luminance information. Next, median blurring was applied to the saturation channel in HSV space to effectively reduce image noise and improve the accuracy of subsequent processing. The image was then binarized through thresholding, and edge smoothing was applied to generate a binary mask of the tissue regions. To further refine the mask, morphological closing operations were performed to fill small gaps and holes, ensuring the mask’s completeness. Based on an area threshold, we filter the approximate contours of the foreground objects to remove noise and small irrelevant regions, resulting in a more accurate tissue region mask9. Finally, patches of 256 × 256 pixels were cropped at 10× and 20× magnifications based on the mask, ensuring that each patch contained sufficient tissue information while avoiding overlap. The specific process is illustrated in Fig. 1a. This approach not only significantly improves the data processing efficiency but also reduces redundant computation during model training, laying the foundation for the efficient training of subsequent models.
Furthermore, we employed a ResNet17 feature extractor pretrained on ImageNet to obtain compact representations for each image patch. Leveraging a pretrained model enables the transfer of rich semantic features learned from large-scale natural image datasets, which is beneficial for downstream classification performance. Compared to directly using raw image patches, this approach significantly reduces the input dimensionality and allows all patches from a whole slide to be processed simultaneously within GPU memory. This eliminates the need for patch sampling and reduces the risk of introducing noisy labels, thereby improving both the efficiency and ACC of model training10.
Model construction
In this study, we propose a model termed Cross-scale Focus Area Multi-instance Learning (CFAMIL), designed to improve both the accuracy and efficiency of pathological image analysis, as illustrated in Fig. 1b. Building upon the CLAM model, we introduced a novel additive attention mechanism (AddNet), as shown in Fig. 1c. Specifically, input features are processed through two parallel linear layers, which extract different aspects of the input representations. The outputs are then merged and passed through a non-linear transformation using a Tanh activation function. Finally, the transformed features are integrated via a linear layer to capture complex feature interactions, thereby enhancing the model’s sensitivity to critical regions and improving its overall feature extraction capability. Additionally, the CFAMIL model incorporates a new category classifier, as illustrated in Fig. 1d. Unlike traditional MIL models, CFAMIL assigns independent linear classifiers to LUAD and LUSC, enabling more fine-grained learning of class-specific features. This design not only enhances the model’s sensitivity to inter-category differences but also improves its robustness when handling imbalanced datasets. The classifier consists of several independent neural network sequences, each corresponding to a specific category, and is responsible for learning and predicting the input features for that category. Each branch consists of the following components: (1) a linear layer that maps the input features into a lower-dimensional latent space; (2) a non-linear activation function to enhance model expressiveness; (3) a dropout layer that randomly deactivates neurons to prevent overfitting and improve generalization; and (4) a final linear layer that maps the intermediate features to a scalar output, representing the prediction score for the corresponding class. This architecture enables customized classification pathways for each category while maintaining parameter efficiency, thereby achieving more accurate recognition and finer class discrimination. This design enables the model to offer customized prediction paths for each category while maintaining parameter efficiency, thereby achieving more precise recognition and finer category distinctions in classification tasks.
Finally, the CFAMIL model introduces an instance-level evaluation mechanism10, which allows the model to evaluate and optimize the attention distribution of each instance during training. This mechanism refines the understanding of the model at the instance level by computing instance-level losses on attention branches within and outside the category, thereby enabling more precise predictions and deeper analysis in multimodal tasks.
In this model, we employ the LeakyReLU activation function18 to mitigate the potential issue of neuron death that may arise when using the traditional ReLU activation function during training. Unlike ReLU, which outputs zero for negative input values, LeakyReLU allows for a small negative output by introducing a slope parameter, that is multiplied by the input value. Specifically, the mathematical expression for LeakyReLU is given by Eq. (1):
For optimizer selection, we employed Nadam19 as the model optimizer. Nadam combines the strengths of the Adam optimizer and the Nesterov Accelerated Gradient (NAG), enhancing gradient prediction through the incorporation of Nesterov momentum. The Nadam algorithm leverages both the first-order and second-order moment estimates of gradients from Adam while utilizing the Nesterov momentum to update the parameters in advance. This approach enables the optimizer to capture the direction of the gradient more effectively, thereby accelerating convergence. The update rule for Nadam is presented in Algorithm 1.
Key area location
In the field of digital pathology, analyzing WSIs is a complex and time-intensive process. To improve the efficiency and accuracy of diagnosis, we propose a detection pipeline based on key region extraction that mimics the diagnostic workflow of pathologists (as shown in Fig. 2a). Unlike traditional multi-scale detection algorithms, which typically rely on simple concatenation of features from different layers (as shown in Fig. 2b), the proposed algorithm automatically identifies and extracts critical regions within pathological slides, providing more precise target areas for subsequent in-depth analysis.
The overall multi-scale detection workflow of the CFAMIL model is illustrated in Fig. 2c. Initially, we trained the model using 10× magnification data to obtain the preliminary training weights. These weights are then utilized to compute attention scores for patches in the target pathology image at the 10× scale, where each score reflects the model’s evaluation of the importance of each patch.
We subsequently reorder these attention scores and convert them into a percentage form to create a percentage array. This step allows us to obtain a focused attention score array at the 10× magnification level. Based on a predefined threshold, we filter out the image block coordinates with attention percentage scores greater than 70%. These coordinates correspond to critical regions in the pathology slide, which warrants further analysis and review. At the 20× magnification level, we resliced and extracted features from these areas of focus. Using the weights at the 20× magnification level, we perform predictions on the extracted features to obtain the final prediction results.
Compared to conventional single-scale detection strategies, the proposed method captures multi-scale information from histopathological images, substantially improving detection accuracy. As illustrated in Fig. 2c (Evaluation of Multi-scale Detection Results), our system integrates detection outcomes from 10× and 20× magnifications. When discrepancies arise between predictions at different scales, the system automatically flags the corresponding image with a “Discrepancy Detected” alert and generates a report highlighting these discordant cases. This facilitates targeted review by pathologists, allowing them to focus on and further investigate diagnostically critical regions.
Experiment setup
To ensure the rigor of the experiments and minimize random errors, this study maintained consistent parameters across comparative models to guarantee the uniformity of the experimental conditions. Additionally, the performance of different models at varying magnification levels was evaluated, with a particular focus on their accuracy in detecting and classifying lung cancer. The experiments were conducted on an Ubuntu 22.04 operating system with CUDA version 11.8, using Python 3.10 and PyTorch 2.1.2. The computational environment was supported by an Nvidia RTX 3080 × 2 GPU and an Intel(R) Xeon(R) Platinum 8352 V CPU. Table 1 provides detailed experimental parameters and relevant explanations.
Model evaluation
In this study, we utilized several key metrics to evaluate the performance of the model comprehensively, including accuracy (ACC), area under the curve (AUC), sensitivity, and specificity. The ACC was selected to measure the overall performance of the model, representing the proportion of correct predictions. Sensitivity refers to the proportion of true positive samples correctly identified as positive, whereas specificity denotes the proportion of true negative samples correctly identified as negative. The AUC metric was employed to assess the ability of the model to distinguish between positive and negative classes, with values closer to 1 indicating stronger classification performance.
In Eqs. (2), (3), and (4), TP (True Positive) represents the number of samples correctly classified as positive, TN (True Negative) denotes the number of samples correctly classified as negative, FP (False Positive) is the number of samples incorrectly classified as positive, and FN (False Negative) is the number of samples incorrectly classified as negative.
In Eq. (5), represents the number of positive samples, denotes the number of negative samples, is the predicted score for the -th positive sample, and is the predicted score for the -th negative sample.
Results
Results
Comparative experiment results
In our study, we comprehensively summarized the experimental results of the aforementioned models on the utilized datasets, with the detailed data presented in Table 2; Fig. 3. The table includes not only existing models such as ABMIL, CLAM-SB, CLAM-MB, DS-MIL, DTFD-MIL, FR-MIL, and WIKG-MIL but also our proposed novel model. Through thorough comparisons and analyses, we provide a holistic view of the performance of each model on key evaluation metrics. We adopted identical parameters and settings across all the comparative experiments to ensure the consistency and rigor of the experimental results. This standardization guaranteed an unbiased evaluation of model performance. According to the experimental data, our model achieved ACC and AUC scores of 0.970 and 0.978, respectively, surpassing the performances of existing models. This accomplishment underscores the exceptional capability and significant potential of our model in NSCLC subtype classification tasks.
Ablation experiment results
In this study, we conducted a series of ablation experiments to evaluate the impact of different model configurations on performance. The results of these experiments are summarized in Table 3, which includes the performance of CLAM-MB, our CFAMIL model under single-scale conditions, and its performance under multi-scale conditions.
The purpose of these ablation studies is to investigate the model’s performance across different scales and to assess the effectiveness of cross-scale focal region detection in enhancing model accuracy. The experimental results indicate that the CFAMIL model outperforms other baseline models in key performance metrics, such as the ACC and AUC. These findings further validate the efficacy of our model design, particularly in the context of cross-scale feature fusion, offering new perspectives and methodologies for the field of cross-scale multi-instance learning.
Generalization experiment results
To verify the broad applicability and scalability of our model, we further tested its performance on two independent datasets: the Clinical Proteomic Tumor Analysis Consortium (CPTAC) Cancer Imaging Archive (TCIA)20, and pathological images from Yantai Yuhuangding Hospital. These datasets provide an additional platform for evaluating model performance across different sources and conditions.
TCIA dataset: A total of 179 LUAD and 182 LUSC WSIs were obtained from the CPTAC TCIA pathology portal. To ensure consistency and comparability, all images underwent the same preprocessing pipeline as those from the TCGA dataset. The dataset was subsequently partitioned into training, validation, and test sets in a 5:2.5:2.5 ratio.
Yantai Yuhuangding hospital dataset: We collaborated with Yantai Yuhuangding Hospital to obtain 36 pathological images of LUAD and 7 images of LUSC. All image data were acquired and used in accordance with relevant ethical guidelines and legal regulations, with approval from the Ethics Committee of Yantai Yuhuangding Hospital approval number 2025-698. The images were subjected to the same standardized preprocessing pipeline as the TCGA dataset. This cohort was used exclusively to evaluate the generalization performance of the model trained on the TCGA dataset. Due to the retrospective nature of the study, Ethics Committee of Yantai Yuhuangding Hospital waived the need of obtaining informed consent.
Table 4; Figs. 4 and 5 summarize the classification performance of the proposed CFAMIL model on the CPTAC TCIA Lung and Yantai Yuhuangding Hospital datasets. On the TCIA dataset, CFAMIL achieved an ACC of 0.912, and an AUC of 0.967. On the external Yantai Yuhuangding Hospital dataset, the model maintained high performance with an ACC of 0.930, and an AUC of 0.968.
These results demonstrate that CFAMIL consistently outperforms baseline methods across most evaluation metrics, validating its strong generalization capability. The model’s stable performance on both internal and external datasets highlights its robustness and reliability in real-world clinical scenarios, where variations in staining, scanning equipment, and population characteristics often pose significant challenges to model generalization.
Model interpretability
LUAD and LUSC exhibit significant differences in terms of cell morphology, pathological structure, and disease progression. LUAD typically originates from glandular cells of the lung and is characterized by nuclear atypia and unique glandular structures, whereas lung squamous cell carcinoma arises from squamous epithelial cells on the respiratory tract surface, with pathological features such as keratinization and distinct intercellular bridges. Therefore, the ability of our model to accurately differentiate between these two subtypes of lung cancer is essential, as it directly affects whether patients receive accurate diagnoses and timely and effective treatments.
In Fig. 6, we present a detailed visualization of the model’s ability to distinguish between LUAD and LUSC. The figure shows the model’s performance on different datasets, with images shown in three parts: the leftmost image shows the cancerous regions annotated by pathologists, the middle image is a visualization heatmap generated by the model, and the rightmost image highlights the areas with high attention. By comparing the annotated images with the regions of high attention in the model’s heatmap, we clearly demonstrate how the model accurately identifies and distinguishes the pathological features of LUAD and LUSC, revealing the distribution of the model’s focus.
In these heatmaps, the model’s sensitivity to specific pathological features of LUAD and LUSC is particularly noteworthy. For LUAD images, the model focuses on regions with nuclear atypia and glandular structures, which are characteristic of this subtype. The heatmap visually highlights these critical areas with red regions, indicating patches that the model strongly attends to and corresponds to the most significant cancerous areas. Similarly, for LUSC images, the model focuses on areas with keratinization and intercellular bridges, which are key pathological features of this subtype. The degree of focus in these regions is positively correlated with the degree of malignancy of the tissue, providing valuable visual cues for pathologists and precisely pointing to the potential malignant changes identified by the model.
Through this intuitive comparison, the model successfully identifies the distinct pathological features of LUAD and LUSC, demonstrating its exceptional interpretability and clinical applicability in differentiating between these two subtypes of lung cancer.
Comparative experiment results
In our study, we comprehensively summarized the experimental results of the aforementioned models on the utilized datasets, with the detailed data presented in Table 2; Fig. 3. The table includes not only existing models such as ABMIL, CLAM-SB, CLAM-MB, DS-MIL, DTFD-MIL, FR-MIL, and WIKG-MIL but also our proposed novel model. Through thorough comparisons and analyses, we provide a holistic view of the performance of each model on key evaluation metrics. We adopted identical parameters and settings across all the comparative experiments to ensure the consistency and rigor of the experimental results. This standardization guaranteed an unbiased evaluation of model performance. According to the experimental data, our model achieved ACC and AUC scores of 0.970 and 0.978, respectively, surpassing the performances of existing models. This accomplishment underscores the exceptional capability and significant potential of our model in NSCLC subtype classification tasks.
Ablation experiment results
In this study, we conducted a series of ablation experiments to evaluate the impact of different model configurations on performance. The results of these experiments are summarized in Table 3, which includes the performance of CLAM-MB, our CFAMIL model under single-scale conditions, and its performance under multi-scale conditions.
The purpose of these ablation studies is to investigate the model’s performance across different scales and to assess the effectiveness of cross-scale focal region detection in enhancing model accuracy. The experimental results indicate that the CFAMIL model outperforms other baseline models in key performance metrics, such as the ACC and AUC. These findings further validate the efficacy of our model design, particularly in the context of cross-scale feature fusion, offering new perspectives and methodologies for the field of cross-scale multi-instance learning.
Generalization experiment results
To verify the broad applicability and scalability of our model, we further tested its performance on two independent datasets: the Clinical Proteomic Tumor Analysis Consortium (CPTAC) Cancer Imaging Archive (TCIA)20, and pathological images from Yantai Yuhuangding Hospital. These datasets provide an additional platform for evaluating model performance across different sources and conditions.
TCIA dataset: A total of 179 LUAD and 182 LUSC WSIs were obtained from the CPTAC TCIA pathology portal. To ensure consistency and comparability, all images underwent the same preprocessing pipeline as those from the TCGA dataset. The dataset was subsequently partitioned into training, validation, and test sets in a 5:2.5:2.5 ratio.
Yantai Yuhuangding hospital dataset: We collaborated with Yantai Yuhuangding Hospital to obtain 36 pathological images of LUAD and 7 images of LUSC. All image data were acquired and used in accordance with relevant ethical guidelines and legal regulations, with approval from the Ethics Committee of Yantai Yuhuangding Hospital approval number 2025-698. The images were subjected to the same standardized preprocessing pipeline as the TCGA dataset. This cohort was used exclusively to evaluate the generalization performance of the model trained on the TCGA dataset. Due to the retrospective nature of the study, Ethics Committee of Yantai Yuhuangding Hospital waived the need of obtaining informed consent.
Table 4; Figs. 4 and 5 summarize the classification performance of the proposed CFAMIL model on the CPTAC TCIA Lung and Yantai Yuhuangding Hospital datasets. On the TCIA dataset, CFAMIL achieved an ACC of 0.912, and an AUC of 0.967. On the external Yantai Yuhuangding Hospital dataset, the model maintained high performance with an ACC of 0.930, and an AUC of 0.968.
These results demonstrate that CFAMIL consistently outperforms baseline methods across most evaluation metrics, validating its strong generalization capability. The model’s stable performance on both internal and external datasets highlights its robustness and reliability in real-world clinical scenarios, where variations in staining, scanning equipment, and population characteristics often pose significant challenges to model generalization.
Model interpretability
LUAD and LUSC exhibit significant differences in terms of cell morphology, pathological structure, and disease progression. LUAD typically originates from glandular cells of the lung and is characterized by nuclear atypia and unique glandular structures, whereas lung squamous cell carcinoma arises from squamous epithelial cells on the respiratory tract surface, with pathological features such as keratinization and distinct intercellular bridges. Therefore, the ability of our model to accurately differentiate between these two subtypes of lung cancer is essential, as it directly affects whether patients receive accurate diagnoses and timely and effective treatments.
In Fig. 6, we present a detailed visualization of the model’s ability to distinguish between LUAD and LUSC. The figure shows the model’s performance on different datasets, with images shown in three parts: the leftmost image shows the cancerous regions annotated by pathologists, the middle image is a visualization heatmap generated by the model, and the rightmost image highlights the areas with high attention. By comparing the annotated images with the regions of high attention in the model’s heatmap, we clearly demonstrate how the model accurately identifies and distinguishes the pathological features of LUAD and LUSC, revealing the distribution of the model’s focus.
In these heatmaps, the model’s sensitivity to specific pathological features of LUAD and LUSC is particularly noteworthy. For LUAD images, the model focuses on regions with nuclear atypia and glandular structures, which are characteristic of this subtype. The heatmap visually highlights these critical areas with red regions, indicating patches that the model strongly attends to and corresponds to the most significant cancerous areas. Similarly, for LUSC images, the model focuses on areas with keratinization and intercellular bridges, which are key pathological features of this subtype. The degree of focus in these regions is positively correlated with the degree of malignancy of the tissue, providing valuable visual cues for pathologists and precisely pointing to the potential malignant changes identified by the model.
Through this intuitive comparison, the model successfully identifies the distinct pathological features of LUAD and LUSC, demonstrating its exceptional interpretability and clinical applicability in differentiating between these two subtypes of lung cancer.
Discussion
Discussion
In this study, we propose an innovative MIL framework for cross-scale key region detection to classify NSCLC subtypes. The superior performance of our method over existing MIL-based approaches can be attributed to three primary factors. First, the additive attention mechanism improves feature aggregation by more effectively focusing on diagnostically relevant instances beyond standard attention mechanisms. Second, the separate category classifier enhances subtype discrimination by capturing inter-class heterogeneity, which is often overlooked in general-purpose classifiers. Third, the cross-scale focal region detection strategy enables the model to integrate morphological cues across multiple magnification levels, a critical factor in histopathological image interpretation. Collectively, these architectural innovations contribute to improved model sensitivity, robustness, and generalization, as reflected by consistently high performance on both internal and external datasets.
Compared with previous studies, our proposed model achieves substantial improvements in ACC. We perform a comprehensive comparative analysis with existing MIL models, including ABMIL, CLAM, DS-MIL, DTFD-MIL, FR-MIL, and WIKG-MIL. The experimental results demonstrate that our model outperforms these methods in critical metrics, such as the ACC and AUC, achieving an ACC of 97% on the TCGA Lung dataset. Unlike other multi-scale detection approaches, our ROI localization process not only integrates multi-scale pathological features, but also effectively reduces the interference of irrelevant regions in pathology slides, enhancing detection efficiency. Although WIKG-MIL performs relatively well in comparative studies, its knowledge-guided attention mechanism requires the calculation of attention scores for each node and its neighbors, which involves many matrix operations and requires substantial computational resources. In contrast, our model is less resource-intensive and demonstrates broader device compatibility. The generalization experiment further confirmed that when applied to the detection tasks of other datasets, the CFAMIL model can also maintain its advantages in various key indicators.
Beyond numerical performance, an important aspect is the potential clinical applicability of the model. Although CFAMIL generates predictions rapidly, we did not conduct formal statistical analyses of diagnostic time, and therefore no definitive conclusions can be drawn regarding diagnostic time. Nevertheless, the ability of the model to operate continuously without manual intervention highlights its practical advantages in real-world workflows. This is consistent with recent advances in AI-assisted clinical tools—for example, Ding et al. demonstrated that prompt-engineered ChatGPT can generate electronic medical records before patient visits, improving workflow efficiency in lung nodule screening21. In digital pathology, Feng et al. proposed STASNet for real-time detection of spread-through-air-space patterns in lung adenocarcinoma, achieving high tile-level accuracy and successful deployment in diagnostic settings22. Similarly, Ding et al. developed a ResNet34-based model for LUAD subtype classification that produces spatial prognosis scores and performs comparably to senior pathologists23. Similar to these studies, our CFAMIL framework complements existing AI-assisted clinical tools by emphasizing cross-scale feature detection while maintaining computational efficiency and strong generalizability, making it highly suitable for integration into high-throughput diagnostic pipelines.
Moreover, CFAMIL may provide substantial benefits in regions with limited access to experienced pathologists. In such contexts, automated tools can assist less-experienced clinicians in improving diagnostic accuracy and provide reliable support, echoing findings from related studies emphasizing AI-assisted pathology. By alleviating routine diagnostic workload, our model can enable pathologists to devote more time to complex or ambiguous cases requiring expert interpretation, thereby complementing rather than replacing human expertise. In future work, we plan to perform systematic and statistically rigorous comparisons between CFAMIL and pathologists, evaluating not only diagnostic efficiency but also accuracy and inter-observer consistency, to further validate its practical advantages.
However, this study has several limitations. The current framework was primarily validated in distinguishing LUAD from LUSC, demonstrating its feasibility but not yet covering the full spectrum of NSCLC subtypes. Future research should extend the classification to include more detailed LUAD subtypes and other clinically relevant categories, while further optimization of the architecture may enhance accuracy in subtype identification. Despite these limitations, the findings highlight the potential of our framework to advance lung cancer diagnostics and provide a foundation for more personalized and precise therapeutic strategies.
In this study, we propose an innovative MIL framework for cross-scale key region detection to classify NSCLC subtypes. The superior performance of our method over existing MIL-based approaches can be attributed to three primary factors. First, the additive attention mechanism improves feature aggregation by more effectively focusing on diagnostically relevant instances beyond standard attention mechanisms. Second, the separate category classifier enhances subtype discrimination by capturing inter-class heterogeneity, which is often overlooked in general-purpose classifiers. Third, the cross-scale focal region detection strategy enables the model to integrate morphological cues across multiple magnification levels, a critical factor in histopathological image interpretation. Collectively, these architectural innovations contribute to improved model sensitivity, robustness, and generalization, as reflected by consistently high performance on both internal and external datasets.
Compared with previous studies, our proposed model achieves substantial improvements in ACC. We perform a comprehensive comparative analysis with existing MIL models, including ABMIL, CLAM, DS-MIL, DTFD-MIL, FR-MIL, and WIKG-MIL. The experimental results demonstrate that our model outperforms these methods in critical metrics, such as the ACC and AUC, achieving an ACC of 97% on the TCGA Lung dataset. Unlike other multi-scale detection approaches, our ROI localization process not only integrates multi-scale pathological features, but also effectively reduces the interference of irrelevant regions in pathology slides, enhancing detection efficiency. Although WIKG-MIL performs relatively well in comparative studies, its knowledge-guided attention mechanism requires the calculation of attention scores for each node and its neighbors, which involves many matrix operations and requires substantial computational resources. In contrast, our model is less resource-intensive and demonstrates broader device compatibility. The generalization experiment further confirmed that when applied to the detection tasks of other datasets, the CFAMIL model can also maintain its advantages in various key indicators.
Beyond numerical performance, an important aspect is the potential clinical applicability of the model. Although CFAMIL generates predictions rapidly, we did not conduct formal statistical analyses of diagnostic time, and therefore no definitive conclusions can be drawn regarding diagnostic time. Nevertheless, the ability of the model to operate continuously without manual intervention highlights its practical advantages in real-world workflows. This is consistent with recent advances in AI-assisted clinical tools—for example, Ding et al. demonstrated that prompt-engineered ChatGPT can generate electronic medical records before patient visits, improving workflow efficiency in lung nodule screening21. In digital pathology, Feng et al. proposed STASNet for real-time detection of spread-through-air-space patterns in lung adenocarcinoma, achieving high tile-level accuracy and successful deployment in diagnostic settings22. Similarly, Ding et al. developed a ResNet34-based model for LUAD subtype classification that produces spatial prognosis scores and performs comparably to senior pathologists23. Similar to these studies, our CFAMIL framework complements existing AI-assisted clinical tools by emphasizing cross-scale feature detection while maintaining computational efficiency and strong generalizability, making it highly suitable for integration into high-throughput diagnostic pipelines.
Moreover, CFAMIL may provide substantial benefits in regions with limited access to experienced pathologists. In such contexts, automated tools can assist less-experienced clinicians in improving diagnostic accuracy and provide reliable support, echoing findings from related studies emphasizing AI-assisted pathology. By alleviating routine diagnostic workload, our model can enable pathologists to devote more time to complex or ambiguous cases requiring expert interpretation, thereby complementing rather than replacing human expertise. In future work, we plan to perform systematic and statistically rigorous comparisons between CFAMIL and pathologists, evaluating not only diagnostic efficiency but also accuracy and inter-observer consistency, to further validate its practical advantages.
However, this study has several limitations. The current framework was primarily validated in distinguishing LUAD from LUSC, demonstrating its feasibility but not yet covering the full spectrum of NSCLC subtypes. Future research should extend the classification to include more detailed LUAD subtypes and other clinically relevant categories, while further optimization of the architecture may enhance accuracy in subtype identification. Despite these limitations, the findings highlight the potential of our framework to advance lung cancer diagnostics and provide a foundation for more personalized and precise therapeutic strategies.
Conclusions
Conclusions
This study presents a multi-instance learning approach for cross-scale focal region detection specifically designed for the classification of NSCLC subtypes. By employing a novel additive attention mechanism and class classifier in conjunction with our multi-scale feature detection pipeline, we significantly enhanced the accuracy of NSCLC subtype classification. In comparative analyses with previous multi-instance learning models, such as ABMIL, CLAM, DS-MIL, DTFD-MIL, FR-MIL, and WIKG-MIL, our method demonstrates substantial improvements in key metrics such as ACC and AUC, with up to 97% of pathological images correctly classified. This achievement highlights the exceptional efficiency and feasibility of our system for NSCLC classification. Generalization experiments further validate the model’s strong adaptability, as it continues to outperform other models when directly transferred to tests on different datasets, laying a solid foundation for the application of the model in a broader range of medical image classification tasks.
By integrating cutting-edge machine learning techniques with medical imaging, this study not only injects significant momentum into the field of lung cancer diagnosis, but also underscores the vast potential of artificial intelligence tools in revolutionizing healthcare. These tools provide faster and more accurate diagnoses and personalized treatment plans, offering valuable insights and experience for future research and practical exploration in related areas.
This study presents a multi-instance learning approach for cross-scale focal region detection specifically designed for the classification of NSCLC subtypes. By employing a novel additive attention mechanism and class classifier in conjunction with our multi-scale feature detection pipeline, we significantly enhanced the accuracy of NSCLC subtype classification. In comparative analyses with previous multi-instance learning models, such as ABMIL, CLAM, DS-MIL, DTFD-MIL, FR-MIL, and WIKG-MIL, our method demonstrates substantial improvements in key metrics such as ACC and AUC, with up to 97% of pathological images correctly classified. This achievement highlights the exceptional efficiency and feasibility of our system for NSCLC classification. Generalization experiments further validate the model’s strong adaptability, as it continues to outperform other models when directly transferred to tests on different datasets, laying a solid foundation for the application of the model in a broader range of medical image classification tasks.
By integrating cutting-edge machine learning techniques with medical imaging, this study not only injects significant momentum into the field of lung cancer diagnosis, but also underscores the vast potential of artificial intelligence tools in revolutionizing healthcare. These tools provide faster and more accurate diagnoses and personalized treatment plans, offering valuable insights and experience for future research and practical exploration in related areas.
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- A Phase I Study of Hydroxychloroquine and Suba-Itraconazole in Men with Biochemical Relapse of Prostate Cancer (HITMAN-PC): Dose Escalation Results.
- Self-management of male urinary symptoms: qualitative findings from a primary care trial.
- Clinical and Liquid Biomarkers of 20-Year Prostate Cancer Risk in Men Aged 45 to 70 Years.
- Diagnostic accuracy of Ga-PSMA PET/CT versus multiparametric MRI for preoperative pelvic invasion in the patients with prostate cancer.
- Comprehensive analysis of androgen receptor splice variant target gene expression in prostate cancer.
- Clinical Presentation and Outcomes of Patients Undergoing Surgery for Thyroid Cancer.