A data fusion deep learning approach for accurate organelle-based classification of cancer cells.

Harrison J. Yee; Megan Bouyea; Joshua Goldwag; John M. Lamar; Xavier Intes; Uwe Kruger; Margarida Barroso

doi:10.1007/s13755-025-00425-8

← 뒤로

A data fusion deep learning approach for accurate organelle-based classification of cancer cells.

2/5 보강

Health information science and systems 2026 Vol.14(1) p. 37 OA Cell Image Analysis Techniques

OpenAlex 토픽 · Cell Image Analysis Techniques AI in cancer detection Gene expression and cancer classification

Yee H, Bouyea M, Goldwag J, Lamar JM, Intes X, Kruger U, Barroso M

📖 무료 전문 🟢 PMC 전문 PMC12881194 🔓 OA PDF oa

PubMed ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

[PURPOSE] Microscopy-based cancer cell classification traditionally relies on cell-based morphological features, while subcellular organelle organization remains underutilized.

이 논문을 인용하기

↓ .bib ↓ .ris

APA Harrison J. Yee, Megan Bouyea, et al. (2026). A data fusion deep learning approach for accurate organelle-based classification of cancer cells.. Health information science and systems, 14(1), 37. https://doi.org/10.1007/s13755-025-00425-8

MLA Harrison J. Yee, et al.. "A data fusion deep learning approach for accurate organelle-based classification of cancer cells.." Health information science and systems, vol. 14, no. 1, 2026, pp. 37.

PMID 41659840 ↗

DOI 10.1007/s13755-025-00425-8

Abstract

[PURPOSE] Microscopy-based cancer cell classification traditionally relies on cell-based morphological features, while subcellular organelle organization remains underutilized. Existing machine learning methods often require manual preprocessing and handcrafted feature extraction, limiting scalability and introducing user bias. This study proposes an automated, interpretable, and organelle-focused deep learning framework for classifying breast cancer cell lines from high-resolution fluorescence microscopy images.

[METHODS] We developed an end-to-end framework that incorporates patch-based sampling, sparsity filtering, and a channel-wise intermediate fusion strategy to independently extract and integrate organelle-specific features. Model interpretability was assessed using Grad-CAM visualizations and single-organelle classifier analyses. The framework was evaluated on fluorescence microscopy images from six breast cancer cell lines using 5-fold cross-validation.

[RESULTS] The proposed framework achieved a classification accuracy of %, performing comparably to or exceeding conventional handcrafted feature-based approaches while eliminating the need for manual segmentation and 3D rendering steps. Interpretability and classifier analyses revealed inter-organelle dependencies and mitochondria as the most informative contributors to classification decisions.

[CONCLUSION] Organelle morphology and spatial organization provide strong discriminative signals for cancer cell classification. The proposed framework offers a scalable, automated, and interpretable deep learning solution that advances microscopy-based phenotyping and supports broader applications in computational pathology and cellular informatics.

📖 전문 본문 읽기 PMC JATS · ~78 KB · 영문

Introduction

Introduction
High phenotypic and morphologic heterogeneity is found across the spectrum of human breast cancer cell lines. An increasing focus has been placed on understanding how differences in morphology affect cancer cell phenotypes. Most research has been based on the monitoring of cancer biomarkers using genetic, biochemical, and microscopic analysis of cancer cells. Recently, phenotypic analysis through microscopy, such as immunofluorescence and immunohistochemistry, has been an emerging approach in cancer research due to its ability to provide a comprehensive view of cellular organization and morphology, capturing spatial relationships that are not observable through biochemical and molecular assays alone [1, 2]. Furthermore, advances in super-resolution microscopy have greatly improved visualization of subcellular organelle architecture, allowing for a deeper understanding of how organelle structure and morphology regulate signaling, trafficking, and metabolic processes [3–7].
Building on these technological advances, computational image analysis now increasingly uses machine learning and deep learning methods to detect, classify, and quantitatively interpret cancer cell morphology from microscopy slides. Classical machine learning methods for object recognition typically use image filtering for feature extraction and machine learning classifiers. In microscopy-based cancer cell classification, these methods have been employed to extract morphological, topological, and textural object features from cells and their nuclei, using k-nearest neighbors, support vector machine, naive Bayes, or decision tree classifiers [8–11].
Despite the wide variety of techniques, these analyses have largely centered on readily detectable cellular and nuclear often involving lower-resolution images with numerous cells, emphasizing cell morphology rather than specific biomarkers or subcellular organization. Even when biomarkers are included, they generally capture bulk cancer-related properties rather than subcellular organelle morphology and distribution. In contrast, high-resolution fluorescence microscopy enables direct visualization of subcellular organelles, such as mitochondria, early endosomes, and recycling endosomes, which have been known to undergo functional and structural alterations during breast cancer progression [12, 13]. Previously, we have developed the Organelle Topology Cell Classification Pipeline (OTCCP) to classify different cancer cell lines based on the shape and spatial distribution (morphology and topology) of immunostained subcellular organelles using high-resolution microscopy [14]. The importance of OTCCP lies in its ability to leverage organelle morphology and topology as distinctive identifiers for breast cancer cell line classification, obtaining over 90% accuracy and thus establishing spatial organelle organization as a biologically meaningful and discriminative feature of cancer phenotypes. However, OTCCP relies on labor-intensive manual preprocessing to perform cell segmentation, interpolation and three-dimensional (3D) object rendering, thereby introducing potential user variability. Moreover, differences in cell size, segmentation accuracy, and imaging quality can lead to biased feature extraction and reduced scalability.
Recent advances in deep learning have provided end-to-end alternatives for image-based cancer analysis, reducing dependence on manual feature engineering. A wide range of architectures and strategies have been explored to optimize performance across different imaging modalities and conditions. In terms of breast cancer histopathology, many approaches have been implemented on the BACH dataset with variations in network architecture and preprocessing strategy to determine the most effective classifier for whole slide histology images [15]. Ensemble CNN models have improved whole-slide classification performance [16], and hybrid networks combining convolutional, recurrent, or transformer components have shown further improvements [17, 18]. To address variability in imaging conditions, several methods have been developed to improve robustness and reduce classification bias, including magnification-independent models such as MCUa [19], multiple-instance learning with ResNet50 for whole-slide classification [20], and adversarial multiple-instance learning strategies that further enhance generalization [21, 22]. However, these deep learning approaches continue to prioritize classification accuracy while functioning largely as black-box models with limited interpretability, thereby obscuring the contribution of underlying subcellular features to their predictions.
In this work, we present an end-to-end deep learning framework for organelle-based breast cancer cell line classification using high-resolution Airyscan fluorescence microscopy. Rather than relying on whole-cell or nuclear morphology, the method focuses exclusively on subcellular organelles captured in preprocessed fluorescence channels, treating them as biologically informative biomarkers of cancer-associated processes. We developed a novel framework for multiplexed channel analysis that uses a channel-wise intermediate data-fusion module to independently extract and integrate organelle-specific features prior to classification. In addition to the fused multi-organelle model, we evaluate single-channel and multi-channel configurations and use Grad-CAM visualizations to identify the organelles most indicative of each cancer cell type, providing interpretable insights into the biological determinants of model predictions. Compared to our previous OTCCP methodology, this framework attains similar classification accuracy while offering substantially higher throughput and consistency through fully automated organelle-level analysis.
Herein, we describe the experimental design, covering cell preparation, imaging and dataset processing (Section 2). This is followed by the description of the network architecture, and validation strategy. Subsequently, we present classification results together with comparative analyses of fusion strategies and visual interpretability assessments (Section 3). The discussion then addresses the implications of these findings, highlights potential limitations, and outlines directions for future work (Section 4). The paper concludes with a summary of the key insights gained from this study (Section 5).

Methods

Methods

Cell culture
Cell lines acquired from ATCC (Manassas, VA, USA) were grown at 37 °C in a 5% CO2 incubator and tested routinely for mycoplasma contamination. MCF10A cells were cultured in DMEM/F12 (catalog 11320; ThermoFisher, Waltham, MA, USA) with 5% horse serum (catalog 16050, ThermoFisher), 20 ng/mL EGF, 0.5 mg/mL hydrocortisone, 100 ng/mL Cholera toxin, 10 g/mL bovine insulin with 1x penicillin/streptomycin. T47D, MDA-MB-231, MDA-MB-436, and MDA-MB-468 cells were cultured in DMEM (catalog 11965-092, ThermoFisher) with 10% fetal bovine serum (FBS; catalog 30–2020, ATCC), 4 mM L-glutamine and 10 mM HEPES, pH 7.4. AU565 cells were cultured in RPMI-1640 Medium (catalog 22400-089; Gibco by Life Technologies) with 10% fetal bovine serum (FBS; catalog 30–2020, ATCC). Clear imaging media consisted of phenol-free DMEM with 0.5% Bovine Serum albumin (BSA; catalog A9085, Millipore-Sigma), 4 mM L-glutamine and 20 mM HEPES, pH 7.4. The characteristics of the breast cancer cell lines are as following: AU565 cells displays amplification and overexpression of epithelial growth factor receptor 2 (HER2), and expression of HER3, HER4 and p53 oncogenes. T47D cells are luminal A subtype expressing estrogen receptor (ER) and progesterone receptor (PR). MDA-MB-231 and MDA-MB-468 are basal triple negative breast cancer (TNBC) subtype, lacking expression of HER2, ER and PR. MDA-MB-231 shows lower EGFR expression, while MDA-MB-468 displays EGFR amplification. MDA-MB-436 are claudin-low TNBC subtype. MCF10A is an immortalized, non-tumorigenic human mammary epithelial cell line used as a model for normal breast epithelial cells.

Transferrin (Tf) internalization, cell fixation, immunofluorescence and airyscan microscopy
Cells were cultured overnight on an -Slide 8 well glass bottom plate (Lot 171005/3, Ibidi), pre-incubated for 30 min with clear imaging media, incubated with AF568-transferrin (Tf) (catalog T-23365, Invitrogen) (25 g/mL) for 10 min at 37oC to label recycling endosomes. and then washed and fixed with 4% paraformaldehyde (PFA) for 10 min. Tf-containing cells were permeabilized with 0.1% TritonX-100 in phosphate-buffered saline (ThermoFisher BP151) for 15 min at room temperature, blocked for 90 min on a gentle rocker-shaker in 2% fish skin gelatin (FGS, G7765, Millipore-Sigma, St. Louis, MO), 1% bovine serum albumin in PBS. Subsequent washing and antibody blocking were completed with 0.5% FSG, 0.05% TX-100 in PBS. Two primary antibodies were used per experiment. Primary antibodies included anti-EEA1 (catalog 610456, BD Bioscience), and Tom20 (FL-145) (catalog sc-11415, Santa Cruz Biotechnology, Inc.). EEA1 (early endosomal antigen 1) is a marker for early endosomes while Tom20 is a marker for the outer mitochondrial membrane.
Upon completion of both primary and secondary antibody staining using F(ab)’2 secondary antibodies labeled with AF488 or AF647 (catalog A11070, A21237, Life Technologies, respectively), cells underwent postfixation with 4% PFA for 5 min followed by nuclear labeling using DAPI counterstaining for 15 min (1:1). All solutions were 0.2 m syringe filtered. 26 three-dimensional (z-stack image series), multichannel images were collected on a Zeiss LSM880 with Airyscan detector in SR mode under Nyquist sampling and subjected to Airyscan processing (pixel reassignment). Four channels allowed for the visualization of the following organelles: i) early endosomal compartments immunostained with anti-EEA1, ii) mitochondria immunostained with anti-Tom20, iii) recycling endosomes directly labeled with AF568-Tf and iv) the nucleus labeled with DAPI. Only EEA1, Tom20 and Tf -labeled images were used for further analysis.

Patch sampling and sparsity filtering
In the OTCCP workflow, three-dimensional rendering of organelle objects and extraction of organelle topology and morphology handcrafted features for incorporation into the OTCCP framework are performed using Imaris image analysis software, as described in our previous work [14]. This object-based preprocessing results in many individual organelle objects being analyzed per image, thereby augmenting the dataset to focus specifically on organelles of interest. These handcrafted organelle-level features were subsequently used as inputs to a deep neural network (DNN) classifier, and the resulting performance is reported in Table 1. The new proposed framework performs direct classification on microscopy images instead of relying on 3D object rendering. Microscopy images are spatial arrays in which fluorescently labeled cellular markers produce structured signals against a predominantly black background. Thus, the image consists of sparse, structured signals corresponding to cells and organelles embedded in a predominantly dark field, where pixel intensities directly reflect the presence and abundance of biological targets. Therefore, our end-to-end deep learning pipeline integrates a sparsity-filtered image patch preprocessing approach to form a larger imaging dataset for deep learning. First, raw 3D fluorescent confocal microscopy images were partitioned by z-stacks to form 2D images of each XY plane in the image. Patches from each slice are then extracted at specified pixel sizes, resulting in a dataset of patches derived from the original microscopy images. Each patch was subsequently assigned a class label based on the corresponding cell line for the original image, as confocal images in this experiment were taken on monocultures.

Second, to facilitate classification based on subcellular features and avoid the misinterpretation of sparse backgrounds as informative features, a threshold filtering-based approach is conducted after the initial patch generation. Global Otsu Thresholding is applied on the entire 3D confocal image, thus determining a threshold value to separate cell material from the background. This threshold is applied to each patch, thus resulting in cellular organelle masks for each patch. Subsequently, a foreground ratio is calculated for each patch, and sparse patches below a certain value are discarded from the dataset. Figure 1 illustrates this patch preprocessing pipeline.

Patch dataset specifications
To avoid potential deep learning bias, the image channel relating to the nucleus was excluded to limit the analysis toward organelle features. Consequently, the resulting image sizes featured XY plane dimensions of 1248x1248 or 1364x1364 pixels, z-stack depths ranging from 43 to 207 slices, and 3 channels. To further improve network robustness, on-the-fly data augmentation was applied during training, including random horizontal and vertical flips, 90-degree rotations, cropping, and small-angle rotation.

Network architecture, training, and validation
We implemented several homogenous data fusion strategies, all implemented within a standardized pipeline that maintained identical preprocessing, encoder architecture, and hyperparameter tuning to enable fair comparison. Each network used a ResNet50 backbone in which image patches were processed through convolutional layers for feature extraction, initialized with ImageNet pretraining for improved performance and generalization [23–25]. The resulting features were passed to fully connected layers that assigned a cell line label to each patch. Hyperparmeter tuning, implemented through Keras hyperband tuner, was used to determine optimal hyperparameters for the network. This resulted in a batch size of 32, an Adam optimizer with an initial learning rate of 3 under exponential decay of 0.95 every 175 steps, dropout layers set to 0.4, batch normalization applied after each convolutional block, L2 regularization for fully connected layers, and categorical cross-entropy as the loss function. Early stopping based upon validation accuracy with a maximum of 100 epochs and a patience of 15 epochs was used to further prevent overfitting. Training employed stratified 5-fold cross-validation at the patch level, ensuring each fold contained a balanced distribution of patches from all cell lines. Furthermore, patches derived from the same original microscopy image were kept within the same fold to avoid data leakage and artificially inflated performance. The dataset was split into 80% training and 20% validation within each fold, with the final evaluation performed on a held-out test set. To address class imbalance in the number of patches per cell line, class weighting was applied to the loss function, where each class weight was inversely proportional to its relative frequency in the training set. This weighting ensured that minority classes contributed proportionally more to the loss, reducing bias toward more represented cell lines. The early fusion architecture contained approximately 26 million trainable parameters, whereas the intermediate and late fusion networks each contained approximately 72 million parameters. All network training was performed in TensorFlow and Keras on an NVIDIA RTX A6000 GPU.

Results

Results

Data fusion framework
We evaluated three main architectures for combining organelle information: single channel classifiers, early fusion, and two homogeneous fusion strategies known as intermediate fusion and late fusion.The single channel classifiers use one organelle channel at a time, with each organelle-specific image preprocessed to fit the ResNet50 encoder input requirement for patch-level classification. The early fusion network serves as the baseline classifier and is shown in Figure 2 where all organelle channels are concatenated at the input and processed together in a single ResNet50 encoder. This architecture represents the standard approach, where all channels jointly contribute to one feature extraction pathway.
The homogeneous data fusion strategies are illustrated in Figure 3. The intermediate fusion architecture extracts features from each organelle channel separately. Each channel is passed through its own ResNet50 encoder with a modified first convolutional layer for single-channel input. After global average pooling, the resulting feature vectors are concatenated and supplied to a shared dense layer for classification. In contrast, the late fusion architecture performs separate feature extraction and also classification for each organelle. Each organelle channel is processed by its own ResNet50 encoder and MLP classifier, and the resulting prediction scores are combined into a final label using a weighting scheme.

Classifier performance evaluation
Classifier performance for each framework was evaluated using 5-fold cross-validation at the previously determined optimal patch receptive field size of 350x350 pixels on various classifiers: single channel classifiers, early fusion, intermediate fusion, and late fusion. All classifiers were evaluated using fluorescence microscopy images of early endosomes, recycling endosomes, and mitochondria, selected based on their relevance to breast cancer biology and prior use as informative biomarkers for cancer cell line classification [14]. In addition, a previously developed handcrafted feature–based classifier from the Organelle Topology Cell Classification Pipeline (OTCCP) was evaluated as a baseline comparator, using Topological Parameter Group (TPG) features that quantify the spatial distribution of either individual organelles or inter-organelle contacts within cells, followed by classification using a deep neural network (DNN) [14]. These results are summarized in Table 1 which includes metrics for balanced weighted accuracy, Area Under the Receiver Operating Curve (AUROC), Matthews Correlation Coefficient (MCC), and macro-averaged F1 score [26]. Furthermore, Figures 4 and 5 together summarize classifier-level performance, with the confusion matrices detailing the true positive, true negative, false positive, and false negative predictions for each class, and the corresponding precision–recall (PR) curves providing complementary evaluation of the same classifiers. To assess the robustness of these results with respect to cross-validation strategy, the analysis was repeated using 30-fold cross-validation, with performance metrics summarized in Table 2 and corresponding confusion matrices and precision–recall curves shown in Figures 9 and 10.
In terms of the multi-channel data fusion classifier approaches, the intermediate fusion network outperformed the early and late fusion approaches based upon the classification performance metrics. As summarized in Table 1, the TPG-based OTCCP classifiers constructed from inter-organelle contact features achieved the highest overall classification performance, indicating that relationships between multiple organelles provide more discriminative information than single-organelle features alone. Within the deep learning frameworks, a similar trend was observed, as the intermediate fusion network demonstrated comparable performance and represented the strongest-performing end-to-end deep learning approach. The enhanced performance of the intermediate fusion network suggests that combining features at an intermediate stage enables the network to capture and leverage more complex relationships between organelles, leading to improved classification accuracy. Although the intermediate fusion network did not surpass the best handcrafted feature classifier, it achieved strong classification performance while offering the advantage of extracting features across all organelles directly from the image. In addition, the model demonstrated efficient runtime characteristics, with patch extraction and sparsity filtering preprocessing for a full microscopy image requiring approximately 4 seconds, and inference taking only 2 seconds per full image, supporting its suitability for high-throughput analysis.
A strength of the single-channel early fusion classifiers is their ability to reveal which organelle contributes most to model performance. When evaluated individually, these classifiers showed that mitochondria were the most predictive organelle for the end-to-end networks, achieving performance comparable to the intermediate fusion network. This indicates that mitochondrial structure and organization carry substantial discriminatory information across the six cell lines. Notably, however, this finding contrasts with results from the individual TPG-based classifiers, where early endosomal features were identified as the most indicative using handcrafted topological parameters. This discrepancy may reflect differences in feature representation and preprocessing, as the handcrafted TPG analysis relies on manual or semi-automated extraction, which may introduce variability and bias. To further examine these observations, Grad-CAM was applied to the trained intermediate fusion model on the 350350 patch receptive field dataset, both with and without overlap, resulting in Figure 6. The close similarity between the mitochondrial Grad-CAM heatmaps and those of the intermediate fusion classifier demonstrates that the model consistently prioritizes mitochondrial features for prediction. This further supports the conclusion that mitochondria are the most important organelle driving accurate classification in this dataset.

Dataset distribution and optimization
Following preprocessing, the total number of patches available for training, validation, and testing was determined jointly by the number of microscopy images and the volumetric field of view from which patches were extracted, where larger images produced more patches and smaller images yielded fewer. Furthermore, final dataset size is also shaped by the chosen patch receptive field and the sparsity filtering applied during patch preprocessing, both of which reduced the number of retained patches. To maintain organelle-resolved structure across depth, patches were extracted independently from each z-stack rather than from a full 3D volume. Patch extraction was performed in 2D, with the receptive field size in the XY plane treated as a tunable hyperparameter to accommodate differences in imaging conditions and dataset characteristics. This design, combined with sparsity filtering, directly influences both the number of usable patches and the attrition rate, which reflects underlying image sparsity across samples. To clearly show the impact of preprocessing, we quantified the number of extracted patches and the corresponding attrition rate after preprocessing for each patch size. Figure 7 summarizes these results for the dataset used in this study.

To determine the optimal patch receptive view size parameter for preprocessing, grid search was performed with 5-fold cross-validation with patch sizes ranging from 200x200 to 500x500 pixels in 50-pixel increments on the intermediate fusion classifier. Figure 8 illustrates the trend in varying patch sizes with respect to the average weighted classification accuracy from cross-validation. From the plot, the classification accuracy peaked at 350x350 pixels, corresponding to an area of approximately 12.37 by 12.37 µm with general decreases in accuracy at patch sizes larger and smaller than this optimal size. For patch sizes smaller than 350 by 350 pixels, the receptive field is limited to an area smaller than a cell, thus potentially leading to a lower total number of analyzed organelles within a cell.

Consequently, the analysis of features is more closely associated with nearby organelles rather than overall organelle topology in a cell. The decreasing accuracy trend for larger patch sizes can be attributed to the incorporation of image analysis of entire cell features while potentially losing informative organelle features due to downsampling. Notably, the accuracy also dips at 300x300 pixels, possibly due to specificities in the procedural patch preprocessing and dataset. To further investigate this trend, new augmented datasets were generated by rotating the entire images and subsequently evaluated using the same intermediate fusion classifier, thus assessing the effect of extracting patches from different locations. This process was repeated multiple times to thoroughly evaluate the performance; however, the lower accuracy and performance metrics remained consistent across all trials.

Discussion

Discussion
We have achieved high-accuracy cancer cell classification through the implementation of a marginal intermediate fusion network with patch extraction and sparsity filtering. In our previous handcrafted feature-based classifier, features resulting in the best classification accuracy originated from the separate analysis and classification of individual organelle types directly [14]. Although this method yielded high classification accuracy, it was limited by its reliance on manual implementation and external software to extract features relevant to organelle morphology and topology. Consequently, this method has limitations similar to other machine learning classification algorithms in terms of throughput, bias, and interpretable feature analysis, thus highlighting the need for an end-to-end method for cell classification directly from raw microscopy images. Compared to the previous Imaris-based feature extraction, our proposed deep-learning workflow offers several advantages.
First, our framework performs organelle-focused analysis with increase throughput and reduce susceptibility to bias through specialized preprocessing. Traditional approaches often involve manual or automated cell segmentation to determine regions of interest. In contrast, our method identifies informative regions by removing sparse patches and excluding the nuclei channel, ensuring that neither non-informative image regions nor nuclear morphology introduce bias into the analysis. Building on this organelle-focused design, we further demonstrate through patch-size grid search that discriminative organelle-level features can reside within subregions that are smaller than an entire cell. By implementing preprocessing in this manner, the workflow mitigates data scarcity and bias while maintaining organelle-level feature analysis more effectively than traditional preprocessing or manual segmentation methods, as indicated by the classification performance results.
Second, our methodology extracts organelle features separately before fusion thus adhering to the gold standard of individual organelle-specific analysis while also capturing inter-organelle relationships. This emphasis on inter-organelle interactions is first demonstrated in our prior OTCCP handcrafted feature analysis, where the best performing DNN classifier was constructed based on features describing the inter-organelle contacts, indicating that inter-organelle features can be more discriminative than single-organelle features alone. Building on this observation, we evaluated organelle-aware fusion strategies using deep learning. When benchmarked against the gold standard ResNet50 implementation, our intermediate fusion approach demonstrated superior classification performance, confirming the benefit of extracting features from each organelle channel separately prior to fusion. Furthermore, intermediate fusion outperformed both single-channel early fusion and multi-channel late fusion methods, providing additional evidence that complementary information across organelle channels contributes to improved classification performance. Altogether, these findings identify intermediate fusion as the most effective strategy, as it leverages the spatial distribution and organization of all organelles.
Third, our framework provides flexibility for analyzing multiplexed fluorescence images, as each organelle channel is processed independently before fusion. This design also enables interpretable analysis by allowing both organelle-specific single channel classifiers and post-hoc Grad-CAM fusion model analysis to elucidate the contribution of each organelle to the final prediction. Using these approaches, mitochondria emerged as the most informative organelle for distinguishing the six breast cancer cell lines within the end-to-end workflow. This finding is consistent with the established role of mitochondria in cancer-associated metabolic and bioenergetic processes. While this contrasts with our prior handcrafted OTCCP analysis in which endosomal features appeared most discriminative, the deep learning based results are more consistent with established biological understanding and the earlier handcrafted approach may have been more susceptible to biases introduced during manual preprocessing and 3D rendering. Together, these findings highlight both the flexibility and interpretability of the framework for assessing organelle importance within multiplexed imaging datasets.
Despite promising results however, limitations of the current work must be considered. Generalizability remains a key consideration, as the model has so far been validated only on a specific set of breast cancer cell lines imaged under controlled laboratory conditions. Translation to clinical settings, including tissue biopsies or heterogeneous patient samples, will require additional validation with larger, more diverse datasets. Additionally, the intermediate fusion network contained approximately 72 million parameters which posed no issue for training and inference on the high-memory NVIDIA RTX A6000 GPU utilized in this study; however, deployment on resource-constrained environments may require model distillation or compression techniques. Additionally, inference speed is rapid for high-throughput analysis but computational demands for training may be prohibitive without access to suitable hardware. Furthermore, like most deep learning architectures, the network remains a black box in terms of direct biological interpretation. While GradCAM provides coarse localization of discriminative regions, further work is required to connect these visual patterns explicitly to biologically meaningful topological features. Extending XAI analyses to include dimensionality reduction methods such as UMAP, or other saliency-based approaches beyond GradCAM, may provide deeper insight into the learned feature space and better align model predictions with biological function. In summary, we demonstrate a flexible and robust framework for multiplexed microscopy image analysis with improved interpretability through the integration of channel-wise analysis, single-organelle classifiers, and GradCAM, in contrast to traditional image-based deep learning models that lack organelle-resolved analysis and interpretability.

Conclusion

Conclusion
This work presents a deep learning-based workflow for breast cancer cell line classification that combines patch extraction, sparsity filtering, and intermediate fusion of organelle-specific features to achieve both high accuracy and interpretability. The framework automates organelle-level feature extraction without manual segmentation or user-dependent preprocessing, improving throughput and reducing variability. By employing separated feature extraction of fluorescence channels corresponding to specific organelles and intermediate channel fusion, the method maintains biological relevance and minimizes dependence on whole-cell or nuclear morphology. Compared to the gold standard early fusion ResNet50 architecture, our approach achieved superior classification accuracy by separately encoding each organelle channel before feature fusion, enabling the network to model capture organelle-specific information more effectively. Furthermore, this method eliminates the need for manual segmentation, reduces bias through nuclei channel exclusion, and enhances interpretability via organelle-level analysis supported by GradCAM visualizations. These characteristics collectively provide a more automated and biologically informative analysis in comparison to traditional pipelines.
The findings underscore the value of organelle-specific feature extraction for high-resolution microscopy image analysis and demonstrate the feasibility of automating such pipelines with minimal classification performance loss. Our experiments showed that relevant classification features can reside within regions smaller than entire cells, and optimal patch receptive field sizes can be identified through systematic grid search. The network achieved rapid inference speeds, thus highlighting its suitability for high-throughput workflows. While the specific features identified by the network may not always correspond directly to the topological features of interest, this work lays the foundation for refining deep learning models to focus on the exclusive extraction of biologically relevant topological characteristics.
Future work will expand validation to larger and more heterogeneous datasets, including additional cultured cell lines, patient-derived tissue sections, and publicly available microscopy datasets. Increasing dataset diversity will be essential for enhancing model generalizability and clinical applicability. We also plan to investigate combinatorial organelle analysis, in which direct interactions between different combinations of organelles of interest are modeled to improve classification performance and yield more biologically meaningful insights. Furthermore, we aim to extend explainability efforts beyond GradCAM by incorporating additional saliency-based techniques, as well as methods such as UMAP for feature space visualization, to further enhance model interpretability. Additionally, model distillation, compression, and alternative architectures will be explored to reduce computational requirements for deployment in resource-limited environments. These directions will ensure that the methodology is not only accurate but also interpretable, scalable, and ready for translation into clinical and research settings. Overall, these developments will advance organelle-aware deep learning approaches for biomedical imaging and support their integration into practical analytical workflows.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.