Neural Latent Filtering for Gene Discovery in Breast Cancer Subtypes.

Menegatti D; Fiscon G; Giuseppi A; Paci P; Pietrabissa A

doi:10.1016/j.btre.2025.e00942

← 뒤로

Neural Latent Filtering for Gene Discovery in Breast Cancer Subtypes.

1/5 보강

Biotechnology reports (Amsterdam, Netherlands) 2026 Vol.49() p. e00942

Menegatti D, Fiscon G, Giuseppi A, Paci P, Pietrabissa A

📖 무료 전문 🟢 PMC 전문 PMC12814072

PubMed ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

이 논문을 인용하기

↓ .bib ↓ .ris

APA Menegatti D, Fiscon G, et al. (2026). Neural Latent Filtering for Gene Discovery in Breast Cancer Subtypes.. Biotechnology reports (Amsterdam, Netherlands), 49, e00942. https://doi.org/10.1016/j.btre.2025.e00942

MLA Menegatti D, et al.. "Neural Latent Filtering for Gene Discovery in Breast Cancer Subtypes.." Biotechnology reports (Amsterdam, Netherlands), vol. 49, 2026, pp. e00942.

PMID 41560779 ↗

DOI 10.1016/j.btre.2025.e00942

Abstract

Gene selection from expression data represents a challenging task, primarily due to the high data dimensionality and the vast number of genes that would be identified, many of which may be unrelated to cancer-relevant biological processes. To tackle this issue, filtering methods constitute an effective solution to identify the most informative genes, which can serve as potential biomarkers to tailor cancer therapies. This work proposes a novel neural-based filtering approach which aims to identify genes by means of their latent representation extracted from RNA Sequencing expression data. This approach has been applied to study breast invasive carcinoma dataset, aiming to identify the most relevant genes of two breast cancer subtypes, Luminal-A and Basal-like, to better investigate their molecular landscape.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

📖 전문 본문 읽기 PMC JATS · ~57 KB · 영문

Introduction

1
Introduction
Cancer represents one of the leading causes of death worldwide, posing a significant challenge to global health. Being characterized by the uncontrolled proliferation and spread of abnormal cells, which can arise in virtually any tissue of the body and invade surrounding tissues and metastasize to distant organs [1]. Moreover, tumors such as breast carcinoma can include several distinct subtypes, each with different biological characteristics, including varying gene expression, mutation patterns, and clinical behavior [2]. The intratumoral and intertumoral diversity means that two tumors categorized under the same general category can respond very differently to the same treatment or progress at markedly different rates. Therefore, due to the heterogeneity observed even within the same tumor class, it is essential to have tools that can capture the subtypes’ molecular specificities. To this end, RNA Sequencing (RNASeq) expression analysis has emerged as a promising paradigm for tailoring therapies to individual patients, thereby integrating genomic technologies into clinical practice [3] .
RNASeq is a sequencing technique that aims to analyze the transcriptome of a cell or tissue, that is, to quantify the complete set of genes that are actively expressed at a given time, providing insights into biological processes in which they may be involved. Due to its advantages over traditional methods, it can identify a wide range of genes, including those that are not involved in cancer activity [4] . Given the large amount of data being produced within the corresponding gene expression matrix, the identification of relevant genes associated with cancer becomes a crucial task. While standard approaches aim to identify differences in expression between two or more experimental conditions leveraging statistical analysis, including fold change, -value (and the adjusted -value for multiple testing) obtained from statistical hypothesis testing, such as analysis of variance (ANOVA) and t-test [5], [6], feature selection techniques have the goal to pinpoint the most informative genes; the latter can be classified into filter and wrapper methods [7].
Filter methods aim to eliminate irrelevant data by estimating the importance of each gene, ranking them accordingly, and applying a thresholding scheme to retain only the most relevant ones [8]. The authors of [9] tackle the problem of detecting cancer from RNA Seq data via a combined approach which leverages on -Nearest Neighbour (kNN) to mitigate noise in the data, and ANOVA for genes’ selection, while a neural network-based gene selection procedure leveraging a Discriminative Index, derived from a combination of the output of the model is carried out in [10]. The use of an overlapping feature selection method is explored in [11] where five selection methods with ranking abilities are employed, i.e., Differential Gene Expression Analysis (DGE), Principal Component Analysis (PCA), Least Absolute Shrinkage and Selection Operator (Lasso), minimal-Redundancy-Maximal Relevance (mRMR), and Extreme Gradient boosting (XGBoost). DGE, exploited by method like DESEq2, attempts to detect changes in gene expression by means of a normalization method in combination with negative binomial distribution, but is subject to outliers and prone to false positives [12], PCA is a dimensionality reduction method which decomposes data into its orthogonal principal components but cannot be easy interpreted [13], Lasso is a linear regression method which suffers from false discoveries [14], mRMR maximizes relevance between features but may suffer from weakly correlated ones [15], while XGboost leverages an ensemble of decision trees and is prone to overfitting [16]. To this end, the underlying idea is to optimize the strengths and minimize the weaknesses of each method, thus producing reliable results.
Wrapper methods leverage classification algorithms to evaluate the importance of data features. The authors of [17] tackle the problem of classifying the type of cancer from RNASeq data by means of a combined approach, leveraging an autoencoder for feature extraction purposes and support vector machine (SVM) as classifier. The neural model is trained to reconstruct the gene expressions given as input, while gene selection is performed leveraging its architecture, backtracking the most important neural connections. A different approach is carried out in [18] where a decision tree is employed to classify cancer types from bulk RNASeq data.
The main innovations of the present work are:

•The introduction of a novel neural-based filtering approach for gene selection based on the integration of a U-Net [19] architecture with the Image Generator for Tabular Data (IGTD) algorithm to identify relevant genes from RNASeq data through their latent representations [20].

•The design of a custom pipeline, depicted in Fig. 1, to transform gene expression data into images where spatial proximity reflects biological similarity, allowing the analysis through convolutional neural networks.

•The automatic, data-driven, discovery of important genes with no a-priori statistical assumption on their relevance taken with standard metrics. Through an unsupervised learning strategy, where the neural network is trained to reconstruct an image-like representation of RNASeq data, the overall neural pipeline is forced to learn the most salient features of gene expression profiles in its latent space, surpassing the typical limitations of expert-driven and supervised solutions.

In the remainder of the paper, Section 2.1 describes the cancer dataset; Section 2.2 provides background information about the IGTD algorithm as well as its application to the image-based transformation of gene expression data; Section 2.3 details the proposed neural latent-based approach, while Section 2.4 depicts the gene identification procedure. Finally, Section 4 draws the conclusions and outlines future research directions.

Methodology

2
Methodology
2.1
Dataset
This work is based on the breast invasive carcinoma gene expression dataset (BRCA) obtained from The Cancer Genome Atlas (TCGA) repository. In particular, the BRCA dataset of RNASeq data (FPKM normalized) is composed of 20,531 genes from 1,182 samples divided into four main subtypes (Luminal A, Luminal B, HER2-enriched, and Basal-like) from clinical data on the basis of the presence or absence of hormone receptors for estrogen (ER) and progesterone (PR), the human epidermal growth factor receptor 2 (HER2), and the cellular proliferation index Ki67 [21].
Within the proposed analysis, two subtypes have been examined: Luminal A (229 samples), which is characterized by positivity for ER and PR, negativity for HER2, and Ki67 less than 20%, and Basal-like (98 samples), also known as Triple-Negative Breast Cancer (TNBC), defined by the lack of expression of ER, PR, and HER2 receptors. After loading the RNAseq gene expression matrix, a log2 transformation with a pseudocount of 1 is applied to mitigate the influence of zero values and enhance the interpretability of expression variability.

2.2
Image-based transformation of gene expression data
Prior to using gene expression data as input to a bespoke neural architecture, an additional processing step is performed leveraging the Image Generator to Tabular data (IGTD) algorithm [20]. The idea of this approach is to convert the tabular representation of gene expression into an image format, where spatial proximity among pixels reflect similarity in gene expression profiles; by mapping genes with similar expression levels close to each other, the resulting spatial structure can be effectively exploited by a convolutional neural network. The underlying assumption is that neighboring pixels share similar features, as they represent biologically related genes, thereby enhancing the network’s ability to perform meaningful feature extraction. Before transformation, Min Max Normalization is performed.
Let be the set of genes associated to gene matrix where is the number of samples and that of genes, and be its corresponding th row, with : each one of them is mapped to an image of dimensions , matching the constraint . The algorithm transforms tabular data into grayscale image representations by reorganizing features on the basis of distance metrics; in particular, gene to pixel mapping is carried out leveraging a feature distance matrix and a pixel distance one, which represent the foundation of an iterative swapping mechanism designed to optimize spatial similarity.
Given that genes are regarded as features, a quantification of their pairwise similarity is carried out via the Pearson correlation coefficient. The resulting distances are sorted in ascending order and each one of them is associated with a proportional value referred to as rank, which forms an element of a symmetric feature distance matrix where each diagonal element and the off-diagonal terms represents the distance between the th and th feature. In addition to , a second symmetric pixel distance matrix is introduced to capture spatial relationship between pixels in the corresponding image - also in this case diagonal terms are zero. While captures feature-wise relationships within each sample, matrix reflects the spatial arrangements of genes; in particular, their position is optimized to reflect their similarity by minimizing the difference between matrix and which is formulated as follows: where represents the mean absolute error of the two quantities. The algorithm performs an iterative procedure looking for the best swap partner among features, resulting in the greatest error reduction; if the latter exceeds a predefined threshold the exchange is executed, otherwise the configuration stays the same. If the improvement falls below a threshold for a given number of consecutive iterations, the algorithm stops; as a result, new matrices and are obtained. Fig. 2 shows the application of the algorithm to the Luminal A subtype. Finally, the resulting image is constructed on the basis of the feature distance matrix .
In the final image representation, each pixel is explicitly associated with a unique gene from the expression matrix. The mapping procedure is driven by pairwise similarity, hence genes exhibiting correlated expression profiles are placed in spatial proximity, so that neighboring pixels correspond to biologically related genes. This is achieved by iteratively minimizing the discrepancy between the feature distance matrix , derived from the Pearson correlation among genes, and the pixel distance matrix , which encodes spatial adjacency in the image. As a result, the spatial arrangement of pixels reflects the similarity structure of the gene set, ensuring that local neighborhoods in the image correspond to clusters of genes with related expression patterns. Once the optimization converges, the mapping becomes deterministic, meaning that the same gene always occupies the same pixel position across all samples, enabling consistent analysis and interpretation.

2.3
Latent neural representation
Following the transformation of samples into image representations, these are subsequently utilized to train a convolutional neural network with the aim of extracting latent features from input data [22], which serve as compact encodings of the original gene expression profiles and are later exploited to identify relevant genes via a neural-based procedure.
Developed using the PyTorch framework, the neural model adopts a U-Net architecture comprising of a contraction path followed by an expansive one [19]; the former, referred to as encoder, is responsible for extracting salient information from input data, while the latter, known as decoder, is designed to reconstruct input data by leveraging its latent representation given as output by the encoder.
The encoder is composed of 3 × 3 convolutional layers, each followed by ReLU activation functions and 2 × 2 max pooling operations [23]. This architecture increases the number of feature channels while reducing spatial resolution, thereby distilling the most salient aspects of the data. On the other hand, the decoder uses a combination of transposed convolutions and up-sampling processes to rebuild the segmented image, leveraging skip connections between encoder and decoder ensuring that fine-grained structural details are preserved, mitigating downsampling by reusing information from the corresponding encoder layers. Both the encoder and decoder follow a symmetric architecture comprising four convolutional blocks each, configured respectively with 64, 128, 256, and 512 filters. Interposed between them is a bottleneck section consisting of two convolutional blocks, each containing 1024 filters; its output represents the latent representation of input data.
In the context of gene identification, the network is trained to perform an image reconstruction task: given an input image derived from gene expression data, the model learns to reproduce it faithfully. Since each sample of the tumor subtype under consideration is associated with an image generated through the IGTD algorithm, forcing the neural network to accurately reconstruct these images ensures that the latent representation produced at the bottleneck layer is truly representative of the original sample. In other words, the latent vector encodes the essential information contained in the image, which directly corresponds to the genes mapped to its pixels. By successfully reconstructing the image, the network demonstrates that is has captured the most relevant features of the gene expression profile, thereby providing a biologically meaningful latent representation that can be exploited for downstream gene selection. With respect to the gene identification task, the proposed neural network aims to solve an image reconstruction task, generating a complete image starting from a given latent representation of data. Let denote the set of images associated to samples, each comprising of genes; it is employed to train the neural network in an image reconstruction task, where its output is expected to replicate its input. Once training is completed, the latent representation associated to each sample , , is evaluated; given that the neural network’s ability to accurately reconstruct its input data, it can be inferred that the model has has learned to identify and extract the most salient features from the input, to transform them into a compact representation within the latent space, and later to decode them to reproduce the original input structure.
To derive subtype-specific gene importance, the latent representations of all samples belonging to a given tumor subtype are aggregated. Since each latent representation can be interpreted as an encoding of the image associated with an individual samples - where pixels corresponds to genes - the aggregation step allows us to move from single-sample encoding to a consensus representation of the entire subtype. This is achieved by aggregating the latent vectors , into a single compact representation : where is the number of subtype samples, and is the average of all latent vectors. In this resulting representation, summarizes the common features shared within the group, thus capturing the subtype’s characteristic gene expression signature, ensuring that the subsequent decoding highlights genes that are consistently relevant across the population rather than those specific to individual samples.
Once a representative latent encoding of the tumor subtype has been obtained, the model’s inherent capability to decode data from the latent space is exploited to associate this encoding with an image in which each pixel corresponds to a specific gene. Since the IGTD algorithm was employed to generate images for all training samples - ensuring that the spatial arrangement of genes (pixels) is consistent across samples - the decoded image preserves the same pixel positions as those used during training. Therefore, the intensity of each pixel reflects the importance or expression level of the corresponding gene for the tumor subtype under consideration. Consequently, the decoded image derived from the aggregated latent representation provides a biologically meaningful visualization, highlighting the genes that are most relevant to the subtype as a whole rather than to individual samples.

2.4
Gene identification
Starting from the image obtained through the decoding of the latent representation associated with the tumor subtype, each pixel intensity is first normalized to range between 0 and 1. In this normalized representation, pixel values can be interpreted as indicators of gene relevance: values close to 1 corresponds to highly expressed genes, while values near 0 denote low expression ones. To further analyze the distribution of these intensities, a frequency histogram is constructed, where the -axis reports the normalized intensity values and the -axis indicates the number of times a gene with that intensity occurs in the image. Based on the resulting distribution, similar to that shown in Fig. 3, a threshold value of 0.45 is selected as cut-off. Genes with normalized intensity below this threshold are retained, while those above are excluded. Consequently, the filtering step reduces the gene expression matrix to only the columns corresponding to the selected genes, ensuring that subsequent analysis focus on the most representative subset for the tumor subtype under consideration.
The genes list obtained from the procedure were assessed by means of a functional enrichment analysis, conducted by querying enrichR tool [24]. The aim is to obtain statistical significant associations between selected genes and functional annotations such as KEGG pathways [25] and GO terms [26] as well as the disease-gene associations according to DisGeNET knowledge platform [27]. In particular for a better visualization, we exploited the extension provided by enrichr-KG tool [28], showing queried genes and functional categories as a bipartite graph (as shown in Fig. 6), where a link occurs between a gene and a functional category whether the gene shows a statistical significant association with that category. A threshold for the adjusted -value (FDR-based procedure) equal to 0.05 was set to identify the functional annotations and disease-gene associations in which selected genes appeared to be significantly enriched, as shown in Fig. 4.

Results

3
Results
The workflow has been applied to RNASeq data of tissues extracted from patients affected by breast invasive carcinoma retrieved from The Cancer Genome Atlas (TCGA). In this case study, we focused on two breast cancer subtypes, stratified according to the well-consolidated PAM50 classification, which are particularly distinct for their molecular profile and clinical prognosis [29]: Luminal A subtype (229 samples) and Basal-like subtype (98 samples). Luminal A is the subtype with the most favorable prognosis, characterized by slower tumor growth and a strong dependence on hormones, which allows effective treatment with endocrine therapies, reducing the need for chemotherapy [30]. Basal-like, on the other hand, is the most aggressive subtype, with a high rate of cell proliferation and a greater probability of early recurrence. The absence of hormone receptors and HER2 makes it less treatable with standard therapies, which is why research is focused on new therapeutic approaches, including PARP inhibitors, immunotherapy, and targeted therapies [30].
To identify common genes relevant to both subtypes, we apply the proposed methodology separately to samples from each subtype. The same neural network architecture is employed across both cases, with each model trained independently on its respective subtype to ensure subtype-specific specialization. Following training, the mean latent representation for each subtype is considered, and , and it is transformed into image representations, from which gene filtering is performed. This process leads to a filtered gene expression matrix for each subtype, enabling comparative analysis of shared genetic features (see Fig. 5).
The U-Net architecture is trained for 10 epochs using the Mean Absolute Error (MAE) as the loss function. Optimization is performed with Adam [31], employing a batch size of 32 and a learning rate of 0.001. The network follows a symmetric encoder–decoder architecture: the encoder consists of four blocks, each composed of two convolutional layers (kernel size 3x3 ReLU activation, “same” padding) followed by 2x2 max pooling operation. The number of filters increases progressively from 64 to 512, while the spatial resolution is reduced at each stage. At the bottleneck, the latent representation has dimension 23x23x1024, capturing compact yet informative latent encoding of the input. The decoder mirrors the encoder structure, with four blocks that perform upsampling via transposed convolutions, concatenation with the corresponding encoder features, and two convolutional layers with decreasing number of filters, from 512 to 64. The final output layer is a 1x1 convolution with sigmoid activation, reconstructing the normalized input image. Separate models are trained for each tumor subtype following the same training protocol. The dataset corresponding to each subtype is partitioned into training and testing set using an 80/20 split. The image reconstruction error on the test set achieves a magnitude of the order of 10−2, indicating satisfactory model performance. The IGTD algorithm is configured to execute up to 30,000 iterations to generate an image of dimension 46 × 46, with convergence assessed over consecutive steps. Convergence is reached when the reduction error stays below the threshold .
Studying the genes associated with these subtypes allows us to better understand the biological differences underlying the progression of breast cancer and to identify potential key biomarkers for personalized therapies.
Functional enrichment analyses conducted via the enrichR-KG tool [28] in DisGeNET [32], KEGG pathways [33], and Gene Ontology Biological Process [34] highlighted that numerous genes identified by the analysis play crucial roles in the progression of breast carcinoma. We can observe that the list of selected genes was found to overlap with high statistical significance with mammary neoplasms and breast cancer disease and in particular related to the subtypes under study, as shown in Fig. 4. In particular, the analysis of the most relevant pathological categories, including Triple Negative Breast Cancer Neoplasms (TNBC), Familial Breast Cancer, Aneuploidy, and Carcinogenesis, enabled the identification of key genes with potential diagnostic and therapeutic implications, as shown in Fig. 4, Fig. 6.
The genes here discussed were among those significantly enriched in pathways/disease directly related to the pathology under investigation, thereby confirming their relevance within the context of our study.
Among these genes, XBP1 stands out for its involvement in regulating endoplasmic reticulum stress and the Unfolded Protein Response. Its splicing mediated by the enzyme IRE1 generates the active form XBP1s, promoting cell survival under stress conditions and contributing to tumor progression. Recent studies have shown that XBP1 is highly expressed in luminal tumors, where its activation is correlated with the expression of the estrogen receptor (ESR1), which promotes tumor growth through direct regulation of estrogen signaling and favors resistance to hormone therapy. In triple-negative tumors, XBP1s interacts with the transcription factor HIF1, modulating the hypoxic response and promoting angiogenesis and neoplastic proliferation. XBP1 inhibition has been shown to reduce tumor growth in preclinical TNBC models, increasing chemotherapy sensitivity, suggesting its potential use as a prognostic biomarker and therapeutic target [35].
Another gene related to cell survival is BIRC5, known for its role in apoptosis inhibition and cell cycle regulation. Its overexpression has been associated with treatment resistance, particularly in triple-negative tumors and those characterized by high proliferation. Bioinformatic analyses using public databases such as TCGA and Oncomine revealed significantly higher levels of BIRC5 in tumor tissues compared to normal ones, suggesting its possible use as a prognostic biomarker. Additionally, Kaplan–Meier survival curve analysis indicated that high BIRC5 levels are associated with reduced recurrence-free survival (RFS), overall survival (OS), and distant metastasis-free survival (DMFS). For these reasons, therapeutic strategies targeting BIRC5 inhibition are under development, aiming to increase tumor cell sensitivity to chemotherapy and radiotherapy [36].
In the context of luminal breast carcinoma, GATA3, a transcription factor involved in gene expression regulation and cell differentiation, plays a key role. Its loss has been associated with poorly differentiated tumors, increased metastatic potential, and a worse prognosis. In BRCA1-mutated tumors, GATA3 is transcriptionally repressed via promoter methylation, suggesting that its restoration could represent a therapeutic strategy to limit tumor progression and improve response to hormonal treatments [37].
Alongside genes involved in cell survival and treatment resistance, a central role is played by those implicated in cell cycle regulation and mitotic division. CHEK1, for example, is a regulator of cell cycle checkpoints, crucial for the DNA damage response. High CHEK1 levels have been observed in basal-like tumors, suggesting its involvement in the genomic instability mechanisms typical of these tumor subtypes. Immunohistochemistry studies confirmed significantly higher CHEK1 expression in neoplastic tissues compared to normal ones, making it a potential target for new therapeutic strategies [38].
Looking at KEGG cell cycle pathway illustrated in Fig. 7, it emerges that while CHEK1 is involved in DNA damage checkpoint regulation, the E2F transcription factor family (E2F1, E2F2, E2F3) regulates cell cycle progression and the transition from the G1 to the S phase. Dysregulation of the E2F transcription factors has been widely described in breast tumorigenesis: these genes regulate cell cycle progression and promote uncontrolled proliferation and genomic instability. In particular, E2F3 has been associated with the epithelial-mesenchymal transition (EMT), a key process in metastatic progression, and its overexpression is correlated with increased invasive capacity of tumor cells in TNBC. This suggests that E2F3 modulation could represent a therapeutic strategy to limit breast carcinoma progression [39].
Another key group of genes involved in tumor progression includes the mitotic kinases AURKA and AURKB, whose overexpression has been correlated with poor prognosis in highly proliferative tumors such as basal-like ones. These kinases regulate mitosis and chromosome segregation, and their inhibition is currently under study to counter tumor growth. AURKA and AURKB have been shown to regulate oncogenic pathways such as ErbB signaling and platinum drug resistance, making them promising therapeutic targets to overcome chemoresistance in breast tumors. High levels of AURKA and AURKB have been associated with increased metastatic risk and reduced recurrence-free survival, supporting the hypothesis that inhibition of AURKA and AURKB could be an effective therapeutic strategy to counteract uncontrolled growth and improve the response to treatment in breast carcinoma [40].
Similarly, CDC20 emerges as an essential regulator of mitotic progression, whose overexpression has been associated with greater tumor aggressiveness and poor prognosis in triple-negative and basal-like tumors. Recent studies have highlighted the link between CDC20 and TPX2, a protein involved in spindle assembly, suggesting that its inhibition could counteract uncontrolled proliferation and chromosomal instability [41].
The kinase MELK is highly expressed in triple-negative tumors, where it has been associated with a higher risk of metastasis and worse prognosis. Recent studies have shown that MELK plays a role in maintaining cancer stem cells (CSCs) and promoting the epithelial-mesenchymal transition (EMT), two key processes in TNBC progression. Pharmacological inhibition of MELK with specific molecules has been shown to reduce cell proliferation, invasive capacity, and lung metastasis formation in murine models, suggesting that it could represent a promising therapeutic target, especially for patients with metastatic disease [42].
Among the genes involved in mitotic division, another relevant gene is RAD51, a key component in DNA repair through homologous recombination. Specifically, RAD51 plays a crucial role in replication fork protection, preventing DNA degradation in response to replicative stress. In BRCA-mutated tumors, the inhibition of RAD51 has been proposed to increase sensitivity to DNA repair inhibitors, such as PARP inhibitors, suggesting new therapeutic perspectives for breast carcinoma [43].

Conclusions

4
Conclusions
In the context of gene selection, this study introduces a novel neural-based filtering approach that leverages latent representations extracted from gene expression data. The proposed methodology is built upon a custom pipeline comprising a tabular-to-image conversion algorithm and a bespoke U-Net architecture, which generates an image in which each pixel corresponds to a gene with its intensity reflecting the gene’s relevance to a specific tumor subtype. The process revolves around an encoded representation of samples associated with the target tumor subtype. A key advantage of this approach is its data-driven nature; it relies only on features learned by the neural network, eliminating the need to select features on the basis of prior assumptions. The application of this analysis to the breast carcinoma dataset highlights several genes worthy of further investigation, which play crucial roles in the progression of the disease and specifically concerning the understudied subtypes, revealing it as a promising method to be exploited in other cancer-based datasets as well.

CRediT authorship contribution statement

CRediT authorship contribution statement
Danilo Menegatti: Writing – original draft, Methodology, Conceptualization. Giulia Fiscon: Writing – original draft, Methodology, Data curation. Alessandro Giuseppi: Writing – review & editing. Paola Paci: Writing – review & editing, Supervision, Project administration. Antonio Pietrabissa: Writing – review & editing, Supervision, Project administration.

Declaration of competing interest

Declaration of competing interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Given their role as Guest Editors for this special issue, PP and GF will not take part in the editorial peer-review process for this article and will not have access to any information related to its evaluation. Full editorial responsibility for this manuscript has been assigned to an independent editor. All other authors declare that they have no known competing financial interests or personal relationships that could have influenced the work reported in this paper.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

Early local immune activation following intra-operative radiotherapy in human breast tissue.
Oncoimmunology 2026 Tiefenthaller A 외 📖 OA
Overall survival and prognostic factors in young women with breast cancer: a retrospective cohort study from Southern Thailand.
World journal of surgical oncology 2026 Khongthong P 외 📖 OA
Age at First Pregnancy, Adult Weight Gain and Postmenopausal Breast Cancer Risk: The PROCAS Study (United Kingdom).
International journal of cancer 2026 Malcomson L 외 📖 OA
Advances in Targeted Therapy for Human Epidermal Growth Factor Receptor 2-Low Tumors: From Trastuzumab to Antibody-Drug Conjugates.
World journal of oncology 2026 Zheng ZN 외 📖 OA
Structural determinants of glycosaminoglycan oligosaccharides as LL-37 inhibitors in breast cancer.
Glycobiology 2026 Le Fournis C 외 📖 OA
Nanotechnology-Assisted Molecular Profiling: Emerging Advances in Circulating Tumor DNA Detection.
International journal of nanomedicine 2026 Kang J 외 📖 OA