본문으로 건너뛰기
← 뒤로

Utilization of machine learning algorithms for the identification of the RLN associated prognostic model and feature biomarkers of RLN-related subtypes in breast cancer.

1/5 보강
Translational oncology 📖 저널 OA 100% 2023: 3/3 OA 2024: 13/13 OA 2025: 72/72 OA 2026: 103/103 OA 2023~2026 2026 Vol.65() p. 102684
Retraction 확인
출처

Du Y, Yuan Q, Yu H, Ye R, Lin H, Yu G

📝 환자 설명용 한 줄

[BACKGROUND] Breast cancer (BC) is the most common malignancy afflicting women worldwide, yet the role of relaxin-related genes (RLN) in BC progression remains unclear.

이 논문을 인용하기

↓ .bib ↓ .ris
APA Du Y, Yuan Q, et al. (2026). Utilization of machine learning algorithms for the identification of the RLN associated prognostic model and feature biomarkers of RLN-related subtypes in breast cancer.. Translational oncology, 65, 102684. https://doi.org/10.1016/j.tranon.2026.102684
MLA Du Y, et al.. "Utilization of machine learning algorithms for the identification of the RLN associated prognostic model and feature biomarkers of RLN-related subtypes in breast cancer.." Translational oncology, vol. 65, 2026, pp. 102684.
PMID 41581316 ↗

Abstract

[BACKGROUND] Breast cancer (BC) is the most common malignancy afflicting women worldwide, yet the role of relaxin-related genes (RLN) in BC progression remains unclear. This study aims to elucidate the relationship between RLN and BC outcomes through immune microenvironment and metabolic pathway analysis.

[METHODS] Gene expression and clinical data were collected from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO). Relaxin-related genes were identified using KEGG and Genecard databases. A prognostic model, the RLN Associated Prognostic Model (TRAPM), was established using 101 combinations of 10 machine learning algorithms and validated at the single-cell level. Multi-omics analysis, including the IMvigor210 cohort, was performed to assess TRAPM's applicability in immunotherapy and drug selection.

[RESULTS] TRAPM, comprising nine prognostic genes (MMP1, RXFP1, PRKCZ, JUN, NFKBIA, GNAI2, NOS2, MMP9, and MMP13), showed significant associations with immune and metabolic profiles. Using TRAPM, a novel BC subtype RC3 and its key marker genes (MTHFD1L, CAVIN4, MMP1, ADGRG6, B3GNT5, SMYD2, and TFRC) were identified. Experimental validation through RT-qPCR and Western Blot confirmed the role of these markers in six BRCA cell lines.

[CONCLUSIONS] The identification of TRAPM and the RC3 subtype enhances our understanding of BC heterogeneity and highlights potential therapeutic targets. This study provides a foundation for personalized treatment strategies by clarifying the biological significance and clinical relevance of the RC3 subtype.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

📖 전문 본문 읽기 PMC JATS · ~61 KB · 영문

Background

Background
BC is identified as the primary malignancy and leading cause of cancer-related mortality among women globally [1]. Despite advances in understanding risk factors, early diagnosis, and BC treatment, including chemotherapy, targeted therapy, and immunotherapy, the prognosis is frequently poor [2]. The prognosis of BC varies significantly across its pathological subtypes, typically categorized as HER2-positive, hormone receptor-positive, and triple-negative breast cancer (TNBC). Compared to other subtypes, TNBC is characterized by poor differentiation, high invasiveness, and a greater likelihood of early recurrence and metastasis. Over one-third of TNBC patients experience recurrence or distant metastasis, and this subgroup generally has a worse prognosis, with a 5-year survival rate of <15 %, significantly lower than the overall 5-year survival rate for BC patients (31 %) [3,4]. Thus, the detection of new BC subtypes could assist in acknowledging the heterogeneity among patients [5,6], thereby supporting the identification of novel biomarkers [7] and enhancing decision-making in precision medicine.
Relaxin, a two-peptide hormone with a molecular weight of 6000 Da, affects reproductive and non-reproductive processes through receptor contact. Relaxin, a hormone with complex regulatory roles [8,9], is increasingly implicated in BC etiology and development. Clinicopathological characteristics and prognostic outcomes of BC are associated with elevated relaxin levels in tissues and serum compared to normal individuals [10,11]. Relaxin activates receptors and signaling pathways to enhance BRCA cell proliferation, migration, invasion, angiogenesis, and inflammatory responses [12].
The human RLN has four homologs: RLN1, RLN2, RLN3, and RLN4. RLN1 and RLN2 are the main relaxin sources, whereas RLN3 and RLN4 are unclear. RLN1 and RLN2 expression is affected by estrogen, progesterone, growth hormones, and cytokines. Mutations or aberrant methylation in RLN1 and RLN2 can change relaxin levels, affecting BC development and progression [13]. RPG genes include receptors and downstream effectors of the relaxin signaling pathway. Human relaxin receptor genes (RXFP1, RXFP2, RXFP3, and RXFP4) are G protein-coupled receptor superfamily members [14]. Relaxin signaling pathway effector genes include nitric oxide synthase, phosphatidylinositol 3-kinase, mitogen-activated protein kinase, NF-κB, VEGF, and matrix metalloproteinases. Relaxin genes synthesize relaxin, but relaxin signaling pathway genes transmit signals and regulate physiological processes [15]. This study seeks to understand the roles of RLN in BC.
This study constructed the predictive model TRAPM using RLN and the BRCA transcriptome dataset. The model demonstrated high prognostic utility and accuracy in both training and validation cohorts, categorizing BC into three subgroups: RC1, RC2, and RC3. The marker genes associated with the RC3 subtype may serve as new targets for detection and offer novel directions for BC immunotherapy research.

Materials and methods

Materials and methods

Data collection and processing
BRCA transcriptome data (TCGA-BRCA), the copy number variation (CNV) data, and SNP data were downloaded from TCGA (https://portal.gdc.cancer.gov), along with the BC bulk RNA transcriptome dataset from GEO (http://www.ncbi.nlm.nih.gov/geo) GSE20685 and single-cell transcriptome dataset GSE176078. Specifically, TCGA-BRCA provides clinical and molecular data for 1105 BC patients from TCGA. CNV data, derived from the Affymetrix Genome-Wide Human SNP Array 6.0 platform, includes 20,531 genes and 1212 samples. SNP data, determined using the Mutect2 algorithm, features 20,601 mutation sites across 1098 samples. The GSE20685 dataset contains data for 1098 samples, with gene expression levels for 327 BC patients measured on an Affymetrix Human Genome U133 Plus 2.0 Array microarray platform. For GSE176078, single-cell RNA-seq analysis was conducted on five HER2+ patients using the 10x Genomics Chromium Single Cell 3′ v3 platform. Prior to single-cell RNA-seq data analysis, quality control was performed to retain cells meeting specific gene count and mitochondrial UMI percentage criteria, and batch effects were mitigated using the Harmony package [16]. Subsequently, PCA was utilized for cell downsizing, and umap [17] displayed the cell distribution post-downsizing, with singleR [18] facilitating cell annotation through common marker genes.

Acquisition of RLN
Relaxin-related genes were sourced from the KEGG pathways REACTOME RELAXIN RECEPTORS and Genecard. The REACTOME RELAXIN RECEPTORS pathway, part of the KEGG Pathways database, comprises downstream signaling pathways activated by relaxin upon binding to its receptor, including cAMP/PKA, NO/cGMP, PI3K/AKT, MAPK, among others. The list of genes in the REACTOME_RELAXIN_RECEPTORS pathway was downloaded from KEGG's official website. Genecard, a comprehensive gene information database, was utilized to search for relaxin-related genes using "Relaxin" as the keyword.

Consensus cluster analysis and principal component analysis (PCA)
PCA is a dimensionality reduction technique used to transform high-dimensional data into a lower-dimensional space while retaining as much important information as possible. The "ConsensusClusterPlus" software package (version 1.60.0) applied the model to cohesive "mean" clustering [19], using the Partioning Around Medoids (PAM) algorithm [20] with 500 replications for 80 % of the samples.

Construction and validation of TRAPM based on machine learning
TRAPM is a machine learning-based prognostic model which integrates multi-omics big data and advanced computational algorithms [21]. Initially, one-way Cox regression analysis was employed to identify prognosis-associated genes (selected at p < 0.05). These genes were further refined and modeled using 10 machine learning algorithms to determine the optimal model [22,23]. TCGA-BRCA served as the training set, with GSE20685 as the validation set. The model's predictive accuracy was assessed by the ROC curve. Additionally, the copy number variation of genes within the model was examined, tumor heterogeneity was evaluated using the mutant-allele tumor heterogeneity (MATH) algorithm [24], mutations in the high- and low-risk groups were analyzed using the maftools R package [25], and the interactions and mutual exclusivity among the top 20 mutated genes were investigated.

Correlation of TRAPM with immune profiles
Differences between risk models and immune cells, immunostimulatory profiles, immunosuppressive profiles, and immune-related markers were evaluated [26]. To assess tumor microenvironment variations between high- and low-risk subgroups, the ESTIMATEScore [27], StromalScore [28], and ImmuneScore [29] were calculated using the Estimate R package. Furthermore, the ssGSEA algorithm provided an immune pathway score for BRCA (pathway from KEGG), facilitating comparison between the high- and low-risk groups. The immune response profile of the model was further explored by referencing the IMvigor210 dataset [30] to examine the differences in distribution of complete remission (CR), partial remission (PR), stable disease (SD), and disease progression (PD) across risk groups. Data on BC immune responses were acquired from TCIA (https://www.tcia.at/patients) to analyze the response to immunosuppression among these groups.

Expression of TRAPM at the single-cell level
Single cells were classified into eight types: T cells, iCAFs, macrophages, endothelial cells, myCAFs, epithelial cells, myCAF B cells, and plasma cells. The expression of nine prognostic genes within the model was illustrated in each cell type. The efficacy of the prognostic model at the single-cell level was assessed using ssGSEA [31].

TRAPM predictive sensitive drugs
The Cancer Therapeutics Response Portal (CTRP) and PRISM-related drug datasets were downloaded [32]. CTRP associates cancer-derived models, profiling, and cellular characteristics with small-molecule sensitivity, encompassing survival data from 970 cancer cell lines against 403 compounds, along with genomic, transcriptomic, and proteomic data. PRISM employs a molecular barcoding and multiplexed screening methodology, testing 578 tumor-derived cultures against 4518 drugs to evaluate their anticancer activity. Sensitive drugs were selected based on a correlation of <0 and a p-value of <0.05. Concurrently, the pRRophetic package [33] was utilized to predict sensitive chemotherapeutic drugs sensitive to the model.

TRAPM filtering of BC subtypes with biomarkers therein
TCGA-BRCA data were categorized using genes from the model via the ConsensusClusterPlus R package, and these subtypes were correlated with the immune microenvironment and metabolic pathways (pathways sourced from PMC10504766 [34]). Subtypes of interest were identified, modules with strong correlations to these subtypes were derived using weighted gene co-expression network analysis (WGCNA) [35], and marker genes corresponding to the modules were identified through multiple machine learning approaches [36]. Markers genes were determined at the immune cell level, and metabolic pathways related to these marker genes were explored.

Protein extraction and western blot
The cells or tissues were lysed using RIPA lysis buffer (P0013B, Beyotime Biotechnology, Shanghai, China) containing PMSF (100 mM, Beyotime Biotechnology). The lysates were incubated on ice for 30 min and centrifuged at 12,000 rpm for 10 min at 4 °C using a FRESCO 17 tabletop refrigerated centrifuge (Thermo Fisher Scientific). The supernatant was collected for protein quantification and further analysis. Total protein content was determined using the bicinchoninic acid (BCA) assay (P0012, Beyotime Biotechnology) following the manufacturer’s instructions. Protein samples were denatured by mixing with 5 × loading buffer at a ratio of 4:1 and heating at 100 °C for 10 min.
The denatured proteins, along with Spectra™ Multicolor Broad Range Protein Ladder (BP107 and BP117, Bio-Platform), were separated by SDS-PAGE using a DYCZ-24DN vertical electrophoresis system (Beijing Liuyi). The separation was conducted with an initial voltage of 80 V for 40 min in the stacking gel, followed by 120 V for 30–50 min in the resolving gel. Proteins were transferred onto PVDF membranes (IPVH00010, Merck Millipore) using an eBlot™ L1 fast transfer system (GenScript Biotech) at 100 V constant voltage.
Membranes were blocked in 5 % bovine serum albumin (BSA)-PBST solution for 1 h at room temperature. Primary antibodies diluted in 5 % BSA-PBST were incubated with the membranes overnight at 4 °C. After washing the membranes five times with PBST (6 min each), secondary antibodies diluted in PBST were applied, and the membranes were incubated for 1 h at room temperature. Following another series of five washes, protein bands were visualized using BeyoECL Plus chemiluminescence detection reagent (P0018S, Beyotime Biotechnology) and captured using the Clinx ChemiScope 6000 imaging system (Shanghai Qinxiang).
The intensity of each band was analyzed using Image J software.

RNA extraction and RT-qPCR
RNA was extracted using Trizol reagent (TianGen) according to the manufacturer’s protocol. Briefly, 500 μL Trizol was added to the cell or tissue samples in 1.5 mL EP tubes (BS2001015, Baisebio), followed by the addition of 100 μL chloroform (Fuzhou Guangbo). After mixing and centrifugation at 12,000 rpm for 10 min at 4 °C (H2100R refrigerated centrifuge, Hunan Xiangyi Laboratory Instrument Development Co., Ltd.), the aqueous phase was transferred to a new 1.5 mL EP tube. RNA was precipitated with an equal volume of isopropanol (Sinopharm Chemical Reagent Co., Ltd.), washed with 75 % ethanol prepared with DEPC-treated water (Sigma, Cat. No. 40718), and resuspended in 25 μL DEPC water. RNA concentration and purity were measured using a Nano-200 micro-nucleic acid analyzer (Hangzhou Allsheng Instruments Co., Ltd.).
Reverse transcription was performed using RevertAid Reverse Transcriptase (Thermo, Cat. No. EP0441) in a 20 μL reaction system according to the manufacturer’s instructions. The obtained cDNA was either used immediately for qPCR or stored at 4 °C for later use.
RT-qPCR was conducted on a Real-Time PCR System (ABI, Model Q1) using PerfectStart Green qPCR SuperMix (TransGen Biotech, Cat. No. AQ601-04). Reactions were prepared in 0.2 mL PCR eight-tube strips with caps (Saipu, Cat. No. 802022-18). Each reaction was performed in triplicate. Primer sequences were synthesized by Beijing Tsingke Biotech Co., Ltd. (Beijing, China).

Analytical statistics
R was used to analyze the data and create the graphical displays (version 4.1.1). To evaluate the correlation between two continuous variables, Pearson's test was applied. To compare continuous variables, t-tests or Wilcoxon rank-sum tests were utilized, whereas chi-square tests were used to analyze categorical variables. Statistical significance was defined as P-values <0.05 (*P < 0.05, **P < 0.01, ***P < 0.001, ns: not significant).

Result

Result
The study's image summary is presented in Fig. 1.

Identification of DEGs in RLN
From the KEGG and GeneCard, 80 RLN were identified (Fig. 2A). One-way Cox regression analysis subsequently revealed nine genes, MMP1, RXFP1, PRKCZ, JUN, NFKBIA, GNAI2, NOS2, MMP9, and MMP13, as significant prognostic factors for BC (Fig. 2B). Among these, MMP1, RXFP1, PRKCZ, JUN, NFKBIA, and GNAI2 showed differential expression in BC cells (p < 0.05)(Fig. 2C). Notably, NOS2, MMP1, and NFKBIA were associated with survival differences across expression levels, correlating with disease-specific survival (DSS), overall survival (OS), and progression-free survival (PFS)(Fig. 2D). These results highlight their importance as key regulators of relaxin in BC and as potential targets or biomarkers for therapeutic intervention.

Application of machine learning in constructing and validating the prognostic model
The choice of the ten machine learning algorithms—CoxBoost, Enet, GBM, Lasso, plsRcox, Ridge, RSF, stepwise Cox regression, SuperPC, and survival-SVM—was influenced by their individual strengths in managing high-dimensional genomic data and predicting survival outcomes [23]. These algorithms can be classified into several families of methodology: (1) penalized regression models (Enet, Lasso, Ridge) which facilitate feature selection and help mitigate overfitting under high-dimensional conditions; (2) ensemble and tree-based techniques (GBM, RSF) that effectively capture intricate non-linear interactions and maintain robustness against noise; (3) dimension reduction strategies (plsRcox, SuperPC) optimal for scenarios where predictors exhibit high correlation; and (4) machine learning and support vector approaches (survival-SVM, CoxBoost) tailored for data with censored survival points. Our objective was to include a varied array of modeling philosophies to ensure that the resulting prognostic signature was not skewed by the assumptions or constraints of any single algorithm. Each algorithm was executed using default or widely accepted hyperparameters as indicated in their corresponding R packages, with a grid search conducted within a predetermined parameter range for key algorithms (e.g., α in Enet, tree count in RSF) utilizing 5-fold cross-validation on the training dataset. The efficacy of the 101 algorithm combinations was rigorously assessed through the C-index, and the final model was chosen based on attaining the maximum average C-index across both the training and validation groups. A comprehensive comparative overview of all algorithm combinations is included in Supplementary File 2. We utilized TCGA dataset as the training set and GSE20685 as the validation set. Among the 101 combinations, the RSF and Enet[alpha = 0.5] demonstrated the highest predictive performance and was selected as the optimal model, achieving an average C-index of 0.59 (Fig. 3A). Subsequently, the regression coefficients (coef) were calculated using Cox regression analysis, with coef values >0 indicating an association with poor prognosis in BC, and coef values <0 suggesting the opposite. The coef values of RXFP1, NOS2, NOS1, and MMP1 were greater than 0; the coef values of JUN, GNAI2, GNG8, PRKCZ, and NFKBIA were <0 (Fig. 3B). Patients were classified into high-risk and low-risk groups based on the selected model combination. Kaplan-Meier survival curve analysis demonstrated that the high-risk group exhibited poorer survival outcomes. Furthermore, this disparity worsened progressively over time (p < 0.05) (Fig. 3C, D). The accuracy of the model in predicting survival at 1, 3, 5, 7, and 9 years in BC was validated by the receiver operating characteristic curve (ROC curve), with the validation cohort's OS at these time points having an area under the ROC curve (AUC) of 0.59, 0.64, 0.67, 0.68, and 0.69, respectively (Fig. 3E).
Subsequently, CNV analysis was performed on the relevant genes. NOS2 and NFKBIA exhibited high deletion frequencies, while PRKCZ, GNAI2, and MMP1 showed high amplification frequencies (Fig. S1A). In the assessment of tumor heterogeneity, the high-risk group had higher MATH scores, whereas the low-risk group displayed lower MATH scores (Fig. S1B). Gene mutations in the high- and low-risk groups were analyzed using the maftools R package. In the low-risk group, 475 out of 538 samples showed mutations, corresponding to a mutation rate of 88.29 % (Fig. S1C), while in the high-risk group, 361 out of 407 samples were mutated, with a mutation rate of 88.7 % (Fig. S1D). The predominant mutation type was missense. This comparison revealed that a higher genomic mutation rate was associated with elevated MATH scores in high-risk individuals, and the proportion of genomic mutations increased with rising MATH scores. This finding supports the notion that both CNV and tumor heterogeneity reflect genomic instability. Analysis of co-occurrence and mutual exclusivity among the top 20 mutated genes revealed that TP53 exhibited the most significant mutual exclusivity with other genes, whereas HMCN1 showed strong correlations with co-occurring gene sequences. High-risk individuals tended to display mutually exclusive gene interactions, while low-risk individuals exhibited more frequent co-occurring gene interactions (Fig. S1E, F). These findings may provide valuable insights for developing targeted therapies or immunotherapies for breast cancer based on gene composition and mutation profiles.

Validation of prognostic models for correlation with immune profiles
Using the Immuno-Oncology Biology Research (IOBR) R package [37], the BC tumor microenvironment (TME) was systematically analyzed, focusing on immune-related features [38]. The extent of immune cell infiltration within the TME varied between low- and high-risk groups. The low-risk group exhibited higher infiltration levels of immune cells, including T cells, B cells, and NK cells, indicating increased immunological activity and the presence of "hot tumors" (Fig. 4A) [39]. Immune suppression analysis demonstrated that, compared to the low-risk group, the low-risk group exhibited higher levels of immune checkpoint molecules, myeloid-derived suppressor cells (MDSCs), and regulatory T cells (Tregs). This observation suggests that the high-risk group possesses a more pronounced immunosuppressive profile (Fig. 4B). Furthermore, immune exclusion analysis identified significant differences in M2 macrophages and tumor-associated macrophages (TAMs) between the high- and low-risk groups, indicating that these cells may serve as key contributors to immune suppression (Fig. 4C). A subsequent examination of immune markers revealed that mismatch repair-associated immune markers were enriched in the high-risk group (Fig. 4D), suggesting that research on mismatch repair mechanisms may aid in the diagnosis and treatment of BC within this subgroup.
ESTIMATEScore, StromalScore, and ImmuneScore were analyzed using Estimate R to compare high- and low-risk groups' TME (Fig. S2A–C). As expected, the low-risk group had better immunological and TME scores. The ssGSEA algorithm was then used to generate BC's immune pathway score (taken from KEGG) and compare high- and low-risk groups' pathways (Fig. S2D). The low-risk group had more immunological pathways, including complement and coagulation cascades. Hematopoietic cell lineage, T cell, B cell receptor signaling pathway, chemokine signaling pathway, natural killer cell-mediated cytotoxicity, leukocyte transendothelial migration, nod, toll-like receptor signaling pathway, cytosolic DNA sensing pathway, etc.
Significant results have been achieved with immunosuppressants in recent years. The immune response of the RLN-related model was examined to determine BC patients' clinical progression and immunosuppressant sensitivity. A pre-treatment tumor sample from a major phase 2 trial, the IMvigor210 dataset [40], was used for biomarker analysis. Responders were CR or PR patients compared to non-responders with SD or PD illness [30]. Results from this dataset show substantial variations in CR, PR, SD, and PD between risk groups (Fig. S3A, B) (P < 0.05). The Cancer Immunome Atlas provided BC immune response data. Patients in the low-risk group were more likely to benefit from immunotherapy, as evidenced by significant differences in their responses to immunosuppressants such PD-1 and CTLA4 (p < 0.0001) (Fig. S3C–F).

Display of the model and the genes at the single-cell level
To investigate the expression of TRAPM at the single-cell level and uncover more specific expression characteristic, single cells were categorized into eight types: T cells, iCAFs, macrophages, endothelial cells, myCAFs, epithelial cells, myCAF B cells, and plasma (Fig. 5A). The singleR automatic annotation method [41], combined with biomarkers from previous literature [42], was employed to annotate these cell types (Fig. 5C). Subsequently, the single-sample gene set enrichment analysis (ssGSEA) assessed TRAPM's performance at the single-cell level, showing that macrophages possessed a higher risk score (Fig. 5B, D). Finally, the expression levels of the model's genes in these cell types were evaluated, finding that NFKBIA and JUN were highly expressed across all cell types, suggesting their potential as "housekeeping genes" and as predictive markers for sensitivity to targeted and immunotherapies [43] (Fig. 5E).

TRAPM assesses drug sensitivity
TRAPM-sensitive medicines were examined from CTRP and PRISM (Fig. 6A, B). We chose sensitive medications with a correlation above 0 and a Wilcoxon test p-value below 0.05. The CTRP dataset includes nutlin-3, canertinib, neratinib, oligomycin A, and afatinib. Canertinib (p = 0.016) and neratinib (p = 0.015) were sensitive medicines evaluated. PRISM evaluated PD-168393, voxtalisib, AZD8931, CGM097, and poziotinib. Three susceptible medicines were PD-168393 (p = 0.0015), AZD8931 (p = 0.035), and poziotinib (p = 0.042). In conclusion, low-risk TRAPM BC patients are more sensitive to canertinib, AZD8931, poziotinib, neratinib, and PD-168393, an EGFR tyrosine kinase inhibitor. Small molecule inhibitors provide a solid theoretical framework for BC treatment and new paths for research into related systems. We selected 20 chemotherapeutic and targeted medications with IC50 values (half inhibitory concentration) that are sensitive to TRAPM to investigate its use in drug selection for precise and individualized BC treatment (Fig. S4). Compared to the high-risk group, BC patients in the low-risk group were more sensitive to these medications, making them more likely to benefit from their therapeutic effects. These findings suggest that TRAPM can help doctors choose tailored medication and create individualized BC treatment plans.

TRAPM classification of BC
Currently, most BC subtypes are classified based on gene expression levels, which may be related to specific biological functions. To this end, we are also trying to explore different subtypes of BC and strive to achieve more precise diagnosis and treatment. Based on the 9 genes in TRAPM, we used the R package “ConsensusClusterPlus” to cluster all samples. After a comprehensive consideration, k = 3 was determined to be the optimal number of clusters (Fig. 7A–C). Subsequently, BC was divided into three subtypes: RC1, RC2, and RC3 (Fig. 7D). The PCA expression heatmap also revealed the differential distribution of expression profiles among the three subtypes (Fig. 7E). RC3 shows enrichment in Basal - like and HER2 - enriched subtypes, suggesting overlap with aggressive BC forms (Supplementary File 3).

Differences in immune characteristics of BC subtypes
The immune characteristics of BC subtypes were determined by analyzing the correlation between each subtype and immune checkpoints and immune cells. Significant differences in immune checkpoint expression were observed among the three BC subtypes. Notably, the expression of immune checkpoints in RC3 was predominantly higher than in RC1 and RC2, including CTLA4, CD274, TNFRSF9, PDCD1LG2, etc. (P < 0.001; Fig. 8A). Due to the variations in immune checkpoints across 3 subtypes, the infiltration of immune cells was further investigated to provide a comprehensive immunological profile. Four algorithms, MCP, QUA, CIBERSORT, and Xcell, were utilized to assess the infiltration levels of various immune cell communities (Fig. 8B–E). Both RC3 and RC1 were found to exhibit significant immune cell infiltration, including regulatory T cells, CD4+T cells, B cells, dendritic cells, M1 macrophages, M0 macrophages, M2 macrophages, natural killer cells, mast cells, eosinophils, neutrophils, endothelial cells, and fibroblasts. Notably, RC3 demonstrated greater macrophage infiltration compared to RC1 and RC2.

Differences in metabolic profiles among BC subtypes
Considering these subtypes are defined based on RLN, we investigated the presence of distinct metabolic characteristics across the various types. Initially, 81 metabolic pathways were identified using the “GSVA” R package. Subsequently, a differential analysis was conducted to pinpoint the metabolic signatures unique to each subtype. These metabolic characteristics were assessed using GSVA scores in the corresponding subtypes. The findings revealed that the three subtypes demonstrated variances across diverse pathways, with the RC3 exhibiting higher metabolic profile scores in most pathways (Fig. 9).

WGCNA identifies BC subtype-related marker genes
The RC3 was selected for further analysis. In WGCNA, samples were clustered (Fig. 10A), with power=6 chosen as the optimal power index for maximum efficiency in scale-free network and connectivity (Fig. 10C). Genes were categorized into 13 modules, each assigned a unique color (Fig. 10B). The MEyellow module showed the strongest correlation with RC3 (Fig. 10D). Genes with gene Module Membership in MMyellow > 0.8 were identified as marker genes (hub genes) (Fig. 10E) and subjected to GO pathway enrichment analyses (Fig. 10F). GO enrichment analysis revealed these marker genes were predominantly enriched in processes like organelle fission, nuclear division, chromosome segregation, chromosome region, and DNA replication, suggesting their significant roles in guiding research and treatment for RC3 subtypes.

Machine learning to identify key predictors of RC3
Three distinct machine learning algorithms were employed to identify potential biomarkers for RC3. The Boruta algorithm [44] reduced the feature genes to 47 variables (Fig. 11A). The RF algorithm [45] identified the top 20 feature genes (Fig. 11B). Using the XGboost algorithm [46], a subset of 20 features was selected among the feature genes (Fig. 11C). The genes common to all three subsets were chosen for further analysis, resulting in seven marker genes (MTHFD1L, CAVIN4, MMP1, ADGRG6, B3GNT5, SMYD2, and TFRC; p < 0.05) (Fig. 11D). ROC curve analysis was utilized to evaluate the predictive capability of each marker gene for RC3 progression. The AUC values were 0.65 for ADGRG6, 0.657 for B3GNT5, 0.723 for CAVIN4, 0.996 for MMP1, 0.707 for MTHFD1L, 0.639 for SMYD2, and 0.683 for TFRC, with MMP1 exhibiting the highest AUC value (Fig. 11E). Given the predominant infiltration of macrophages in RC3, a detailed analysis of the seven marker genes was conducted in eight cell types, revealing that B3GNT5 and TFRC were notably expressed in macrophages (Fig. 11F). To evaluate the association between these marker genes and metabolic pathways, all metabolic pathways (81 pathways) were selected, and the Mantel test [47] was applied to examine pathway-to-pathway and gene-to-pathway correlations (Fig. 12).

Differential expression of prognostic genes in BRCA cell lines
To explore the expression patterns of seven marker genes, mRNA and protein levels were analyzed in six BRCA cell cultures, each representing a molecular subtype: TNBC (TNBC; MDA-MB-231 and MDA-MB-468), hormone receptor-positive (HR+; MCF-7 and T47D), and HER2-positive (MDA-MB-435 and SKBR3). RT-qPCR was used to quantify mRNA levels (Fig. 13A–G), while Western Blot measured protein expression (Fig. 13H, I), with GAPDH as the internal control for normalization.
MTHFD1L and CAVIN4 showed high expression levels in HR+ cells at both mRNA and protein levels, particularly in MCF-7 and T47D. MMP1 and CD71/TFRC were most strongly expressed in HER2-positive cell lines, with SKBR3 displaying the highest levels. ADGRG6 and B3GN5 were predominantly expressed in TNBC cells, including MDA-MB-231 and MDA-MB-468, while exhibiting lower expression in HR+ and HER2-positive subtypes. SMYD2 was significantly elevated in HR+ cells, particularly in MCF-7, across both mRNA and protein levels. These findings reveal subtype-specific expression patterns for these marker genes. The detailed experimental data are provided in the supplementary file.

Discussion

Discussion
According to data from the National Cancer Institute (NCI) and the American Cancer Society (ACS), the overall five-year survival rate for BC patients is approximately 90 %, largely due to advancements in immunotherapy and targeted therapy [48]. However, certain subtypes, such as TNBC, remain highly aggressive, exhibiting high recurrence rates and a lack of effective targeted treatments [49]. Accumulating evidence highlights the systemic hormonal function of relaxin, particularly its synergy with PD-L1 blockade in inhibiting tumorigenesis, mainly by enhancing T cell-mediated cytotoxicity and macrophage phagocytic activity [50]. While previous research has confirmed relaxin's therapeutic efficacy in pancreatic cancer xenograft models [51], its role in BC remains intricate, with some studies suggesting its contribution to tumor proliferation and metastasis [52,53]. Relaxin exerts various bioactive effects through interaction with its primary receptor, RXFP1, or by directly modulating related metabolic pathways [54]. Consequently, we developed a novel TRAPM using machine learning, integrating multi-omics data to classify BC into three distinct subtypes: RC1, RC2, and RC3.
TRAPM was employed in this study to evaluate the BC immune microenvironment. The analysis revealed that high infiltration levels of M2 macrophages and TAMs are major contributors to high-risk status and poor prognosis. TAMs exhibit immunosuppressive and tumor-promoting properties by secreting IL-10, TGF-β, and VEGF, which facilitate tumor growth and angiogenesis [55]. Additionally, TAMs suppress the immune response via the PD-L1 and CTLA-4 pathways, thereby reducing the efficacy of immune checkpoint inhibitors (ICIs) [56].
A key contribution of this study is the identification of the RC3 subtype, which is characterized by distinct immune and metabolic features, particularly elevated macrophage infiltration and increased expression of immune checkpoints such as CTLA4, PD-L1, and TNFRSF9. Based on these findings, we propose a promising therapeutic approach: by identifying RC3 patients using specific biomarkers and integrating drug sensitivity testing, macrophage-targeted combination therapies, including chemotherapy or ICIs, could be employed to overcome the immunosuppressive cold tumor microenvironment, ultimately improving patient outcomes.
Seven biomarkers—MTHFD1L [57], CAVIN4 [58], MMP1 [59], ADGRG6 [60], B3GNT5, SMYD2 [61], and TFRC—were identified as RC3-specific markers in this study. GO enrichment analysis revealed that these genes are significantly involved in processes such as organelle fission, nuclear division, chromosome segregation, and DNA replication, pointing to a potential role in promoting cellular proliferation and genomic instability. Notably, MMP1 and TFRC have been previously linked to extracellular matrix remodeling and iron metabolism, respectively, both processes known to influence tumor invasion and immune modulation. KEGG pathway analysis further suggests that several of these genes interface with relaxin signaling and its downstream effectors. For instance, MMP1 is a known downstream target of relaxin via the NF-κB and MAPK pathways, and its overexpression could enhance matrix degradation and metastatic potential. Similarly, TFRC-mediated iron uptake may support the metabolic demands of proliferating tumor cells and alternatively activated macrophages within the TME, thereby reinforcing an immunosuppressive niche. The enrichment of metabolic pathways such as folate metabolism (MTHFD1L) and glycosphingolipid biosynthesis (B3GNT5) in RC3further suggests that these genes may help rewire cellular metabolism to favor tumor growth and immune evasion.
While these bioinformatic associations provide compelling hypotheses, functional validation is essential to establish causality. Future studies employing CRISPR-based knockout or overexpression of these genes in relevant BC cell lines and co-culture systems will clarify their specific contributions to relaxin-mediated signaling, macrophage polarization, and metabolic reprogramming. Such experiments will not only validate their role as drivers of the RC3 phenotype but may also reveal novel therapeutic vulnerabilities for this high-risk subgroup. WB and RT-qPCR analyses demonstrated distinct expression patterns: MMP1 exhibited consistently high expression across all three BC cell lines, whereas TFRC displayed uniformly low expression. Additionally, ADGRG6 and B3GNT5 were highly expressed in TNBC cell lines, whereas MTHFD1L, CAVIN4, and SMYD2 were predominantly expressed in hormone receptor-positive cell lines. These findings further validate the molecular characteristics of the RC3 subtype. Notably, the high expression of B3GNT5 and TFRC in macrophages suggests their potential role in macrophage-mediated tumor progression. As a transferrin receptor, TFRC plays a crucial role in macrophage function. In the tumor microenvironment, macrophages, particularly TAMs, often exhibit elevated TFRC expression, facilitating increased iron uptake and promoting tumor cell proliferation. Moreover, an iron-rich environment can induce macrophage polarization toward the M2 phenotype, might contributing to tumor progression [62]. B3GNT5, a glycosyltransferase involved in glycosphingolipid biosynthesis, has been implicated in immune cell interactions within the tumor microenvironment [63]. Future studies will employ PDX models and macrophage co - culture systems to investigate the role of B3GNT5 and TFRC in immune modulation. ICIs restore T cell-mediated tumor cytotoxicity by blocking interactions between immune checkpoint receptors and their ligands. Among the most extensively studied immune checkpoints are PD-1/PD-L1 and CTLA-4, with PD-L1 inhibitors playing a particularly significant role in breast cancer treatment. However, some breast cancer patients exhibit resistance to ICIs, potentially due to low PD-L1 expression, inadequate T cell infiltration, or other immune evasion mechanisms [64]. Targeting TAMs in BC may enhance anti-tumor immunity by disrupting the PD-1/PD-L1 or CTLA-4 pathways and mitigating T cell suppression. In the RC3 subtype, B3GNT5 overexpression may promote TAMs recruitment and polarization, thereby influencing the PD-L1 and CTLA-4 pathways and modulating tumor immune response activity. Therefore, we hypothesize that targeting B3GNT5 and TFRC could serve as a promising therapeutic strategy for modulating macrophage behavior and enhancing the tumor immune response.
Dandan Yi et al. demonstrated that MTHFD1L knockout in nude mice resulted in a downregulation of Notch2, Hes1, CCND1, Bcl-2, and PCNA proteins, thereby suppressing thyroid cancer cell proliferation [65]. Caveolae-related proteins are strongly implicated in the development and progression of breast cancer. The roles of Cav-1, Cav-2, and Cav-3 in tumor cells have been extensively studied. Conversely, reports about CAVIN4 are highly infrequent [58,66]. In head and neck squamous cell carcinoma (HNSCC) cells, MMP1 knockdown markedly inhibits cell proliferation, migration, and invasion, while simultaneously inducing apoptosis and epithelial-mesenchymal transition (EMT) [67]. Additionally, Tahmina Akter, Md Abdul Aziz, and colleagues investigated the MMP1 domain and its role in breast cancer metastasis and invasion, providing crucial insights into MMP1’s potential as a genetic biomarker [68]. Song Wu conducted whole-genome and targeted sequencing of urothelial bladder cancer (UBC) and functional assays further revealed that ADGRG6 depletion in UBC cells compromised their ability to recruit endothelial cells and induce tube formation, underscoring the therapeutic potential of ADGRG6 as an anti-angiogenic target [69]. Furthermore, the lysine methyltransferase SMYD2 has been identified as a key regulator of breast cancer metastasis. In a mammary epithelium-specific SMYD2 knockout mouse model, SMYD2 deletion effectively inhibits the metastatic potential of primary tumor cells, thereby prolonging overall survival in mice [70].
To the best of our knowledge, this study is the first to employ machine learning to develop a relaxin-based breast cancer prognostic model and perform a corresponding subtype analysis. The identification and validation of biomarker genes provide potential molecular targets for further investigations into BC immune and metabolic mechanisms. However, this study has certain limitations. Despite being the selected integrated model, TRAPM exhibited suboptimal performance metrics (e.g., C-index and AUC), likely due to the heterogeneity of data types and sources, which may have compromised its generalizability. The model's development relied primarily on retrospective data, as the feasibility of prospective cohort studies was constrained by challenges such as patient recruitment, data collection, and long-term follow-up. Moreover, breast cancer progression is inherently time-dependent, whereas our model is based on static features (e.g., gene expression) and does not incorporate time-series methodologies such as long short-term memory (LSTM) networks or Cox time-dependent regression models [71]. We acknowledge that the predictive accuracy of TRAPM, as reflected by the C-index (∼0.59) and time-dependent AUCs (ranging from 0.59 to 0.69), is moderate. This may be attributable to several factors inherent to the study design and biological complexity. First, the transcriptomic and clinical data were sourced from public repositories encompassing diverse patient populations, treatment histories, and sequencing platforms, introducing unavoidable batch effects and biological heterogeneity. Second, breast cancer progression is influenced by a dynamic interplay of genetic, microenvironmental, and systemic factors that may not be fully captured by a static, RNA-based signature alone. Although relaxin-related genes provide a novel and biologically grounded feature set, they represent one layer of a multifactorial disease process. Future models integrating multi-omics data at serial time points, along with clinically accessible variables, are warranted to enhance prognostic precision. Our analysis revealed that the RC3 subtype exhibits elevated activity across multiple metabolic pathways, including glycolysis, glutamine metabolism, and folate metabolism. This metabolic reprogramming is not merely a hallmark of rapid tumor proliferation but may also be a key enabler of the immunosuppressive microenvironment characteristic of RC3. Specifically, high glycolytic flux can lead to lactate accumulation in the tumor milieu, which has been shown to promote the polarization of tumor-associated macrophages (TAMs) toward an M2-like, pro-tumor phenotype while inhibiting cytotoxic T cell function. Similarly, enhanced glutamine metabolism supports the biosynthetic and oxidative needs of both cancer cells and infiltrating myeloid cells, further stabilizing an immune-evasive niche. To address this limitation, we plan to perform integrative multi-omics analysis on bioinformatically identified marker genes to characterize the RC3 subtype of breast cancer. Furthermore, we will conduct functional assays along with in vitro and in vivo experiments to elucidate the roles of key genes in this subtype. Another major challenge is the absence of clinical validation for the drug sensitivity analysis, as this study primarily focuses on computational modeling. Moving forward, we plan to collaborate with clinical researchers to systematically evaluate the therapeutic potential of the identified candidate drugs and assess their applicability in real-world clinical settings, future work will include functional assays in BRCA organoids and xenograft models to validate the efficacy of candidate drugs such as canertinib and poziotinib.

Conclusion

Conclusion
The predictive accuracy of our RLN-associated TRAPM is higher than traditional frameworks due to its 10 machine learning algorithms. The model identified three BC subgroups with significant clinical, metabolic, and immune infiltration variations, revealing BC patients' heterogeneity. The RC3 subtype also had seven marker genes: MTHFD1L, CAVIN4, MMP1, ADGRG6, B3GNT5, SMYD2, and TFRC, enabling more tailored diagnostic and treatment approaches. The complex and yet-to-be-unraveled links between RLN and BC pathogenesis should improve clinical prognosis prediction and treatment decision-making, improving BC patient care.

Ethics approval and consent to participate

Ethics approval and consent to participate
Not applicable

Consent for publication

Consent for publication
Not applicable

Availability of data and materials

Availability of data and materials
The datasets used and analysed during the current study are available from the corresponding author on reasonable request.

Funding

Funding
This work was supported by the Beijing Heart to Heart Foundation [grant number HXXT2022ktyj001, HXXT2022ktyj002].

CRediT authorship contribution statement

CRediT authorship contribution statement
Yi Du: Writing – original draft, Visualization, Methodology, Conceptualization. Quan Yuan: Writing – original draft, Visualization, Validation, Software, Data curation. Hao Yu: Visualization, Validation, Software, Data curation. Rongjie Ye: Software, Resources, Data curation. Huan Lin: Writing – review & editing, Visualization, Supervision. Ge Yu: Visualization, Supervision, Project administration, Funding acquisition. Ming Niu: Writing – review & editing, Supervision, Investigation, Funding acquisition, Conceptualization. Huilei Qiu: Writing – review & editing, Validation, Supervision, Software, Project administration, Formal analysis, Conceptualization.

Declaration of competing interest

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기