본문으로 건너뛰기
← 뒤로

Comparative Transcriptomic Analysis Identifies Predictive Biomarkers of Pathological Complete Response in Triple-negative Breast Cancer.

1/5 보강
Cancer genomics & proteomics 📖 저널 OA 100% 2024: 3/3 OA 2025: 16/16 OA 2026: 12/12 OA 2024~2026 2026 Vol.23(1) p. 66-80
Retraction 확인
출처

Chen FM, Li CL, Pan MR, Huang YC, Huang LJ, Moi SH

📝 환자 설명용 한 줄

[BACKGROUND/AIM] Pathologic complete response (pCR) to neoadjuvant chemotherapy (NACT) is a strong prognostic indicator in triple-negative breast cancer (TNBC).

이 논문을 인용하기

↓ .bib ↓ .ris
APA Chen FM, Li CL, et al. (2026). Comparative Transcriptomic Analysis Identifies Predictive Biomarkers of Pathological Complete Response in Triple-negative Breast Cancer.. Cancer genomics & proteomics, 23(1), 66-80. https://doi.org/10.21873/cgp.20561
MLA Chen FM, et al.. "Comparative Transcriptomic Analysis Identifies Predictive Biomarkers of Pathological Complete Response in Triple-negative Breast Cancer.." Cancer genomics & proteomics, vol. 23, no. 1, 2026, pp. 66-80.
PMID 41482355 ↗
DOI 10.21873/cgp.20561

Abstract

[BACKGROUND/AIM] Pathologic complete response (pCR) to neoadjuvant chemotherapy (NACT) is a strong prognostic indicator in triple-negative breast cancer (TNBC). However, reliable predictive biomarkers for pCR remain limited. This study aimed to identify gene expression signatures associated with pCR in TNBC to facilitate more precise treatment stratification.

[MATERIALS AND METHODS] Tumor samples from 16 TNBC patients treated with NAC at the Kaohsiung Medical University Hospital (KMUH) were analyzed, including 5 pCR and 11 non-pCR cases. RNA sequencing (RNA-seq) was performed, and differentially expressed genes (DEGs) were identified using DESeq2 (|logFC| ≥2, adjusted <0.05). Gene expression profiles were compared with a validation cohort of 27 NAC-responsive TNBC cases from The Cancer Genome Atlas (TCGA). Overlapping DEGs were identified using Venn diagram analysis, and drug-gene interaction databases were queried to explore therapeutic relevance.

[RESULTS] In the KMUH cohort, 175 DEGs were identified, including 146 up-regulated and 29 down-regulated genes in non-pCR tumors. Fifteen DEGs demonstrated consistent differential expression patterns between KMUH and TCGA datasets, showing enrichment in pCR samples. These genes may serve as predictive biomarkers for NAC response. Notably, several of these genes are potentially druggable, suggesting opportunities for targeted therapy in chemoresistant TNBC.

[CONCLUSION] We identified and validated a 15 gene signature associated with pCR in TNBC across independent cohorts. These findings offer a promising basis for improving patient stratification, guiding treatment decisions, and developing targeted therapies for NAC-resistant TNBC.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

📖 전문 본문 읽기 PMC JATS · ~33 KB · 영문

Introduction

Introduction
Triple-negative breast cancer (TNBC) represents one of the most aggressive and therapeutically challenging subtypes of breast cancer. Characterized by the absence of estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) expression, TNBC accounts for approximately 15-20% of all breast cancer cases (1,2). It disproportionately affects younger women and those with BRCA1 mutations (3), and is associated with a high risk of early recurrence and distant metastasis (4). The lack of hormone receptors and HER2 amplification limits the use of targeted therapies, leaving chemotherapy as the mainstay of systemic treatment (2).
Neoadjuvant chemotherapy (NACT) has become the standard of care for locally advanced TNBC, allowing tumor downstaging and enabling in vivo assessment of chemosensitivity. Achieving a pathological complete response (pCR), defined as the absence of invasive cancer in the breast and axillary lymph nodes following NACT, is strongly associated with improved disease-free and overall survival (5,6). However, a substantial proportion of patients fail to achieve pCR, and these non-responders face a significantly higher risk of relapse and limited therapeutic options after standard treatment (7).
Given this variability in treatment response, there is an urgent need to identify biomarkers capable of predicting which patients are more likely to benefit from NACT. Predictive biomarkers could enable personalized treatment approaches, in which likely non-responders are considered for alternative regimens such as immunotherapy or targeted agents (8,9), while predicted responders may be spared overtreatment and its associated toxicities.
Recent advances in high-throughput RNA sequencing (RNA-seq) have made it possible to investigate the transcriptomic landscape of TNBC in detail. This technology enables the identification of differentially expressed genes (DEGs) between responder and non-responder groups, offering insights into the molecular mechanisms underlying chemotherapy sensitivity or resistance (10). Furthermore, some DEGs may represent druggable targets, providing opportunities to improve therapeutic efficacy in chemoresistant cases (11).
Although several studies have explored gene expression profiles associated with pCR in TNBC, challenges such as limited cohort sizes and insufficient cross-cohort validation have hindered the generalizability of their findings (12,13). Integrating publicly available datasets, such as The Cancer Genome Atlas (TCGA), provides an opportunity to enhance the robustness of candidate gene signatures through validation in larger and more diverse populations (14,15).
In this study, we analyzed RNA-seq data from TNBC patients treated with NACT at the Kaohsiung Medical University Hospital (KMUH) and compared gene expression profiles with those from a chemotherapy-responsive TNBC cohort in TCGA. Our goal was to identify DEGs associated with pCR and validate their predictive value across independent datasets. We also evaluated the therapeutic potential of these genes using drug-gene interaction databases. Our findings may aid in the development of predictive biomarkers and targeted treatment strategies for TNBC patients undergoing neoadjuvant chemotherapy.

Materials and Methods

Materials and Methods
Patient cohorts and sample collection. Tumor samples were collected from 16 patients with histologically confirmed TNBC treated at KMUH. All patients received anthracycline- and/or taxane-based NACT. Pathologic response was evaluated postoperatively and classified as either pCR (n=5) or non-pCR (n=11). All procedures were approved by the Institutional Review Board of KMUH [IRB No: KMUHIRB-G(I)-20200039], and written informed consent was obtained from all participants.
RNA extraction and sequencing (KMUH cohort). Fresh-frozen tumor tissues were used for total RNA extraction using the RNeasy Mini Kit (Qiagen, Germantown, MD, USA), following the manufacturer’s protocol. RNA quality was assessed with a NanoDrop spectrophotometer and Agilent 2100 Bioanalyzer. Samples with RNA integrity number (RIN) ≥7.0 were included. Library preparation was performed using the TruSeq Stranded mRNA Library Prep Kit (Illumina, San Diego, CA, USA) and paired-end 150 bp sequencing was conducted on the Illumina NovaSeq 6000 platform, generating 30-50 million reads per sample.
RNA-seq data processing. Raw reads underwent quality control using FastQC, followed by adapter trimming. Clean reads were aligned to the human reference genome (GRCh38, Ensembl release 104) using HISAT2. Gene-level counts were generated using featureCounts, and expression values were reported in both fragments per kilobase of exon per million reads mapped (FPKM) and raw counts for downstream analysis.
Public dataset integration (TCGA cohort). RNA-seq data from 27 TNBC patients were downloaded from TCGA via the Genomic Data Commons (GDC) portal. Patients were selected based on ER-/PR-/HER2- receptor status and receipt of anthracycline- and/or taxane-based chemotherapy. Only patients with favorable clinical response (i.e., no disease progression during follow-up) were included. mRNA expression levels were obtained using batch-normalized RSEM data (Illumina HiSeq_RNASeqV2 pipeline).
Differential gene expression analysis. Differential gene expression analysis between pCR and non-pCR groups in the KMUH cohort was conducted using the DESeq2 package (R version 4.4.1, R Foundation, Vienna, Austria) (16). Genes with an absolute log2 fold change ≥2.0 and a Benjamini-Hochberg adjusted p-value <0.05 were considered significantly DEGs. To focus on biologically meaningful targets, only protein-coding genes were retained by filtering DEGs against the Ensembl database (17).
Gene Ontology (GO) enrichment and Protein-Protein Interaction (PPI) analysis. GO enrichment and PPI analyses were performed using the STRING database through the rbioapi R package (18). GO enrichment was assessed across three major categories: molecular function, biological process, and cellular component. For PPI analysis, STRING was used to construct interaction networks that incorporated both physical interactions and functional associations, covering biological mechanisms such as activation, inhibition, catalysis, and expression regulation.
Candidate biomarker analysis. To identify candidate biomarkers, normalized log-transformed transcripts per million (TPM) values were compared among KMUH pCR, KMUH non-pCR, and TCGA responder groups. DEGs that showed significant differences between KMUH non-pCR tumors and TCGA responders but did not differ significantly between KMUH pCR tumors and TCGA responders, were selected as candidate markers associated with chemotherapy resistance. A Venn diagram was used to visualize the overlap of DEGs among groups. The overall analysis workflow is summarized in Figure 1.
Drug-gene interaction analysis was performed on selected DEGs using the Drug-Gene Interaction Database (DGIdb, version 5.0.9) and other curated pharmacogenomic resources to evaluate their therapeutic potential.
Statistical analysis. Descriptive statistics were used to summarize baseline characteristics. Continuous variables were expressed as mean±standard deviation (SD), while categorical variables were reported as counts and percentages. Comparisons between pCR and non-pCR groups within the KMUH cohort were conducted using the Wilcoxon rank sum test for continuous variables and Fisher’s exact test for categorical variables. Pairwise comparisons between the TCGA responder cohort and KMUH subgroups were also conducted using the Wilcoxon test. All tests were two-tailed, with a p-value <0.05 considered statistically significant. All analyses were performed using R software version 4.4.1 (R Foundation).

Results

Results
Baseline clinicopathological characteristics. The clinical characteristics of patients from both the KMUH and TCGA cohorts are summarized in Table I. In the KMUH cohort (n=16), patients were stratified according to their pathological response to NACT, yielding 11 non-pCR cases and 5 pCR cases. There were no statistically significant differences between the two groups in terms of age (p=0.061), tumor stage (p=1.000), tumor size (p=0.144), or surgical margin status (p=0.083). In contrast, tumor grade differed significantly between groups. A higher proportion of non-pCR tumors were classified as grade 3 (82%) compared to the pCR group (20%) (p=0.036), indicating a trend toward more aggressive histopathological features among non-responders. Regarding proliferative activity, all patients in the non-pCR group had a Ki-67 index of ≥20%, while 80% of pCR patients met the same threshold; however, this difference was not statistically significant (p=0.313).
All patients underwent surgery as part of their treatment. The use of radiotherapy was comparable between groups (p=1.000). A higher proportion of non-pCR patients received targeted therapy (55% vs. 20%), and hormone therapy was administered exclusively to the non-pCR group (36%), though neither difference reached statistical significance (p=0.308 and p=0.245, respectively). No cases of disease progression or death occurred in the pCR group, whereas one progression and three deaths were observed among non-pCR patients. However, survival comparisons should be interpreted cautiously due to the limited sample size.
For reference, the TCGA cohort consisted of 27 TNBC patients who demonstrated a favorable response to chemotherapy and showed no evidence of disease progression during follow-up. Due to differences in clinical annotations and dataset structure, statistical comparisons between the TCGA and KMUH cohorts were not performed.
Protein-coding DEGs identification. To further refine the list of potential biomarkers, we focused on identifying protein-coding DEGs that may have functional relevance and translational potential. By filtering RNA-seq data through protein-coding annotations, we aimed to prioritize genes most likely involved in the biological mechanisms underlying differential chemotherapy response. Figure 2A displays the normalized expression distributions of all 16 tumor samples, encompassing both pCR and non-pCR groups. The consistent distribution across samples suggests high-quality data and effective normalization, with no evidence of batch effects. Figure 2B shows the principal component analysis (PCA) of transcriptomic profiles, revealing a partial separation between pCR and non-pCR groups. Notably, non-pCR samples tend to cluster together, indicating shared transcriptional characteristics within this subgroup. Differential expression analysis (DEA) identified 175 DEGs between the two groups, including 146 genes up-regulated and 29 down-regulated in the non-pCR group (Figure 2C). DEGs were defined by an absolute log2 fold change ≥2 and a false discovery rate (FDR)-adjusted p-value <0.05. Among these, 132 genes (104 up-regulated and 28 down-regulated) were annotated as protein-coding based on Ensembl. These protein-coding DEGs were further analyzed through unsupervised hierarchical clustering to assess their expression patterns across samples. As shown in Figure 2D, up-regulated and down-regulated DEGs formed clearly distinct clusters, indicating consistent intra-group expression and marked transcriptional differences between pCR and non-pCR tumors. These results support the potential of these genes as predictive biomarkers for treatment response in TNBC. To further investigate the functional implications of these DEGs, we performed GO enrichment and PPI analyses. The results are provided in Supplementary Figure S1, where Figures S1A-C present the top 10 enriched terms for molecular function, biological process, and cellular component, respectively. Figure S1D illustrates the STRING-based PPI network, highlighting protein-level functional relationships among the DEGs. Comprehensive GO enrichment results are summarized in Supplementary Table S1.
Cohort comparison using TCGA-cohort. To validate the transcriptomic findings, a cohort-level comparison was performed for 117 protein-coding DEGs between the KMUH cohort and TCGA chemotherapy-responsive TNBC cases. The complete expression profiles are provided in Supplementary Table S2. Among these genes, 43 protein-coding DEGs exhibited significantly different expression between the KMUH non-pCR and KMUH-pCR. In contrast, 57 DEGs showed no significant expression difference between the KMUH pCR group and TCGA responders, suggesting a closer transcriptomic resemblance between these two groups (Figure 1). Overall, 15 DEGs have met were significant differential between KMUH non-pCR and KMUH pCR, but resemblance with TCGA, which is included as candidate DEGs for later analysis.
Candidate DEGs for pCR estimation. To refine potential biomarkers predictive of pCR, we integrated RNA-seq data from the KMUH cohort with transcriptomic profiles from TCGA chemotherapy-responsive TNBC cases. Among the differentially expressed genes, 15 protein-coding genes - CD74, PYCARD, IFI27L2, HCST, ASPHD2, RPL27, FAU, OTOA, C1QA, SSR4, NDUFA4, HLA-DRB1, HSD17B8, PSMB9, and CHCHD10, which are exhibited highly concordant expression patterns between the KMUH pCR group and TCGA responders. As shown in Figure 3A, boxplots reveal that all 15 genes were significantly up-regulated in both the KMUH pCR group and TCGA responders compared to the KMUH non-pCR group, suggesting their potential role in favorable treatment response. In contrast, these genes showed consistently lower expression in the non-pCR group, indicating their discriminatory capacity. Figure 3B presents a heatmap of the 15 candidate genes within the KMUH cohort, where most genes were down-regulated in non-pCR samples and up-regulated in pCR samples. This clear distinction in expression profiles underscores their relevance as potential biomarkers. Table II summarizes the differential expression statistics, including log-transformed expression levels, fold changes, and significance values across KMUH and TCGA cohorts. The consistent expression patterns observed between the KMUH pCR group and the TCGA responder group, along with their distinct separation from non-pCR samples, support the validity and potential clinical utility of these 15 candidate genes in predicting pCR in TNBC patients receiving neoadjuvant chemotherapy.
Druggable DEGs and functional annotation. To enhance the clinical relevance of the 15 candidate DEGs associated with pCR prediction, we conducted GO enrichment and DGI analyses. As summarized in Table III, GO analysis revealed that several DEGs were significantly enriched in immune-related and metabolic biological processes (p<0.05). In particular, CD74 and HSD17B8 were implicated in multiple pathways, including carboxylic acid metabolism, organic acid biosynthesis, and small molecule metabolism, while C1QA was enriched in the myeloid leukocyte activation process. These results indicate that the identified DEGs are not only differentially expressed between responders and non-responders but are also functionally involved in biological processes potentially linked to treatment response.
To explore their therapeutic relevance, we queried public drug-gene interaction databases. As illustrated in Figure 4, a chord diagram highlights four genes: CD74, HLA-DRB1, NDUFA4, and PSMB9 with known or predicted interactions with clinically relevant compounds. These include proteasome inhibitors (e.g., bortezomib, carfilzomib, marizomib) and immune-targeting agents (e.g., milatuzumab). The arc width in the diagram reflects the strength or number of interactions, suggesting varying degrees of therapeutic accessibility.
The convergence of predictive and actionable features within this gene panel underscores its translational potential. These DEGs may serve not only as biomarkers for predicting chemotherapy response but also as targets for developing novel therapeutic strategies, particularly for TNBC patients who are less likely to benefit from standard neoadjuvant regimens.

Discussion

Discussion
In this study, we identified and validated a 15-gene signature associated with pCR to NACT in TNBC patients. By integrating transcriptomic data from KMUH tumor samples and chemotherapy-responsive cases from TCGA, we discovered protein-coding DEGs that not only distinguish responders from non-responders but also exhibit strong cross-cohort reproducibility and therapeutic relevance.
Consistent with prior research highlighting pCR as a robust prognostic marker in TNBC (19-21), our findings support the utility of gene expression signatures for patient stratification prior to NACT. Notably, the 15 DEGs were significantly up-regulated in both KMUH pCR and TCGA responder groups, but down-regulated in KMUH non-pCR tumors. This expression pattern - confirmed through boxplots, heatmaps, and statistical analysis - reinforces the predictive value of the identified gene panel.
Among the DEGs, several genes such as CD74, HLA-DRB1, PYCARD, C1QA, and HCST are involved in immune-related pathways such as antigen presentation, inflammasome signaling, and activation of lymphoid or myeloid cells (22,23). CD74 and HLA-DRB1, for instance, are components of the MHC class II pathway that enhance tumor immunogenicity and promote T cell mediated cytotoxicity (24,25). PYCARD encodes ASC, a key adaptor in inflammasome-mediated pyroptosis (26), while C1QA and HCST contribute to macrophage phagocytic activity and NK/T-cell signaling (27,28). IFI27L2, an interferon-stimulated gene, reflects activation of type I interferon pathways, which have been associated with improved responses to cytotoxic and immune-based therapies (29). Other DEGs such as ASPHD2, SSR4, RPL27, and FAU are involved in protein synthesis, endoplasmic reticulum (ER) function, and cellular stress responses. SSR4, although underexplored in breast cancer, has been identified as a prognostic biomarker in colon adenocarcinoma and is associated with immune infiltration (30). RPL27 has been implicated in chemotherapy sensitivity (31), and FAU, typically considered a tumor suppressor, shows increased expression in pCR cases, possibly indicating enhanced apoptotic capacity (32). NDUFA4 and CHCHD10 are mitochondrial regulators of respiration and oxidative stress (33,34), while HSD17B8, a metabolism-related gene, is associated with better outcomes in breast cancer (35). OTOA, though not traditionally expressed in breast tissue, may represent a cancer-specific transcriptional aberration, and elevated expression of PSMB9, a component of the immunoproteasome, has been linked to favorable TNBC prognosis (36). Together, these functionally diverse genes represent biological processes related to immune activation, metabolism, apoptosis, and stress responses, all of which may contribute synergistically to enhanced chemosensitivity. Integrated into a 15 gene signature, this panel demonstrated strong predictive performance for pCR in patients receiving standard NACT, including anthracyclines, taxanes, and platinum agents. Tumors with this expression profile may be more susceptible to cytotoxic damage, more capable of immune-mediated clearance, or less able to resist treatment, making this gene panel a promising tool for guiding clinical decisions and tailoring therapy.
Additionally, we found that DEGs such as HSD17B8, RPL27, and NDUFA4 were enriched in metabolic and biosynthetic processes, including mitochondrial function and carboxylic acid metabolism. These findings suggest that chemoresistant tumors may exhibit distinct metabolic phenotypes, a hypothesis supported by literature on metabolic reprogramming in TNBC (37,38). Importantly, DGI analysis identified four genes: CD74, HLA-DRB1, NDUFA4, and PSMB9 are potentially druggable. These genes interact with existing or investigational compounds, including proteasome inhibitors (bortezomib, carfilzomib, marizomib) and anti-CD74 monoclonal antibodies (milatuzumab). As visualized in the chord diagram, the therapeutic accessibility of these targets enhances the translational utility of the gene panel and suggests opportunities for repurposing agents in chemoresistant TNBC (39,40).
Study limitations. Several limitations warrant consideration. First, the KMUH cohort was relatively small (n=16), potentially affecting statistical power and generalizability. However, cross-validation with TCGA responders enhances the robustness of the findings. Second, TCGA lacked explicit pCR data, and response status was inferred from clinical outcomes, which may introduce classification bias. Third, while DGI analysis suggested potential therapeutic targets, functional validation through in vitro or in vivo experiments is necessary. Lastly, the interaction data were derived from curated databases and require experimental confirmation to establish clinical relevance.

Conclusion

Conclusion
In summary, we identified a reproducible 15-gene expression signature predictive of pCR in TNBC. This gene panel includes several immune- and metabolism-related genes with known or potential druggability, offering a foundation for precision treatment strategies. Future validation in larger, prospective cohorts and functional studies will be essential to translate these findings into clinical practice.

Supplementary Material

Supplementary Material
The supplementary material for this article, including Supplementary Figure S1 and Supplementary Tables S1-S2, is openly available in a Figshare repository at the following DOI: 10.6084/m9.figshare.30294595.

Conflicts of Interest

Conflicts of Interest
All Authors state that they have no competing interests related to this work.

Authors’ Contributions

Authors’ Contributions
Fang-Ming Chen and Chung-Liang Li: Writing - original draft. Mei-Ren Pan: Investigation. Yun-Cian Huang and Li-Ju Huang: Data curation. Sin-Hua Moi, Shu-Jyuan Chang, and Yi-Hsiung Lin: Methodology. Ping-Fu Yang, Jung-Yu Kan, Chieh-Ni Kao, Li-Kun Ko, Hidenobu Takahashi, Chia-Yu Kuo, and Shen-Liang Shih: Resources. Ming-Feng Hou: Supervision. Chi-Wen Luo: Conceptualization, Writing - review & editing.

Acknowledgements

Acknowledgements
This study was supported by the following grants: (1) Grants 112-2314-B-037-044 and 113-2320-B-037-006 from the National Science and Technology Council, Taiwan; (2) Grant KMUH113-3R34 from Kaohsiung Medical University Hospital, Taiwan; (3) MOHW111-TDU-B-221-114016 and MOHW112-TDU-B-222-124016 from the Ministry of Health and Welfare, Taiwan.

Artificial Intelligence (AI) Disclosure

Artificial Intelligence (AI) Disclosure
During the preparation of this manuscript, a large language model (ChatGPT, OpenAI) was used solely for language editing and stylistic improvements in select paragraphs. No sections involving the generation, analysis, or interpretation of research data were produced by generative AI. All scientific content was created and verified by the authors. Furthermore, no figures or visual data were generated or modified using generative AI or machine learning-based image enhancement tools.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기