Copy number alteration fingerprint predicts the clinical response of oxaliplatin-based chemotherapy in metastatic colorectal cancer.
1/5 보강
Oxaliplatin-based chemotherapy is a standard treatment for metastatic colorectal cancer (mCRC), yet accurate biomarkers to identify responders remain lacking.
APA
Weng J, Wang J, et al. (2026). Copy number alteration fingerprint predicts the clinical response of oxaliplatin-based chemotherapy in metastatic colorectal cancer.. NPJ precision oncology, 10(1). https://doi.org/10.1038/s41698-026-01354-9
MLA
Weng J, et al.. "Copy number alteration fingerprint predicts the clinical response of oxaliplatin-based chemotherapy in metastatic colorectal cancer.." NPJ precision oncology, vol. 10, no. 1, 2026.
PMID
41807588 ↗
Abstract 한글 요약
Oxaliplatin-based chemotherapy is a standard treatment for metastatic colorectal cancer (mCRC), yet accurate biomarkers to identify responders remain lacking. In this study, we developed and validated a genomic copy number alteration (CNA)-based biomarker to predict clinical response to oxaliplatin-based chemotherapy. A total of 297 samples were collected, and shallow sequencing was employed to extract CNA features. The resulting model named "CNA fingerprint" is an XGBoost model trained using 7 CNA features. The model was validated across three independent test cohorts from two centers, achieving area under the receiver operating characteristic curve (AUC) of 0.87, 0.87, and 0.85, respectively. The primary predictor was the number of DNA segments with high absolute copy numbers. Our findings suggest that the CNA fingerprint could be used as biomarker for oxaliplatin-based chemotherapy response prediction in mCRC. Further prospective clinical trials are warranted to evaluate CNA fingerprint's performance in clinical applications.
같은 제1저자의 인용 많은 논문 (2)
📖 전문 본문 읽기 PMC JATS · ~39 KB · 영문
Introduction
Introduction
With an estimated 1,880,725 new cases and 915,880 deaths, colorectal cancer (CRC) ranked third most commonly diagnosed cancer, second with regard to mortality in 20201. 23% of CRC patients have developed metastatic disease at the time of diagnosis2, and 40% of patients present with recurrence after surgical resection3, 25–50% of patients with early-stage cancer will develop metastasis4. Data from the National Cancer Institute in the United States shows that the 5-year survival for metastatic colorectal cancer (mCRC) is less than 15%2. Chemotherapy has long been the mainstay of systemic treatment for advanced CRC4.
For advanced mCRC, regimens based on oxaliplatin or irinotecan are recommended according to the National Comprehensive Cancer Network colon/rectal cancer guideline, and oxaliplatin-based regimens are usually preferred by physicians in clinics5. These oxaliplatin-based regimens include combinations such as leucovorin calcium, fluorouracil, and oxaliplatin (FOLFOX), or capecitabine and oxaliplatin (XELOX). However, oxaliplatin can result in significant neurotoxicity6, and the selection of wrong treatment options could lead to disease progression. Existing biomarkers for oxaliplatin response prediction are usually based on mRNA expression or germline DNA polymorphism6–10. For mRNA expression quantification, fresh tumor tissue and strict processing procedures are required, and this could limit the clinical applicability of mRNA based biomarker. The accuracy of the germline DNA polymorphism-based biomarker has not been validated in clinics. There remains a significant challenge in accurately and cost-effectively distinguishing the patients that respond effectively to oxaliplatin-based chemotherapy treatment.
Somatic DNA alterations are the driving force of cancer, and DNA alterations consist of two major types: (1) small-scale alterations that include single base substitutions (SBS) and small insertions and deletions, and (2) large-scale alterations, such as copy number alteration (CNA), which is prevalent in human cancer11. Many types of SBS have been reported as cancer biomarkers, however the application of CNA in cancer precision medicine is still very limited, probably due to the complexity in CNA information extraction and procession.
Here, we developed a novel strategy to apply CNA information in mCRC oxaliplatin clinical response prediction. This method is based on the calculation of stable CNA features, including those described in our previous studies12,13, then a machine learning model is subsequently trained. This CNA features-based machine learning model is named as “CNA fingerprint” here. We further validate the predictive performance of the CNA fingerprint using three independent test cohorts, and the CNA fingerprint biomarker shows the most robust performance compared with clinical information or other SBS-based biomarkers in mCRC oxaliplatin clinical response prediction. Since CNA information and CNA fingerprint can be calculated using a cost-effective shallow whole-genome sequencing (sWGS) method, the CNA fingerprint could be further employed in clinical applications.
With an estimated 1,880,725 new cases and 915,880 deaths, colorectal cancer (CRC) ranked third most commonly diagnosed cancer, second with regard to mortality in 20201. 23% of CRC patients have developed metastatic disease at the time of diagnosis2, and 40% of patients present with recurrence after surgical resection3, 25–50% of patients with early-stage cancer will develop metastasis4. Data from the National Cancer Institute in the United States shows that the 5-year survival for metastatic colorectal cancer (mCRC) is less than 15%2. Chemotherapy has long been the mainstay of systemic treatment for advanced CRC4.
For advanced mCRC, regimens based on oxaliplatin or irinotecan are recommended according to the National Comprehensive Cancer Network colon/rectal cancer guideline, and oxaliplatin-based regimens are usually preferred by physicians in clinics5. These oxaliplatin-based regimens include combinations such as leucovorin calcium, fluorouracil, and oxaliplatin (FOLFOX), or capecitabine and oxaliplatin (XELOX). However, oxaliplatin can result in significant neurotoxicity6, and the selection of wrong treatment options could lead to disease progression. Existing biomarkers for oxaliplatin response prediction are usually based on mRNA expression or germline DNA polymorphism6–10. For mRNA expression quantification, fresh tumor tissue and strict processing procedures are required, and this could limit the clinical applicability of mRNA based biomarker. The accuracy of the germline DNA polymorphism-based biomarker has not been validated in clinics. There remains a significant challenge in accurately and cost-effectively distinguishing the patients that respond effectively to oxaliplatin-based chemotherapy treatment.
Somatic DNA alterations are the driving force of cancer, and DNA alterations consist of two major types: (1) small-scale alterations that include single base substitutions (SBS) and small insertions and deletions, and (2) large-scale alterations, such as copy number alteration (CNA), which is prevalent in human cancer11. Many types of SBS have been reported as cancer biomarkers, however the application of CNA in cancer precision medicine is still very limited, probably due to the complexity in CNA information extraction and procession.
Here, we developed a novel strategy to apply CNA information in mCRC oxaliplatin clinical response prediction. This method is based on the calculation of stable CNA features, including those described in our previous studies12,13, then a machine learning model is subsequently trained. This CNA features-based machine learning model is named as “CNA fingerprint” here. We further validate the predictive performance of the CNA fingerprint using three independent test cohorts, and the CNA fingerprint biomarker shows the most robust performance compared with clinical information or other SBS-based biomarkers in mCRC oxaliplatin clinical response prediction. Since CNA information and CNA fingerprint can be calculated using a cost-effective shallow whole-genome sequencing (sWGS) method, the CNA fingerprint could be further employed in clinical applications.
Results
Results
mCRC cohorts and patient characteristics
Here, we aim to investigate the biomarkers for predicting oxaliplatin clinical response in mCRC, and REMARK guidelines are followed in this study14. Three mCRC cohorts from Fudan University Shanghai Cancer Center (FUSCC) and one cohort from Tongji Hospital were collected in separate batches, and the genomic DNA of each patient cohorts are sequenced independently for subsequent biomarker analysis. These mCRC samples are selected from FUSCC and Tongji hospital following the criteria listed in Fig. 1A. Tumor tissue samples were collected during surgical resection before chemotherapy; only samples clearly labeled as mCRC and have oxaliplatin based chemotherapy clinical response data were selected for model development and validation.
In total, 297 mCRC patients are included in the analysis, 271 mCRC patients received oxaliplatin-based chemotherapy, and 26 mCRC patients received other chemotherapy regimens. Patient characteristics of different mCRC cohorts are described in detail in Table 1. More than 30% of the patients harbor KRAS mutations (Supplementary Fig. 1A). The oxaliplatin-based chemotherapy regimens mainly consist of two types: 202 patients (74.5%) received the FOLFOX/XELOX regimen, and 49 patients (18.0%) received a regimen combining FOLFOX/XELOX with cetuximab or bevacizumab (Supplementary Fig. 1B).
CNA fingerprint biomarker development
The workflow of this study is depicted in Fig. 1B. Here, we start to extract stable CNA features and apply these CNA features in oxaliplatin chemotherapy response prediction. The CNA features applied in this study are listed in Supplementary data 1, and some of these CNA features reflect the underlying mutational processes for CNA. The CNA feature distribution in our cohorts, and the Cancer Genome Atlas (TCGA), exhibits a similar distribution (Supplementary Fig. 1C). In total, 310 CNA features are extracted from each CNA profile of mCRC samples; these CNA features are served as inputs for machine learning models for predicting oxaliplatin-based chemotherapy clinical response in mCRC.
We performed an internal benchmark to compare the performance of different CNA feature sets and different machine learning models in the training dataset. XGBoost model trained with CNA features described in studies12,13 showed the best performance in the five-fold cross-validation based on training dataset (Supplementary Fig. 2A). After internal benchmark process, the model was constructed using 7 features, and this CNA features based XGBoost prediction model is named as “CNA fingerprint” here (Supplementary Fig. 2B, C). The AUC in the test1 cohort of CNA fingerprint was 0.87 (95% CI, 0.75–0.99), in the test2 cohort was 0.87 (95% CI, 0.76–0.98), and 0.85 (95% CI, 0.73–0.97) in the test3 cohort (Fig. 2A–F). 0.58 was chosen as the threshold for oxaliplatin-based chemotherapy clinical response prediction (Supplementary Fig. 2D). The accuracy, sensitivity, specificity, precision, and F1 score of three test cohorts are showed in Fig. 2E–H, and confusion matrixes are showed in Supplementary Fig. 2E.
Kaplan-Meier survival analysis showed that responders predicted by CNA fingerprint achieved a favorable overall survival (OS) compared with non-responders after oxaliplatin based chemotherapy in cohorts with available OS data (Fig. 3B). In the no-oxaliplatin cohort, the CNA fingerprint could not predict the clinical response (Supplementary Fig. 3A, B). We also applied CNA fingerprint to the GSE36864 cohort, which included 349 untreated metastatic colorectal cancer patients receiving oxaliplatin-based, irinotecan, or capecitabine chemotherapy15, and we observed a non-significant but consistent trend (log-rank P = 0.085, Fig. 3C). Patients treated with oxaliplatin-based chemotherapy and predicted to be responders had a better progression free survival (PFS), no survival differences for the other regimens (log-rank P = 0.37, Fig. 3D, Supplementary Fig. 3C, D). These results suggest the specificity of the CNA fingerprint in predicting the clinical response of oxaliplatin-based chemotherapy and the potential for broad clinical applicability.
Important CNA features contributing to the prediction
The importance of the features that contribute to the CNA fingerprint prediction model is calculated based on the relative contribution of the corresponding feature to the model (Fig. 4A). CNA feature “CN[>8]” shows the highest contribution, this indicates that number of DNA segments with high absolute copy number values (>8) is a major contributor to the prediction of clinical response to oxaliplatin-based chemotherapy regimen (Fig. 4A). Oxaliplatin responder show significantly lower CN[>8] counts compared with non-responder in both the training cohort and three test cohorts (Fig. 4B). The remaining features are CNA burden, E:LL:9 + BB and amplification level of chromosome 8, 20, 11, 5 (quantified according to Eq. 1). These features exhibit significant differences between groups (Fig. 4B, C). E:LL:9 + BB is a subgroup of CN[ > 8]. CNA burden measures the proportion of CNA in the genome. High amplification levels of a chromosome produce high copy number segments. This observation reinforced that DNA segments with high absolute copy number is novel predictor for oxaliplatin-based chemotherapy in mCRC. This data needs further validation in larger cohorts, and the biological implications also need further investigation.
CNA fingerprint comparison and application
Currently, there are no well-established biomarkers for oxaliplatin chemotherapy clinical response prediction. The performance of CNA fingerprint in oxaliplatin response prediction was compared with easily available biomarkers, including HRD status, KRAS mutation, aneuploidy, and primary tumor location. HRD status has been reported to be a biomarker for predicting the clinical efficacy of PARP inhibitors and platinum drugs. We observed that HRD status cannot predict the clinical effects of oxaliplatin-based chemotherapy (Fig. 2A–F, H), and this observation is consistent with a previous study16. Aneuploidy17,18 and right-sided CRC19 have been reported to be associated with poor chemotherapy response; we find that aneuploidy and primary tumor location had no effects in the prediction of oxaliplatin-based chemotherapy clinical response (Fig. 2H). Compared to the KRAS mutation, the CNA fingerprint also showed improved predictive performance (Fig. 2G). These results showed that the CNA fingerprint demonstrated the capacity of predicting oxaliplatin-based chemotherapy efficacy and showed robust performance in test cohorts. To accelerate the application of the CNA fingerprint biomarker, we build an easy-to-use R package, CNA fingerprint, which can calculate the CNA fingerprint value for each input CNA profile file.
mCRC cohorts and patient characteristics
Here, we aim to investigate the biomarkers for predicting oxaliplatin clinical response in mCRC, and REMARK guidelines are followed in this study14. Three mCRC cohorts from Fudan University Shanghai Cancer Center (FUSCC) and one cohort from Tongji Hospital were collected in separate batches, and the genomic DNA of each patient cohorts are sequenced independently for subsequent biomarker analysis. These mCRC samples are selected from FUSCC and Tongji hospital following the criteria listed in Fig. 1A. Tumor tissue samples were collected during surgical resection before chemotherapy; only samples clearly labeled as mCRC and have oxaliplatin based chemotherapy clinical response data were selected for model development and validation.
In total, 297 mCRC patients are included in the analysis, 271 mCRC patients received oxaliplatin-based chemotherapy, and 26 mCRC patients received other chemotherapy regimens. Patient characteristics of different mCRC cohorts are described in detail in Table 1. More than 30% of the patients harbor KRAS mutations (Supplementary Fig. 1A). The oxaliplatin-based chemotherapy regimens mainly consist of two types: 202 patients (74.5%) received the FOLFOX/XELOX regimen, and 49 patients (18.0%) received a regimen combining FOLFOX/XELOX with cetuximab or bevacizumab (Supplementary Fig. 1B).
CNA fingerprint biomarker development
The workflow of this study is depicted in Fig. 1B. Here, we start to extract stable CNA features and apply these CNA features in oxaliplatin chemotherapy response prediction. The CNA features applied in this study are listed in Supplementary data 1, and some of these CNA features reflect the underlying mutational processes for CNA. The CNA feature distribution in our cohorts, and the Cancer Genome Atlas (TCGA), exhibits a similar distribution (Supplementary Fig. 1C). In total, 310 CNA features are extracted from each CNA profile of mCRC samples; these CNA features are served as inputs for machine learning models for predicting oxaliplatin-based chemotherapy clinical response in mCRC.
We performed an internal benchmark to compare the performance of different CNA feature sets and different machine learning models in the training dataset. XGBoost model trained with CNA features described in studies12,13 showed the best performance in the five-fold cross-validation based on training dataset (Supplementary Fig. 2A). After internal benchmark process, the model was constructed using 7 features, and this CNA features based XGBoost prediction model is named as “CNA fingerprint” here (Supplementary Fig. 2B, C). The AUC in the test1 cohort of CNA fingerprint was 0.87 (95% CI, 0.75–0.99), in the test2 cohort was 0.87 (95% CI, 0.76–0.98), and 0.85 (95% CI, 0.73–0.97) in the test3 cohort (Fig. 2A–F). 0.58 was chosen as the threshold for oxaliplatin-based chemotherapy clinical response prediction (Supplementary Fig. 2D). The accuracy, sensitivity, specificity, precision, and F1 score of three test cohorts are showed in Fig. 2E–H, and confusion matrixes are showed in Supplementary Fig. 2E.
Kaplan-Meier survival analysis showed that responders predicted by CNA fingerprint achieved a favorable overall survival (OS) compared with non-responders after oxaliplatin based chemotherapy in cohorts with available OS data (Fig. 3B). In the no-oxaliplatin cohort, the CNA fingerprint could not predict the clinical response (Supplementary Fig. 3A, B). We also applied CNA fingerprint to the GSE36864 cohort, which included 349 untreated metastatic colorectal cancer patients receiving oxaliplatin-based, irinotecan, or capecitabine chemotherapy15, and we observed a non-significant but consistent trend (log-rank P = 0.085, Fig. 3C). Patients treated with oxaliplatin-based chemotherapy and predicted to be responders had a better progression free survival (PFS), no survival differences for the other regimens (log-rank P = 0.37, Fig. 3D, Supplementary Fig. 3C, D). These results suggest the specificity of the CNA fingerprint in predicting the clinical response of oxaliplatin-based chemotherapy and the potential for broad clinical applicability.
Important CNA features contributing to the prediction
The importance of the features that contribute to the CNA fingerprint prediction model is calculated based on the relative contribution of the corresponding feature to the model (Fig. 4A). CNA feature “CN[>8]” shows the highest contribution, this indicates that number of DNA segments with high absolute copy number values (>8) is a major contributor to the prediction of clinical response to oxaliplatin-based chemotherapy regimen (Fig. 4A). Oxaliplatin responder show significantly lower CN[>8] counts compared with non-responder in both the training cohort and three test cohorts (Fig. 4B). The remaining features are CNA burden, E:LL:9 + BB and amplification level of chromosome 8, 20, 11, 5 (quantified according to Eq. 1). These features exhibit significant differences between groups (Fig. 4B, C). E:LL:9 + BB is a subgroup of CN[ > 8]. CNA burden measures the proportion of CNA in the genome. High amplification levels of a chromosome produce high copy number segments. This observation reinforced that DNA segments with high absolute copy number is novel predictor for oxaliplatin-based chemotherapy in mCRC. This data needs further validation in larger cohorts, and the biological implications also need further investigation.
CNA fingerprint comparison and application
Currently, there are no well-established biomarkers for oxaliplatin chemotherapy clinical response prediction. The performance of CNA fingerprint in oxaliplatin response prediction was compared with easily available biomarkers, including HRD status, KRAS mutation, aneuploidy, and primary tumor location. HRD status has been reported to be a biomarker for predicting the clinical efficacy of PARP inhibitors and platinum drugs. We observed that HRD status cannot predict the clinical effects of oxaliplatin-based chemotherapy (Fig. 2A–F, H), and this observation is consistent with a previous study16. Aneuploidy17,18 and right-sided CRC19 have been reported to be associated with poor chemotherapy response; we find that aneuploidy and primary tumor location had no effects in the prediction of oxaliplatin-based chemotherapy clinical response (Fig. 2H). Compared to the KRAS mutation, the CNA fingerprint also showed improved predictive performance (Fig. 2G). These results showed that the CNA fingerprint demonstrated the capacity of predicting oxaliplatin-based chemotherapy efficacy and showed robust performance in test cohorts. To accelerate the application of the CNA fingerprint biomarker, we build an easy-to-use R package, CNA fingerprint, which can calculate the CNA fingerprint value for each input CNA profile file.
Discussion
Discussion
Here, we developed and validated a CNA fingerprint biomarker to predict the effectiveness of an oxaliplatin-based regimen in mCRC patients. This machine learning model biomarker was developed retrospectively using competitively low-cost shallow sequencing data; the performance of the CNA fingerprint has been validated in three independent test mCRC cohorts. While clinical features alone or KRAS mutations could not predict the clinical response of oxaliplatin therapy in our cohorts.
We identified for the first time that CNA features and fingerprints play a predictive role in chemotherapy drug’s clinical response, demonstrating the significant clinical value of CNA patterns. Among them, the CNA feature “CN[>8]” (meaning: number of DNA segments with absolute copy numbers greater than 8) was the most important contributor in the model, indicating that extremely high levels of copy number amplification may impact the chemotherapy efficacy in mCRC patients. Oxaliplatin responders show lower CN[>8] counts compared with non-responders. The biological implication of CN[>8] in oxaliplatin clinical response needs further investigation. Conventional CNA biomarker using specific genome locus CNA as a biomarker, for example, amplification of TYMS20 and STRAP21 has been reported to be associated with the effects of 5-FU treatment, gene copy number gain in chromosome 8p24.1-p24.2 shows more favorable response and survival when treated with irinotecan in combination with bevacizumab22. In the case of oxaliplatin response prediction, we did not identify the specific genome locus that has predictive values. Instead, our findings underscore the impact of widespread copy number amplification. Previous research suggests that cancer cells exploit aneuploidy-induced genomic instability, for instance, through amplification of DNA damage repair pathways23, to survive chemotherapy24. Studies on CRC cell lines further indicate that whole genome duplication and aneuploidy may provide a selective advantage for therapy resistance25, potentially by slowing cell proliferation17. However, the aneuploidy score alone has proven inadequate for identifying oxaliplatin-sensitive patients, highlighting the complexity of the underlying mechanisms. The CNA features and derived CNA fingerprint represent a novel type of CNA information that has clinical value, and this needs further investigation and validation.
Oxaliplatin, a third-generation platinum, acts by binding to DNA and forming cross-links to disrupt DNA replication and repair processes, ultimately leading to cell death26. The other platinum compounds, such as carboplatin and cisplatin, are not effective in patients with CRC27. Like other chemotherapeutic agents, oxaliplatin also exhibits toxic side effects28. Accurately distinguishing benefiting patients can prevent ineffective treatment. Some research seeks to address this challenge. A recent study trained a classification model using mRNA expression data to predict the clinical response of oxaliplatin-based therapy7. A meta-analysis involving 1150 patients with metastatic/advanced CRC treated with oxaliplatin-based chemotherapy showed a close association between the rs11615-T allele and reduced chemotherapy response in Asians8. A 67-gene signature predicts the efficacy of oxaliplatin-based chemotherapy in mCRC9. These existing oxaliplatin chemotherapy response prediction studies are based on mRNA gene expression or germline DNA polymorphism. Compared with mRNA, DNA is more stable and does not require strict experimental conditions, while the prediction accuracy of germline DNA polymorphism has not been solidly validated. Compared with existing studies, the CNA fingerprint biomarker reported here could has advantage in both prediction accuracy and actual clinical applicability.
In clinical settings, oxaliplatin is usually applied in combination with other drugs such as 5-FU. Here, our CNA fingerprint biomarker does not consider the identity of the specific combination of drugs; this can expand the application scenarios of this biomarker and, at the same time, influence the accuracy of the prediction biomarker. Future improvement can be obtained when different oxaliplatin combinations are considered when more data is available. With the availability of new therapeutic strategies for mCRC, to further identify the patients with clinical benefit from oxaliplatin-based chemotherapy, more clinical data from these additional non-oxaliplatin-treated mCRC patients are needed.
This study was performed using retrospectively collected mCRC samples, and the genomic DNA of the three different mCRC cohorts was sequenced separately. The unbalanced distribution of patient characteristics and the limited cohort size of our retrospectively collected cohorts may influence the performance of predictive biomarkers. We are initiating prospective clinical trials to evaluate this CNA fingerprint biomarker in CRC chemotherapy clinical response prediction. Since CNA status information can be obtained using a cost-effective sWGS approach, similar types of CNA biomarkers could also be generated in other cancer types for different clinical purpose.
Here, we constructed and validated a CNA-based biomarker, “CNA fingerprint” to predict the effectiveness of oxaliplatin-based regimen in mCRC patients for the first time. CNA fingerprint shows robust performance in multiple independent validation datasets and is cost-effective compared with existing markers. Further prospective clinical trials are warranted to evaluate the CNA fingerprint’s performance in clinical applications.
Here, we developed and validated a CNA fingerprint biomarker to predict the effectiveness of an oxaliplatin-based regimen in mCRC patients. This machine learning model biomarker was developed retrospectively using competitively low-cost shallow sequencing data; the performance of the CNA fingerprint has been validated in three independent test mCRC cohorts. While clinical features alone or KRAS mutations could not predict the clinical response of oxaliplatin therapy in our cohorts.
We identified for the first time that CNA features and fingerprints play a predictive role in chemotherapy drug’s clinical response, demonstrating the significant clinical value of CNA patterns. Among them, the CNA feature “CN[>8]” (meaning: number of DNA segments with absolute copy numbers greater than 8) was the most important contributor in the model, indicating that extremely high levels of copy number amplification may impact the chemotherapy efficacy in mCRC patients. Oxaliplatin responders show lower CN[>8] counts compared with non-responders. The biological implication of CN[>8] in oxaliplatin clinical response needs further investigation. Conventional CNA biomarker using specific genome locus CNA as a biomarker, for example, amplification of TYMS20 and STRAP21 has been reported to be associated with the effects of 5-FU treatment, gene copy number gain in chromosome 8p24.1-p24.2 shows more favorable response and survival when treated with irinotecan in combination with bevacizumab22. In the case of oxaliplatin response prediction, we did not identify the specific genome locus that has predictive values. Instead, our findings underscore the impact of widespread copy number amplification. Previous research suggests that cancer cells exploit aneuploidy-induced genomic instability, for instance, through amplification of DNA damage repair pathways23, to survive chemotherapy24. Studies on CRC cell lines further indicate that whole genome duplication and aneuploidy may provide a selective advantage for therapy resistance25, potentially by slowing cell proliferation17. However, the aneuploidy score alone has proven inadequate for identifying oxaliplatin-sensitive patients, highlighting the complexity of the underlying mechanisms. The CNA features and derived CNA fingerprint represent a novel type of CNA information that has clinical value, and this needs further investigation and validation.
Oxaliplatin, a third-generation platinum, acts by binding to DNA and forming cross-links to disrupt DNA replication and repair processes, ultimately leading to cell death26. The other platinum compounds, such as carboplatin and cisplatin, are not effective in patients with CRC27. Like other chemotherapeutic agents, oxaliplatin also exhibits toxic side effects28. Accurately distinguishing benefiting patients can prevent ineffective treatment. Some research seeks to address this challenge. A recent study trained a classification model using mRNA expression data to predict the clinical response of oxaliplatin-based therapy7. A meta-analysis involving 1150 patients with metastatic/advanced CRC treated with oxaliplatin-based chemotherapy showed a close association between the rs11615-T allele and reduced chemotherapy response in Asians8. A 67-gene signature predicts the efficacy of oxaliplatin-based chemotherapy in mCRC9. These existing oxaliplatin chemotherapy response prediction studies are based on mRNA gene expression or germline DNA polymorphism. Compared with mRNA, DNA is more stable and does not require strict experimental conditions, while the prediction accuracy of germline DNA polymorphism has not been solidly validated. Compared with existing studies, the CNA fingerprint biomarker reported here could has advantage in both prediction accuracy and actual clinical applicability.
In clinical settings, oxaliplatin is usually applied in combination with other drugs such as 5-FU. Here, our CNA fingerprint biomarker does not consider the identity of the specific combination of drugs; this can expand the application scenarios of this biomarker and, at the same time, influence the accuracy of the prediction biomarker. Future improvement can be obtained when different oxaliplatin combinations are considered when more data is available. With the availability of new therapeutic strategies for mCRC, to further identify the patients with clinical benefit from oxaliplatin-based chemotherapy, more clinical data from these additional non-oxaliplatin-treated mCRC patients are needed.
This study was performed using retrospectively collected mCRC samples, and the genomic DNA of the three different mCRC cohorts was sequenced separately. The unbalanced distribution of patient characteristics and the limited cohort size of our retrospectively collected cohorts may influence the performance of predictive biomarkers. We are initiating prospective clinical trials to evaluate this CNA fingerprint biomarker in CRC chemotherapy clinical response prediction. Since CNA status information can be obtained using a cost-effective sWGS approach, similar types of CNA biomarkers could also be generated in other cancer types for different clinical purpose.
Here, we constructed and validated a CNA-based biomarker, “CNA fingerprint” to predict the effectiveness of oxaliplatin-based regimen in mCRC patients for the first time. CNA fingerprint shows robust performance in multiple independent validation datasets and is cost-effective compared with existing markers. Further prospective clinical trials are warranted to evaluate the CNA fingerprint’s performance in clinical applications.
Methods
Methods
Patient cohorts
The study was conducted in accordance with the Declaration of Helsinki and approved by the institutional review boards of Fudan University Shanghai Cancer Center (FUSCC, 2309-ZZK-100) and Tongji Hospital, Tongji University School of Medicine (SBKT-2021-116). Written consent and institutional approval were obtained from all study participants. The sample inclusion and exclusion criteria are detailed in Fig. 1A. Gender and age were not considered as selection criteria. Random sampling was used to select participants from the preliminarily enrolled patients, followed by strict exclusion criteria to form the final cohort. Frozen tumor tissue was stored at −80 °C. Patients’ response to oxaliplatin-based chemotherapy regimens is assessed based on the Response Evaluation Criteria in Solid Tumors (RECIST) criteria. Therapy response was assessed by computed tomography every 6 weeks until week 24 and every 12 weeks thereafter. Treatment efficacy was categorized as follows: complete response (CR), partial response (PR), stable disease (SD), and progressive disease (PD). The patients with metastatic lesions evaluated as CR, PR, or SD were labeled as responders, while patients with PD were labeled as non-responder. The cohort with the largest number of patients (n = 121) being assigned as the training set, while the remaining three cohorts were used as independent test set 1 (n = 55), test set 2 (n = 55), and test set 3 (n = 40) to validate the model’s performance. We additionally collected a cohort of patients who did not receive oxaliplatin-based chemotherapy (n = 26, named “no-oxaliplatin” cohort, they are treated with capecitabine or irinotecan-based regimens), the no-oxaliplatin cohort was also obtained from FUSCC. This study was approved by the institutional review boards of FUSCC and Tongji Hospital.
The copy number segment data of the TCGA cohort, which included data from 54 untreated patients with mCRC, was downloaded using the UCSCXenaTools R package29. We downloaded the GSE36864 set from Gene Expression Omnibus. It contains CNA data from 349 previously untreated mCRC patients assigned to one of the following first-line therapies: capecitabine (n = 105), capecitabine and irinotecan (n = 111), or capecitabine, oxaliplatin, and bevacizumab (n = 133)15.
Sequencing data preprocessing
Frozen tumor tissue samples were subjected to paired-end sequencing using the DNBSEQ T7 PE150 sequencing platform, with an average sequencing depth of about 2×. The raw sequencing data in FASTQ format were obtained. Preprocessing was performed using fastp v0.22.030, with default parameters for adapter trimming, removal of reads containing N bases exceeding the threshold, deletion of low-quality bases, and sliding window trimming of low-quality bases. The sequence files were aligned to the hg38 human reference genome using the BWA MEM v0.7.1731. The resulting SAM files were converted to BAM files using Samtools v1.9. The genome fragments were segmented, and relative copy number estimation was performed using the shallow sequencing copy number variation extraction tool QDNAseq package v1.30.032. Subsequently, the ACE package 1.17.0 was employed for obtaining absolute copy number segment data33,34.
CNA features extraction
The absolute copy number segment data for each patient was used for CNA feature extraction using the R package Sigminer v2.2.013, We extracted CNA features described in our previous studies12,13, and also global CNA status indicator: CNA burden (proportion of CNA in the genome35); Chromosomal amplification (AMP) or deletion (DEL) ratio; Chromosomal AMP/DEL level (diploid-referenced change magnitude). The AMP/DEL level of chromosomal X is defined by the formula (Eq. 1).where Li and Ci are the length and copy number of each individual AMP/DEL segment on one chromosome, respectively. The denominator represents the theoretical baseline copy number (diploid state) for the combined length of all amplified regions. In total, 310 CNA features are included in this study (Supplementary data 1). CNA feature distributions were visualized using the Sigminer “show_catalogue” function.
Feature selection and model training
We trained a classification model, the CNA fingerprint (See the Supplementary data and code for details of model training procedures). The model categorizes patient into responder or non-responder to oxaliplatin-based chemotherapy. We first conducted an internal benchmark to compare the performance of different feature sets and different machine learning models (Supplementary Fig. 2A). Machine learning models include: (1) extreme gradient boosting (XGBoost), (2) neural networks (NNET), (3) random forest (RF), (4) Naive bayes, (5) light gradient boosting machine (LGBM). Features or signatures extracted by different methods include: (1) CNA features we used here, which were reported in our previous studies12,13. (2) CNA signatures reported by Drews et al. and quantification by R package CINSignatureQuantification36. (3) Clinical features. For models XGBoost, RF, and LGBM, we use the property “importance” of the models for feature selection, and the rest of the models are trained based on features that significantly different between responder and non-responder.
The optimal feature-model combination was selected via five-fold cross-validation, with performance evaluated by the area under the receiver operating characteristic curve (AUC) computed on the validation folds. The final model was subsequently trained using this optimal combination, also following the five-fold cross-validation. The internal benchmark and model training are performed by R package mlr3verse v0.3.137. The threshold for separating responder vs non-responder was selected as 0.58 by bootstrapping on the train cohort using the R package cutpointr (v1.1.2), with the criterion of maximizing accuracy (Supplementary Fig. 2D).
Quantify additional markers
We use the R package HRDCNA38 and scarHRD v0.1.139 to assess the HRD status score; the R package AneuploidyScore18 was used to calculate the aneuploidy score (total number of arm level gains/losses).
Statistical analysis
Data between two groups are compared using the Wilcoxon rank-sum test. All reported P-values are two-tailed. The confusion matrix, accuracy, sensitivity, precision, recall, F1 score, and specificity were all calculated using the “confusionMatrix” function in the R package caret v6.0.9340. The receiver operating characteristic (ROC) curve was plotted, and the AUC was computed using the “roc” function in the R package pROC v1.18.041. All analyses are based on R 4.2. Kaplan–Meier survival analyses were performed using the R package survival v3.5.8 and survminer v0.4.9. P values were calculated using the log-rank statistic.
Patient cohorts
The study was conducted in accordance with the Declaration of Helsinki and approved by the institutional review boards of Fudan University Shanghai Cancer Center (FUSCC, 2309-ZZK-100) and Tongji Hospital, Tongji University School of Medicine (SBKT-2021-116). Written consent and institutional approval were obtained from all study participants. The sample inclusion and exclusion criteria are detailed in Fig. 1A. Gender and age were not considered as selection criteria. Random sampling was used to select participants from the preliminarily enrolled patients, followed by strict exclusion criteria to form the final cohort. Frozen tumor tissue was stored at −80 °C. Patients’ response to oxaliplatin-based chemotherapy regimens is assessed based on the Response Evaluation Criteria in Solid Tumors (RECIST) criteria. Therapy response was assessed by computed tomography every 6 weeks until week 24 and every 12 weeks thereafter. Treatment efficacy was categorized as follows: complete response (CR), partial response (PR), stable disease (SD), and progressive disease (PD). The patients with metastatic lesions evaluated as CR, PR, or SD were labeled as responders, while patients with PD were labeled as non-responder. The cohort with the largest number of patients (n = 121) being assigned as the training set, while the remaining three cohorts were used as independent test set 1 (n = 55), test set 2 (n = 55), and test set 3 (n = 40) to validate the model’s performance. We additionally collected a cohort of patients who did not receive oxaliplatin-based chemotherapy (n = 26, named “no-oxaliplatin” cohort, they are treated with capecitabine or irinotecan-based regimens), the no-oxaliplatin cohort was also obtained from FUSCC. This study was approved by the institutional review boards of FUSCC and Tongji Hospital.
The copy number segment data of the TCGA cohort, which included data from 54 untreated patients with mCRC, was downloaded using the UCSCXenaTools R package29. We downloaded the GSE36864 set from Gene Expression Omnibus. It contains CNA data from 349 previously untreated mCRC patients assigned to one of the following first-line therapies: capecitabine (n = 105), capecitabine and irinotecan (n = 111), or capecitabine, oxaliplatin, and bevacizumab (n = 133)15.
Sequencing data preprocessing
Frozen tumor tissue samples were subjected to paired-end sequencing using the DNBSEQ T7 PE150 sequencing platform, with an average sequencing depth of about 2×. The raw sequencing data in FASTQ format were obtained. Preprocessing was performed using fastp v0.22.030, with default parameters for adapter trimming, removal of reads containing N bases exceeding the threshold, deletion of low-quality bases, and sliding window trimming of low-quality bases. The sequence files were aligned to the hg38 human reference genome using the BWA MEM v0.7.1731. The resulting SAM files were converted to BAM files using Samtools v1.9. The genome fragments were segmented, and relative copy number estimation was performed using the shallow sequencing copy number variation extraction tool QDNAseq package v1.30.032. Subsequently, the ACE package 1.17.0 was employed for obtaining absolute copy number segment data33,34.
CNA features extraction
The absolute copy number segment data for each patient was used for CNA feature extraction using the R package Sigminer v2.2.013, We extracted CNA features described in our previous studies12,13, and also global CNA status indicator: CNA burden (proportion of CNA in the genome35); Chromosomal amplification (AMP) or deletion (DEL) ratio; Chromosomal AMP/DEL level (diploid-referenced change magnitude). The AMP/DEL level of chromosomal X is defined by the formula (Eq. 1).where Li and Ci are the length and copy number of each individual AMP/DEL segment on one chromosome, respectively. The denominator represents the theoretical baseline copy number (diploid state) for the combined length of all amplified regions. In total, 310 CNA features are included in this study (Supplementary data 1). CNA feature distributions were visualized using the Sigminer “show_catalogue” function.
Feature selection and model training
We trained a classification model, the CNA fingerprint (See the Supplementary data and code for details of model training procedures). The model categorizes patient into responder or non-responder to oxaliplatin-based chemotherapy. We first conducted an internal benchmark to compare the performance of different feature sets and different machine learning models (Supplementary Fig. 2A). Machine learning models include: (1) extreme gradient boosting (XGBoost), (2) neural networks (NNET), (3) random forest (RF), (4) Naive bayes, (5) light gradient boosting machine (LGBM). Features or signatures extracted by different methods include: (1) CNA features we used here, which were reported in our previous studies12,13. (2) CNA signatures reported by Drews et al. and quantification by R package CINSignatureQuantification36. (3) Clinical features. For models XGBoost, RF, and LGBM, we use the property “importance” of the models for feature selection, and the rest of the models are trained based on features that significantly different between responder and non-responder.
The optimal feature-model combination was selected via five-fold cross-validation, with performance evaluated by the area under the receiver operating characteristic curve (AUC) computed on the validation folds. The final model was subsequently trained using this optimal combination, also following the five-fold cross-validation. The internal benchmark and model training are performed by R package mlr3verse v0.3.137. The threshold for separating responder vs non-responder was selected as 0.58 by bootstrapping on the train cohort using the R package cutpointr (v1.1.2), with the criterion of maximizing accuracy (Supplementary Fig. 2D).
Quantify additional markers
We use the R package HRDCNA38 and scarHRD v0.1.139 to assess the HRD status score; the R package AneuploidyScore18 was used to calculate the aneuploidy score (total number of arm level gains/losses).
Statistical analysis
Data between two groups are compared using the Wilcoxon rank-sum test. All reported P-values are two-tailed. The confusion matrix, accuracy, sensitivity, precision, recall, F1 score, and specificity were all calculated using the “confusionMatrix” function in the R package caret v6.0.9340. The receiver operating characteristic (ROC) curve was plotted, and the AUC was computed using the “roc” function in the R package pROC v1.18.041. All analyses are based on R 4.2. Kaplan–Meier survival analyses were performed using the R package survival v3.5.8 and survminer v0.4.9. P values were calculated using the log-rank statistic.
Dataavailability
Dataavailability
Raw NGS sequencing data are available upon request. Analysis code is available on the github (https://xsliulab.github.io/CNAfingerprint_mCRC/), processed data can be found at https://github.com/XSLiuLab/CNAfingerprint_mCRC. To accelerate the application of CNA fingerprint biomarker, we build an easy-to-use R package CNA fingerprint (https://github.com/XSLiuLab/CNAfingerprint), which can calculate the CNA fingerprint value for each input CNA profile file.
Raw NGS sequencing data are available upon request. Analysis code is available on the github (https://xsliulab.github.io/CNAfingerprint_mCRC/), processed data can be found at https://github.com/XSLiuLab/CNAfingerprint_mCRC. To accelerate the application of CNA fingerprint biomarker, we build an easy-to-use R package CNA fingerprint (https://github.com/XSLiuLab/CNAfingerprint), which can calculate the CNA fingerprint value for each input CNA profile file.
Supplementary information
Supplementary information
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.