Comparative effectiveness of CDK4/6 inhibitors in metastatic breast cancer: using the target trial emulation framework to investigate overall survival in routine care.
1/5 보강
PICO 자동 추출 (휴리스틱, conf 3/4)
유사 논문P · Population 대상 환자/모집단
2626 patients were included (palbociclib n = 1686; ribociclib n = 537; abemaciclib n = 403).
I · Intervention 중재 / 시술
first-line palbociclib/ribociclib/abemaciclib plus AI for mBC between 2018 and 2024
C · Comparison 대조 / 비교
추출되지 않음
O · Outcome 결과 / 결론
Target trial emulation applies the principles of RCTs to observational data to overcome such challenges. We emulated a hypothetical target trial to investigate whether causal differences in OS between patients receiving first-line CDK4/6i plus AI exist in the real-world clinical setting.
[PURPOSE] Cyclin-dependent kinase 4/6 inhibitor (CDK4/6i) plus aromatase inhibitor (AI) is the recommended first-line treatment for hormone receptor-positive/human epidermal growth factor receptor 2-n
- 표본수 (n) 1686
- 95% CI 0.81-1.24
APA
Brufsky AM, Finn RS, et al. (2026). Comparative effectiveness of CDK4/6 inhibitors in metastatic breast cancer: using the target trial emulation framework to investigate overall survival in routine care.. Breast cancer research and treatment, 216(2). https://doi.org/10.1007/s10549-026-07935-3
MLA
Brufsky AM, et al.. "Comparative effectiveness of CDK4/6 inhibitors in metastatic breast cancer: using the target trial emulation framework to investigate overall survival in routine care.." Breast cancer research and treatment, vol. 216, no. 2, 2026.
PMID
41793559 ↗
Abstract 한글 요약
[PURPOSE] Cyclin-dependent kinase 4/6 inhibitor (CDK4/6i) plus aromatase inhibitor (AI) is the recommended first-line treatment for hormone receptor-positive/human epidermal growth factor receptor 2-negative metastatic breast cancer (mBC). CDK4/6i head-to-head trials have not been conducted, and randomized controlled trials (RCTs) report inconsistent overall survival (OS) results despite similar effects on the primary endpoint of progression-free survival. Real-world evidence can complement RCTs but selection biases and confounders can challenge interpretation. Target trial emulation applies the principles of RCTs to observational data to overcome such challenges. We emulated a hypothetical target trial to investigate whether causal differences in OS between patients receiving first-line CDK4/6i plus AI exist in the real-world clinical setting.
[METHODS] We used de-identified data (Flatiron Health mBC Enhanced Data Mart) from patients ≥ 18 years old at primary diagnosis who were treated with first-line palbociclib/ribociclib/abemaciclib plus AI for mBC between 2018 and 2024. Statistical adjustments included stabilized inverse-probability weighting (sIPTW), investigation of missing data mechanisms, and analyses for unmeasured confounders.
[RESULTS] 2626 patients were included (palbociclib n = 1686; ribociclib n = 537; abemaciclib n = 403). After sIPTW, baseline characteristics were balanced between groups and there was no observable difference in real-world OS (ribociclib vs palbociclib, adjusted hazard ratio 1.00, 95% CI: 0.81-1.24; abemaciclib v palbociclib: 0.91, 95% CI: 0.74-1.14). Results were consistent after sensitivity analyses.
[CONCLUSION] Using target trial emulation, real-world OS is similar with palbociclib/ribociclib/abemaciclib plus AI. These findings may contribute to the development of combination strategies to improve clinical outcomes and to guide clinical decision-making.
[METHODS] We used de-identified data (Flatiron Health mBC Enhanced Data Mart) from patients ≥ 18 years old at primary diagnosis who were treated with first-line palbociclib/ribociclib/abemaciclib plus AI for mBC between 2018 and 2024. Statistical adjustments included stabilized inverse-probability weighting (sIPTW), investigation of missing data mechanisms, and analyses for unmeasured confounders.
[RESULTS] 2626 patients were included (palbociclib n = 1686; ribociclib n = 537; abemaciclib n = 403). After sIPTW, baseline characteristics were balanced between groups and there was no observable difference in real-world OS (ribociclib vs palbociclib, adjusted hazard ratio 1.00, 95% CI: 0.81-1.24; abemaciclib v palbociclib: 0.91, 95% CI: 0.74-1.14). Results were consistent after sensitivity analyses.
[CONCLUSION] Using target trial emulation, real-world OS is similar with palbociclib/ribociclib/abemaciclib plus AI. These findings may contribute to the development of combination strategies to improve clinical outcomes and to guide clinical decision-making.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
- Humans
- Breast Neoplasms
- Female
- Cyclin-Dependent Kinase 4
- Cyclin-Dependent Kinase 6
- Antineoplastic Combined Chemotherapy Protocols
- Protein Kinase Inhibitors
- Pyridines
- Aminopyridines
- Piperazines
- Middle Aged
- Benzimidazoles
- Purines
- Aged
- Aromatase Inhibitors
- Neoplasm Metastasis
- Adult
- Cyclin-dependent kinase 4/6
- Metastatic breast cancer
- Overall survival
- Real-world data
- Target trial emulation
📖 전문 본문 읽기 PMC JATS · ~48 KB · 영문
Introduction
Introduction
Endocrine-based therapy (ET) with a cyclin-dependent kinase 4/6 inhibitor (CDK4/6i) plus an aromatase inhibitor (AI) is the recommended first-line treatment for patients with hormone receptor (HR)-positive/human epidermal growth factor receptor 2 (HER2)-negative metastatic breast cancer (mBC) [1–3]. These recommendations are based on results from three pivotal randomized controlled trials (RCTs) that showed that all three CDK4/6is (palbociclib [PALOMA-2] [4], ribociclib [MONALEESA-2] [5], and abemaciclib [MONARCH 3] [6]) plus AI significantly prolonged progression-free survival (PFS), the primary endpoint, versus placebo plus AI. Palbociclib plus letrozole was the first CDK4/6 plus AI combination to gain US Food and Drug Administration approval in 2015, based on the results of PALOMA-1 [7], followed by ribociclib plus AI in 2017, and abemaciclib plus AI in 2018 [8].
However, findings for overall survival (OS), a key secondary endpoint, with these combinations have been inconsistent. Ribociclib plus letrozole significantly improved median OS versus letrozole in MONALEESA-2 [9], while OS was not significantly improved with palbociclib plus letrozole versus letrozole in PALOMA-2 [10], nor with abemaciclib plus AI versus AI in MONARCH 3 [11]. Understanding potential differences in survival between CDK4/6is could help inform clinical decision-making around combination therapy in the first-line setting. No head-to-head RCTs have been conducted to compare CDK4/6is, and trials with a primary endpoint of OS may be considered impractical, given the time to reach data maturity when the potential for long survival outcomes is high; loss to follow-up and the impact of subsequent therapies further complicate the survival analysis. Previous indirect treatment comparison studies did not report significant OS differences by CDK4/6i plus ET (either AI or selective estrogen receptor degrader [SERD]) [12–14]; however, differences in study designs and eligibility criteria between these trials, resulting in subtle differences in patient population, make these comparisons difficult.
Observational studies can complement the findings from RCTs; observational data are often more generalizable to a broader patient population than data from RCTs as they are less limited by geographical and socioeconomic factors and thus can overcome limitations such as reduced patient diversity [15]. Several studies have used observational data to compare the effectiveness of CDK4/6is plus ET (or AIs specifically) in routine care [16–23]. However, these studies were often small, single-center evaluations and have produced inconsistent findings.
Other observational data from recent studies have suggested no differences in survival outcomes. However, these studies have been exploratory in nature and have not followed a structured process aimed at answering a causal question of interest. For example, a recent comparative effectiveness analysis that leveraged the US Flatiron Health electronic health record (EHR)-derived deidentified longitudinal database, found that there were no significant differences in OS between palbociclib, ribociclib, or abemaciclib plus AI in the real-world setting [24]. However, this study did not address data missingness or investigate the potential impact of unmeasured confounders and used a dataset comprising machine learning-derived data elements that are not exhaustively validated by human review. Another recent study, which leveraged observational data from the German multicenter clinical registry OPAL, found no significant differences in real-world PFS or OS between palbociclib plus ET and ribociclib plus ET in matched patient cohorts [21]. In patients with a treatment-free interval (TFI) < 12 months, OS was significantly longer with ribociclib, although the number of events in this subgroup was low. However, this study also did not address data missingness, and the design was not aligned to the corresponding RCTs (PALOMA-2 or MONALEESA-2), which could result in inadequate proxying of trial inclusion/exclusion criteria. One real-world study conducted in the UK reported a significant difference in real-world OS between the CDK4/6is (P = 0.004), with the longest median OS observed in the palbociclib plus ET arm [40], but this was a single-center study and was limited by a small sample size.
Using observational data to answer causal questions relating to treatments and outcomes is challenging due to inherent selection biases, unmeasured confounding factors, and the inability to ensure comparable treatment groups, as they would be in RCTs, leading to biased estimates of treatment effects and questionable conclusions about efficacy [25]. However, the principled approach of target trial emulation is gaining wider acceptance as an approach to answer causal questions using observational data: the target trial emulation framework applies the rigorous design principles of an RCT to observational data [25–27], which addresses issues of selection bias in observational studies. This approach provides a more robust estimate of a causal effect, especially when an RCT is not feasible, by specifying eligibility criteria, causal estimand, causal contrast, censoring strategies, and the emulation of these components using observational data. Additionally, it allows for transparency in any assumptions being made due to limitations of the available data [25]. Target trial emulation has previously been used in oncology research, and an analysis of published emulations of oncology RCTs, including studies using data derived from EHR, has shown that the emulation hazard ratio estimate fell within the 95% confidence interval of the RCT findings in nine out of eleven evaluable studies [28]. One study specifically emulated the PALOMA-2 RCT with EHR data [29].
We aimed to emulate a hypothetical target trial using contemporaneous observational data from patients with HR-positive/HER2-negative mBC receiving first-line CDK4/6i plus AI as part of routine care to establish if there is a difference in real-world OS between patients receiving palbociclib, ribociclib, or abemaciclib plus AI in routine clinical practice.
Endocrine-based therapy (ET) with a cyclin-dependent kinase 4/6 inhibitor (CDK4/6i) plus an aromatase inhibitor (AI) is the recommended first-line treatment for patients with hormone receptor (HR)-positive/human epidermal growth factor receptor 2 (HER2)-negative metastatic breast cancer (mBC) [1–3]. These recommendations are based on results from three pivotal randomized controlled trials (RCTs) that showed that all three CDK4/6is (palbociclib [PALOMA-2] [4], ribociclib [MONALEESA-2] [5], and abemaciclib [MONARCH 3] [6]) plus AI significantly prolonged progression-free survival (PFS), the primary endpoint, versus placebo plus AI. Palbociclib plus letrozole was the first CDK4/6 plus AI combination to gain US Food and Drug Administration approval in 2015, based on the results of PALOMA-1 [7], followed by ribociclib plus AI in 2017, and abemaciclib plus AI in 2018 [8].
However, findings for overall survival (OS), a key secondary endpoint, with these combinations have been inconsistent. Ribociclib plus letrozole significantly improved median OS versus letrozole in MONALEESA-2 [9], while OS was not significantly improved with palbociclib plus letrozole versus letrozole in PALOMA-2 [10], nor with abemaciclib plus AI versus AI in MONARCH 3 [11]. Understanding potential differences in survival between CDK4/6is could help inform clinical decision-making around combination therapy in the first-line setting. No head-to-head RCTs have been conducted to compare CDK4/6is, and trials with a primary endpoint of OS may be considered impractical, given the time to reach data maturity when the potential for long survival outcomes is high; loss to follow-up and the impact of subsequent therapies further complicate the survival analysis. Previous indirect treatment comparison studies did not report significant OS differences by CDK4/6i plus ET (either AI or selective estrogen receptor degrader [SERD]) [12–14]; however, differences in study designs and eligibility criteria between these trials, resulting in subtle differences in patient population, make these comparisons difficult.
Observational studies can complement the findings from RCTs; observational data are often more generalizable to a broader patient population than data from RCTs as they are less limited by geographical and socioeconomic factors and thus can overcome limitations such as reduced patient diversity [15]. Several studies have used observational data to compare the effectiveness of CDK4/6is plus ET (or AIs specifically) in routine care [16–23]. However, these studies were often small, single-center evaluations and have produced inconsistent findings.
Other observational data from recent studies have suggested no differences in survival outcomes. However, these studies have been exploratory in nature and have not followed a structured process aimed at answering a causal question of interest. For example, a recent comparative effectiveness analysis that leveraged the US Flatiron Health electronic health record (EHR)-derived deidentified longitudinal database, found that there were no significant differences in OS between palbociclib, ribociclib, or abemaciclib plus AI in the real-world setting [24]. However, this study did not address data missingness or investigate the potential impact of unmeasured confounders and used a dataset comprising machine learning-derived data elements that are not exhaustively validated by human review. Another recent study, which leveraged observational data from the German multicenter clinical registry OPAL, found no significant differences in real-world PFS or OS between palbociclib plus ET and ribociclib plus ET in matched patient cohorts [21]. In patients with a treatment-free interval (TFI) < 12 months, OS was significantly longer with ribociclib, although the number of events in this subgroup was low. However, this study also did not address data missingness, and the design was not aligned to the corresponding RCTs (PALOMA-2 or MONALEESA-2), which could result in inadequate proxying of trial inclusion/exclusion criteria. One real-world study conducted in the UK reported a significant difference in real-world OS between the CDK4/6is (P = 0.004), with the longest median OS observed in the palbociclib plus ET arm [40], but this was a single-center study and was limited by a small sample size.
Using observational data to answer causal questions relating to treatments and outcomes is challenging due to inherent selection biases, unmeasured confounding factors, and the inability to ensure comparable treatment groups, as they would be in RCTs, leading to biased estimates of treatment effects and questionable conclusions about efficacy [25]. However, the principled approach of target trial emulation is gaining wider acceptance as an approach to answer causal questions using observational data: the target trial emulation framework applies the rigorous design principles of an RCT to observational data [25–27], which addresses issues of selection bias in observational studies. This approach provides a more robust estimate of a causal effect, especially when an RCT is not feasible, by specifying eligibility criteria, causal estimand, causal contrast, censoring strategies, and the emulation of these components using observational data. Additionally, it allows for transparency in any assumptions being made due to limitations of the available data [25]. Target trial emulation has previously been used in oncology research, and an analysis of published emulations of oncology RCTs, including studies using data derived from EHR, has shown that the emulation hazard ratio estimate fell within the 95% confidence interval of the RCT findings in nine out of eleven evaluable studies [28]. One study specifically emulated the PALOMA-2 RCT with EHR data [29].
We aimed to emulate a hypothetical target trial using contemporaneous observational data from patients with HR-positive/HER2-negative mBC receiving first-line CDK4/6i plus AI as part of routine care to establish if there is a difference in real-world OS between patients receiving palbociclib, ribociclib, or abemaciclib plus AI in routine clinical practice.
Methods
Methods
This retrospective open-cohort study adhered to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines.
Target trial emulation framework
We began by framing the causal question that we wished to answer through a target trial: Is there a difference in real-world OS between patients receiving first-line CDK4/6i plus AI? As prescribed by the target trial emulation framework [30], we specified the various components of the target trial protocol (Table 1), which aligned to common criteria across three pivotal CDK4/6i trials [4–6], the emulation of these components using observational data, and any assumptions made. The analysis of comparative effectiveness focused on palbociclib plus AI versus ribociclib plus AI or abemaciclib plus AI.
Cohort definition
The study used data from the US Flatiron Health mBC Enhanced Data Mart, which comprises deidentified, structured and unstructured, longitudinal patient-level data curated via a technology-enabled abstraction method [31]. During the study period, data originated from approximately 800 sites of care, mostly in the community oncology setting.
Inclusion/exclusion criteria were applied to closely emulate a hypothetical trial (Fig. 1). Patients eligible for inclusion were ≥ 18 years of age at primary BC diagnosis with evidence of HR-positive/HER2-negative disease that was documented from a year prior to diagnosis of mBC until the initiation of first-line therapy. Patients had initiated first-line CDK4/6i (palbociclib/ribociclib/abemaciclib) plus AI (anastrozole/letrozole/exemestane) for mBC between January 1, 2018, and August 30, 2024 (follow-up data were collected until November 30, 2024). Patients had an Eastern Cooperative Oncology Group performance status (ECOG PS) of 0 or 1, or unknown at initiation of first-line therapy. Including patients with an unknown ECOG PS allowed for a larger patient cohort, with the missingness being handled through multiple imputation (see Supplemental Methods). To ensure that patients with endocrine-resistant tumors were excluded, and in alignment with the pivotal clinical trials, patients were required to have a TFI ≥ 12 months between last evidence of receipt of adjuvant ET and mBC diagnosis (see Supplemental Methods). Patients with recurrent BC who had received AI or CDK4/6i prior to their metastatic diagnosis and patients with multiple distinct primary tumors were also excluded. All patients were subject to a maximum follow-up time of 72 months after first-line treatment initiation.
Statistical analyses
Statistical analyses were predefined in the statistical analysis plan and conducted using R, version 4.1.0. Patient demographics and clinical characteristics at baseline were summarized descriptively by CDK4/6i. Year of first-line therapy initiation (2018–2020/2021–2024) was captured as a baseline variable.
Real-world OS by CDK4/6i was estimated using Kaplan–Meier methodology. Treatment effects were analyzed using adjusted Cox proportional hazards models, including covariate and propensity score adjustments. Covariates used in the Cox proportional hazards model were age (< 65 years/≥ 65 years), race (white/black/Asian/other), socioeconomic status (ranked from 1 [lowest] to 5 [highest]), ECOG PS (0 or 1), de novo mBC (yes/no), prior adjuvant AI (yes/no), prior adjuvant ET (yes/no), first-line AI partner (anastrozole/letrozole), disease site (liver/non-liver, visceral/non-visceral), and year of first-line therapy initiation (2018–2020/2021–2024) to account for differences in the approval dates and uptake of ribociclib and abemaciclib compared with palbociclib. Missing values in race, socioeconomic status, ECOG PS, and disease site were assigned their own category of “unknown” in the primary analysis.
To evaluate the comparative effectiveness of the three CDK4/6is + AI, we first used these covariates in an unweighted analysis where a multivariate Cox proportional hazards model was fitted to the data to estimate the treatment effect. We further performed a weighted analysis, where treatment groups were subjected to stabilized Inverse Propensity of Treatment Weighting (sIPTW) with the average treatment effect as estimand and using the covariates mentioned above to measure the balance between groups.
A Cox proportional hazards model was then fitted to the weighted cohort and adjusted with the same covariates used for weighting to produce a doubly robust model [32]. Doubly robust models have been proposed as a more reliable approach for the estimation of causal effects in observational studies because they are more robust to model mis-specification. Estimates resulting from a doubly robust model remain consistent even if one of the models (propensity score or outcome model) is mis-specified [33].
Sensitivity analyses
Detailed methodologies are provided in the Supplemental Methods. Multiple sensitivity analyses were performed to evaluate whether varying the approach to handle missing values would change the results: complete case analysis, where only patients with no missing covariate data were included, and analysis of missing values using the multiple imputation by chained equations (MICE) approach [34]. We further investigated missingness mechanisms (missing completely at random [MCAR], missing at random [MAR], and missing not at random [MNAR]) in each covariate using the smdi R package [35]. Based on our findings, we conducted a further sensitivity analysis to quantify the effect of deviations from MAR mechanism when missing data are imputed using MICE.
Supplementary sensitivity analyses considering specific patient and disease characteristics were also conducted, including sIPTW analysis of a cohort excluding patients with a TFI < 12 months associated with adjuvant AI (versus the primary analysis that excluded patients with a TFI < 12 months associated with any adjuvant ET), as well as quantitative bias analysis on the effect of menopausal status, considered an unmeasured confounder in the primary analysis, using the E-value computation procedure [36] and the array approach [37].
This retrospective open-cohort study adhered to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines.
Target trial emulation framework
We began by framing the causal question that we wished to answer through a target trial: Is there a difference in real-world OS between patients receiving first-line CDK4/6i plus AI? As prescribed by the target trial emulation framework [30], we specified the various components of the target trial protocol (Table 1), which aligned to common criteria across three pivotal CDK4/6i trials [4–6], the emulation of these components using observational data, and any assumptions made. The analysis of comparative effectiveness focused on palbociclib plus AI versus ribociclib plus AI or abemaciclib plus AI.
Cohort definition
The study used data from the US Flatiron Health mBC Enhanced Data Mart, which comprises deidentified, structured and unstructured, longitudinal patient-level data curated via a technology-enabled abstraction method [31]. During the study period, data originated from approximately 800 sites of care, mostly in the community oncology setting.
Inclusion/exclusion criteria were applied to closely emulate a hypothetical trial (Fig. 1). Patients eligible for inclusion were ≥ 18 years of age at primary BC diagnosis with evidence of HR-positive/HER2-negative disease that was documented from a year prior to diagnosis of mBC until the initiation of first-line therapy. Patients had initiated first-line CDK4/6i (palbociclib/ribociclib/abemaciclib) plus AI (anastrozole/letrozole/exemestane) for mBC between January 1, 2018, and August 30, 2024 (follow-up data were collected until November 30, 2024). Patients had an Eastern Cooperative Oncology Group performance status (ECOG PS) of 0 or 1, or unknown at initiation of first-line therapy. Including patients with an unknown ECOG PS allowed for a larger patient cohort, with the missingness being handled through multiple imputation (see Supplemental Methods). To ensure that patients with endocrine-resistant tumors were excluded, and in alignment with the pivotal clinical trials, patients were required to have a TFI ≥ 12 months between last evidence of receipt of adjuvant ET and mBC diagnosis (see Supplemental Methods). Patients with recurrent BC who had received AI or CDK4/6i prior to their metastatic diagnosis and patients with multiple distinct primary tumors were also excluded. All patients were subject to a maximum follow-up time of 72 months after first-line treatment initiation.
Statistical analyses
Statistical analyses were predefined in the statistical analysis plan and conducted using R, version 4.1.0. Patient demographics and clinical characteristics at baseline were summarized descriptively by CDK4/6i. Year of first-line therapy initiation (2018–2020/2021–2024) was captured as a baseline variable.
Real-world OS by CDK4/6i was estimated using Kaplan–Meier methodology. Treatment effects were analyzed using adjusted Cox proportional hazards models, including covariate and propensity score adjustments. Covariates used in the Cox proportional hazards model were age (< 65 years/≥ 65 years), race (white/black/Asian/other), socioeconomic status (ranked from 1 [lowest] to 5 [highest]), ECOG PS (0 or 1), de novo mBC (yes/no), prior adjuvant AI (yes/no), prior adjuvant ET (yes/no), first-line AI partner (anastrozole/letrozole), disease site (liver/non-liver, visceral/non-visceral), and year of first-line therapy initiation (2018–2020/2021–2024) to account for differences in the approval dates and uptake of ribociclib and abemaciclib compared with palbociclib. Missing values in race, socioeconomic status, ECOG PS, and disease site were assigned their own category of “unknown” in the primary analysis.
To evaluate the comparative effectiveness of the three CDK4/6is + AI, we first used these covariates in an unweighted analysis where a multivariate Cox proportional hazards model was fitted to the data to estimate the treatment effect. We further performed a weighted analysis, where treatment groups were subjected to stabilized Inverse Propensity of Treatment Weighting (sIPTW) with the average treatment effect as estimand and using the covariates mentioned above to measure the balance between groups.
A Cox proportional hazards model was then fitted to the weighted cohort and adjusted with the same covariates used for weighting to produce a doubly robust model [32]. Doubly robust models have been proposed as a more reliable approach for the estimation of causal effects in observational studies because they are more robust to model mis-specification. Estimates resulting from a doubly robust model remain consistent even if one of the models (propensity score or outcome model) is mis-specified [33].
Sensitivity analyses
Detailed methodologies are provided in the Supplemental Methods. Multiple sensitivity analyses were performed to evaluate whether varying the approach to handle missing values would change the results: complete case analysis, where only patients with no missing covariate data were included, and analysis of missing values using the multiple imputation by chained equations (MICE) approach [34]. We further investigated missingness mechanisms (missing completely at random [MCAR], missing at random [MAR], and missing not at random [MNAR]) in each covariate using the smdi R package [35]. Based on our findings, we conducted a further sensitivity analysis to quantify the effect of deviations from MAR mechanism when missing data are imputed using MICE.
Supplementary sensitivity analyses considering specific patient and disease characteristics were also conducted, including sIPTW analysis of a cohort excluding patients with a TFI < 12 months associated with adjuvant AI (versus the primary analysis that excluded patients with a TFI < 12 months associated with any adjuvant ET), as well as quantitative bias analysis on the effect of menopausal status, considered an unmeasured confounder in the primary analysis, using the E-value computation procedure [36] and the array approach [37].
Results
Results
Cohort description
Overall, 2626 patients were included (Fig. 2), with 1686 (64%) patients receiving palbociclib plus AI, 537 (20%) receiving ribociclib plus AI, and 403 (15%) receiving abemaciclib plus AI as first-line therapy for mBC. Almost three-quarters (1919/2626; 73%) of patients were from community clinics, one-quarter (657/2626; 25%) were from academic clinics, and the remainder (50/2626; 2%) were from both.
Patient demographics and clinical characteristics were generally well-balanced across the CDK4/6i arms (Table 2), even before weighting to adjust for differences. A greater proportion of patients receiving palbociclib were aged ≥ 65 years (37% vs 27% of patients in the ribociclib arm and 34% in the abemaciclib arm, respectively) and were White (69% vs 64% vs 60%). Among patients receiving first-line ribociclib, 81% were initiated on treatment in 2021 or later, which was a greater proportion than those receiving abemaciclib (68%) or palbociclib (48%). Patients in the palbociclib arm had the longest median follow-up (42.3 months, 95% CI: 40.7–45.0) compared with those in the ribociclib (16.9 months, 95% CI: 16.1–18.5) and abemaciclib arms (28.0 months, 95% CI: 25.2–31.5).
Cohort description
Overall, 2626 patients were included (Fig. 2), with 1686 (64%) patients receiving palbociclib plus AI, 537 (20%) receiving ribociclib plus AI, and 403 (15%) receiving abemaciclib plus AI as first-line therapy for mBC. Almost three-quarters (1919/2626; 73%) of patients were from community clinics, one-quarter (657/2626; 25%) were from academic clinics, and the remainder (50/2626; 2%) were from both.
Patient demographics and clinical characteristics were generally well-balanced across the CDK4/6i arms (Table 2), even before weighting to adjust for differences. A greater proportion of patients receiving palbociclib were aged ≥ 65 years (37% vs 27% of patients in the ribociclib arm and 34% in the abemaciclib arm, respectively) and were White (69% vs 64% vs 60%). Among patients receiving first-line ribociclib, 81% were initiated on treatment in 2021 or later, which was a greater proportion than those receiving abemaciclib (68%) or palbociclib (48%). Patients in the palbociclib arm had the longest median follow-up (42.3 months, 95% CI: 40.7–45.0) compared with those in the ribociclib (16.9 months, 95% CI: 16.1–18.5) and abemaciclib arms (28.0 months, 95% CI: 25.2–31.5).
Comparative effectiveness of first-line AI plus CDK4/6i
Comparative effectiveness of first-line AI plus CDK4/6i
Unweighted analysis
Without accounting for differences in population characteristics across arms, the median real-world OS for patients in the palbociclib arm was 55.7 months (95% CI: 52.0–62.0), 52.6 months in the ribociclib arm (95% CI: 45.8–64.6), and 68.2 months (95% CI: 56.5–not available [NA]) in the abemaciclib arm (Fig. 3). Hazard ratio estimates using a multivariate Cox proportional hazards model showed no observable difference in OS across the three arms.
Weighted analysis
sIPTW produced a well-balanced distribution of covariates across the three arms; standardized mean differences < 0.011 were achieved across all confounders for each arm-pair. The effective sample sizes (rounded to the nearest integer) were 1587 for palbociclib, 375 for ribociclib, and 326 for abemaciclib. In a doubly robust model, after weighting and adjusting for differences across arms, hazard ratio estimates using a multivariate Cox proportional hazards model showed no observable difference in OS across the three arms (Fig. 4).
Sensitivity analyses and missingness mechanisms
After repeating the doubly robust analysis on the subset of patients with only complete covariate data (complete case analysis), estimates of real-world OS across the three arms remained consistent with the unweighted and weighted analyses (Fig. 5). Repeating the analysis with missing values imputed using MICE imputation created five datasets using a random forest for fully conditional specification. After sIPTW within each imputed dataset, fitting of a multivariate Cox model to each weighted dataset, and results pooling using Rubin’s rules, hazard ratios of OS for abemaciclib and ribociclib when compared with palbociclib were 0.94 (95% CI: 0.74–1.19) and 1.09 (95% CI: 0.87–1.36), respectively.
Since non-missing ECOG PS values were restricted to 0 or 1 in the analysis dataset as a result of the trial eligibility criteria, MICE imputed values of ECOG PS were also restricted to this range. We performed a subsequent sensitivity analysis by repeating the MICE imputation procedure using an expanded dataset obtained by relaxing the ECOG PS exclusion criterion as the starting point. Each imputed dataset was filtered to retain only patients with an ECOG PS score of 0 or 1, mirroring the trial eligibility criterion. After sIPTW within each imputed dataset, fitting of a multivariate Cox model to each weighted dataset, and results pooling using Rubin’s rules, hazard ratios of OS for abemaciclib and ribociclib when compared with palbociclib were 0.93 (95% CI: 0.69–1.17) and 1.12 (95% CI: 0.89–1.34), respectively.
To ensure that MICE was applicable to all covariates, an investigation into the missingness mechanisms was performed using various diagnostics, which, in aggregate, suggest an MCAR mechanism for the missingness of ECOG PS data, a MAR mechanism for missingness of race and disease site data, and an MNAR mechanism for socioeconomic status data. Detailed interpretation of missingness mechanisms can be found in the Supplemental Results and in Supplementary Table 1. As the missingness for socioeconomic data could deviate from a MAR mechanism, a not at random fully conditional specification (NARFCS) sensitivity analysis was performed, which showed that the hazard ratios for OS remain unaffected by this deviation from the MAR mechanism (Supplementary Fig. 1).
Supplementary sIPTW analyses on a patient cohort excluding patients with a TFI < 12 months associated with adjuvant AI (as opposed to any adjuvant ET) did not affect estimated hazard ratios of OS (see Supplemental Results). Quantitative bias analyses on the effect of menopausal status using the E-value computation procedure [36] and the array approach [37], using the palbociclib and ribociclib arms as a reference, found that menopausal status must have an implausibly strong association with death (on the risk ratio scale) to mask even a moderate effect on survival between the palbociclib and ribociclib arms (see Supplemental Results and Supplementary Fig. 2).
Unweighted analysis
Without accounting for differences in population characteristics across arms, the median real-world OS for patients in the palbociclib arm was 55.7 months (95% CI: 52.0–62.0), 52.6 months in the ribociclib arm (95% CI: 45.8–64.6), and 68.2 months (95% CI: 56.5–not available [NA]) in the abemaciclib arm (Fig. 3). Hazard ratio estimates using a multivariate Cox proportional hazards model showed no observable difference in OS across the three arms.
Weighted analysis
sIPTW produced a well-balanced distribution of covariates across the three arms; standardized mean differences < 0.011 were achieved across all confounders for each arm-pair. The effective sample sizes (rounded to the nearest integer) were 1587 for palbociclib, 375 for ribociclib, and 326 for abemaciclib. In a doubly robust model, after weighting and adjusting for differences across arms, hazard ratio estimates using a multivariate Cox proportional hazards model showed no observable difference in OS across the three arms (Fig. 4).
Sensitivity analyses and missingness mechanisms
After repeating the doubly robust analysis on the subset of patients with only complete covariate data (complete case analysis), estimates of real-world OS across the three arms remained consistent with the unweighted and weighted analyses (Fig. 5). Repeating the analysis with missing values imputed using MICE imputation created five datasets using a random forest for fully conditional specification. After sIPTW within each imputed dataset, fitting of a multivariate Cox model to each weighted dataset, and results pooling using Rubin’s rules, hazard ratios of OS for abemaciclib and ribociclib when compared with palbociclib were 0.94 (95% CI: 0.74–1.19) and 1.09 (95% CI: 0.87–1.36), respectively.
Since non-missing ECOG PS values were restricted to 0 or 1 in the analysis dataset as a result of the trial eligibility criteria, MICE imputed values of ECOG PS were also restricted to this range. We performed a subsequent sensitivity analysis by repeating the MICE imputation procedure using an expanded dataset obtained by relaxing the ECOG PS exclusion criterion as the starting point. Each imputed dataset was filtered to retain only patients with an ECOG PS score of 0 or 1, mirroring the trial eligibility criterion. After sIPTW within each imputed dataset, fitting of a multivariate Cox model to each weighted dataset, and results pooling using Rubin’s rules, hazard ratios of OS for abemaciclib and ribociclib when compared with palbociclib were 0.93 (95% CI: 0.69–1.17) and 1.12 (95% CI: 0.89–1.34), respectively.
To ensure that MICE was applicable to all covariates, an investigation into the missingness mechanisms was performed using various diagnostics, which, in aggregate, suggest an MCAR mechanism for the missingness of ECOG PS data, a MAR mechanism for missingness of race and disease site data, and an MNAR mechanism for socioeconomic status data. Detailed interpretation of missingness mechanisms can be found in the Supplemental Results and in Supplementary Table 1. As the missingness for socioeconomic data could deviate from a MAR mechanism, a not at random fully conditional specification (NARFCS) sensitivity analysis was performed, which showed that the hazard ratios for OS remain unaffected by this deviation from the MAR mechanism (Supplementary Fig. 1).
Supplementary sIPTW analyses on a patient cohort excluding patients with a TFI < 12 months associated with adjuvant AI (as opposed to any adjuvant ET) did not affect estimated hazard ratios of OS (see Supplemental Results). Quantitative bias analyses on the effect of menopausal status using the E-value computation procedure [36] and the array approach [37], using the palbociclib and ribociclib arms as a reference, found that menopausal status must have an implausibly strong association with death (on the risk ratio scale) to mask even a moderate effect on survival between the palbociclib and ribociclib arms (see Supplemental Results and Supplementary Fig. 2).
Discussion
Discussion
To our knowledge, this is the first study investigating the comparative effectiveness of all three approved CDK4/6i + AI combinations using the target trial emulation approach. We found that the relative real-world OS outcomes were similar when palbociclib, ribociclib, or abemaciclib were combined with AI as first-line treatment.
Leveraging the target trial emulation approach ensured that the rigorous design principles of an RCT were applied to observational data [25] to provide a more robust estimate of a causal effect and bridge the gap between RCTs, which are resource-intensive, and real-world clinical practice. This approach ensures transparency regarding the assumptions by specifying the emulated trial’s components in a prespecified protocol. The results were consistent when data were analyzed using doubly robust methods, with multiple sensitivity analyses adding confidence to our findings.
In the target trial emulation framework, applying inclusion/exclusion criteria that best emulate a hypothetical trial is important to improve comparability between data collected from RCTs and real-world databases. In this study, clinically relevant characteristics, such as the proportion of patients with de novo metastatic disease (range 53–60% across arms) or visceral disease (47–49%), were consistent with those in the CDK4/6i registrational studies [4–6]. The median OS estimates in our study were consistent with those in PALOMA-2 [10], MONALEESA-2 [9], and MONARCH-3 [11], as demonstrated by our estimates falling within the 95% CIs reported in these studies.
Our results build upon previous observational studies that showed no difference in real-world OS between patients who received palbociclib, ribociclib, or abemaciclib when combined with AI [22, 24, 38, 39] or plus ET (either AI or SERD) [17, 19, 21, 23, 40], while also addressing key methodological limitations of these studies and of indirect treatment comparisons. Specifically, we corroborated and strengthened the conclusions of previous studies by addressing common limitations of observational studies through the principled study design prescribed by the target trial emulation framework and through rigorous sensitivity analyses, including those investigating the impact of missing data.
Use of the target trial emulation framework and the focus on the causal estimand are key strengths of this study. Our primary analysis adopted a doubly robust approach, which addresses measured confounding using both propensity score weighting as well as covariate-adjusted outcome regression and provides reliable estimates even if one of the models is mis-specified. We performed multiple sensitivity analyses to evaluate the stability of our results to different approaches of handling missing data and additionally investigated the potential impact of unmeasured confounders (premenopausal status). Missing data imputation and unmeasured confounding have not been rigorously addressed in prior studies on the real-world effectiveness of CDK4/6i plus AI or ET. We utilized a previously validated dataset that is fit-for-purpose [31, 41]. Additionally, the US Flatiron Health mortality data have been validated against the gold standard mortality dataset, the US National Death Index [42]. To our knowledge, no prior studies focused on this question have utilized the approach taken in this study.
It is important to note that our study did not analyze data on types of prior treatments or surgical interventions for early BC, comorbidities, or types of treatments used for mBC in the second-line setting and beyond, each of which may have impacted OS outcomes. For example, we did not consider evidence of prior adjuvant chemotherapy in our analysis, as a previous study has shown that this covariate had a limited impact on metastatic survival [43], and the proportion of patients with evidence of prior chemotherapy was low across CDK4/6i groups. The time between mBC and treatment initiation could differ across patients and treatment arms, especially if delays were influenced by clinical or systemic factors. If time-to-treatment is associated with prognosis, this could introduce bias. However, given the prolonged disease control typically observed with first-line CDK4/6i-based therapy, this bias is unlikely to meaningfully impact the reliability of the OS comparisons. Additionally, to limit this bias, we explicitly limited variability in time-to-treatment by requiring all patients to have initiated first-line therapy < 90 days after their mBC diagnosis.
Follow-up times in our study were shorter than those in pivotal trials due to the higher attrition of patients who were lost to follow-up. Furthermore, we observed differences in follow-up times across the arms despite data collection starting after all three CDK4/6is were approved. This is likely due to palbociclib, the first approved CDK4/6i, becoming the established treatment choice. To address this timing imbalance and its effect on follow-up duration, we included a categorical variable in our weighting and adjusted models that captured when patients initiated their first-line treatment. In an additional sensitivity analysis restricted to patients who initiated first-line treatment between 2018 and 2020, follow-up times across CDK4/6i groups were comparable, and results did not alter the qualitative conclusions of this study (see Supplemental Results). However, some residual time-window bias may remain and warrants a subsequent study improve precision of the median real-world OS estimates.
Almost three-quarters of patients were from community oncology practices in the US, which aligns with the most recent US oncology practice census [44]; although the deidentified data used in this study represented a large sample of community oncology practices, the data may not have been representative of all oncology practices. As this study reports real-world OS outcomes in a US population only, the external validity of the study outside of the US may be lower. However, it is also appropriate to acknowledge that in the global PALOMA-2 [10] and MONALEESA-2 [9] registrational RCTs, prespecified exploratory analysis of OS by region did show a survival benefit that was consistent with that observed in the overall study population.
To our knowledge, this is the first study investigating the comparative effectiveness of all three approved CDK4/6i + AI combinations using the target trial emulation approach. We found that the relative real-world OS outcomes were similar when palbociclib, ribociclib, or abemaciclib were combined with AI as first-line treatment.
Leveraging the target trial emulation approach ensured that the rigorous design principles of an RCT were applied to observational data [25] to provide a more robust estimate of a causal effect and bridge the gap between RCTs, which are resource-intensive, and real-world clinical practice. This approach ensures transparency regarding the assumptions by specifying the emulated trial’s components in a prespecified protocol. The results were consistent when data were analyzed using doubly robust methods, with multiple sensitivity analyses adding confidence to our findings.
In the target trial emulation framework, applying inclusion/exclusion criteria that best emulate a hypothetical trial is important to improve comparability between data collected from RCTs and real-world databases. In this study, clinically relevant characteristics, such as the proportion of patients with de novo metastatic disease (range 53–60% across arms) or visceral disease (47–49%), were consistent with those in the CDK4/6i registrational studies [4–6]. The median OS estimates in our study were consistent with those in PALOMA-2 [10], MONALEESA-2 [9], and MONARCH-3 [11], as demonstrated by our estimates falling within the 95% CIs reported in these studies.
Our results build upon previous observational studies that showed no difference in real-world OS between patients who received palbociclib, ribociclib, or abemaciclib when combined with AI [22, 24, 38, 39] or plus ET (either AI or SERD) [17, 19, 21, 23, 40], while also addressing key methodological limitations of these studies and of indirect treatment comparisons. Specifically, we corroborated and strengthened the conclusions of previous studies by addressing common limitations of observational studies through the principled study design prescribed by the target trial emulation framework and through rigorous sensitivity analyses, including those investigating the impact of missing data.
Use of the target trial emulation framework and the focus on the causal estimand are key strengths of this study. Our primary analysis adopted a doubly robust approach, which addresses measured confounding using both propensity score weighting as well as covariate-adjusted outcome regression and provides reliable estimates even if one of the models is mis-specified. We performed multiple sensitivity analyses to evaluate the stability of our results to different approaches of handling missing data and additionally investigated the potential impact of unmeasured confounders (premenopausal status). Missing data imputation and unmeasured confounding have not been rigorously addressed in prior studies on the real-world effectiveness of CDK4/6i plus AI or ET. We utilized a previously validated dataset that is fit-for-purpose [31, 41]. Additionally, the US Flatiron Health mortality data have been validated against the gold standard mortality dataset, the US National Death Index [42]. To our knowledge, no prior studies focused on this question have utilized the approach taken in this study.
It is important to note that our study did not analyze data on types of prior treatments or surgical interventions for early BC, comorbidities, or types of treatments used for mBC in the second-line setting and beyond, each of which may have impacted OS outcomes. For example, we did not consider evidence of prior adjuvant chemotherapy in our analysis, as a previous study has shown that this covariate had a limited impact on metastatic survival [43], and the proportion of patients with evidence of prior chemotherapy was low across CDK4/6i groups. The time between mBC and treatment initiation could differ across patients and treatment arms, especially if delays were influenced by clinical or systemic factors. If time-to-treatment is associated with prognosis, this could introduce bias. However, given the prolonged disease control typically observed with first-line CDK4/6i-based therapy, this bias is unlikely to meaningfully impact the reliability of the OS comparisons. Additionally, to limit this bias, we explicitly limited variability in time-to-treatment by requiring all patients to have initiated first-line therapy < 90 days after their mBC diagnosis.
Follow-up times in our study were shorter than those in pivotal trials due to the higher attrition of patients who were lost to follow-up. Furthermore, we observed differences in follow-up times across the arms despite data collection starting after all three CDK4/6is were approved. This is likely due to palbociclib, the first approved CDK4/6i, becoming the established treatment choice. To address this timing imbalance and its effect on follow-up duration, we included a categorical variable in our weighting and adjusted models that captured when patients initiated their first-line treatment. In an additional sensitivity analysis restricted to patients who initiated first-line treatment between 2018 and 2020, follow-up times across CDK4/6i groups were comparable, and results did not alter the qualitative conclusions of this study (see Supplemental Results). However, some residual time-window bias may remain and warrants a subsequent study improve precision of the median real-world OS estimates.
Almost three-quarters of patients were from community oncology practices in the US, which aligns with the most recent US oncology practice census [44]; although the deidentified data used in this study represented a large sample of community oncology practices, the data may not have been representative of all oncology practices. As this study reports real-world OS outcomes in a US population only, the external validity of the study outside of the US may be lower. However, it is also appropriate to acknowledge that in the global PALOMA-2 [10] and MONALEESA-2 [9] registrational RCTs, prespecified exploratory analysis of OS by region did show a survival benefit that was consistent with that observed in the overall study population.
Conclusions
Conclusions
Using a target trial approach, we emulated hypothetical Phase 3 registrational RCTs for palbociclib, ribociclib, and abemaciclib, leveraging observational data with mature follow-up to address the causal question: Is there a difference in real-world OS between patients receiving first-line AI plus CDK4/6i in the real-world clinical setting? We found that real-world OS is similar when palbociclib, ribociclib, or abemaciclib were combined with AI as first-line treatment for patients with HR-positive/HER2-negative mBC, and findings were consistent when data were subject to propensity weighting (sIPTW) and multiple sensitivity analyses that investigated potential confounders and the impact of missing data. As first-line treatment evolves, these findings may contribute to the ongoing development of combination strategies, including novel oral SERDs, to improve clinical outcomes in patients with HR-positive/HER2-negative mBC and to guide clinical decision-making.
Using a target trial approach, we emulated hypothetical Phase 3 registrational RCTs for palbociclib, ribociclib, and abemaciclib, leveraging observational data with mature follow-up to address the causal question: Is there a difference in real-world OS between patients receiving first-line AI plus CDK4/6i in the real-world clinical setting? We found that real-world OS is similar when palbociclib, ribociclib, or abemaciclib were combined with AI as first-line treatment for patients with HR-positive/HER2-negative mBC, and findings were consistent when data were subject to propensity weighting (sIPTW) and multiple sensitivity analyses that investigated potential confounders and the impact of missing data. As first-line treatment evolves, these findings may contribute to the ongoing development of combination strategies, including novel oral SERDs, to improve clinical outcomes in patients with HR-positive/HER2-negative mBC and to guide clinical decision-making.
Supplementary Information
Supplementary Information
Below is the link to the electronic supplementary material.
Below is the link to the electronic supplementary material.
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- A Phase I Study of Hydroxychloroquine and Suba-Itraconazole in Men with Biochemical Relapse of Prostate Cancer (HITMAN-PC): Dose Escalation Results.
- Self-management of male urinary symptoms: qualitative findings from a primary care trial.
- Clinical and Liquid Biomarkers of 20-Year Prostate Cancer Risk in Men Aged 45 to 70 Years.
- Diagnostic accuracy of Ga-PSMA PET/CT versus multiparametric MRI for preoperative pelvic invasion in the patients with prostate cancer.
- Clinical Presentation and Outcomes of Patients Undergoing Surgery for Thyroid Cancer.
- Association of patient health education with the postoperative health related quality of life in low- intermediate recurrence risk differentiated thyroid cancer patients.