Adjuvant nivolumab vs pembrolizumab in stage IIB/IIC melanoma: a reinforcement learning-based simulation study.
1/5 보강
PICO 자동 추출 (휴리스틱, conf 3/4)
유사 논문P · Population 대상 환자/모집단
추출되지 않음
I · Intervention 중재 / 시술
Adjuvant nivolumab
C · Comparison 대조 / 비교
pembrolizumab in stage IIB/IIC melanoma
O · Outcome 결과 / 결론
[CONCLUSION] Our RL framework complements existing comparative methods by making treatment trade-offs explicit and scenario-dependent. Rather than declaring a universal "best" PD-1 inhibitor, the model contextualizes efficacy-toxicity balances, supporting transparent decision-making in settings where small absolute differences may meaningfully influence patient and clinician preferences.
[BACKGROUND] Adjuvant programmed cell death protein 1 (PD-1) inhibitors (nivolumab, pembrolizumab) improve recurrence-free survival (RFS) in stage IIB-IIC melanoma, yet no head-to-head trial directly
APA
Perkin P, Köş FT (2026). Adjuvant nivolumab vs pembrolizumab in stage IIB/IIC melanoma: a reinforcement learning-based simulation study.. Clinical & translational oncology : official publication of the Federation of Spanish Oncology Societies and of the National Cancer Institute of Mexico, 28(4), 1422-1430. https://doi.org/10.1007/s12094-025-04090-x
MLA
Perkin P, et al.. "Adjuvant nivolumab vs pembrolizumab in stage IIB/IIC melanoma: a reinforcement learning-based simulation study.." Clinical & translational oncology : official publication of the Federation of Spanish Oncology Societies and of the National Cancer Institute of Mexico, vol. 28, no. 4, 2026, pp. 1422-1430.
PMID
41182652
Abstract
[BACKGROUND] Adjuvant programmed cell death protein 1 (PD-1) inhibitors (nivolumab, pembrolizumab) improve recurrence-free survival (RFS) in stage IIB-IIC melanoma, yet no head-to-head trial directly compares them. Traditional indirect methods estimate relative efficacy but often fail to integrate toxicity and patient-level trade-offs. Reinforcement learning (RL) provides a framework to simulate decision-making under uncertainty and competing clinical priorities.
[METHODS] We developed an RL model treating each simulated patient as the environment, with state variables including age, ECOG status, stage, time-to-recurrence, and adverse event (AE) outcomes. Actions were treatment choices between nivolumab and pembrolizumab. Rewards combined gains in RFS (+ 1 per 2 months) with penalties for grade 3-4 AEs and discontinuations, incorporating both raw and placebo-adjusted AE rates. Q-learning was iterated across 1000 virtual trial episodes until policy convergence.
[RESULTS] The RL-derived policies reflected conditional treatment preferences rather than a single optimal agent. In scenarios weighted toward tolerability, nivolumab was favored due to lower grade 3-4 AE and discontinuation rates. When incremental RFS gains were prioritized, pembrolizumab emerged as the preferred option. Placebo-adjusted versus raw AE modeling materially influenced the balance of preferences, underscoring the importance of attribution in comparative safety assessment.
[CONCLUSION] Our RL framework complements existing comparative methods by making treatment trade-offs explicit and scenario-dependent. Rather than declaring a universal "best" PD-1 inhibitor, the model contextualizes efficacy-toxicity balances, supporting transparent decision-making in settings where small absolute differences may meaningfully influence patient and clinician preferences.
[METHODS] We developed an RL model treating each simulated patient as the environment, with state variables including age, ECOG status, stage, time-to-recurrence, and adverse event (AE) outcomes. Actions were treatment choices between nivolumab and pembrolizumab. Rewards combined gains in RFS (+ 1 per 2 months) with penalties for grade 3-4 AEs and discontinuations, incorporating both raw and placebo-adjusted AE rates. Q-learning was iterated across 1000 virtual trial episodes until policy convergence.
[RESULTS] The RL-derived policies reflected conditional treatment preferences rather than a single optimal agent. In scenarios weighted toward tolerability, nivolumab was favored due to lower grade 3-4 AE and discontinuation rates. When incremental RFS gains were prioritized, pembrolizumab emerged as the preferred option. Placebo-adjusted versus raw AE modeling materially influenced the balance of preferences, underscoring the importance of attribution in comparative safety assessment.
[CONCLUSION] Our RL framework complements existing comparative methods by making treatment trade-offs explicit and scenario-dependent. Rather than declaring a universal "best" PD-1 inhibitor, the model contextualizes efficacy-toxicity balances, supporting transparent decision-making in settings where small absolute differences may meaningfully influence patient and clinician preferences.
MeSH Terms
Nivolumab; Humans; Antibodies, Monoclonal, Humanized; Melanoma; Antineoplastic Agents, Immunological; Neoplasm Staging; Chemotherapy, Adjuvant; Computer Simulation; Skin Neoplasms; Reinforcement, Psychology; Immune Checkpoint Inhibitors