Multiple imputation for missing values in ordinal variables from cancer registry data when performing Cox proportional hazards regression.

Kästner A; Hoffmann W; Hüsing J; Stang A; Hüsing A

doi:10.1186/s12874-026-02790-8

← 뒤로

Multiple imputation for missing values in ordinal variables from cancer registry data when performing Cox proportional hazards regression.

BMC medical research methodology 2026 Vol.26(1)

Kästner A, Hoffmann W, Hüsing J, Stang A, Hüsing A

PMC 전문 ↗ 원문 ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

[BACKGROUND] Scientists working with cancer registry data are often confronted with large proportions of missing values in ordinal variables, such as tumor stage, grading or the general health status

이 논문을 인용하기

BibTeX ↓ RIS ↓

APA Kästner A, Hoffmann W, et al. (2026). Multiple imputation for missing values in ordinal variables from cancer registry data when performing Cox proportional hazards regression.. BMC medical research methodology, 26(1). https://doi.org/10.1186/s12874-026-02790-8

MLA Kästner A, et al.. "Multiple imputation for missing values in ordinal variables from cancer registry data when performing Cox proportional hazards regression.." BMC medical research methodology, vol. 26, no. 1, 2026.

PMID 41645114

DOI 10.1186/s12874-026-02790-8

Abstract

[BACKGROUND] Scientists working with cancer registry data are often confronted with large proportions of missing values in ordinal variables, such as tumor stage, grading or the general health status (ECOG-PS scored 0 to 5). Despite the long-standing issue, research on handling missing ordinal cancer registry data remains sparse.

[METHODS] A simulation study was conducted using complete lung cancer cases (2019–2022) from the North Rhine-Westphalia Cancer Registry. Missing values in ECOG-PS were generated with varying missingness mechanisms (MCAR, MAR, MNAR), missingness proportions (10% to 50%) and sample sizes ( = 500, = 1,000, = 5,000). The data were then replaced using MICE with ordinal logistic regression (POLR), multinomial regression (POLYREG), predictive mean matching (PMM), random forests (RF), and the joint model (JM). The performance parameters bias, MSE, width of the 95%CI and coverage were assessed.

[RESULTS] Severe bias, high MSE, wide 95%CI, and poor coverage were found in scenarios with sample sizes of = 500 and 1,000 and 30% or more missing data with low prevalence of ECOG-PS = 4. MICE with POLYREG maintained low bias across all scenarios with = 5,000, while MICE with RF and PMM performed well with up to 30%-50% missing data. MICE with POLR and the JM yielded low bias with up to 10%-20% missing data. Compared to complete case analysis, MI did not offer a systematic advantage in terms of bias or MSE compared to the MI methods evaluated.

[CONCLUSION] Sample size and ordinal category distribution impact missing data handling in registry studies. Severe bias might be introduced when sample sizes are smaller and prevalence of categories is low, indicating finite-sample effects rather than systematic bias of the imputation methods. Among the MI methods applied, MICE with POLYREG performed best, however, further research is needed for time-to-event analyses and multivariate missingness patterns.

[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12874-026-02790-8.

같은 제1저자의 인용 많은 논문 (1)

Concept and feasibility of privacy-preserving record linkage of cancer registry data and claims data in Germany: results from the DigiNet study on stage IV non-small cell lung cancer.
Journal of cancer research and clinical oncology 2025