An integrated random forest- and LASSO-derived nomogram for predicting postoperative nosocomial infections in colorectal cancer patients.

Lu R; Xue X; Chen T; Wang Y

doi:10.1186/s13741-026-00667-4

← 뒤로

An integrated random forest- and LASSO-derived nomogram for predicting postoperative nosocomial infections in colorectal cancer patients.

코호트 1/5 보강

Perioperative medicine (London, England) 2026 Vol.15(1)

PICO 자동 추출 (휴리스틱, conf 3/4)

유사 논문

P · Population 대상 환자/모집단

760 patients who underwent colorectal cancer surgery between 2020 and 2024.

I · Intervention 중재 / 시술

colorectal cancer surgery between 2020 and 2024

C · Comparison 대조 / 비교

추출되지 않음

O · Outcome 결과 / 결론

[CONCLUSIONS] Our nomogram enables precise stratification of colorectal cancer patients by their postoperative infection risk, highlighting perioperative factors—such as operative duration, surgical approach, and ASA grade—that warrant targeted management. Future prospective, multicentre validation will be essential to refine and generalize the model’s applicability.

Lu R, Xue X, Chen T, Wang Y

📖 무료 전문 🟢 PMC 전문 PMC13041146

PubMed ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)

연구 설계 cohort study

이 논문을 인용하기

↓ .bib ↓ .ris

APA Lu R, Xue X, et al. (2026). An integrated random forest- and LASSO-derived nomogram for predicting postoperative nosocomial infections in colorectal cancer patients.. Perioperative medicine (London, England), 15(1). https://doi.org/10.1186/s13741-026-00667-4

MLA Lu R, et al.. "An integrated random forest- and LASSO-derived nomogram for predicting postoperative nosocomial infections in colorectal cancer patients.." Perioperative medicine (London, England), vol. 15, no. 1, 2026.

PMID 41749269 ↗

DOI 10.1186/s13741-026-00667-4

Abstract

[OBJECTIVE] We sought to delineate the independent risk factors underlying postoperative nosocomial infections in colorectal cancer patients and to construct and validate a nomogram for individualized risk prediction, thereby enabling early clinical identification of high-risk individuals and the implementation of targeted preventive strategies.

[METHODS] We conducted a retrospective cohort study including 1,760 patients who underwent colorectal cancer surgery between 2020 and 2024. Postoperative nosocomial infection was defined as any hospital-acquired infection occurring within 30 days after surgery, including lower respiratory tract infection, surgical-site infection, multiple-site infections, and other-site infections. Patients admitted in 2020–2022 comprised the training cohort ( = 1,146), and those admitted in 2023–2024 served as a temporal validation cohort ( = 614). Univariable analyses were performed to screen candidate predictors. Predictor importance was ranked using a random forest model, and least absolute shrinkage and selection operator (LASSO) regression with 10-fold cross-validation was applied to reduce overfitting and multicollinearity. Predictors retained after feature selection were entered into multivariable logistic regression to construct a nomogram. Model performance was evaluated by discrimination (AUC), calibration plots, and decision-curve analysis (DCA).

[RESULTS] Postoperative hospital-acquired infection occurred in 166/1,760 (9.43%) patients. Lower respiratory tract infection was the most common subtype, followed by surgical-site infection. LASSO retained eight predictors with non-zero coefficients, and multivariable logistic regression confirmed that age, coronary heart disease, perioperative blood transfusion, colostomy, surgical approach, ASA grade, persistent fever for 3 consecutive days, and postoperative complications were independently associated with postoperative hospital-acquired infection. The nomogram showed good discrimination, with an AUC of 0.860 in the training cohort and 0.843 in the validation cohort. Calibration was good in the training cohort and acceptable in the validation cohort, with modest deviations at lower predicted probabilities. DCA demonstrated a positive net benefit across a broad range of clinically relevant threshold probabilities in both cohorts.

[CONCLUSIONS] Our nomogram enables precise stratification of colorectal cancer patients by their postoperative infection risk, highlighting perioperative factors—such as operative duration, surgical approach, and ASA grade—that warrant targeted management. Future prospective, multicentre validation will be essential to refine and generalize the model’s applicability.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

Dual-dimensional profiling of host genomic variations and HPV integration in PD-L1-stratified cervical cancer via Oxford Nanopore Technology.
Journal of translational medicine 2026
DDR1 as a key prognostic biomarker in non-small cell lung cancer: identification, validation, and potential therapeutic implications.
Frontiers in immunology 2025
Robot-assisted thoracoscopic tracheal carina resection and reconstruction under spontaneous-ventilation anesthesia with a single-lumen endotracheal tube: case report and literature review.
Journal of cardiothoracic surgery 2025
Integrated Analysis of PSMB8 Expression and Its Potential Roles in Hepatocellular Carcinoma.
Digestive diseases and sciences 2025
Latent profile analysis of self-management and its association with quality of life differences in patients with cancer treated with immune checkpoint inhibitors.
Asia-Pacific journal of oncology nursing 2025

📖 전문 본문 읽기 PMC JATS · ~49 KB · 영문

Introduction

Introduction
Colorectal cancer remains a leading cause of cancer-related morbidity and mortality worldwide (Zhang, et al. 2024; Davidson et al. 2021). Despite advances in multimodal treatment—including surgery—that have elevated 5-year survival rates beyond 75% (Cuomo 2024), postoperative infections persist as a formidable complication, with reported incidence rates of 12.3%–19.7% (National Health Commission Medical Administration, Chinese Society of Oncology 2023). These infections not only prolong hospitalization and inflate costs by 40%–60% (Liu et al. 2021) but also nearly triple 30-day postoperative mortality (Deery et al. 2020). Consequently, there is a critical need for reliable predictive and preventive strategies.Risk prediction models lie at the heart of precision medicine, bridging biomedical engineering and clinical practice. Among these, nomograms have gained traction for their clear visualization and ease of clinical integration, particularly in infection control (Gu et al. 2025; Huang et al. 2022). Yet the vast array of potential predictors—spanning patient demographics, laboratory values, and operative parameters—poses a significant challenge to variable selection. To address this, we present a novel framework that sequentially applies random forest and least absolute shrinkage and selection operator (LASSO) regression for feature selection, coupled with temporal validation. This approach not only aligns with clinical exigencies but also enhances the methodological rigor and applicability of postoperative infection risk models in colorectal cancer.

Subjects and methods

Subjects and methods

Study population
We conducted a retrospective cohort study of patients who underwent colorectal cancer surgery at a tertiary Grade A (Class 3 A) hospital in Shandong Province, China, between 1 January 2020 and 31 December 2024. A total of 1,760 eligible patients were included. Eligibility criteria were: age ≥ 18 years; hospital stay ≥ 48 h; histopathological confirmation of colorectal cancer; and availability of medical records required for analysis. All included patients underwent primary tumor resection with curative intent (colectomy/proctectomy), performed via a laparoscopic or open approach. Patients were excluded if they received palliative surgery, had evidence of infection at admission or before surgery (community-acquired or hospital-acquired), or were discharged against medical advice. A pilot audit estimated a postoperative hospital-acquired infection rate of ~ 10% (φ ≈ 0.10); with an allowable absolute error of 0.03, the minimum required sample size was calculated as 384, and the final sample exceeded this requirement. The study protocol was approved by the institutional ethics committee (Approval No. 2021-R-110).

Study methods

Study design
This retrospective study used data from 2020 to 2022 for model development and data from 2023 to 2024 as a temporal validation cohort to evaluate model performance under real-world temporal variability. The study was reported in accordance with the TRIPOD guidelines.

Survey methods
We used a study-specific case report form—the “Targeted Surveillance Form for Postoperative Hospital-Acquired Infections in Colorectal Cancer”—to standardize data capture. This form was a structured data-abstraction instrument developed by the infection-control and surgical teams for routine postoperative infection surveillance. It was used to ensure uniform extraction and coding of perioperative variables and infection outcomes from the hospital infection surveillance platform and electronic health records, thereby improving data completeness, consistency, and auditability. The form captured demographic characteristics, baseline nursing assessments, operative details, perioperative treatments, ancillary test results, and the status of postoperative hospital-acquired infection.

Covariables
Covariables were retrospectively extracted from the electronic health record and the hospital infection surveillance platform using the study-specific surveillance form. Available predictors included demographics and anthropometrics (sex, age, BMI), comorbidities and history (hypertension, diabetes, coronary heart disease, smoking history, alcohol history, prior abdominal surgery, and other cancer history), tumor-related features (pathological type, differentiation degree, and lymph node metastasis), operative/perioperative factors (surgical approach, operative time, ASA grade, NNIS score, colostomy, perioperative blood transfusion, wound drainage, and abdominal drainage duration), and early postoperative indicators (postoperative complications and persistent fever for ≥ 3 consecutive days). Age, BMI, and operative time were analyzed as continuous variables. For categorical analyses, ASA grade was grouped as < III versus ≥ III, NNIS score as < 2 versus ≥ 2, and abdominal drainage duration as < 4 versus ≥ 4 days. Persistent fever for ≥ 3 consecutive days was determined from postoperative temperature records, and postoperative complications were defined as any complication documented during the index hospitalization.

Data processing
To ensure completeness and accuracy, we implemented dual independent data entry with operator blinding. GCP-certified researchers abstracted baseline data strictly according to prespecified inclusion and exclusion criteria. Two investigators entered all variables independently and cross-validated key fields for concordance; discrepancies were adjudicated by a third reviewer against the source medical records. Logical and range checks were conducted in SPSS (version 26.0) to verify data integrity and correctness of variable coding.

Diagnostic criteria for hospital-acquired infections
Hospital-acquired infection was ascertained according to the Diagnostic Criteria for Nosocomial Infections (trial version, 2001) promulgated by the Ministry of Health of the People’s Republic of China (Ministry of Health of the People’s Republic of China 2001). For this study, “postoperative nosocomial infection” referred to the first episode of nosocomial infection occurring after the index colorectal cancer surgery, within 30 days postoperatively (postoperative day 1–30). Infections with evidence at admission or before surgery (community-acquired or preoperative infections) were excluded, ensuring that the outcome represented newly developed infections during the postoperative period. Case adjudication integrated clinical features, microbiological culture results, imaging evidence, and serial laboratory indices. All suspected cases were reviewed by a multidisciplinary panel comprising gastrointestinal clinicians and hospital infection-control specialists. Infections included lower respiratory tract infection, surgical-site infection, multiple-site infection (infections involving ≥ 2 anatomical sites), and other-site nosocomial infections as classified and recorded in the hospital infection surveillance platform according to the national diagnostic criteria.

Statistical analysis
Statistical analyses were performed using SPSS (version 26.0) and R (RStudio). Categorical variables are summarized as counts and percentages. Continuous variables with approximate normality are reported as mean ± s.d.; non-normally distributed data are presented as median (interquartile range). Between-group comparisons used χ² tests for categorical variables and independent-samples t-tests or Mann–Whitney U tests for continuous variables, as appropriate. All tests were two-sided, and P < 0.05 was considered statistically significant.
Candidate predictors were screened using univariable analyses. A random forest model was used to rank variable importance (mean decrease in Gini), and LASSO logistic regression (binomial) with 10-fold cross-validation was applied for feature selection; the optimal penalty parameter (λ) was chosen at the minimum cross-validated error (λ_min). Predictors retained by LASSO were entered into a multivariable logistic regression to develop the nomogram; statistical significance was assessed using two-sided Wald tests (P < 0.05) with odds ratios. Model performance was evaluated in both cohorts by AUC, calibration plots, and decision-curve analysis.

Results

Results

Incidence of postoperative nosocomial infection following colorectal cancer surgery
We analyzed 1,760 patients who underwent colorectal cancer surgery, with 1,146 in the training cohort and 614 in the validation cohort. The overall incidence of postoperative hospital-acquired infection was 9.43%. Lower respiratory tract infections were the most frequent, followed by surgical-site infections (Table 1).

Univariable analysis of risk factors for postoperative nosocomial infection in the training cohort
In the training cohort, univariable analyses showed that postoperative hospital-acquired infection was significantly associated with age, hypertension, coronary heart disease, history of other malignancies, surgical approach, NNIS score, ASA class, perioperative blood transfusion, postoperative complications, colostomy, wound drainage, duration of abdominal drainage, persistent fever for ≥ 3 consecutive days, and operative time (Table 2).

Feature selection

Random forest–derived ranking of predictor importance
The 16 candidate predictors identified on univariable analysis were entered into a random forest to quantify variable importance. As shown in Fig. 1, the five most informative predictors were age, surgical duration, postoperative complications, surgical approach, and continuous 3-day fever.

LASSO-based feature selection
To mitigate multicollinearity and reduce overfitting, the 16 candidate predictors were further subjected to LASSO regression. The LASSO coefficient profiles are shown in Fig. 2, illustrating progressive shrinkage of regression coefficients as the penalty parameter increased. Using the optimal penalty determined by 10-fold cross-validation, eight predictors with non-zero coefficients were retained, including age, coronary heart disease, perioperative blood transfusion, colostomy, surgical approach, ASA grade, persistent fever for 3 consecutive days, and postoperative complications. The cross-validation performance across log(λ) is presented in Fig. 3, where λ_min (and the more parsimonious λ_1se) are indicated. In line with the random forest–derived importance ranking (Fig. 1), age, surgical duration, postoperative complications, surgical approach, and persistent fever for 3 consecutive days were among the most informative predictors. The retained predictors were subsequently carried forward for model development.

Independent predictors identified by multivariable analysis
The predictors retained after feature selection were entered into a multivariable logistic regression model. After adjustment, age, coronary heart disease, perioperative blood transfusion, colostomy, surgical approach, ASA grade, persistent fever for 3 consecutive days, and postoperative complications remained independently associated with postoperative hospital-acquired infection. Detailed regression estimates are presented in Table 3.

Development and validation of a nomogram predicting postoperative nosocomial infection after colorectal cancer surgery

Construction of the nomogram prediction model
Figure 4 shows the nomogram derived from the multivariable logistic model to estimate the probability of postoperative hospital-acquired infection after colorectal cancer surgery. For each predictor, locate the patient’s value to assign a score on the “Points” scale; summing these yields a “Total Points” score, which is then mapped—by projecting vertically—to the corresponding predicted probability of infection.

Model validation
We assessed discrimination, calibration, and clinical utility in both the training and validation cohorts using receiver-operating-characteristic (ROC) analysis, calibration curves, and DCA. The model showed good discriminatory performance, with an AUC of 0.860 in the training cohort and 0.843 in the validation cohort (Figs. 5 and 6), indicating effective discrimination between patients with and without postoperative nosocomial infection after colorectal cancer surgery.

Calibration plots for the training and validation cohorts are presented in Figs. 7 and 8. In the training cohort, the calibration curve closely approximated the 45° reference line, indicating good agreement between predicted and observed risks. In the validation cohort, calibration remained acceptable overall, with modest departures from the ideal line at the lower range of predicted probabilities, suggesting generally reliable risk estimation across cohorts.

Decision-curve analysis for the training and validation cohorts is shown in Figs. 9 and 10. The x-axis represents the threshold probability for postoperative nosocomial infection, and the y-axis indicates the net benefit. In both cohorts, the model yielded a positive net benefit across a broad range of clinically relevant threshold probabilities and consistently outperformed the default “treat-none” strategy (net benefit = 0), supporting its potential clinical utility for risk-stratified decision-making.

Discussion

Discussion
We applied a sequential random forest–LASSO feature selection strategy to delineate risk factors for hospital-acquired infection following colorectal cancer surgery. Using these determinants, we constructed a nomogram and performed temporal external validation. Model performance was appraised across discrimination, calibration, and clinical utility.
In our cohort, the incidence of postoperative hospital-acquired infection (HAI) was 9.69%, lower than the 21.0% reported by Xu Xiaofei et al. (Xu et al. 2019) and the 15.62% reported by Wang Zhengyu et al. (Wang et al. 2021). Several factors may account for these discrepancies: regional differences in pathogen ecology and transmission; variation in institutional surgical techniques and infection-prevention standards; heterogeneity in study design (patient selection, data capture and outcome definitions); temporal changes in medical practice and preventive measures; and differences in sample size affecting precision and representativeness. Notwithstanding the lower point estimate, postoperative HAIs after colorectal cancer surgery remain a substantial clinical burden. Rigorous adherence to prevention and control protocols, together with strengthened postoperative surveillance and risk assessment, is therefore warranted.
Among the independent predictors of postoperative hospital-acquired infection, operative duration was the most influential and a robust predictor of risk. This aligns with prior reports (He et al. 2020; Li and Yan 2024; Li et al. 2020), despite the absence of a standardized definition of operative time across studies. Contemporary evidence indicates that each additional 30 min of surgery increases the odds of postoperative infectious complications by 24% (OR = 1.24, 95% CI 1.17–1.31) (Bludevich BM et al. 2021). Postoperative complications were likewise independently associated with infection risk, a relationship corroborated in multicentre analyses (Matsuda et al. 2023). In particular, anastomotic leakage and pulmonary complications are strong harbingers of subsequent infection (Gao et al. 2024; Dhanasekara et al. 2023; Jin et al. 2015).
Surgical approach has emerged as an important determinant of postoperative infection risk: laparoscopic procedures, relative to open surgery, are associated with less peritoneal trauma and a blunted systemic inflammatory response (Cuk et al. 2021). The ASA classification is likewise a strong predictor of risk (Tserenpuntsag et al. 2023). In a French multicentre cohort, ASA ≥ 3 independently predicted infection, with patients classified as ASA III having 3.7-fold higher odds of postoperative infection than those classified as ASA I (OR = 3.7, 95% CI 1.5–11.1) (Tresson et al. 2023), consistent with our findings. Mechanistically, coronary artery disease (CAD) plausibly augments infection susceptibility: atherosclerosis represents a chronic inflammatory state, with circulating mediators (e.g., IL-6) that may disrupt the local wound milieu (Libby P 2021). Experimental evidence supports this pathway—coronary artery ligation in rats induced myocardial ischemia and delayed cutaneous wound healing by ~ 30%, implicating systemic hypoxia from coronary stenosis and cytokine signalling in impaired tissue repair (Su et al. 2018).
Postoperative wound drainage facilitates evacuation of fluid and blood and may support healing, but it also carries risk. As an indwelling foreign body, a drain can breach local barriers, foster bacterial colonization, and increase the likelihood of nosocomial infection (Pennington et al. 2019). Accordingly, judicious selection of drainage indication and device, minimization of dwell time, and rigorous nursing management are essential to mitigate infection risk and improve recovery.Sustained fever is a salient sentinel of infection: both its magnitude and duration correlate with severity. Prolonged pyrexia (≥ 38 °C for > 3 days) or high fever (≥ 39 °C) is strongly associated with postoperative infection; patients with prolonged fever have 2.3-fold higher odds of infection than those with isolated febrile episodes (OR = 2.3, 95% CI 1.5–3.5) (Lu et al. 2015). Without timely intervention, infection can disseminate, precipitating serious complications and, in extreme cases, endangering life.
Compared with contemporary studies, our model is distinguished by its two-stage variable selection. The random forest–LASSO cascade, rather than single-step selection, enables comprehensive identification of salient predictors, mitigates collinearity and overfitting, and yields more stable estimates with stronger out-of-sample performance. For deployment, the nomogram is intentionally simple and transparent: clinicians can sum item scores from routine clinical variables to obtain an individualized infection risk, facilitating point-of-care decision-making. Performance was consistently robust in both the derivation and validation cohorts, with satisfactory discrimination, close calibration, and favourable clinical impact as demonstrated by ROC, calibration, and clinical impact curves.
The model demonstrated strong discrimination, with areas under the ROC curve of 0.860 in the training cohort and 0.843 in the validation cohort, comparing favourably with prior reports. Nonetheless, several limitations warrant consideration. First, the retrospective design may introduce selection and information bias. Second, the cohort was drawn from a single regional tertiary hospital, which may constrain generalisability. Prospective, multicentre studies with larger and more diverse populations are needed to confirm external validity and transportability. Finally, although we identified multiple independent predictors, additional unmeasured or unrecorded factors may influence postoperative infection risk in colorectal cancer and merit further study (Wang 2024; Zhang et al. 2024; Chen et al. 2025)
In summary, the nomogram developed in this study demonstrates strong predictive performance for assessing postoperative infection risk in colorectal cancer patients. While this tool has great potential in clinical practice, several limitations must be considered. First, the retrospective design of the study introduces the possibility of selection and information biases. Although stringent inclusion criteria and data validation procedures were implemented to minimize these biases, the design cannot fully eliminate all potential confounders. Second, our cohort was derived from a single regional tertiary hospital, which may limit the external validity of the findings. The patient characteristics and infection patterns observed at this institution could differ from those in other settings, particularly those in different geographic regions or with varying healthcare resources. To enhance the generalizability of our model, future studies should consider multicenter and more diverse patient populations. Third, while we identified several key predictors, other unmeasured or unrecorded factors, such as nutritional status, microbiome composition, and specific pathogen profiles, may also influence infection risk and were not included in the model. Finally, the reliance on electronic health records and infection surveillance systems may introduce issues of data accuracy, particularly in cases of underreporting or misclassification of infections. To refine the model’s accuracy and validate its applicability, prospective, multicenter studies will be essential. Addressing these limitations is crucial to ensuring the effectiveness of this tool across diverse clinical settings. Additionally, the study period spanned the COVID-19 pandemic, which may have impacted hospital practices, infection control measures, and patient outcomes. Changes in hospital operations and staffing during the pandemic could have affected infection rates. However, the inclusion of 2023–2024 data allows us to assess trends after the pandemic’s peak and better account for these shifts in hospital practices.
The nomogram developed in this study holds significant potential to enhance clinical practice by offering a reliable, personalized risk stratification model for postoperative infections. In real-world clinical settings, this tool can be used during the perioperative period to identify patients at high risk for nosocomial infections, enabling the implementation of targeted preventive strategies. For example, patients identified as high risk could benefit from closer monitoring, early interventions such as prophylactic antibiotics, or more stringent infection control measures, which could lead to improved patient outcomes and reduced healthcare costs. Moreover, the nomogram’s simplicity makes it easily integrated into routine clinical workflows, allowing clinicians to quickly assess infection risk without the need for complex software or extensive data processing. By predicting infection risk with high accuracy, this tool can help inform clinical decisions, optimize resource allocation, and enhance care in both high- and low-resource settings.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

Nanotechnology-Assisted Molecular Profiling: Emerging Advances in Circulating Tumor DNA Detection.
International journal of nanomedicine 2026 Kang J 외 📖 OA
Building Hybrid Pharmacometric-Machine Learning Models in Oncology Drug Development: Current State and Recommendations.
CPT: pharmacometrics & systems pharmacology 2026 Fochesato A 외 📖 OA
Machine learning integrating MRI and clinical features predicts early recurrence of hepatocellular carcinoma after resection.
Scientific reports 2026 Feng L 외 📖 unpaywall
Artificial Intelligence-Enhanced Optimization of Wireless Breath Sensor Arrays for Detection of Lung Cancer Using Fuzzy Logic-Guided Genetic Algorithm and Multimodal Machine Learning.
ACS sensors 2026 Dinh D 외 📖 OA
Machine learning approaches to optimize the integration of sociodemographic factors for predicting cancer-specific survival among patients with high-risk prostate cancer.
Current urology 2026 Ajjawi I 외 📖 unpaywall
Integrative Computational Approaches to Prostate Cancer with Conditional Reprogramming and AI-Driven Precision Medicine.
Cells 2026 Fadiel A 외 📖 OA