Multi-Patient Vision Transformer for Markerless Tumor Motion Forecasting.
Accurate forecasting of lung tumor motion is crucial for precise radiotherapy.
APA
Rotsart de Hertaing G, Manjah D, Macq B (2026). Multi-Patient Vision Transformer for Markerless Tumor Motion Forecasting.. Biomedicines, 14(3). https://doi.org/10.3390/biomedicines14030496
MLA
Rotsart de Hertaing G, et al.. "Multi-Patient Vision Transformer for Markerless Tumor Motion Forecasting.." Biomedicines, vol. 14, no. 3, 2026.
PMID
41898143
Abstract
Accurate forecasting of lung tumor motion is crucial for precise radiotherapy. Deep-learning-based markerless tracking methods have been explored, but extending these approaches to predict future tumor trajectories remains largely unaddressed. We address this by framing markerless lung tumor motion forecasting as a spatio-temporal prediction task using a vision transformer to estimate three-dimensional tumor positions over short horizons. Digitally reconstructed radiographs (DRRs) generated from four-dimensional computed tomography scans of 12 lung cancer patients were used to train a multi-patient (MP) model. Patient-specific (PS) models trained solely on planning data were compared, and the MP model was further fine-tuned using a small number of patient-specific treatment images under realistic clinical constraints. Models processed sequences of 12 DRRs, with performance evaluated via root mean square error. The results indicate that low-resolution inputs with larger patch sizes outperform higher-resolution configurations by reducing image noise. PS models require extensive data to match MP performance, whereas fine-tuning the MP model with limited patient-specific data achieves comparable or superior forecasting accuracy at a lower cost. These findings demonstrate that Vision Transformers can extend markerless tracking methods to accurate short-term forecasting and highlight fine-tuning as an efficient strategy for personalized prediction.