Pairwise ratio transformation of gene expression data leads to improved checkpoint response prediction in lung cancer patients.

Pfeil J; Ma L; Lo HC; Turan T; McLaughlin RT; Shi X; Villarruel S; Wilson S; Zhao X; Samayoa J; Halliwill K

doi:10.1186/s12859-025-06332-9

← 뒤로

Pairwise ratio transformation of gene expression data leads to improved checkpoint response prediction in lung cancer patients.

1/5 보강

BMC bioinformatics 2025 Vol.27(1) p. 15

Pfeil J, Ma L, Lo HC, Turan T, McLaughlin RT, Shi X

📖 무료 전문 🟢 PMC 전문 PMC12809930

PubMed ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

[BACKGROUND] Machine learning algorithms identify patterns that would otherwise be difficult to observe in high-dimensional molecular and clinical data.

이 논문을 인용하기

↓ .bib ↓ .ris

APA Pfeil J, Ma L, et al. (2025). Pairwise ratio transformation of gene expression data leads to improved checkpoint response prediction in lung cancer patients.. BMC bioinformatics, 27(1), 15. https://doi.org/10.1186/s12859-025-06332-9

MLA Pfeil J, et al.. "Pairwise ratio transformation of gene expression data leads to improved checkpoint response prediction in lung cancer patients.." BMC bioinformatics, vol. 27, no. 1, 2025, pp. 15.

PMID 41388366 ↗

DOI 10.1186/s12859-025-06332-9

Abstract

[BACKGROUND] Machine learning algorithms identify patterns that would otherwise be difficult to observe in high-dimensional molecular and clinical data. For this reason, machine learning has the potential to have a profound impact on clinical decision-making and drug target discovery. However, there are technical challenges in adapting these tools for clinical use, including clinical feature engineering, model selection, and defining optimal strategies for model training. For cancer care, RNA sequencing of patient tumor biopsies has already proven to be a powerful molecular assay to characterize tumor-intrinsic and -extrinsic phenotypes influencing therapeutic response, but an optimal solution for using gene expression data to predict outcome is yet to be established.

[RESULTS] We developed the tauX machine learning framework to refine gene expression features and improve the predictive performance of RNA-sequencing data. The tauX framework uses aggregated ratios of positively and negatively associated predictive genes to simplify the prediction task. We showed a significant improvement in predictive performance using a large database of synthetic gene expression profiles. We also showed how the tauX framework can be used to elucidate the mechanisms of response and resistance to checkpoint blockade therapy using data from the Stand Up to Cancer (SU2C) Lung Response Cohort and The Cancer Genome Atlas (TCGA). The tauX framework achieved superior predictive performance (~ 30% improvement) compared to models built upon established feature engineering strategies or widely used cancer gene expression signatures. The tauX framework is available as a freely deployable docker container (https://hub.docker.com/r/pfeiljx/taux).

[CONCLUSION] By simultaneously modeling gene expression signatures associated with response and resistance to drug therapy, the tauX approach revealed expression patterns that can be used to improve genomic medicine strategies in several ways. Significantly, tauX allows the paired response and resistance signatures to be used to design new companion diagnostics. Application of the tauX framework can also be used to identify drug targets by indicating genes that consistently associate with resistance. Improved performance in drug response prediction using the tauX approach can support data-driven decision-making in the precision medicine space that can lead to improved clinical outcomes for patients.

[SUPPLEMENTARY INFORMATION] The online version contains supplementary material available at 10.1186/s12859-025-06332-9.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

📖 전문 본문 읽기 PMC JATS · ~42 KB · 영문

Background

Background
While recent advancements in nucleotide sequencing technology and analytical techniques have aided in the development of molecularly informed therapeutic strategies, evidence is accumulating that patient-specific effects strongly influence treatment outcomes [1, 2]. Machine learning/artificial intelligence (ML/AI) approaches have been proposed as a solution to this problem, but despite an era of remarkable development in machine learning and artificial intelligence, there are only a few examples of successful applications of ML to precision medicine and clinical drug development tasks [3, 4]. The reasons for the limited applications include a combination of conceptual, technical, and regulatory hurdles. Examples of challenges outside of the regulatory framework surrounding the clinical use of AI technology include the complexity of biological data and the limited number of clinically relevant training datasets. These challenges are unlikely to be solved by increasingly sophisticated ML algorithms, as more complex models perform poorly on high-dimensional data with a small number of samples [5].
While genetic data are commonly used for subtyping cancer patients and matching patients to targeted therapies, only a small proportion of patients carry actionable genetic markers that allow them to benefit from this profiling [6]. In contrast to the specificity of targeted panels of actionable molecular alterations, whole-transcriptome gene expression data may inform treatment decisions for a broader swath of patients regardless of the presence of specifically targetable alterations. The development of superior predictive signatures derived from whole-transcriptome gene expression data paired with flexible modeling solutions may facilitate the adoption of ML and transcriptome data for clinical decision-making [7, 8].
Traditional RNA-seq analysis includes differential expression analysis, where RNA-seq counts are modeled as negative binomial distributions and significant differences are determined using a hypothesis test [9]. Although genes are known to work together to achieve biological functions, most differential gene expression models assume independence, so the interaction across genes is not directly modelled. Differentially expressed genes may be used alone or for gene-set enrichment analysis (GSEA). GSEA is a well-established technique for understanding the coordinated expression of complex biological systems [10–12]. Genes known to contribute to specific biological processes have been assembled into geneset databases, including MSigDB, Reactome, KEGG, and the Gene Ontology Knowledgebase [12–16]. For gene expression analysis of cancer samples, GSEA can measure the relative level of immune infiltration, stromal composition, and tumor intrinsic pathway activation [17]. It has also been shown that the ratio of specific gene sets or other molecular and cellular features can further improve predictions of clinical outcomes, including survival and response to therapies [8, 18]. However, a generalizable framework for the optimization of predictive gene expression ratios has not been thoroughly explored.
Here, we describe a computationally intensive gene expression transformation that internally normalizes gene expression profiles using pairwise gene ratios. Using synthetic and clinical lung cancer data, we showed that superior predictive accuracy can be achieved using gene expression ratios. For example, gene ratio features achieved the highest validation score in the recent anti-PD1 response prediction challenge [19, 20]. Further characterization of engineered gene-set ratios revealed new enriched biological pathways involved in the response to checkpoint blockade therapy that could lead to improved patient subtyping and response prediction. This framework makes no assumptions about the prediction task and can be applied to any precision medicine task involving binary outcomes and gene expression data.

Methods

Methods

Traditional gene expression feature engineering approaches
The synthetic gene expression profiles were visualized as a hierarchically clustered heatmap using the Ward algorithm [21, 22]. The tauX framework was compared to two routine preprocessing steps for developing gene expression machine learning algorithms. The first is to rank the log-normalized genes by their variance to obtain the most highly variable genes (HVGs) [23]. The second strategy was to correlate log-normalized gene expression values with the binary response outcome variable using a Z score to obtain the most differentially expressed genes (DEGs) [24]. The HVGs and the DEGs were used as input for training machine learning tools.

tauX response feature engineering approach
The tauX strategy uses gene feature counts normalized to transcripts per million (TPM) values. To ensure numerical stability, the TPM values are rescaled using the min–max scaler to a range between 1 and 100 [25].
Outlier gene expression was also rescaled to 1.5 times the interquartile range value [26]. An all-vs-all approach was used to identify gene ratios that associate with the outcome variable. A user defined minimum expression filter was used to remove low expressed genes and ensure numerical stability. Due to the computationally intensive nature of this calculation, the C + + programming language was used to make the pairwise ratio calculations more efficient.
Gene ratios were then compared between responder and nonresponder samples using Student’s t-statistic to identify differentially expressed gene ratios (DEGRs).
The resulting DEGRs were subsequently clustered to identify modules of gene ratio expression associated with response. Since the input DEGRs were positively correlated with response, the numerator genes were associated with response, and the denominator genes were associated with nonresponse. Specific biological functions were investigated using gene set overlap analysis [27]. The mathematical structure of the ratios allowed us to detect enrichment of the ratio associated with expression using the SingScore bidirectional gene set enrichment approach [28]. Specifically, the numerator genes were used as the up signatures, and the denominator genes were used as the down signatures.

Automated machine learning model generation
The optuna Bayesian hyperparameter optimization framework was used to train each of the machine learning models [29]. Using the optuna framework ensured that the best performing model was used when evaluating each feature engineering approach. The top 500 input features from the traditional and tauX feature engineering strategies were used to train commonly used machine learning algorithms, specifically elastic net regression, linear support vector machine (SVM), and radial basis function (RBF)-kernel SVM [25, 29]. In every case, the machine learning model was evaluated using a hold-out set of samples.
Generation of Synthetic Gene Expression Data.
Synthetic gene expression data were generated using the TCGA lung adenocarcinoma (LUAD) study (N = 601) [30, 31]. We varied several important parameters to explore the effect of important gene expression and sample subtype parameters. These parameters included the effect size (Cohen’s d: 3.0, 2.0, 1.0, 0.75, 0.5, 0.25), number of DEGs (100, 50, 30, 20, 10), percentage of synthetic responders (0.25, 0.2, 0.15, 0.10, 0.05), and strength of the correlation between DEGs (Spearman R: 0.9, 0.75, 0.5, 0.25). The parameter sweep yielded 600 unique experiments and generated 240,000 unique synthetic gene expression profiles for training and testing the tauX framework.
The recount3 recomputed counts for the TCGA LUAD study were used to model the lung adenocarcinoma gene expression distribution [32]. Gene expression counts were scaled to transcripts per million (TPM) and normalized using a log2(TPM + 1) transformation [33]. The mean vector and covariance matrix for the background distribution were calculated to model the correlation structure of the TCGA-LUAD cohort. A multivariate normal distribution was estimated to accurately model expression patterns of lung cancer patients. First, pairwise Pearson correlation scores for all gene pairs were calculated. The correlation scores were then converted to a multivariate covariance matrix using the following definition.
Synthetic gene expression profiles were then sampled from the resulting multivariate normal distribution . Responder gene expression profiles were sampled from a modified background distribution where the means were shifted to accommodate the statistical properties for each experiment. Synthetic LUAD gene expression profiles were then used for downstream precision medicine subtyping experiments.

Generation of Stand Up to Cancer (SU2C) lung ICI response training and testing data
The Stand Up To Cancer-Mark Foundation recently published an integrative analysis of non-small cell lung cancer (NSCLC) that included 152 samples with RNA-seq and immune checkpoint inhibitor (ICI) response data [34]. The SU2C Lung cohort included lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) patient samples. The expression data were then compared to those of patients who received anti-PD(L)1 therapy and experienced progressive disease (PD), partial response (PR), or complete response (CR). Outlier gene expression profiles were identified using isolation forests [35], local outlier factor analysis [36], and one-class SVM [37]. Consensus clustering was used to identify expression subtypes [38–40]. The number of clusters was selected by optimizing the Bayesian information criterion, Davies–Bouldin index, silhouette score, and Calinski–Harabasz index.
The training and testing cohorts were generated by random sampling (80:20 split), and the anti-PD1 therapy response label and consensus cluster assignments were used to stratify the patients (n = 72). The tauX gene expression signatures were learned by applying the tauX approach to the SU2C training data. The resulting ratios were clustered to identify modules of correlated expression and to generate bidirectional gene sets. The SingScore enrichment algorithm was used to stably score single samples without the need for a background cohort [28]. The resulting gene-set enrichment scores were used to train the elastic net, linear SVM, and nonlinear RBF-SVM models.

Survival analysis
K‒M plots and log-rank statistics were generated using the R survminer package [41]. Cox proportional hazards models were fit using the R survival package [42, 43]. The plots were generated using ggplot [44], matplotlib, and seaborn [22, 45] .

Results

Results

Overview of the tauX gene expression modeling framework
The tauX framework was designed to identify paired gene expression effects that correlate with response to immune checkpoint blockade (Fig. 1). The goal of the tauX framework is to minimize background noise and maximize signal associated with clinical response. The combinatorial complexity of this problem was computationally challenging (i.e., C2(10,000) 5 × 107 comparisons). To scale to an entire transcriptome, the computationally efficient C + + programming language was used to accelerate the identification of gene expression ratio features (Supplementary Fig. 1).
Pairwise ratios were calculated across all genes of interest. Because the distribution of gene expression values varies considerably across genes, each gene was rescaled to a range between 1 and 100 (see Methods). Rescaling guarantees numerically stable comparisons across the entire transcriptome. Most pairwise comparisons were not associated with response, so a t-statistic was calculated to identify which ratios were consistently associated with response, accounting for the effect and sample sizes. Consensus clusters were optimized using recommended clustering quality metrics, including the BIC, the Davies–Bouldin index, the silhouette score, and the Calinski–Harabasz index [39]. This resulted in the identification of 7 new response signatures for the StandUp2Cancer cohort (Supplementary Table 1). Each gene expression ratio module was further characterized using gene set enrichment analysis to characterize the functional enrichment of biological pathways. The gene ratio modules were then deconstructed into up- and downregulated gene sets for bidirectional gene set enrichment using the SingScore approach [28]. The resulting enrichment scores greatly reduced the total number of features, which is beneficial, particularly when the amount of training data is small. The gene ratio enrichment scores (GRESs) were then used as features for training ML algorithms and applied to synthetic data and real-world lung cancer data.

Performance of the tauX approach on synthetic data
Synthetic datasets were constructed to evaluate the performance of the tauX framework compared to other widely used feature engineering strategies. These strategies were also compared across widely used machine learning algorithms, including ElasticNet, linear SVM, and nonlinear RBF SVM (Fig. 2). Each method analyzed the same set of simulated expression data, as described above, spanning 600 experimental parameters. Overall, the tauX framework consistently outperformed the traditional approaches of using differentially expressed genes (DEGs) and highly variable genes (HVGs). The average AUCs for the tauX framework across the ElasticNet, linear SVM and RBF SVM models were 0.81, 0.83, and 0.83, respectively (Fig. 2). The average AUC using the HVGs was 0.58 across all three methods. The average AUCs of the DEGs were 0.63, 0.61, and 0.61, respectively. The tauX framework was able to maintain predictive performance at lower effect sizes, suggesting that the tauX framework identifies subtle changes in expression that may correlate with the response to therapies.
Visual inspection of the features using hierarchically clustered heatmaps (Fig. 3A) revealed considerable amounts of signals not associated with response for HVGs and DEGs. Surprisingly, the gene ratio approach showed low amounts of background signal and elevated levels of response-associated signal. The dendrogram for variably expressed genes and differentially expressed genes showed modest separation between responders and non-responders, which is consistent with the observed clustering of other publicly available cancer drug response data [46]. The tauX-engineered features showed clear separation, suggesting that the ML algorithms may more easily identify these patterns (chi-squared p value < 0.05).
We next examined the predictive performance of commonly used ML algorithms as compared to the tauX framework. The ML algorithms were trained using Bayesian hyperparameter optimization on the training data and applied to a hold-out validation cohort (Fig. 3B). The performance was visualized using a receiver operating characteristic (ROC) plot. The tauX engineered gene ratios achieved excellent performance on the training data (AUC > 0.9), whereas the traditional approaches were unable to identify a signal. To determine whether the tauX model overfit the training data, we predicted responses in an out-of-sample cohort of synthetic data generated from the same background cohort but not used for training. The tauX-generated features exhibited excellent predictive performance in the hold-out data (AUC > 0.9), suggesting that this strategy may have a computational advantage over existing gene expression feature engineering approaches.

Statistical properties underlying the tauX framework
Synthetic data were generated to model a pair of inversely expressed genes, such as genes related to response and resistance marker expression (Supplementary Fig. 3). One way to improve the specificity of machine learning and AI applications is to increase the distributional distance, also known as the Kullback–Leibler divergence, across outcome variables [47]. Statistical modeling of the tauX framework gene expression ratios revealed that the ratios follow an extreme value distribution instead of the original normal distribution, leading to significant differences between background and predictive response ratio expression. The tauX gene ratio transformation increases the distributional differences between responders and non-responders, which allowed for more accurate and reproducible separation of responders by ML algorithms.

Predicting ICI response in the Stand Up to Cancer (SU2C) lung cohort
Preliminary exploration of established biomarkers of ICI response, including tumor mutation burden, neoantigen burden, and PDL1 expression, revealed moderate predictive performance (AUC < 0.8, Supplementary Fig. 4). We sought to improve the predictive performance in this cohort by training ML models using tauX-generated GRES features. As a non-tauX-derived gene set comparator, we also assessed its predictive performance relative to the widely used hallmarks of cancer gene signatures [16] .
Unsupervised consensus clustering of the responder gene expression cohort revealed three clear subgroups of patient samples (Supplementary Fig. 2). An investigation of the histology of the clusters revealed that two of the clusters (specifically, clusters 0 and 2) correlated with LUAD histology, whereas cluster 1 was enriched for LUSC (chi-squared test, p value = 0.028). Upon further characterization of the responder clusters, it was found that cluster 0 samples were more likely to have a greater number of previous lines of therapy, whereas cluster 2 samples were more likely to have fewer previous lines of therapy (Mann‒Whitney U test, p value = 0.002). The responder cluster assignments were used to randomly stratify the samples into training and validation datasets. This ensured that the training and validation cohorts had similar compositions of known gene expression covariates, including histological subtype, treatment history, and response to checkpoint blockade therapy.

The tauX approach was then applied to the training cohort (n = 57). This resulted in the identification of 7 GRESs (Supplementary Table 1). An investigation of these GRESs revealed functional enrichment of biological pathways in the numerator and denominator gene sets (Fig. 4). The LUAD patients with fewer previous lines of therapy (response cluster 2) were enriched for classical immune activation-associated gene sets in the numerator. Similar signatures have been shown to predict the response to checkpoint blockade therapy [48]. However, not all patients with inflamed tumors respond to checkpoint blockade, so concomitant downregulation of resistance pathways may be required to achieve a response. The tauX approach also identified paired signatures of nonresponses, including transcriptional regulation, DNA damage repair, and lipid metabolism.
We found substantial enrichment in the more heavily pretreated LUAD patients for TGF-beta signaling, platelet growth factor receptor signaling, smooth muscle differentiation, and endoplasmic reticulum transport signaling (Fig. 4). The cluster 0 LUAD denominator gene set was enriched for protein folding pathways and endoplasmic reticulum stress pathways, which have been shown to be involved in resistance to cancer therapy through adaptation to hypoxia, inflammation, and angiogenesis [49]. Finally, the LUSC cluster showed the inverse trend to that of the LUAD clusters, where cell cycle signaling was a positive predictor of response and immune cell expression was a predictor of resistance. A similar pattern has also been observed by others as a resistance signature in LUSC [50].

As a comparator to the tauX-defined GRESs, we evaluated performance using the cancer hallmark gene set signatures as input to the same set of ML algorithms used above [16]. Hallmark gene sets are widely used to characterize cancer gene expression, as these gene sets were defined to capture important expression patterns associated with cancer biology and the tumor microenvironment [51]. Previously, hallmark models were built using Bayesian hyperparameter optimization to determine the best performing models for each of the feature sets. The hallmark linear SVM achieved an AUC comparable to that of existing biomarkers of response, including PDL1 staining and the TMB (AUC ~ 0.7, Supplementary Fig. 4). The models trained using enrichment scores for the 7 previously defined tauX features achieved superior predictive performance for the training and validation cohorts (AUC > 0.9), suggesting that the tauX approach may isolate more informative features for training clinical ML tools in the SU2C anti-PD1 response cohort (Fig. 5).

tauX framework signatures correlate with patient survival in the SU2C and TCGA LUAD cohorts
The tauX approach for this lung cancer cohort was initially trained on a binary outcome variable associated with response. To evaluate patient survival, which is not necessarily equivalent to response, we assessed whether the tauX predictions also correlated with greater progression-free survival in the SU2C cohort. A significant difference in survival outcomes was observed in the validation cohort for progression-free survival using the elastic net and linear SVM classifiers (log-rank test, p value = 0.0012 and 0.0062, respectively) (Fig. 6) but not the RBF SVM classifier, although the survival curves showed a similar pattern (log-rank test, p value = 0.12). We then investigated whether tauX GRESs also predict patient survival in The Cancer Genome Atlas (TCGA-LUAD) LUAD cohort [31]. A Cox proportional hazards model was fit to the TCGA-LUAD cohort using the corresponding LUAD tauX signatures and the nonsilent mutation rate as covariates (Fig. 6). The response cluster 2 signature was the most relevant signature for the treatment-naïve TCGA-LUAD cohort and achieved the greatest reduction in relative risk (HR 95% CI: 0.13–0.64). Surprisingly, the non-silent mutation rate was not an independent covariate for decreased risk when the response cluster 2 signature was included as a covariate (HR 95% CI: 0.97–1.01).

Discussion

Discussion
Gene expression analysis paired with ML tools has transformative potential for patient care but requires substantial optimization to overcome well-documented issues with overfitting and reproducibility. One strategy for ML tool performance optimization may be to relate coregulated gene expression signals, since both response to therapy and gene expression regulatory structure are affected by positive and negative factors [8, 18]. Here, we showed that a transformation of gene expression data into gene set ratios has the potential to amplify subtle changes in expression that correlate with the response to ICI therapy. To our knowledge, this is the first evaluation of pairwise gene expression ratios for ML tool optimization in the context of response prediction. Using these ratios, we identified correlated modules associated with positive and negative predictive factors for ICI response and showed that these gene ratio expression signatures (GRESs) can be used to enhance the predictive performance of ML tools relative to the commonly used hallmarks of cancer gene set collection.
Gene expression is highly correlated with the coregulation of many genes and pathways to control gene activity. When one gene or pathway is activated, the anti-gene/pathway is deactivated (i.e., Ras GEFs vs. Ras GAPs). This pattern can also be seen in the response prediction data, where the response markers are correlated with the nonresponse markers. This makes prediction challenging when a positive response prediction score may be cancelled out by an increase in the resistance prediction score. The tauX approach models the response- and resistance-associated expression simultaneously to improve its predictive accuracy. Overexpressed and underexpressed genes are regularly characterized during routine differential expression analysis. Combining positive and negative predictors via pairwise comparisons may be a new way to generate clinical gene expression signatures.
The SU2C lung cohort included patients with diverse treatment histories, and we were able to leverage this advantage here to isolate a distinct response signature for more heavily treated patients. This is a particularly exciting discovery since heavily pretreated populations of patients tend to be less responsive to checkpoint blockade therapy [52]. The tauX framework revealed elevated TGF-beta signaling, with concurrent downregulation of endoplasmic reticulum stress response pathways being associated with response. The ratios driving TGF-beta enrichment included those of the SKIL gene, which is upregulated during sustained TGF-beta signaling [53]. TGF-beta signaling is currently considered a predictor of nonresponse to checkpoint blockade therapy and predicts worse survival in patients with lung cancer [54, 55]. TGF-beta signaling may maintain an exhausted T-cell state through the inhibition of stem cell-like CD8 + T cells. Simultaneous PDL1 and TGF-beta blockade was shown to overcome these resistance mechanisms and allow antitumor immune responses to eradicate tumors [56]. The unfolded protein response (UPR) has been shown to make tumors more resilient to cellular stress in the tumor microenvironment and facilitate immune evasion through signaling to tumor-promoting monocytes/macrophages [57]. The tauX framework provides more context to the response signature and provides insight into mechanisms that are not as easily identified using traditional gene expression analysis.
The largest cluster of patients in the SU2C cohort consisted of LUAD patients with fewer previous lines of therapy ( < = 2 lines). This cluster was associated with classic predictors of response to checkpoint blockade, including T-cell activation signatures [58]. The tauX framework enhanced the adaptive immune response signature by pairing it with gene expression patterns associated with resistance, which included upregulation of proliferative signatures [59]. This patient cluster most closely resembled the pretreatment naïve TCGA LUAD cohort, and indeed, the tauX predictions were associated with better survival outcomes. This finding is consistent with the known observation that the immune involvement associated with checkpoint blockade response also correlates with a survival benefit in pretreated TCGA samples [60].
Many methods have been developed to address heterogeneity in RNA-seq data, but few of these methods are specifically designed for clinical applications. The unique constraints of medical gene expression analysis require a new set of tools to achieve clinical impact. We designed tauX to be applied to any ML task that has a binary outcome variable. This approach may be particularly helpful for relatively small training cohorts (< 100 samples), since tauX transformation improves specificity by increasing the distributional distance between responders and nonresponders. The tauX framework leverages the statistical properties of gene expression data to learn new gene expression signatures associated with response.
Despite significant improvements in predictive accuracy, challenges remain for both implementation and interpretation. Whole-transcriptome gene expression analysis as a clinical biomarker for treatment selection is limited to higher-resource settings and may impose delays in therapy initiation. Due to its unbiased nature, the tauX framework derived GRESs may involve new expression signatures that are not immediately interpretable, as the expression patterns may not reflect canonical signaling pathways. This is a limitation, but it is also a strength in that this may reveal new therapeutic targets or resistance mechanisms that can be used to stratify patients. Experimental validation of signatures may be required to fully explain the association between the identified ratios and response, and how to best measure them in the clinic.
The tauX framework is a flexible modeling strategy with the potential to improve the performance of ML tools for response prediction using gene expression data. The framework makes no assumptions about the prediction task and therefore can be applied to many different machine learning tasks. We have published the tauX approach as a docker container to enable further community development of this framework, with the hope that this approach will contribute to the advancement of ML for precision medicine applications.

Conclusion

Conclusion
The tauX framework enhances precision medicine approaches that leverage machine learning and AI. By creating more robust gene expression signatures for precision medicine applications, we optimize the utility of this enormously informative but complex data type. Using this framework, and the framework-derived gene expression signatures, clinical scientists will be better able to stratify patients and anticipate response. Successful application of this approach may yield new companion diagnostics, reveal more effective clinical trial designs, and generate a deeper understanding of treatment resistance mechanisms. Future work will focus on the interpretation and validation of signatures developed by tauX gene expression analysis and how to best apply this framework to improve patient outcomes.

Supplementary Information

Supplementary Information
Below is the link to the electronic supplementary material.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

Explainable active reinforcement deep learning improves lung cancer detection from CT images.
Scientific reports 2026 Nady G 외 📖 OA
GPTNeXt: Biomedical Image Classification Investigations.
Diagnostics (Basel, Switzerland) 2026 Alotaibi FA 외 📖 OA
Exploring a mutation-based signature to predict the benefits of immune checkpoint inhibitors in oncogene-addicted subsets of non-small cell lung cancer: a retrospective study.
Respiratory research 2025 Huang J 외 📖 OA
High-Throughput Chemotherapeutic Drug Screening System for Gastric Cancer (Cure-GA).
Annals of surgical oncology 2025 Lee J 외 📖 OA
Relationship between infection and programmed death-ligand 1 in gastric cancer: A meta-analysis.
World journal of clinical oncology 2025 Yang HC 외 📖 OA
Organoid-based precision medicine in pancreatic cancer.
United European gastroenterology journal 2025 Beutel AK 외 📖 OA