eNose technologies in the detection of cancer: a systematic review and meta-analysis.
메타분석
1/5 보강
[BACKGROUND] In 2004, researchers in the United Kingdom were able to train six dogs to distinguish between urine samples from bladder cancer patients and healthy controls using smell, achieving a dete
- 95% CI 82.3-88.9
- Sensitivity 85.9%
- Specificity 83.6%
- 연구 설계 meta-analysis
APA
Neugut EJ, Neugut AI (2026). eNose technologies in the detection of cancer: a systematic review and meta-analysis.. The oncologist, 31(3). https://doi.org/10.1093/oncolo/oyag016
MLA
Neugut EJ, et al.. "eNose technologies in the detection of cancer: a systematic review and meta-analysis.." The oncologist, vol. 31, no. 3, 2026.
PMID
41604309 ↗
Abstract 한글 요약
[BACKGROUND] In 2004, researchers in the United Kingdom were able to train six dogs to distinguish between urine samples from bladder cancer patients and healthy controls using smell, achieving a detection rate of 41% (95% confidence interval [CI]: 23-58%), far above the 14% expected by chance. After numerous subsequent studies validated that the odor of breath and urine samples could be used by dogs to detect cancer, researchers pivoted to electronic noses (eNoses), sensor-based systems that mimic the sense of smell using arrays of chemical detectors. In this study, we review the potential efficacy of eNoses in the detection of selected cancer types in human biological samples.
[METHODS] We identified and performed a meta-analysis on 37 studies of eNose technology, comprising 1 365 cancer patients and 2 249 control subjects. We calculated the pooled sensitivity and specificity stratified by cancer type, sample type, and sensor type. Meta-regressions were conducted on these variables as well as the number of sensors used in the sensor array.
[RESULTS] All six cancer types analyzed-breast, colorectal, gastric, lung, ovarian, and prostate-achieved pooled sensitivities and specificities above 70%, with most around 85%. The overall pooled sensitivity was 85.9% (95% CI: 82.3-88.9%) and specificity was 83.6% (95% CI: 78.6-87.7%). Meta-regression revealed that the number of sensors in the sensor arrays, up to 15 sensors, was predictive of sensitivity with PFDR < 0.001.
[CONCLUSION] This analysis found that eNoses constitute a promising tool in the early detection of cancer. However, more research is necessary before it can be introduced into clinical settings.
[METHODS] We identified and performed a meta-analysis on 37 studies of eNose technology, comprising 1 365 cancer patients and 2 249 control subjects. We calculated the pooled sensitivity and specificity stratified by cancer type, sample type, and sensor type. Meta-regressions were conducted on these variables as well as the number of sensors used in the sensor array.
[RESULTS] All six cancer types analyzed-breast, colorectal, gastric, lung, ovarian, and prostate-achieved pooled sensitivities and specificities above 70%, with most around 85%. The overall pooled sensitivity was 85.9% (95% CI: 82.3-88.9%) and specificity was 83.6% (95% CI: 78.6-87.7%). Meta-regression revealed that the number of sensors in the sensor arrays, up to 15 sensors, was predictive of sensitivity with PFDR < 0.001.
[CONCLUSION] This analysis found that eNoses constitute a promising tool in the early detection of cancer. However, more research is necessary before it can be introduced into clinical settings.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
📖 전문 본문 읽기 PMC JATS · ~91 KB · 영문
Introduction
Introduction
In this review, we examine the efficacy and potential of electronic nose (eNose) technology in the detection of various types of cancer. In addition, we assess the performance of different sensor types included in the eNose, the number of sensors, and the sample type analyzed.
Origin in canine olfactory studies
The use of olfactory detection for cancer dates back to a 1989 case report when a dog’s unusual interest in a spot on its owner’s hand led to the detection of a melanoma at King’s College Hospital in London.1 Interest grew, and in 2004, researchers in the United Kingdom trained six dogs to distinguish between urine samples from bladder cancer patients and healthy controls, achieving a detection rate of 41% (95% CI: 23-58%), far above the 14% expected by chance.2
Canine experiments were subsequently conducted for various cancer types, including bladder,2,3 breast,4–6 colorectal,7 hepatic,8 lung,4,5,9–18 melanoma,5 osteosarcoma,19 ovarian,20 and prostate,21–24 with many reporting sensitivities and specificities >90%. For example, a notable 2015 study by Taverna et al.25 at the Humanitas Clinical and Research Center in Milan, Italy trained two German Shepherds to detect prostate cancer and reported impressive results: 99.3% sensitivity and 98.2% specificity on urine samples from 162 cancer patients and 310 controls.
While the use of canines faced significant practical barriers, including concerns about the FDA’s willingness to authorize animal-based diagnostic methods, logistical challenges in costs and scaling, and ethical concerns,26,27 the dog studies revealed that the scent profile of breath, urine, and other samples contain chemical patterns that indicate the cancer status of an individual, leading researchers to new approaches for cancer detection.
Sensor arrays for cancer detection
eNoses are biomimetic systems designed to replicate the sense of smell with the use of an array of chemical sensors. Originally developed for applications in the food and beverage industries to assess product quality and aroma profiles, these technologies have expanded into fields, such as environmental monitoring and medical diagnostics.28,29 In particular, their ability to detect and analyze complex chemical patterns in biological samples has made them promising tools for non-invasive cancer detection.30
eNoses function by generating a collective response to a sample’s chemical composition, forming a “signature” that can be analyzed to detect specific patterns related to disease.31 Sensor arrays use a combination of chemical sensors to detect volatile organic compounds (VOCs) in biological samples, such as breath, urine, blood, and saliva. Each sensor in the array responds to a different set of compounds based on their chemical properties, producing a collective “fingerprint” that can be analyzed with the use of statistical or machine learning algorithms. The sensor data is then used in classification algorithms to detect patterns of VOCs that correlate with various cancers due to metabolic changes associated with the particular cancer (Figure 1).30,32,33
Comparison to chemical analyzers
Many studies have used traditional chemical analyzers, such as gas chromatography-mass spectrometry (GC-MS), proton transfer reaction-mass spectrometry (PTR-MS), field asymmetric ion mobility spectrometry (FAIMS), etc, for the detection and analysis of VOCs in biological samples.34–36 These methods involve a two-step process: first, the separation of compounds in the sample, and second, the identification of each compound based on its chemical properties.37
In contrast, rather than isolating and identifying individual compounds, eNoses respond to VOCs collectively, providing more of a general sense of the “smell” profile of the sample.38 This approach, although less granular, offers advantages in speed, simplicity, affordability, and portability of the device, making them better suited for use in clinical settings.
While studies utilizing chemical analyzers have been able to identify individual compounds that correlate with cancer, there is a lack of consensus about the specific compounds that should be used for cancer diagnosis. Gouzerh et al.33 reviewed 118 studies and found that 458 unique compounds were identified, but only 116 were found in more than one study.
Moreover, studies have reported conflicting associations for the same compounds.39 For example, Gouzerh et al.33 found that, of the five most commonly reported compounds correlated with cancers (hexanal, toluene, styrene, ethylbenzene, acetone), 30 studies reported increased concentrations in cancer patients, while 11 studies showed decreased levels.
These inconsistencies reflect the complexity of cancer-related metabolic changes, which involve broader patterns rather than individual compounds. Thus, eNose technologies, which capture these complex chemical patterns, may offer a more effective approach for cancer screening.
Sensor array technologies
While all studies in this review used eNose systems, the sensor arrays varied substantially in both the numbers and types of sensors. These sensor types differ in detection mechanism and detection limit, range of measurement, stability, and cost. A summary of common sensor types is in Table 1.40–46
These sensor technologies are integrated into both custom-built arrays and commercially available eNose systems, such as the Cyranose 320, aeoNose, PEN3, and SpiroNose (Table S1).
While some sensor arrays include multiple sensor types, most use a single sensor type but incorporate multiple sensor models, each with different chemical selectivities, ie, that respond to different sets of chemicals.
In this review, we examine the efficacy and potential of electronic nose (eNose) technology in the detection of various types of cancer. In addition, we assess the performance of different sensor types included in the eNose, the number of sensors, and the sample type analyzed.
Origin in canine olfactory studies
The use of olfactory detection for cancer dates back to a 1989 case report when a dog’s unusual interest in a spot on its owner’s hand led to the detection of a melanoma at King’s College Hospital in London.1 Interest grew, and in 2004, researchers in the United Kingdom trained six dogs to distinguish between urine samples from bladder cancer patients and healthy controls, achieving a detection rate of 41% (95% CI: 23-58%), far above the 14% expected by chance.2
Canine experiments were subsequently conducted for various cancer types, including bladder,2,3 breast,4–6 colorectal,7 hepatic,8 lung,4,5,9–18 melanoma,5 osteosarcoma,19 ovarian,20 and prostate,21–24 with many reporting sensitivities and specificities >90%. For example, a notable 2015 study by Taverna et al.25 at the Humanitas Clinical and Research Center in Milan, Italy trained two German Shepherds to detect prostate cancer and reported impressive results: 99.3% sensitivity and 98.2% specificity on urine samples from 162 cancer patients and 310 controls.
While the use of canines faced significant practical barriers, including concerns about the FDA’s willingness to authorize animal-based diagnostic methods, logistical challenges in costs and scaling, and ethical concerns,26,27 the dog studies revealed that the scent profile of breath, urine, and other samples contain chemical patterns that indicate the cancer status of an individual, leading researchers to new approaches for cancer detection.
Sensor arrays for cancer detection
eNoses are biomimetic systems designed to replicate the sense of smell with the use of an array of chemical sensors. Originally developed for applications in the food and beverage industries to assess product quality and aroma profiles, these technologies have expanded into fields, such as environmental monitoring and medical diagnostics.28,29 In particular, their ability to detect and analyze complex chemical patterns in biological samples has made them promising tools for non-invasive cancer detection.30
eNoses function by generating a collective response to a sample’s chemical composition, forming a “signature” that can be analyzed to detect specific patterns related to disease.31 Sensor arrays use a combination of chemical sensors to detect volatile organic compounds (VOCs) in biological samples, such as breath, urine, blood, and saliva. Each sensor in the array responds to a different set of compounds based on their chemical properties, producing a collective “fingerprint” that can be analyzed with the use of statistical or machine learning algorithms. The sensor data is then used in classification algorithms to detect patterns of VOCs that correlate with various cancers due to metabolic changes associated with the particular cancer (Figure 1).30,32,33
Comparison to chemical analyzers
Many studies have used traditional chemical analyzers, such as gas chromatography-mass spectrometry (GC-MS), proton transfer reaction-mass spectrometry (PTR-MS), field asymmetric ion mobility spectrometry (FAIMS), etc, for the detection and analysis of VOCs in biological samples.34–36 These methods involve a two-step process: first, the separation of compounds in the sample, and second, the identification of each compound based on its chemical properties.37
In contrast, rather than isolating and identifying individual compounds, eNoses respond to VOCs collectively, providing more of a general sense of the “smell” profile of the sample.38 This approach, although less granular, offers advantages in speed, simplicity, affordability, and portability of the device, making them better suited for use in clinical settings.
While studies utilizing chemical analyzers have been able to identify individual compounds that correlate with cancer, there is a lack of consensus about the specific compounds that should be used for cancer diagnosis. Gouzerh et al.33 reviewed 118 studies and found that 458 unique compounds were identified, but only 116 were found in more than one study.
Moreover, studies have reported conflicting associations for the same compounds.39 For example, Gouzerh et al.33 found that, of the five most commonly reported compounds correlated with cancers (hexanal, toluene, styrene, ethylbenzene, acetone), 30 studies reported increased concentrations in cancer patients, while 11 studies showed decreased levels.
These inconsistencies reflect the complexity of cancer-related metabolic changes, which involve broader patterns rather than individual compounds. Thus, eNose technologies, which capture these complex chemical patterns, may offer a more effective approach for cancer screening.
Sensor array technologies
While all studies in this review used eNose systems, the sensor arrays varied substantially in both the numbers and types of sensors. These sensor types differ in detection mechanism and detection limit, range of measurement, stability, and cost. A summary of common sensor types is in Table 1.40–46
These sensor technologies are integrated into both custom-built arrays and commercially available eNose systems, such as the Cyranose 320, aeoNose, PEN3, and SpiroNose (Table S1).
While some sensor arrays include multiple sensor types, most use a single sensor type but incorporate multiple sensor models, each with different chemical selectivities, ie, that respond to different sets of chemicals.
Methods
Methods
Search strategy
To compile the studies included in this review, we conducted a search using PubMed and examined the bibliographies of existing literature. We did not limit our search by year.
Inclusion and exclusion criteria
Exclusion criteria were applied as follows:
Studies that did not use a sensor array.
Studies that did not use unaltered and non-invasively extracted human samples, including studies that analyzed cell lines.
Studies that did not classify a cancer group against a non-cancer group.
Studies that employed compound separation techniques or used sensor arrays designed to identify specific compounds.
Studies that lacked sufficient methodological detail.
Studies that did not report empirical results for a final classification model.
Studies that analyzed samples in liquid form, ie, using an eTongue rather than an eNose.
Review methodology
Each study was read by one of the authors, including the methodology and results. We synthesized each study by describing the cancer type studied; the type of sample analyzed (eg, breath or urine); the number and types of sensors in the eNose; the sizes of the training and test datasets and the type of model validation performed; the modeling techniques employed; and the results metrics.
Treatment of multiple analyses
In cases where the cancer group was classified against different control groups, we report all classifications in separate rows. In cases where separate classifications were performed upon subgroups, such as smokers and non-smokers, we only include the full classification if one exists; otherwise, we report each subclassification in separate rows.
For studies that performed multiclass classification between cancer and two separate control groups, such as cancer patients vs. healthy controls vs. patients with benign diseases, we calculate the metrics after pooling the control groups.
Training and test sets
The training set refers to any samples used during the model development at any point, including cross-validation sets. The test set refers to external samples not involved in model training or validation and only analyzed when the training process is completed.47 Sample sizes were determined based on these definitions regardless of the terminology employed in the study.
Metrics
We report on three metrics:
Sensitivity, defined as True Positives/Total Positives.
Specificity, defined as True Negatives/Total Negatives.
Accuracy, defined as (True Positive + True Negatives)/(Total Positives + Total Negatives).
If a confusion matrix or misclassifications were listed in the text, we calculated these metrics directly. Otherwise, we used the reported values.
Model methods
In the Model Methods column, we detail the feature extraction and classification techniques used. If an ensemble was used, all constituent models are listed. When studies tested multiple methods, we report only the method highlighted in the abstract or the one with the highest accuracy.
Sensor array
The Sensor Array column describes the device utilized in the study. We only list sensors that directly detect VOCs and omit temperature and RH sensors, as well as sensors placed outside the main detection chamber to monitor external conditions. For commercial devices, we name the device directly.
Quality assessment
To evaluate the methodological quality of the included studies, we used the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool.48 Sensitivity analyses were conducted by re-running meta-regressions excluding studies with ≥2 domains rated as “High” risk of bias.
Meta-analysis methodology
We conducted a meta-analysis on studies that included an independent test set. In cases where a study reported multiple independent classification analyses (eg, separate cancer types or distinct control groups), we split them into separate data points.
We performed three sets of analyses:
Descriptive visualizations.
Stratified meta-analyses by subgroup.
Bivariate meta-regressions with single moderators.
Both the stratified meta-analyses and meta-regressions utilized a bivariate random-effects model with restricted maximum likelihood (REML) estimation to predict sensitivity and specificity jointly while accounting for correlation.49,50 Proportions were logit-transformed prior to analysis.
For stratified meta-analyses, we grouped studies by categorical variables and estimated pooled sensitivity and specificity within each subgroup.
For meta-regressions, we modeled each moderator (categorical or numeric) as a covariate, including an interaction term with outcome type (sensitivity vs. specificity) to allow for differential effects. Between-study heterogeneity was captured using an unstructured covariance matrix.51 We assessed significance with the QM test52 and applied Benjamini–Hochberg false discovery rate (FDR) correction to adjust for multiple comparisons.53 Due to the limited total sample size, regressions with multiple covariates were not performed.
Categorical moderators with fewer than three data points per subgroup were excluded to reduce instability. For numeric moderators, we also visually inspected regression prediction plots for nonlinear trends or clusters, and excluded outliers based on multiple influence diagnostics: Cook’s distance, studentized residuals, and leverage.54
Reporting standards
This process abided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.55 This review was registered on the Open Science Framework (OSF) registry (DOI: https://doi.org/10.17605/OSF.IO/MHBWC).
Search strategy
To compile the studies included in this review, we conducted a search using PubMed and examined the bibliographies of existing literature. We did not limit our search by year.
Inclusion and exclusion criteria
Exclusion criteria were applied as follows:
Studies that did not use a sensor array.
Studies that did not use unaltered and non-invasively extracted human samples, including studies that analyzed cell lines.
Studies that did not classify a cancer group against a non-cancer group.
Studies that employed compound separation techniques or used sensor arrays designed to identify specific compounds.
Studies that lacked sufficient methodological detail.
Studies that did not report empirical results for a final classification model.
Studies that analyzed samples in liquid form, ie, using an eTongue rather than an eNose.
Review methodology
Each study was read by one of the authors, including the methodology and results. We synthesized each study by describing the cancer type studied; the type of sample analyzed (eg, breath or urine); the number and types of sensors in the eNose; the sizes of the training and test datasets and the type of model validation performed; the modeling techniques employed; and the results metrics.
Treatment of multiple analyses
In cases where the cancer group was classified against different control groups, we report all classifications in separate rows. In cases where separate classifications were performed upon subgroups, such as smokers and non-smokers, we only include the full classification if one exists; otherwise, we report each subclassification in separate rows.
For studies that performed multiclass classification between cancer and two separate control groups, such as cancer patients vs. healthy controls vs. patients with benign diseases, we calculate the metrics after pooling the control groups.
Training and test sets
The training set refers to any samples used during the model development at any point, including cross-validation sets. The test set refers to external samples not involved in model training or validation and only analyzed when the training process is completed.47 Sample sizes were determined based on these definitions regardless of the terminology employed in the study.
Metrics
We report on three metrics:
Sensitivity, defined as True Positives/Total Positives.
Specificity, defined as True Negatives/Total Negatives.
Accuracy, defined as (True Positive + True Negatives)/(Total Positives + Total Negatives).
If a confusion matrix or misclassifications were listed in the text, we calculated these metrics directly. Otherwise, we used the reported values.
Model methods
In the Model Methods column, we detail the feature extraction and classification techniques used. If an ensemble was used, all constituent models are listed. When studies tested multiple methods, we report only the method highlighted in the abstract or the one with the highest accuracy.
Sensor array
The Sensor Array column describes the device utilized in the study. We only list sensors that directly detect VOCs and omit temperature and RH sensors, as well as sensors placed outside the main detection chamber to monitor external conditions. For commercial devices, we name the device directly.
Quality assessment
To evaluate the methodological quality of the included studies, we used the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool.48 Sensitivity analyses were conducted by re-running meta-regressions excluding studies with ≥2 domains rated as “High” risk of bias.
Meta-analysis methodology
We conducted a meta-analysis on studies that included an independent test set. In cases where a study reported multiple independent classification analyses (eg, separate cancer types or distinct control groups), we split them into separate data points.
We performed three sets of analyses:
Descriptive visualizations.
Stratified meta-analyses by subgroup.
Bivariate meta-regressions with single moderators.
Both the stratified meta-analyses and meta-regressions utilized a bivariate random-effects model with restricted maximum likelihood (REML) estimation to predict sensitivity and specificity jointly while accounting for correlation.49,50 Proportions were logit-transformed prior to analysis.
For stratified meta-analyses, we grouped studies by categorical variables and estimated pooled sensitivity and specificity within each subgroup.
For meta-regressions, we modeled each moderator (categorical or numeric) as a covariate, including an interaction term with outcome type (sensitivity vs. specificity) to allow for differential effects. Between-study heterogeneity was captured using an unstructured covariance matrix.51 We assessed significance with the QM test52 and applied Benjamini–Hochberg false discovery rate (FDR) correction to adjust for multiple comparisons.53 Due to the limited total sample size, regressions with multiple covariates were not performed.
Categorical moderators with fewer than three data points per subgroup were excluded to reduce instability. For numeric moderators, we also visually inspected regression prediction plots for nonlinear trends or clusters, and excluded outliers based on multiple influence diagnostics: Cook’s distance, studentized residuals, and leverage.54
Reporting standards
This process abided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.55 This review was registered on the Open Science Framework (OSF) registry (DOI: https://doi.org/10.17605/OSF.IO/MHBWC).
Results
Results
Overview
After removing duplicates, 1 041 studies were initially identified, 825 studies were removed during screening, 3 studies could not be accessed, and 109 studies were excluded based on the eligibility criteria, leaving 104 studies for inclusion in this review. Of those, 37 studies included independent test sets and were thus included in the meta-analysis. (See Figure S1 for the PRISMA flowchart summarizing this process.)
Table 2 provides a summary of the reviewed papers included in the meta-analysis; additional studies included in the systematic review can be found in Table S2. Studies have been organized by cancer type and year.
Several of the 37 studies included more than one independent classification analysis (eg, different cancer types, sample types, or control groups), resulting in 46 total data points used in the meta-analysis. The total sample sizes were 1 365 for the cancer groups and 2 249 for the control groups. Across the 37 studies, there was a wide range of false positive and false negative results (Table S3). What is to be noted, however, is that the resultant sensitivities and specificities were generally good.
Across all 46 data points, the pooled estimates from the meta-analysis were:
Sensitivity: 85.9% (95% CI: 82.3-88.9%)
Specificity: 83.6% (95% CI: 78.6-87.7%)
Exploratory analyses are shown in Figure S2. Pooled metrics from stratified analyses are presented by cancer type (Figure 2), sample type (Figure S3), and sensor type (Figure S4). The meta-regression results are summarized in Table 3. Results for individual studies are presented in Table S3 and Figure S7.
Cancer type
Lung cancer was the most studied type, represented in 20 of the 37 studies. Studies of lung cancer had a pooled sensitivity of 87.3% (95% CI: 83.1-90.6%) and specificity of 80.4% (95% CI: 72.8-86.3%). The next most commonly studied type, gastric cancer, achieved a sensitivity of 86.9% (95% CI: 75.6-93.4%) and specificity of 91.7% (95% CI: 81.8-96.5%), followed by ovarian cancer with a sensitivity of 86.4% (95% CI: 59.9-96.4%) and specificity of 91.6% (95% CI: 63.8-98.5%). All cancer types studied achieved pooled sensitivities and specificities above 70%, with most around 85% (Figure 2).
Bivariate meta-regression on cancer type (k = 41) found no significant difference in sensitivity and specificity between cancer types (QM(df = 9) = 11.26, PFDR = 0.47).
Sample type
The majority of studies analyzed breath samples (32 studies), and the rest urine (5). Pooled sensitivity and specificity were similar across sample types: 86.0% (95% CI: 82.1-89.2%) and 84.9% (95% CI: 79.5-89.1%) for breath, and 85.7% (95% CI: 78.6-90.7%) and 77.5% (95% CI: 69.1-84.1%) for urine. Bivariate meta-regression (k = 47) found no significant difference by sample type (QM(df = 3) = 1.64, PFDR = 0.73).
Sensor count
The number of sensors per device ranged from 3 to 56, with a mean of 13.5 and standard deviation of 13.1.
Bivariate meta-regression across all studies (k = 46) found no significant association between sensor count and either sensitivity or specificity (QM(df = 3) = 2.79, PFDR = 0.55). However, the prediction plot (Figure S5) revealed two distinct clusters: one with 3-15 sensors showing a visible positive trend, and another starting at 32 sensors (primarily using the Cyranose 320), where no relationship was apparent.
To investigate further, we repeated the meta-regression using only studies with ≤15 sensors. One extreme outlier (Lee et al.72) was excluded based on influence diagnostics. The final model (k = 34) showed a significant association (QM(df = 3) = 23.28, PFDR < 0.001), with improved model fit relative to the full dataset (AIC = 198.8 vs. 288.3; BIC = 213.9 vs. 305.6). Between-study heterogeneity was moderate for sensitivity (τ2 = 0.221) and higher for specificity (τ2 = 0.710), with a moderate negative correlation between them (ρ = -0.469).
Sensor count was significantly associated with sensitivity: Each additional sensor in the sensor array increased the logit-transformed sensitivity by 0.183 (95% CI: 0.065–0.301, P < .01). No significant effect was observed for specificity (between-measure term: P = .50; interaction term: P = .71).
Sensor type
Six sensor types were represented across the dataset. However, only the three most common types—MOS (22 analyses), nanomaterial-based sensors (13), and polymer-based sensors (8)—had sufficient representation to be included in the meta-analysis. Bivariate meta-regression (k = 43) showed no significant association between sensor type and sensitivity or specificity (QM(df = 5) = 4.98, PFDR = 0.55).
Quality assessment
QUADAS-2 assessments are summarized in Table S4 and Figure S6. The most common high-risk domain was Patient Selection, with 63.0% of analyses rated as “High.” Sensitivity analyses of meta-regressions, in which studies with ≥2 “High” risk of bias ratings were excluded, did not yield materially different results.
Overview
After removing duplicates, 1 041 studies were initially identified, 825 studies were removed during screening, 3 studies could not be accessed, and 109 studies were excluded based on the eligibility criteria, leaving 104 studies for inclusion in this review. Of those, 37 studies included independent test sets and were thus included in the meta-analysis. (See Figure S1 for the PRISMA flowchart summarizing this process.)
Table 2 provides a summary of the reviewed papers included in the meta-analysis; additional studies included in the systematic review can be found in Table S2. Studies have been organized by cancer type and year.
Several of the 37 studies included more than one independent classification analysis (eg, different cancer types, sample types, or control groups), resulting in 46 total data points used in the meta-analysis. The total sample sizes were 1 365 for the cancer groups and 2 249 for the control groups. Across the 37 studies, there was a wide range of false positive and false negative results (Table S3). What is to be noted, however, is that the resultant sensitivities and specificities were generally good.
Across all 46 data points, the pooled estimates from the meta-analysis were:
Sensitivity: 85.9% (95% CI: 82.3-88.9%)
Specificity: 83.6% (95% CI: 78.6-87.7%)
Exploratory analyses are shown in Figure S2. Pooled metrics from stratified analyses are presented by cancer type (Figure 2), sample type (Figure S3), and sensor type (Figure S4). The meta-regression results are summarized in Table 3. Results for individual studies are presented in Table S3 and Figure S7.
Cancer type
Lung cancer was the most studied type, represented in 20 of the 37 studies. Studies of lung cancer had a pooled sensitivity of 87.3% (95% CI: 83.1-90.6%) and specificity of 80.4% (95% CI: 72.8-86.3%). The next most commonly studied type, gastric cancer, achieved a sensitivity of 86.9% (95% CI: 75.6-93.4%) and specificity of 91.7% (95% CI: 81.8-96.5%), followed by ovarian cancer with a sensitivity of 86.4% (95% CI: 59.9-96.4%) and specificity of 91.6% (95% CI: 63.8-98.5%). All cancer types studied achieved pooled sensitivities and specificities above 70%, with most around 85% (Figure 2).
Bivariate meta-regression on cancer type (k = 41) found no significant difference in sensitivity and specificity between cancer types (QM(df = 9) = 11.26, PFDR = 0.47).
Sample type
The majority of studies analyzed breath samples (32 studies), and the rest urine (5). Pooled sensitivity and specificity were similar across sample types: 86.0% (95% CI: 82.1-89.2%) and 84.9% (95% CI: 79.5-89.1%) for breath, and 85.7% (95% CI: 78.6-90.7%) and 77.5% (95% CI: 69.1-84.1%) for urine. Bivariate meta-regression (k = 47) found no significant difference by sample type (QM(df = 3) = 1.64, PFDR = 0.73).
Sensor count
The number of sensors per device ranged from 3 to 56, with a mean of 13.5 and standard deviation of 13.1.
Bivariate meta-regression across all studies (k = 46) found no significant association between sensor count and either sensitivity or specificity (QM(df = 3) = 2.79, PFDR = 0.55). However, the prediction plot (Figure S5) revealed two distinct clusters: one with 3-15 sensors showing a visible positive trend, and another starting at 32 sensors (primarily using the Cyranose 320), where no relationship was apparent.
To investigate further, we repeated the meta-regression using only studies with ≤15 sensors. One extreme outlier (Lee et al.72) was excluded based on influence diagnostics. The final model (k = 34) showed a significant association (QM(df = 3) = 23.28, PFDR < 0.001), with improved model fit relative to the full dataset (AIC = 198.8 vs. 288.3; BIC = 213.9 vs. 305.6). Between-study heterogeneity was moderate for sensitivity (τ2 = 0.221) and higher for specificity (τ2 = 0.710), with a moderate negative correlation between them (ρ = -0.469).
Sensor count was significantly associated with sensitivity: Each additional sensor in the sensor array increased the logit-transformed sensitivity by 0.183 (95% CI: 0.065–0.301, P < .01). No significant effect was observed for specificity (between-measure term: P = .50; interaction term: P = .71).
Sensor type
Six sensor types were represented across the dataset. However, only the three most common types—MOS (22 analyses), nanomaterial-based sensors (13), and polymer-based sensors (8)—had sufficient representation to be included in the meta-analysis. Bivariate meta-regression (k = 43) showed no significant association between sensor type and sensitivity or specificity (QM(df = 5) = 4.98, PFDR = 0.55).
Quality assessment
QUADAS-2 assessments are summarized in Table S4 and Figure S6. The most common high-risk domain was Patient Selection, with 63.0% of analyses rated as “High.” Sensitivity analyses of meta-regressions, in which studies with ≥2 “High” risk of bias ratings were excluded, did not yield materially different results.
Discussion
Discussion
Cancer type and sample type
eNose technology proved applicable across a broad range of cancer types, with all types yielding pooled sensitivities and specificities above 70% and most around 85%. This suggests that, despite the nascent state of this area of research, there is strong potential for this technology—and even the same device—to develop into a screening tool for numerous cancer types.
Lung cancer dominates the literature. Several authors cited as motivation the limitations of the current non-invasive diagnostic standard, low-dose computed tomography (LDCT).93,94 While LDCT has relatively high sensitivity (93.5%) and specificity (73.4%), it is costly, and its high false-positive rate (PPV = 3.8%) often leads to unnecessary follow-up procedures.95,96 Given that lung cancer is the leading cause of cancer-related death,97 a low-cost screening test with similar or better diagnostic performance to LDCT could have significant public health impact.
Cancer type is often tracked with sample type: all lung cancer studies used breath, and all prostate cancer and bladder cancer studies used urine. This pattern likely reflects the assumption that sample types anatomically closer to the tumor site may yield stronger or more specific VOC signals. That assumption may be reasonable for bladder cancer, where the tumor typically forms on the inner lining in direct contact with stored urine, and possibly for lung cancer, where most tumors are located near the airways.
In general, cancer alters cellular metabolism, producing VOCs that enter the bloodstream and are exhaled or excreted. This systemic mechanism implies that any sample type can carry a cancer-specific VOC fingerprint, regardless of tumor location. In support of this, Mohamed et al.98 analyzed lung cancer using urine, breath, and blood samples and found very similar results across all three sample types. In a similar vein, our meta-analysis found no significant difference in sensitivity or specificity between breath and urine samples. These findings suggest that both sample types should be considered viable for VOC-based cancer detection, regardless of cancer type.
Sensor count
Among studies using sensor arrays with 15 or fewer sensors, our meta-regression found a significant positive relationship between the number of sensors and sensitivity. This suggests that increasing the sensor count enhances detection performance—likely by improving the system’s capability in capturing complex VOC patterns. Because the model was estimated on the logit scale, effect size varies depending on baseline sensitivity. For illustration, increasing from 9 to 10 sensors raises the predicted sensitivity by 1.18 points, and from 10 to 11 by 1.01 points (Figure 2B).
Sensor type
Sensor type was not significantly associated with sensitivity or specificity in the meta-regression. However, this analysis was limited to only three sensor categories—metal-oxide semiconductor (MOS), nanomaterial-based, and polymer-based—due to insufficient representation of other types. Notably, all three fall under the broader class of chemiresistive sensors, which constrains the generalizability of this finding.
Of the studies reviewed, 35 of 37 (94.6%) included chemiresistive sensors in their sensor array, likely due to their low cost and wide availability.99 Because the role of the sensor array is to detect a VOC “fingerprint” rather than to identify individual compounds, more precise but costlier sensors such as QCM or infrared sensors may offer little advantage and may undermine the economic feasibility of deploying eNoses in clinical settings.
A key drawback, however, of chemiresistive sensors is the risk of sensor drift, the gradual decline in sensor accuracy over time, unlike more stable mass-based sensors.99 Bax et al.,82 who utilized MOS sensors, noted that sensor drift poses the “primary obstacle to the [eNose] diffusion for long-term applications.”
While many studies employed basic processing methods—eg baseline correction, normalization, scaling, replicate averaging56,61—only three applied more advanced techniques. Bax et al.82 and Taverna et al.26 applied orthogonal signal correction (OSC) and Lee et al.72 applied semi-supervised domain generalization (SSDG) and noise-shift augmentation (NSA); all three reported improved performance and reduced drift. Broader testing and adoption of these techniques or others, such as periodic recalibration or other domain adaptation algorithms, will be necessary before eNose systems can be considered viable for widespread clinical use.
Study design and sample sizes
A major constraint in this body of research is the relatively small sample sizes. Across the 46 analyses, the average number of patients used for model training was only 120.7. This limits the application of deep neural networks, which often require hundreds or thousands of samples to perform reliably. Neural networks—which are particularly well-suited for detecting complex VOC patterns—were used in only 11 of the 37 studies (29.7%).
Only 37 of the 104 (35.6%) papers identified tested their models on a held-out test set, with an average test dataset size of just 78.6. These small test sets weaken the strength of validation and are another significant limitation of our meta-analysis.
Moreover, 31 of the 37 studies included in the meta-analysis utilized a case-control study design, with separate, often non-random, recruitment of cancer patients and control subjects. Four studies employed prospective recruitment—ie, the enrollment of study participants before cancer status is known (eg, symptomatic patients or population-based cohorts)—a design which more closely reflects real-world scenarios. Two studies used a mixed approach. For this reason, the majority of analyses were assessed as exhibiting “High” risk of bias in the Patient Selection domain. This bias should be avoided by enrolling a representative clinical population consecutively or randomly at the point of care, prior to diagnostic confirmation.
These considerations reflect the experimental nature of the field: Studies are often early-stage feasibility efforts, piloting custom-built sensor arrays or new analytical pipelines. While these studies help demonstrate proof-of-concept, larger datasets and prospective study designs are needed for more rigorous validation and to advance eNose technology to clinical use.
Methodological variation
Studies exhibited wide methodological variation, much of which was not captured in our summary tables or meta-analyses.
Patient instructions varied. Some studies specified collecting early morning samples, when VOC concentration is generally higher and not influenced by recent food consumption.81,91 For this reason, about half the studies also instructed participants to fast for varying amounts of time before sample collection.59,71 In studies using breath samples, participants were sometimes instructed not to smoke, use mouthwash, wear perfumes, drink coffee or alcohol, or use medications before collection.43,58,66
Sample collection protocols differed across multiple dimensions. Some studies analyzed fresh samples;63,92 others froze urine samples88,91 or adsorbed breath samples78,87 for later analysis. Sampling bags and sorbent tubes for breath samples also varied in type and brand.
Analytical procedures for sample preparation and measurement were equally diverse. Urine samples were analyzed at room temperature in some studies,91 while others heated samples to various temperatures.26,92 For breath, all studies analyzed samples at room temperature, except for Mazzone et al.,57 who incubated the samples at body temperature. Some studies also specified maintaining a constant relative humidity in the sensing chamber.73,82
Headspace preparation techniques differed in implementation. Some studies circulated air or an inert gas over the sample to push VOCs into the sensor chamber,57,81 while others had patients breathe continuously into the eNose device.62,86 In certain cases, VOCs were first trapped on adsorbent materials and later released via thermal desorption, allowing pre-concentration of trace compounds.64,78 Specific methods varied in terms of carrier gas, timing, flow rate, desorption temperature, and sensor integration.
Studies varied in methods for data preprocessing, feature extraction, and classification techniques, as noted in the summary table. Furthermore, among the 104 studies reviewed overall, six studies70,75,94,100–102 tested their classification models with and without the inclusion of clinical parameters, five of which found that including these variables increased the performance of their model relative to using eNose data alone. This suggests that other studies could improve their results by including them, and that future studies should seek to secure additional data that could be used alongside sensor data for classification.
Evaluation of methodologies
A small number of studies tested multiple methodologies explicitly: Capelli et al.27 compared conditioning temperatures for urine samples at 23 °C, 37 °C, 50 °C, and 60 °C, finding that 60 °C yielded the best classification performance without risking protein denaturation. Asimakopoulos et al.103 and Capelli et al.27 tested multiple portions of the urine stream and found that the initial portion improved detection efficacy. However, neither of these studies employed independent test sets. While these comparative studies are valuable, by and large, more studies are needed to compare methodologies and help standardize these approaches.
There have not been a significant amount of studies on what factors affect sensitivity and specificity. Amal et al.84 assessed the effect of overnight fasting on the performance of their model and found no difference, indicating that fasting may not improve results. Research is needed to determine whether abstention from smoking or alcohol or other test conditions alter these metrics. In addition, it is unknown how oncologic interventions—eg, surgery or chemotherapy—prior to the eNose test affect results.
Limitations
This meta-analysis had several limitations. First, as discussed above, relatively few studies used independent test sets; those that did typically had small sample sizes, limiting the strength of our meta-analyses. Second, substantial methodological heterogeneity across studies—including differences in sample collection, preparation, and analysis—introduces confounding that could not be controlled given the limited number of studies in each subgroup. Third, in order to preserve statistical power, we included multiple data points from studies that conducted several classification tasks, which may have introduced cross-sample correlation. Fourth, our meta-analysis does not distinguish between (1) different subtypes of a cancer, (2) whether healthy individuals or patients with benign conditions were used as controls, and (3) differences in patient recruitment, which could introduce selection bias.
Cancer type and sample type
eNose technology proved applicable across a broad range of cancer types, with all types yielding pooled sensitivities and specificities above 70% and most around 85%. This suggests that, despite the nascent state of this area of research, there is strong potential for this technology—and even the same device—to develop into a screening tool for numerous cancer types.
Lung cancer dominates the literature. Several authors cited as motivation the limitations of the current non-invasive diagnostic standard, low-dose computed tomography (LDCT).93,94 While LDCT has relatively high sensitivity (93.5%) and specificity (73.4%), it is costly, and its high false-positive rate (PPV = 3.8%) often leads to unnecessary follow-up procedures.95,96 Given that lung cancer is the leading cause of cancer-related death,97 a low-cost screening test with similar or better diagnostic performance to LDCT could have significant public health impact.
Cancer type is often tracked with sample type: all lung cancer studies used breath, and all prostate cancer and bladder cancer studies used urine. This pattern likely reflects the assumption that sample types anatomically closer to the tumor site may yield stronger or more specific VOC signals. That assumption may be reasonable for bladder cancer, where the tumor typically forms on the inner lining in direct contact with stored urine, and possibly for lung cancer, where most tumors are located near the airways.
In general, cancer alters cellular metabolism, producing VOCs that enter the bloodstream and are exhaled or excreted. This systemic mechanism implies that any sample type can carry a cancer-specific VOC fingerprint, regardless of tumor location. In support of this, Mohamed et al.98 analyzed lung cancer using urine, breath, and blood samples and found very similar results across all three sample types. In a similar vein, our meta-analysis found no significant difference in sensitivity or specificity between breath and urine samples. These findings suggest that both sample types should be considered viable for VOC-based cancer detection, regardless of cancer type.
Sensor count
Among studies using sensor arrays with 15 or fewer sensors, our meta-regression found a significant positive relationship between the number of sensors and sensitivity. This suggests that increasing the sensor count enhances detection performance—likely by improving the system’s capability in capturing complex VOC patterns. Because the model was estimated on the logit scale, effect size varies depending on baseline sensitivity. For illustration, increasing from 9 to 10 sensors raises the predicted sensitivity by 1.18 points, and from 10 to 11 by 1.01 points (Figure 2B).
Sensor type
Sensor type was not significantly associated with sensitivity or specificity in the meta-regression. However, this analysis was limited to only three sensor categories—metal-oxide semiconductor (MOS), nanomaterial-based, and polymer-based—due to insufficient representation of other types. Notably, all three fall under the broader class of chemiresistive sensors, which constrains the generalizability of this finding.
Of the studies reviewed, 35 of 37 (94.6%) included chemiresistive sensors in their sensor array, likely due to their low cost and wide availability.99 Because the role of the sensor array is to detect a VOC “fingerprint” rather than to identify individual compounds, more precise but costlier sensors such as QCM or infrared sensors may offer little advantage and may undermine the economic feasibility of deploying eNoses in clinical settings.
A key drawback, however, of chemiresistive sensors is the risk of sensor drift, the gradual decline in sensor accuracy over time, unlike more stable mass-based sensors.99 Bax et al.,82 who utilized MOS sensors, noted that sensor drift poses the “primary obstacle to the [eNose] diffusion for long-term applications.”
While many studies employed basic processing methods—eg baseline correction, normalization, scaling, replicate averaging56,61—only three applied more advanced techniques. Bax et al.82 and Taverna et al.26 applied orthogonal signal correction (OSC) and Lee et al.72 applied semi-supervised domain generalization (SSDG) and noise-shift augmentation (NSA); all three reported improved performance and reduced drift. Broader testing and adoption of these techniques or others, such as periodic recalibration or other domain adaptation algorithms, will be necessary before eNose systems can be considered viable for widespread clinical use.
Study design and sample sizes
A major constraint in this body of research is the relatively small sample sizes. Across the 46 analyses, the average number of patients used for model training was only 120.7. This limits the application of deep neural networks, which often require hundreds or thousands of samples to perform reliably. Neural networks—which are particularly well-suited for detecting complex VOC patterns—were used in only 11 of the 37 studies (29.7%).
Only 37 of the 104 (35.6%) papers identified tested their models on a held-out test set, with an average test dataset size of just 78.6. These small test sets weaken the strength of validation and are another significant limitation of our meta-analysis.
Moreover, 31 of the 37 studies included in the meta-analysis utilized a case-control study design, with separate, often non-random, recruitment of cancer patients and control subjects. Four studies employed prospective recruitment—ie, the enrollment of study participants before cancer status is known (eg, symptomatic patients or population-based cohorts)—a design which more closely reflects real-world scenarios. Two studies used a mixed approach. For this reason, the majority of analyses were assessed as exhibiting “High” risk of bias in the Patient Selection domain. This bias should be avoided by enrolling a representative clinical population consecutively or randomly at the point of care, prior to diagnostic confirmation.
These considerations reflect the experimental nature of the field: Studies are often early-stage feasibility efforts, piloting custom-built sensor arrays or new analytical pipelines. While these studies help demonstrate proof-of-concept, larger datasets and prospective study designs are needed for more rigorous validation and to advance eNose technology to clinical use.
Methodological variation
Studies exhibited wide methodological variation, much of which was not captured in our summary tables or meta-analyses.
Patient instructions varied. Some studies specified collecting early morning samples, when VOC concentration is generally higher and not influenced by recent food consumption.81,91 For this reason, about half the studies also instructed participants to fast for varying amounts of time before sample collection.59,71 In studies using breath samples, participants were sometimes instructed not to smoke, use mouthwash, wear perfumes, drink coffee or alcohol, or use medications before collection.43,58,66
Sample collection protocols differed across multiple dimensions. Some studies analyzed fresh samples;63,92 others froze urine samples88,91 or adsorbed breath samples78,87 for later analysis. Sampling bags and sorbent tubes for breath samples also varied in type and brand.
Analytical procedures for sample preparation and measurement were equally diverse. Urine samples were analyzed at room temperature in some studies,91 while others heated samples to various temperatures.26,92 For breath, all studies analyzed samples at room temperature, except for Mazzone et al.,57 who incubated the samples at body temperature. Some studies also specified maintaining a constant relative humidity in the sensing chamber.73,82
Headspace preparation techniques differed in implementation. Some studies circulated air or an inert gas over the sample to push VOCs into the sensor chamber,57,81 while others had patients breathe continuously into the eNose device.62,86 In certain cases, VOCs were first trapped on adsorbent materials and later released via thermal desorption, allowing pre-concentration of trace compounds.64,78 Specific methods varied in terms of carrier gas, timing, flow rate, desorption temperature, and sensor integration.
Studies varied in methods for data preprocessing, feature extraction, and classification techniques, as noted in the summary table. Furthermore, among the 104 studies reviewed overall, six studies70,75,94,100–102 tested their classification models with and without the inclusion of clinical parameters, five of which found that including these variables increased the performance of their model relative to using eNose data alone. This suggests that other studies could improve their results by including them, and that future studies should seek to secure additional data that could be used alongside sensor data for classification.
Evaluation of methodologies
A small number of studies tested multiple methodologies explicitly: Capelli et al.27 compared conditioning temperatures for urine samples at 23 °C, 37 °C, 50 °C, and 60 °C, finding that 60 °C yielded the best classification performance without risking protein denaturation. Asimakopoulos et al.103 and Capelli et al.27 tested multiple portions of the urine stream and found that the initial portion improved detection efficacy. However, neither of these studies employed independent test sets. While these comparative studies are valuable, by and large, more studies are needed to compare methodologies and help standardize these approaches.
There have not been a significant amount of studies on what factors affect sensitivity and specificity. Amal et al.84 assessed the effect of overnight fasting on the performance of their model and found no difference, indicating that fasting may not improve results. Research is needed to determine whether abstention from smoking or alcohol or other test conditions alter these metrics. In addition, it is unknown how oncologic interventions—eg, surgery or chemotherapy—prior to the eNose test affect results.
Limitations
This meta-analysis had several limitations. First, as discussed above, relatively few studies used independent test sets; those that did typically had small sample sizes, limiting the strength of our meta-analyses. Second, substantial methodological heterogeneity across studies—including differences in sample collection, preparation, and analysis—introduces confounding that could not be controlled given the limited number of studies in each subgroup. Third, in order to preserve statistical power, we included multiple data points from studies that conducted several classification tasks, which may have introduced cross-sample correlation. Fourth, our meta-analysis does not distinguish between (1) different subtypes of a cancer, (2) whether healthy individuals or patients with benign conditions were used as controls, and (3) differences in patient recruitment, which could introduce selection bias.
Conclusion
Conclusion
This review highlights both the promise and limitations of eNose technology in cancer detection. Despite wide methodological variability, high pooled sensitivity, and specificity were found across cancer types, suggesting strong potential for future non-invasive diagnostics.
Our meta-analysis found that increasing the number of sensors—up to 15—was associated with improved model sensitivity. However, other meta-regressions yielded non-significant results.
Advancing eNose technology will require several key improvements: standardization of sampling and analysis protocols, robust sensor drift compensation, integration of clinical variables, larger, more diverse datasets, and prospective study design. Ultimately, as the field matures, rigorous validation, and methodologic standardization will be essential to move eNoses from experimental tools to clinically reliable diagnostics with the potential to transform early cancer detection.
Despite the advantages of this technology that we have highlighted in this review, a lack of widespread familiarity persists. We hope this publication will make the cancer research community more aware of the potential for eNose technology.
This review highlights both the promise and limitations of eNose technology in cancer detection. Despite wide methodological variability, high pooled sensitivity, and specificity were found across cancer types, suggesting strong potential for future non-invasive diagnostics.
Our meta-analysis found that increasing the number of sensors—up to 15—was associated with improved model sensitivity. However, other meta-regressions yielded non-significant results.
Advancing eNose technology will require several key improvements: standardization of sampling and analysis protocols, robust sensor drift compensation, integration of clinical variables, larger, more diverse datasets, and prospective study design. Ultimately, as the field matures, rigorous validation, and methodologic standardization will be essential to move eNoses from experimental tools to clinically reliable diagnostics with the potential to transform early cancer detection.
Despite the advantages of this technology that we have highlighted in this review, a lack of widespread familiarity persists. We hope this publication will make the cancer research community more aware of the potential for eNose technology.
Supplementary Material
Supplementary Material
oyag016_Supplementary_Data
oyag016_Supplementary_Data
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- A Phase I Study of Hydroxychloroquine and Suba-Itraconazole in Men with Biochemical Relapse of Prostate Cancer (HITMAN-PC): Dose Escalation Results.
- Self-management of male urinary symptoms: qualitative findings from a primary care trial.
- Clinical and Liquid Biomarkers of 20-Year Prostate Cancer Risk in Men Aged 45 to 70 Years.
- Diagnostic accuracy of Ga-PSMA PET/CT versus multiparametric MRI for preoperative pelvic invasion in the patients with prostate cancer.
- Clinical Presentation and Outcomes of Patients Undergoing Surgery for Thyroid Cancer.
- Association of patient health education with the postoperative health related quality of life in low- intermediate recurrence risk differentiated thyroid cancer patients.