Label Noise in Pathological Segmentation Is Overlooked, Leading to Potential Overestimation of Artificial Intelligence.

Harada K; Nomura Y; Komura D; Ishikawa S; Sakashita S

doi:10.1111/cas.70288

← 뒤로

Label Noise in Pathological Segmentation Is Overlooked, Leading to Potential Overestimation of Artificial Intelligence.

기술보고 1/5 보강

Cancer science 📖 저널 OA 89.1% 2022~2026 2026 Vol.117(3) p. 852-863

Harada K, Nomura Y, Komura D, Ishikawa S, Sakashita S

📖 무료 전문 🟢 PMC 전문 PMC12951102

PubMed ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

Artificial intelligence (AI) has transformed medical imaging, notably in radiology and endoscopy.

이 논문을 인용하기

↓ .bib ↓ .ris

APA Harada K, Nomura Y, et al. (2026). Label Noise in Pathological Segmentation Is Overlooked, Leading to Potential Overestimation of Artificial Intelligence.. Cancer science, 117(3), 852-863. https://doi.org/10.1111/cas.70288

MLA Harada K, et al.. "Label Noise in Pathological Segmentation Is Overlooked, Leading to Potential Overestimation of Artificial Intelligence.." Cancer science, vol. 117, no. 3, 2026, pp. 852-863.

PMID 41433191 ↗

DOI 10.1111/cas.70288

Abstract

Artificial intelligence (AI) has transformed medical imaging, notably in radiology and endoscopy. Semantic segmentation, a pixel-level technique crucial for delineating pathological features, has become pivotal in digital pathology. Pathology segmentation AI models are often trained using annotations generated by pathologists. Despite the meticulous care typically exercised, pathologist-generated annotations often contain label noise whose types and effects on model training remain underexplored. This study combined a survey of public datasets with the synthesis of artificial label noise to evaluate its effects on pathology segmentation models. Using publicly available datasets and a breast cancer semantic segmentation dataset, modules were developed to simulate four types of artificial label noise at varying intensity levels. These datasets were used to train deep learning models and their performance was evaluated. The results indicated that models were highly susceptible to overfitting label noise, particularly boundary-dependent noise, such as dilation and shrinkage. Discrepancies were identified between apparent performance scores obtained under real-world conditions and true performance scores derived using clean test data. This overestimation risk was most pronounced for datasets containing boundary-altering noise. Furthermore, random noise combinations further degraded generalization. This study underscores the critical importance of addressing label noise in pathology datasets. It is proposed that future efforts focus on developing standardized methods for quantifying and mitigating label noise, along with creating robust benchmarks using noise-inclusive datasets. Enhancing annotation quality and addressing label noise can improve the reliability and generalizability of AI in pathology, facilitating broader clinical adoption.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

Cord blood versus bone marrow/peripheral blood stem cell transplantation under reduced-intensity conditioning in high-risk acute myeloid leukemia and myelodysplastic syndromes.
Leukemia & lymphoma 2026
PI3 expression predicts recurrence after chemotherapy with DNA-damaging drugs in gastric cancer.
The Journal of pathology 2025
Ultrasound-detected gastric changes related to zolbetuximab-induced emesis: a case series.
Frontiers in oncology 2025
MicroRNA-142 improves IL1RAP CAR-T cell activity in acute myeloid leukemia.
Journal of hematology & oncology 2025
Augmentation genioplasty with hydroxyapatite blocks. A case report.
International journal of oral and maxillofacial surgery 1993

📖 전문 본문 읽기 PMC JATS · ~48 KB · 영문

Introduction

1
Introduction
The advent of artificial intelligence (AI) in healthcare has significantly transformed numerous medical fields. Recent advancements in radiology and endoscopy have notably enabled the integration of AI‐based applications into clinical practice [1]. In pathology, AI development for both clinical and research purposes is rapidly gaining attention [2, 3, 4]. Among the critical AI techniques employed in pathology is semantic segmentation, a pixel‐level method widely utilized to delineate lesions and other pathological features in medical images [5, 6]. Semantic segmentation is pivotal in digital pathology, as it facilitates the precise identification of tissue structures, cellular components, and pathological features in histological images. Detailed segmentation, such as detecting microscopic tumor cells or identifying vascular invasion, is essential for tasks where errors are intolerable. Recent progress in phenotypic and functional research on cellular and tissue structures, including analyses of protein expression, distribution, and complex tumor characteristics, depends on generating precise cell‐level segmentation data [7].
The training of pathology segmentation AI models often relies on annotations made by pathologists and data derived from immunohistochemical staining—both requiring considerable effort to produce [8, 9, 10, 11, 12, 13]. However, despite the meticulous care taken during the annotation process, these annotations frequently contain variability and errors, commonly referred to as label noise [14]. Label noise in pathology arises from several sources. First, the inherent variability in human expert annotations contributes significantly to this noise [15]. Even with common expertise, different pathologists may produce varying segmentations for the same image due to subjective interpretation, fatigue, or differing levels of experience. Second, the complexity of histological images—characterized by heterogeneous textures, overlapping structures, and subtle morphological changes—can create ambiguous regions that are difficult to annotate consistently. Additionally, manual annotation is time‐consuming and labor‐intensive, often leading to rushed or incomplete annotations that exacerbate label noise. However, there is little information available at this point on how label noise in pathology segmentation is characterized. In this study, we first evaluated public pathological segmentation databases to assess the presence of label noise.
The impact of label noise on the performance of segmentation models has been recognized across various domains. For example, Rahaman et al. investigated the effects of artificial label noise on AI models designed to identify water areas in satellite images, exploring noise types such as Gaussian, translation/shift, rotation, and mirroring/flipping [16]. Similarly, studies using magnetic resonance images have examined the influence of label noise types, including random warp, constant shift, and random crop [17]. These studies demonstrated that label noise significantly degrades model performance. Additionally, Luo et al. reported that dilation and erosion in fluorescence microscopy binary mask images reduce model segmentation performance [18]. However, these studies lacked detailed analyses of the effects of different noise types and intensities or comparative evaluations of model generalizability. Moreover, limited validation of pathological segmentation data has been conducted, and the specific impact of label noise on pathological AI models remains unclear. Therefore, in addition to investigating label noise in public pathology data, this study aimed to elucidate its impact on the training of pathology segmentation AI and its effect on generalizability. To this end, a novel experimental approach was developed using artificial label noise. Specifically, various levels and types of artificial label noise were introduced into a dataset initially considered to have minimal label noise, and the resultant effects were evaluated by comparing actual performance outcomes.

Materials and Methods

2
Materials and Methods
2.1
Datasets
We used four publicly available datasets to evaluate the presence of label noise: WSSS4LUAD [9], HuBMAP + HPA [10], CAMELYON [11], and GlaS [12] (Figure 1). All datasets used in this study were primarily prepared with hematoxylin and eosin (H&E) staining. An exception was the HuBMAP+HPA dataset: HuBMAP images were prepared using periodic acid–Schiff/H&E stains, and HPA samples were stained with antibodies visualized with 3,3′‐diaminobenzidine and counterstained with hematoxylin.
The Breast Cancer Semantic Segmentation (BCSS) dataset was used to analyze artificial label noise [13]. This dataset consists of 151 histopathological images of breast cancer, 0.25 μm per pixel, annotated for 21 different tissue components that include tumor regions, stroma, and inflammatory cells. Details of this dataset are in Table S1. This dataset was selected specifically for its detailed annotations, which allowed us to introduce controlled levels and types of artificial label noise. Labels for “angioinvasion (GT_code = 19)” and “dcis (GT_code = 20)” were combined with “tumor (GT_code = 1)”, due to the presence of tumor cells in both regions. The lesion labels were then binarized for model training and subsequent experiments.

2.2
Artificial Label Noises
To assess the effects of different types of label noise on pathology segmentation, we implemented and applied four types of artificial label noise: dilation, shrinkage, omission, and hypothetical additive. These noise types were designed to simulate common annotation errors in histopathological images, and were introduced into the BCSS dataset. Additional methods are available in Appendix S1.

2.3
Convolutional Neural Network (CNN)‐Based Model
The training model was a neural network with an encoder‐decoder structure. The encoder backbone consisted of pre‐trained convolutional neural networks, such as ResNet50 [19] or EfficientNet‐b4 [20]. The decoder models included U‐Net [21], U‐Net++ [22], FPN [23], PSPNet [24], and DeepLabV3+ [25]. The number of output classes was set to 2 (background and lesion). The final layer used a sigmoid activation function to perform the binary segmentation task.

2.4
Statistical Methods
All statistical analyses were performed using Python or R software (version 4.2.1). All tests were two‐sided, and statistical significance was set at p < 0.05. One‐sample t‐test, one‐way analysis of variance (ANOVA), Tukey's honestly significant difference (HSD), Holm adjustment test, chi–square tests, Welch's t‐test, Kruskal–Wallis test, or pairwise Mann–Whitney U test were used appropriately. Groups with fewer than three observations were excluded from statistical testing due to insufficient sample size.
We conducted two complementary web‐based surveys on label noise: (i) categorization of noise in public datasets, evaluated by 14 reviewers (seven board‐certified pathologists and seven non‐pathologists); and (ii) perceived realism of our artificial noise, evaluated by seven board‐certified pathologists. For data preparation, augmentation, and AI modeling, the BCSS dataset was tiled into 512 × 512 patches (n = 6207 from 151 cases) and split 80/10/10 (train/validation/test); training used standard augmentations (flips, shift/scale/rotate, brightness/contrast) with ImageNet normalization, and no augmentation on validation/test. We evaluated SegFormer and TransUNet under a shared training protocol (Adam, step‐decayed learning rate, early stopping, model selection by validation IoU). All implementations used Python (SciPy/Pillow/OpenCV) and PyTorch 2.0.1 on Ubuntu with NVIDIA A100 or GTX 1080 Ti. Detailed information can be found in Appendix S1.

Results

3
Results
3.1
Public Pathological Segmentation Data Contain a Variety of Label Noise
A number of pathological datasets with segmented data are currently available publicly, each created with different research objectives that have led to variations in the target cells or tissues. The GlaS dataset, which provides colorectal epithelial segmentation annotated by pathologists, was found to contain lumen and necrotic tissues within the annotations (Figure 1A) [12]. The CAMELYON dataset, which includes detailed tumor cell annotations for lymph node metastases in breast cancer cases, divides the data into exhaustively annotated and nonexhaustively annotated categories based on annotation quality [11]. However, even within the exhaustively annotated cases, nontumor cells such as lymphocytes were observed to be incorrectly included in the positive regions (Figure 1B). The HuBMAP + HPA dataset, used in the “HuBMAP + HPA—Hacking the Human Body” competition on Kaggle, is a large‐scale dataset that includes segmentation at the cellular level for five organs: kidney, prostate, colon, spleen, and lung [10]. Label noise was evident across the dataset, with issues such as unannotated structures, lumen contamination, and inconsistent annotation boundaries being identified (Figure 1C). The WSSS4LUAD dataset, focused on lung adenocarcinoma, includes tissue semantic segmentation data [9]. Upon closer inspection, instances of false‐positive regions for tumor cells and inconsistencies in positive region annotations within the same image were observed (Figure 1D). We then surveyed seven pathologists and seven non‐expert physicians regarding the presence of label noise types in these datasets. Across most of the datasets, the distribution of selected noise types differed significantly by group (Figure 1E), and in per‐image analyses expert within‐group agreement was significantly lower than nonexpert agreement in some datasets (Figure 1F). Additive noise was seldom endorsed: no image reached ≥ 50% expert consensus (Figure S1A); examples recognized only by a minority of pathologists are shown in Figure S1B. These results indicate systematic, dataset‐specific label noise, and divergent perceptions between experts and nonexperts, with substantial variability among experts.
These findings indicated that current publicly available datasets exhibit various types of label noise—including dilation, shrinkage, and omission—with varying degrees of noise present between the datasets and individual cases. Although the creation of segmentation data for pathology is inherently labor‐intensive and subject to certain limitations, the presence of label noise is a critical issue that must be carefully considered, as it can potentially impact AI training outcomes.

3.2
Generation of Artificial Label Noise
We generated artificial label noise to investigate its impact on AI performance. We developed modules to create three types of label noise—shrink, dilation, and omission—based on observations from publicly available datasets. In addition to these noises, we applied hypothetical additive noise, which is the counterpart to omission noise. Hypothetical additive noise was introduced to examine a wider spectrum of noise types, especially those leading to false positives. These four types of noises were added to the annotated BCSS dataset, which we determined to have one of the lowest levels of label noise among the publicly available datasets we investigated (Figure 2A; Figure S2A,B). Each type of label noise was introduced at three different levels, allowing for a range of noise intensities (Figure 2B,C; Figure S2C,D). In addition, to simulating conditions that more closely mimicked real‐world scenarios, we created a combined pattern of dilation and omission noise, which is commonly observed in pathological data in Figure 1. When comparing the datasets with and without artificial label noise, distinct patterns in precision, recall, Dice coefficient, IoU, and the ratio of noise area to nontumor (i.e., class 0) areas were observed for each type of label noise (Figure 2D; Figure S3A). In total, we successfully generated 15 dataset variations, encompassing five types of label noise across three levels of intensity. We conducted a survey to assess the realism of these simulated noises, finding expert ratings showed that dilation, shrinkage, and omission were generally regarded as realistic (median scores ≥ 4), whereas additive noise received significantly lower scores compared to shrinkage and omission (Figure S3B). These results support our characterization of additive noise as a hypothetical construct with limited biological plausibility.

3.3
Label Noise Has a Significant Impact on Model Training
To assess the effects of label noise on AI model performance, we conducted training on datasets with clean labels and 15 artificially generated noisy datasets. We trained 10 CNN‐based models on each dataset, using combinations of two encoder architectures and five decoder types (Figure 3A). The performance scores of the models trained on clean labels are detailed in Table S2. We found that the combination of EfficientNet‐b4 with either U‐Net or DeepLabV3+ consistently yielded the highest accuracy. These two models demonstrated high predictive accuracy for the testing dataset when trained on clean labels (Figure 3B; Figure S4A,B). When the models were trained using datasets with various types of introduced label noises, the predictions of the models generally reflected the same types of noise that were present in the training data (Figure 3C); however, this was not the case for hypothetical additive noise (Figure S5). Additionally, a clear drop in model performance was observed as the severity of the label noise increased (Figure 3D). These results indicate that label noise is likely to have a significant effect on model learning.

3.4
Label Noise Often Goes Underestimated in Pathology Segmentation
We evaluated model performance on the testing dataset using two approaches. The first involved testing data that contained noise at comparable levels to the training data, allowing us to calculate apparent real‐world performance scores (Figure 4A; Figure S6A). The second approach used clean labels for the testing data, in order to provide more true and idealized performance scores (Figure 4B; Figure S6B). Our comparison revealed that, for models trained on datasets with dilation, shrinkage, or combined noise with both dilation and omission, the apparent scores were often inflated relative to the true ones—indicating an overestimation of model performance (Figure 4C,D; Figure S6C,D). By contrast, models trained with hypothetical additive noise showed higher true performance scores than apparent scores. For omission noise, there was little difference between the two scoring methods. These results indicate that such models may overestimate identified targets if the dataset includes noise that alters the object's boundaries, such as dilation or shrinkage.
To examine whether these effects generalize across architecture families, we further evaluated two Transformer‐based models: SegFormer (fully Transformer‐based) and TransUNet (CNN encoder with a Transformer module). When trained on clean data, SegFormer achieved the highest performance among all models, whereas TransUNet performed comparably to the CNN‐based baselines (Figure 4E, Figure S6E). Under artificial label noise, however, both SegFormer and TransUNet exhibited apparent–true score discrepancies that mirrored those of CNN‐based models across noise types and intensities, indicating that label noise similarly degrades Transformer‐based architectures (Figure 4E,F, Figure S6E,F).

3.5
Random Label Noise Also Impairs Model Training
In real‐world data, label noise could be rarely uniform; rather, various types of noise are randomly distributed across the data, as seen in Figure 1. To simulate this, we generated datasets containing random combinations of label noise types and evaluated their impacts on model performance (Figure 5A). Similarly to the results we observed when using uniform noise, model performance was found to decline as the intensity of random noise increased in the data (Figure 5B). Furthermore, a significant gap was observed between the apparent and true scores, indicating that models trained on datasets containing random label noise may also be prone to overestimation (Figure 5C). We observed a substantial drop in performance in another experiment wherein the datasets contained label noise of varying intensity levels, confirming that the presence of random noise severely affected model training (Figure 5D–F).

Discussion

4
Discussion
This study investigated the presence of different types of label noise in publicly available pathology segmentation datasets, and demonstrated the significant impact of label noise on the training of AI models for pathology image segmentation. Specifically, random label noise caused by dilation, shrinkage, omission, or combinations thereof, was found to significantly decrease model performance. Furthermore, by comparing true scores with apparent ones, this study is the first to our knowledge to highlight the potential of label noise in terms of causing overestimations of model performance.
The label noise observed in existing datasets can be categorized into three primary types: dilation, shrinkage, and omission—each of which stems from factors such as the ambiguity of lesion boundaries and subjective biases in pathologist annotations. For instance, dilation may occur because of a tendency to mark boundaries more broadly for safety, whereas shrinkage may result from a cautious approach that focuses only on definitive regions. Omission, on the other hand, may stem from missed annotations of small or subtle lesions, potentially caused by fatigue, inattention, or lack of experience. In addition, we consider hypothetical additive noise, which involves the erroneous labeling of nonlesion areas with visual characteristics as actual lesions, may also occur. In this study, we developed a module to generate artificial label noise artifacts that mimicked these specific noise types, which are prevalent in pathology but underexplored in other domains [16, 17, 26, 27].
Using 15 types of artificial label noise, we trained deep‐learning models and compared their performances on our testing datasets. Figure 3C,D shows that the models were highly susceptible to overfitting label noise, which is consistent with the results of prior studies concerning label noise in image classification tasks [28]. Notably, Han et al. demonstrated that deep learning models initially prioritize learning from clean labels but eventually memorize noisy labels as training progresses, in a phenomenon known as the “memorization effect”.
We then evaluated the IoU scores of the model predictions on the testing datasets using both clean and noisy labels, referred to as the true and apparent scores, respectively. Note that the BCSS dataset was treated as the “clean” reference to enable comparisons between apparent and true scores; however, we recognize that these annotations are not completely noise‐free, particularly in morphologically ambiguous regions, and that no single dataset can serve as an absolute gold standard in pathology. Because it is generally unknown whether annotation data represent the expected ground truth, apparent scores are typically calculated under standard experimental conditions. By contrast, true scores are only available in controlled experiments with synthetic noise. A comparison of the true scores revealed that the models maintained their performance more optimally when exposed to noise types in the following order: hypothetical additive, omission, dilation, and shrink. Hypothetical additive noise, characterized by randomly introducing small, irregular class 1 regions into class 0 areas, likely avoided overfitting because the remaining class 0 regions were substantially larger than the introduced class 1 regions (Figure S3A). Another possible explanation for this may be related to label noise location. In our experiments, hypothetical additive artifacts were randomly generated in nontumor regions, which may not reflect clinically realistic false positives. In practice, false positives likely arise in regions morphologically similar to targets—an aspect our random injection does not capture and thus represents a limitation. Future simulations should (i) place artifacts in morphologically confusing regions (e.g., stroma, inflammation, necrosis), (ii) use image‐based heuristics (e.g., texture similarity, color variance), and (iii) seed artifacts from pathologist‐identified false positives to better approximate real‐world behavior. Accordingly, the apparent stability under hypothetical additive noise may reflect limitations of our simulation design rather than true model robustness and should be reevaluated using more pathology‐informed designs. By contrast, our models that were trained with shrinkage and dilation noise exhibited lower performance, likely owing to the added noise in the indistinct mask boundary areas, which also exacerbated issues of class imbalances. These findings demonstrate the varied characteristics of different types of label noise, as well as their distinct effects on model performance.
Moreover, the differences between the apparent and true scores varied with the noise type and intensity, underscoring the potential for overestimation of model performance when label noise is present. This discrepancy is particularly pronounced for boundary‐dependent noises such as dilation, shrinkage, and random variations. Our findings also corroborate certain existing challenges that have been identified when generalizing AI models, wherein models trained on specific datasets often perform poorly when given new data [29, 30]. This lack of generalizability can be attributed to the limited diversity and external validation of datasets. Many studies rely on only one or two data sources, which further exacerbates this issue. Although recent methods have been developed that facilitate the training of AI models using smaller datasets, manual annotation, and corrections remain essential—indicating that the challenges related to label noise have not yet been fully addressed. Our present results highlight the potential for overestimation when validation is restricted to limited data sources.
Pathology‐specific label noise presents unique challenges that are not typically encountered in other imaging domains. For instance, slides stained with H&E often show ambiguous boundaries, increasing annotation variability and hindering large, high‐quality dataset creation. Segmentation decisions can vary significantly even among experienced and well‐trained experts, making it difficult to establish a single “correct” annotation [31]. Although we simplified the task to binary segmentation, in practice, multi‐class delineation is often required and may pose greater challenges. Methods for obtaining more reliable annotations from multiple annotations created by pathologists have also been developed to date, with various proposals including label reweighting or using matrices [15, 32]. However, the results generated by these methods still depend on the quality of the original annotations and do not necessarily yield labels closer to the true labels. Thus, although most pathologists can still subjectively recognize the presence of label noise, the definition of label noise in pathology is inherently complex.
In light of these challenges, we acknowledge that creating high‐quality, noise‐free annotations is a labor‐intensive process for pathologists, and practical solutions to reduce this burden are actively being explored. For example, interactive foundation‐model approaches, such as the Segment Anything Model and pathology‐adapted variants, enable experts to obtain accurate masks with only a few prompts, shifting their effort from manual drawing to light correction [33]. Likewise, restaining‐and‐registration pipelines that align immunohistochemistry/immunofluorescence signals with H&E slides can automatically generate large numbers of pixel‐level annotations with minimal manual effort [8]. These strategies aim toward “labor‐less” annotation; however, the complete elimination of expert input has not yet been achieved, and annotation remains a substantial challenge.
Weakly supervised learning has also been reported to be useful for slide‐level diagnosis and for detecting certain lesions. Nevertheless, precise spatial localization remains difficult without pixel‐ or region‐level supervision. For instance, Lu et al. noted that detecting micro‐lesions with multiple‐instance learning remains challenging [34], and Safdari et al. demonstrated that although performance improves when a small proportion of supervised annotations is incorporated in a mixed‐supervision setting, accuracy is still inferior to that of fully supervised segmentation trained on expert‐annotated pixel‐level data [35]. Overall, these findings suggest that while annotation‐free or label‐efficient methods are advancing rapidly, tasks requiring reliable detection of micro‐lesions or analyses of the tumor microenvironment continue to depend on high‐quality annotations. We therefore believe that supervised or mixed‐supervision training grounded in expert‐labeled data will remain indispensable, and that the required standard of data quality in pathology will inevitably remain high.
Taken together, these considerations emphasize that pathology‐specific label noise continues to hinder the dependable deployment of AI systems. However, as discussed earlier, obtaining completely noise‐free ground truth labels is practically difficult. Rather than striving for perfection, a more pragmatic approach is to define what level of annotation variability is acceptable among pathologists. One of our proposals is for pathologists to evaluate annotations created by other pathologists and/or AI, aiming to build an open database that incorporates such data. Standardization of annotation quality metrics and correction methods should be pursued, along with consensus among pathologists regarding acceptable levels of labeling variation. Furthermore, constructing databases annotated at various noise levels with such expert annotation evaluations will aid in misdiagnosis and error analysis, playing a significant role in benchmarking robustness—a critical factor for medical AI. Large‐scale, systematically annotated resources will provide the foundation for more resilient and clinically reliable pathology AI systems.
This study characterized the various types of label noise present in pathology datasets, and investigated their impact on model performance by simulating this type of noise. Our results demonstrated the significant influence of boundary‐altering noise on AI model performance in this context. By comparing apparent scores to true ones, we highlighted the risk that the performance metrics of existing models may be overestimated. This study underscores the importance of recognizing and addressing label noise as a critical challenge during the development of pathological AI systems.

Author Contributions

Author Contributions

Kenji Harada: conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, supervision, writing – original draft. Yuichiro Nomura: methodology, writing – original draft. Daisuke Komura: resources, writing – review and editing. Shumpei Ishikawa: resources, funding acquisition, writing – review and editing. Shingo Sakashita: resources, funding acquisition, supervision, writing – review and editing.

Funding

Funding
This work was supported by JST SPRING (grant number JPMJSP2132) and National Cancer Center Research and Development Fund (grant number 2024‐A‐07).

Ethics Statement

Ethics Statement
This study did not involve the collection or use of any clinical specimens or patient data from our institution. All of the experiments were conducted exclusively using publicly available datasets. This study included a web‐based questionnaire in which participants evaluated publicly available data and synthetically perturbed data. According to the Ethical Guidelines for Medical and Biological Research Involving Human Subjects in Japan and our institutional policy, this activity does not constitute medical/biological research involving human subjects; therefore, formal ethical approval and informed consent were not considered necessary for this study.

Conflicts of Interest

Conflicts of Interest
Shumpei Ishikawa is an editorial board member of Cancer Science. No conflicts of interest were declared by the other authors.

Supporting information

Supporting information

Appendix S1: Supplementary methods.

Figure S1: Survey results.

Figure S2: Artificial label noise.

Figure S3: Artificial label noise evaluation.

Figure S4: Visualization of predictions made by models trained on the level 0 dataset (no noise) for the test dataset.

Figure S5: Visualization of the impact of hypothetical additive label noise on model training.

Figure S6: Data related to Figure 4.

Table S1: Details of BCSS dataset.

Table S2: Performance of models trained on clean labels (Level 0 labels).

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

A Phase I Study of Hydroxychloroquine and Suba-Itraconazole in Men with Biochemical Relapse of Prostate Cancer (HITMAN-PC): Dose Escalation Results.
Cancer research communications 2026 Talmor B 외 📖 OA
Self-management of male urinary symptoms: qualitative findings from a primary care trial.
The British journal of general practice : the journal of the Royal College of General Practitioners 2026 Wheeler JR 외 📖 OA
Clinical and Liquid Biomarkers of 20-Year Prostate Cancer Risk in Men Aged 45 to 70 Years.
JAMA network open 2026 Lindholz M 외 📖 OA
Diagnostic accuracy of Ga-PSMA PET/CT versus multiparametric MRI for preoperative pelvic invasion in the patients with prostate cancer.
Science progress 2026 Qin Z 외 📖 OA
Association of patient health education with the postoperative health related quality of life in low- intermediate recurrence risk differentiated thyroid cancer patients.
Scientific reports 2026 Li S 외 📖 OA
Early local immune activation following intra-operative radiotherapy in human breast tissue.
Oncoimmunology 2026 Tiefenthaller A 외 📖 OA