본문으로 건너뛰기
← 뒤로

Explainable AI in nuclear medicine.

1/5 보강
European journal of nuclear medicine and molecular imaging 📖 저널 OA 42.9% 2022: 3/10 OA 2023: 7/13 OA 2024: 6/14 OA 2025: 36/80 OA 2026: 69/163 OA 2022~2026 2026 Vol.53(4) p. 2648-2651
Retraction 확인
출처

Holm S, Ferrara D, Pepponi M, Abenavoli E, Frille A, Duke S

📝 환자 설명용 한 줄

[PURPOSE] In this short communication, we consider the need for explainable AI from the perspective of a large multi-disciplinary research project for predicting cachexia in cancer patients.

이 논문을 인용하기

↓ .bib ↓ .ris
APA Holm S, Ferrara D, et al. (2026). Explainable AI in nuclear medicine.. European journal of nuclear medicine and molecular imaging, 53(4), 2648-2651. https://doi.org/10.1007/s00259-025-07675-4
MLA Holm S, et al.. "Explainable AI in nuclear medicine.." European journal of nuclear medicine and molecular imaging, vol. 53, no. 4, 2026, pp. 2648-2651.
PMID 41288691 ↗

Abstract

[PURPOSE] In this short communication, we consider the need for explainable AI from the perspective of a large multi-disciplinary research project for predicting cachexia in cancer patients.

[MATERIALS AND METHODS] In a series of meetings, comprising expertise from medicine, data science, sociology, and philosophy, project participants discussed the need for explainability.

[RESULTS] We distinguish between contexts in which a black box AI tool undertakes tasks that users can perform or validate themselves and contexts in which this is not the case.

[CONCLUSION] We conclude that explanations are likely required when a black box AI tool undertakes tasks that users cannot perform or validate themselves. If the user can verify outputs manually, documented reliability and accuracy may suffice, but explainability can still add value when outputs are uncertain or errors occur. More generally, close collaboration among physicians, AI developers, and other stakeholders is crucial to ensure that AI tools are trustworthy and useful in clinical practice.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (3)

📖 전문 본문 읽기 PMC JATS · ~12 KB · 영문

Introduction

Introduction
The use of AI in clinical settings, particularly for medical imaging and diagnosis, has grown significantly in recent years [1]. Still, it is widely claimed that the black box nature of powerful AI models is an obstacle to their wider adoption in medical practice, because users may not trust the outputs of these AI models [2]. Furthermore, a high predictive accuracy of AI model does not warrant that users are willing to deploy a given AI tool [3, 4]. A common response to the challenge of clinical adoption of black box AI tools is to argue that users should be provided with explanations [5] that provide insight into “the reasoning behind decisions” [6]. This has led to a burgeoning explainable AI (xAI) literature discussing the needs and methods for making AI explainable [6–10].
According to another line of argument [11], justified clinical use of black box models should focus on “thorough and rigorous validation” in line with the standards of evaluating the safety and reliability of medical drugs and devices is sufficient for justified use of black box AI tools in clinical decision-making.
Following Holm [8], we can describe the two approaches to xAI as the Validation View and the Explanation View. According to the former, validation is sufficient for justified use of black box AI tools in clinical decision-making [7, 11]. On the Explanation View, application of xAI is mandatory for justified clinical use of black box AI tools [6, 12].
In this short communication, we consider the need for explainability from the perspective of a large multi-disciplinary research project for predicting cachexia in cancer patients comprising medicine, data science, sociology, and philosophy. This study is embedded in recent studies that support a growing interest for positron emission tomography / computed tomography (PET/CT) in cancer-induced cachexia – not only for tumour assessment, but also for capturing systemic metabolic disturbances [13]. We find that reliable performance is sufficient in contexts where the tool performs a task, which the user of the tool can perform manually, whereas explanations are required when the user does not have the expertise or skills to do so (Fig. 1).

Three AI tools

Three AI tools
Subsequent deliberations stem from our insights as active collaborators within the LuCaPET research project that seeks to determine the role of [18F]fluorodeoxyglucose (FDG)-PET/CT imaging in predicting cancer-induced cachexia in lung cancer patients [14, 15]. Within this project, we used three different AI-based tools for the analysis of [18F]FDG-PET/CT images of lung cancer patients: a tool for automated segmentation of healthy tissues, another tool for automated segmentation of pathological lesions, and a classification tool based on extracted imaging biomarkers. Each tool has a unique purpose, and as a result, their reliance on xAI methods differs.

MOOSE: automatic segmentation of healthy tissues from CT images
In our project, we assessed the value of metabolic information derived from [18F]FDG-PET/CT images of lung cancer patients to identify potential metabolic abnormalities related to the onset of cancer-associated cachexia. The first step in our analysis involved identifying the organs primarily involved in maintaining body homeostasis. The anatomical information in combined PET/CT imaging is provided by the CT component. Segmenting these anatomical regions is a tedious and labour-intensive task often performed manually and is subject to personal interpretation. In our research, we performed image segmentation with our in-house developed software MOOSE [16]. MOOSE is a deep learning-based software designed to automatically segment healthy organs from CT images. It utilizes the nnU-Net framework [17] to perform this task, significantly reducing the time and manual effort required for anatomical segmentation.
We find that using MOOSE to automate a task, which can be undertaken manually, does not require the application of xAI methods to ensure the tool's clinical acceptance. The primary reason MOOSE does not demand xAI methods is that it is always possible for a qualified human to look at the CT images and ascertain whether the system has made a correct segmentation. In this scenario we notice that for clinical acceptance, users mandate overall accuracy and reliability of the system rather than detailed explainability.

LION: automatic lesion segmentation in PET images
The second tool, LION [18], also uses the nnU-Net framework [17] but is focused on the automatic segmentation of lesions in PET images. LION is trained with [68Ga]PSMA and [18F]FDG PET images. However, LION faces greater challenges than MOOSE due to the inherent variability of lesions in terms of shape, size, and location/distribution, as well as the need to differentiate lesions from anatomical regions with naturally high tracer uptake.
Lesion segmentation is a critical component of oncological imaging, directly impacting treatment planning and prognosis. However, as with MOOSE, LION’s task is primarily to automate a segmentation task that can be done manually by qualified humans. Like MOOSE, we also find that the requirement for LION for clinical applicability is reliability and accuracy and not detailed explainability. Nonetheless, users have voiced that it would be valuable to have a risk assessment tool that highlight difficult cases or regions of lower confidence (e.g., inflamed tissues rather than tumours) in need of human validation.
Although such segmentation tasks can, in principle, be manually performed and verified, explainability may still provide added value when the model’s performance is imperfect. Understanding why the AI system missegments or shows uncertainty can help clinicians interpret outputs, identify systematic errors, and improve the model’s reliability. In this way, even for tasks that are manually verifiable, xAI can contribute to trust and refinement of the tool.

Cachexia classification tool
The third component of our project is a binary classification model designed to analyse volumetric and metabolic imaging data—specifically, Standardized Uptake Values (SUVs) and volumes from segmented PET/CT regions—in a retrospective cohort of lung cancer patients. This model aims to identify organ-level metabolic signatures associated with cachexia, a severe wasting syndrome marked by the loss of muscle and fat, which correlates with poor prognosis in cancer patients [14].
The model integrates imaging-derived parameters (mean SUV normalized to body weight and aorta uptake, as well as CT-derived tissue volumes) from segmented regions without visible malignant lesions. These data are obtained from non-cachectic patients, cachectic patients, and individuals in early stages of cachexia development. Further, the model incorporates demographic and available blood-based parameters to classify patients into one of three cachexia progression stages. Currently, the model remains under development and does not yet meet the threshold for clinical utility, reflecting limitations inherent to our retrospective dataset [15]. Alongside model refinement, we are actively evaluating the need for explainability in a fully developed cachexia classification tool, and these considerations are discussed in this section.
In contrast to MOOSE and LION, the cachexia classification tool’s predictions – once meeting the threshold for clinical utility—would be used for direct clinical decision-making, aiming to support diagnosis and treatment planning. Moreover, the prediction of cancer-induced cachexia is based on the assessment of metabolic patterns in multiple organs and founded on similar tracer distribution patterns in several hundred patients with and without proven cachexia (so called “training data”), which – together – represents a data volume that cannot be comprehended and analyzed easily by a single user. Hence, the only way the user can validate the system’s predictions is if an explanation of how the model arrived at its prediction is offered. Hence, it seems highly relevant to apply xAI methods to the third tool.
Of the many xAI methods available, we have chosen Shapley Additive Explanations (SHAP) analysis. SHAP identifies the contribution of each input feature to a prediction, offering a quantifiable view of model reasoning. However, its interpretability and clinical relevance depend heavily on the quality and consistency of the underlying data. In our work [14], we found that heterogeneity in imaging acquisition protocols from three European study sites, patient information reporting, and dietary records limited the ability to identify precise metabolic patterns associated with early cachexia development. Better-curated and standardized datasets, including longitudinal clinical parameters, nutritional assessments, and harmonized imaging features, would allow SHAP and similar xAI methods to generate explanations that are both more robust and clinically meaningful.
An important question is whether SHAP alone is sufficient to provide clinical users with explanations they can understand and trust. This highlights a broader point regarding xAI methods: if the purpose of explanations is to foster user trust and support deployment, then the explanations must be tailored to the decision context and accepted as relevant by the field of expertise. Individual user preferences alone are insufficient. Without careful assessment of what users require from an explanation, using xAI to promote trust may be ineffective. Therefore, determining whether a use context calls for explainability is only one step; a second crucial step involves assessing how the tool will integrate into clinical workflows and which explanation methods will best communicate meaningful information to users. Without these steps, even reliable and accurate AI tools may fail to translate into patient benefit.

Conclusion

Conclusion
Based on real-life use cases, explanations are likely required when a black box AI tool undertakes tasks that users cannot perform or validate themselves. If the user can verify outputs manually, documented reliability and accuracy may suffice, but explainability can still add value when outputs are uncertain or errors occur. More generally, close collaboration among physicians, AI developers, and other stakeholders is crucial to ensure that AI tools are trustworthy and useful in clinical practice (Fig. 1).

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기