Artificial Intelligence in Prostate MRI: Addressing Current Limitations Through Emerging Technologies.

Johnson PM; Umapathy L; Gigax B; Rossi JK; Tong A; Bruno M; Sodickson DK; Nayan M; Chandarana H

doi:10.1002/jmri.70189

← 뒤로

Artificial Intelligence in Prostate MRI: Addressing Current Limitations Through Emerging Technologies.

1/5 보강

Journal of magnetic resonance imaging : JMRI 📖 저널 OA 40.4% 2024~2026 2026 Vol.63(3) p. 617-630

Johnson PM, Umapathy L, Gigax B, Rossi JK, Tong A, Bruno M, Sodickson DK, Nayan M, Chandarana H

📖 무료 전문 🟢 PMC 전문 PMC13022587

PubMed ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

이 논문을 인용하기

↓ .bib ↓ .ris

APA Johnson PM, Umapathy L, et al. (2026). Artificial Intelligence in Prostate MRI: Addressing Current Limitations Through Emerging Technologies.. Journal of magnetic resonance imaging : JMRI, 63(3), 617-630. https://doi.org/10.1002/jmri.70189

MLA Johnson PM, et al.. "Artificial Intelligence in Prostate MRI: Addressing Current Limitations Through Emerging Technologies.." Journal of magnetic resonance imaging : JMRI, vol. 63, no. 3, 2026, pp. 617-630.

PMID 41348934 ↗

DOI 10.1002/jmri.70189

Abstract

Prostate MRI has transformed lesion detection and risk stratification in prostate cancer, but its impact is constrained by the high cost of the exam, variability in interpretation, and limited scalability. False negatives, false positives, and moderate inter-reader agreement undermine reliability, while long acquisition times restrict throughput. Artificial intelligence (AI) offers potential solutions to address many of the limitations of prostate MRI in the clinical management pathway. Machine learning-based triage can refine patient selection to optimize resources. Deep learning reconstruction enables accelerated acquisition while preserving diagnostic quality, with multiple FDA-cleared products now in clinical use. Ongoing development of automated quality assessment and artifact correction aims to improve reliability by reducing nondiagnostic exams. In image interpretation, AI models for lesion detection and clinically significant prostate cancer prediction achieve performance comparable to radiologists, and the PI-CAI international reader study has provided the strongest evidence to date of non-inferiority at scale. More recent work extends MRI-derived features into prognostic modeling of recurrence, metastasis, and functional outcomes. This review synthesizes progress across five domains-triage, accelerated acquisition and reconstruction, image quality assurance, diagnosis, and prognosis-highlighting the level of evidence, validation status, and barriers to adoption. While acquisition and reconstruction are furthest along, with FDA-cleared tools and prospective evaluations, triage, quality control, and prognosis remain earlier in development. Ensuring equitable performance across populations, incorporating uncertainty estimation, and conducting prospective workflow trials will be essential to move from promising prototypes to routine practice. Ultimately, AI could accelerate the adoption of prostate MRI toward a scalable platform for earlier detection and population-level prostate cancer management. EVIDENCE LEVEL: N/A TECHNICAL EFFICACY: 3.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (2)

External evaluation of an open-source deep learning model for prostate cancer detection on bi-parametric MRI.
European radiology 2026
Prostate Cancer Risk Stratification and Scan Tailoring Using Deep Learning on Abbreviated Prostate MRI.
Journal of magnetic resonance imaging : JMRI 2025

📖 전문 본문 읽기 PMC JATS · ~52 KB · 영문

Introduction

1 ∣
Introduction
Prostate cancer is the second most commonly diagnosed malignancy in men worldwide, with an estimated 1.4 million new cases annually [1]. Recent population-level data show a rising prevalence of advanced disease at diagnosis, highlighting the need for improved strategies for early detection [2, 3]. Prostate-specific antigen (PSA) blood testing has long been the foundation of early detection but suffers from limited specificity and only moderate sensitivity, leading to both overdiagnosis of indolent cancers and underdiagnosis of clinically significant prostate cancer (csPCa) [4]. Multiparametric MRI (mpMRI), which combines T2-weighted, diffusion-weighted (DWI), and dynamic contrast-enhanced (DCE) imaging, has emerged as a powerful adjunct to PSA screening. mpMRI improves risk stratification and enables targeted biopsies [5], and in recent years biparametric MRI (bpMRI), which omits DCE, has been shown in meta-analyses to provide comparable diagnostic accuracy [6, 7].
Despite its demonstrated value, prostate MRI faces persistent challenges. False negatives and false positives remain common [8], and long acquisition times increase cost and restrict throughput. Standardization with the Prostate Imaging Reporting and Data System (PI-RADS) has improved reporting, but interreader agreement remains only moderate [9-11]. Motion, rectal gas, and other artifacts degrade image quality [12], and uneven access to scanners and expertise contributes to disparities in care. These limitations restrict scalability at a time when the demand for MRI continues to rise, placing increasing strain on radiology services. Addressing these barriers is critical if MRI is to become a widely accessible component of prostate cancer detection and management.
Artificial intelligence (AI) has rapidly emerged as a potential solution. Across the imaging pathway, AI is being applied to optimize patient selection, accelerate acquisition, improve quality assurance, assist with lesion detection and risk stratification, and extend MRI-derived features into prognostic modeling (Figure 1). Deep learning reconstruction (DLR) has already been translated into clinical practice with FDA-cleared products, demonstrating real-world feasibility. The PI-CAI international reader study recently provided the most compelling validation to date for AI-based image interpretation, showing that ensembles of AI models can match or surpass the performance of expert radiologists in diagnosing csPCa [13]. In parallel, recent population-based screening trials such as STHLM3-MRI [14] and ReIMAGINE [15] have renewed interest in the role of MRI as a screening test for early detection. Taken together, advances in AI and emerging evidence supporting MRI for early detection bring the field to a critical juncture, where transitioning prostate MRI from a specialized diagnostic exam to a first-line screening tool should now be seriously explored.
This narrative review synthesizes current progress across five domains—patient triage, accelerated acquisition and reconstruction, image quality assurance, diagnosis and prognosis. We focus not only on technical performance but also on validation status and barriers to adoption. By mapping where evidence is most mature and where critical gaps remain, we aim to identify the steps needed for AI to move from research prototypes to integrated, valuable tools in routine clinical practice.

AI for Patient Triage

2 ∣
AI for Patient Triage
Among the various uses of AI/ML models in medicine, applications concerned with optimizing patient selection, risk stratification, and triage have an opportunity to be particularly impactful in their potential to minimize diagnostic waste, expand access to healthcare, and reduce healthcare costs. Precedent exists in other specialties; for example, FDA-cleared systems for diabetic retinopathy screening demonstrate that automated referral can improve efficiency and equity [16], but prostate-specific applications remain limited.
Screening for prostate cancer has conventionally relied on PSA testing, which is limited by moderate to high sensitivity but poor specificity [4, 17]. This leads to both under- and overdiagnosis, and there remains no consensus on which patients should undergo MRI. Current guidelines reflect this uncertainty: the American Urological Association (AUA) recommends that men with persistently elevated PSA undergo risk assessment with calculators or biomarkers and shared decision-making, while the National Comprehensive Cancer Network (NCCN) more strongly endorses pre-biopsy MRI when available [18]. Despite enthusiasm for MRI, access remains limited, with documented disparities by race, geography, and rurality [19, 20], and both guidelines leave wide discretion to clinicians. Furthermore, prostate MRI requires expert interpretation while the supply of radiologists with adequate training and experience remains limited.
To refine patient selection, several non-imaging approaches have been proposed. Commercial biomarker tests such as ExoDx and 4Kscore can supplement PSA, and the Stockholm3 test—a regression-based risk calculator that combines PSA with plasma protein biomarkers, genetic variants, and clinical factors—has been validated in large cohorts as a way to reduce unnecessary MRI and biopsy [14]. These examples highlight that riskstratification tools can improve on PSA alone, but none are specifically designed to predict MRI outcomes.
To date, only one AI model has been published with the explicit aim of triaging patients for prostate MRI. Persily et al. developed ProMT-ML, a machine learning model designed to triage patients for prostate MRI by predicting the likelihood of an abnormal scan (PI-RADS ≥ 3) from routinely available clinical parameters [21]. Trained on data from approximately 12,000 patients, the model incorporated age, PSA, body mass index, and either prostate volume or systolic blood pressure. Compared with PSA thresholds, ProMT-ML achieved higher specificity while maintaining sensitivity, and in a clinical audit false negatives rarely represented clinically significant cancers when prostate volume was included. A notable strength was that the tool was made publicly accessible online (Figure 2), allowing potential real-time use in clinical workflows. Limitations include the need for external validation across different health systems and demographic groups, as well as uncertainty around how best to incorporate systolic blood pressure, which is influenced by anti-hypertensive therapy and remains difficult to parameterize.
Beyond determining who should undergo MRI, AI can also guide what type of MRI is needed. Dynamic contrast–enhanced (DCE) imaging represents a key target for protocol optimization. Recent studies have demonstrated the feasibility of AI-driven decision support to determine whether DCE is necessary during prostate MRI acquisition, showing that many cases can be accurately characterized using only biparametric sequences [22, 23]. Such real-time protocol adaptation could reduce contrast use and shorten exam times.
Looking ahead, AI-based triage models for MRI referral will need prospective evaluation across diverse populations, with careful attention to integration alongside existing biomarkers and risk calculators. Defining and validating clinically appropriate referral thresholds and demonstrating cost-effectiveness and workflow benefits will also be essential. If these challenges can be addressed, AI-based MRI triage has the potential to ensure prostate MRI is deployed where it provides the greatest clinical value.

Acquisition and Reconstruction

3 ∣
Acquisition and Reconstruction
Lengthy acquisition times remain a major barrier to the scalability of prostate MRI. Standard mpMRI protocols require at least 25–30 min of room time. Each sequence in the mpMRI protocol contributes to PI-RADS–based lesion assessment [9] but the combined duration limits throughput and increases costs. Deep learning–based reconstruction (DLR) has emerged as a practical solution, reconstructing high-quality images from under sampled k-space by embedding the MRI forward model in an unrolled network and using learned population-level priors for artifact and noise suppression [24, 25]. Johnson et al. provided the seminal demonstration of DLR for prostate MRI, showing that variational-network DLR can deliver sub-4-min bpMRI with preserved PI-RADS performance [26]. Since then, vendor products have been established for T2 and DWI, providing clinical access to the technology for routine bpMRI of the prostate.
Commercial DLR products were first deployed for routine spinecho/TSE sequences and saw early use in prostate MRI [27-29]. Gassenmaier et al. evaluated vendor DLR and reduced the acquisition time for three-plane T2-TSE sequences from 10:21 to 3:50 min with higher reader-rated quality and unchanged PI-RADS assessment [28]. Tong et al. showed that DL-accelerated axial (3-fold) and coronal (2-fold) T2WI maintained reader AUCs and overall image quality compared to conventional T2WI [29].
Vendor DLR for TSE has had a several-year head start, but similar products for DWI are now available and have also been evaluated for prostate MRI in prospective studies. DWI plays a central role in csPCa detection but is limited by low SNR at high b-values and by geometric distortion inherent to EPI-based acquisitions. Ueda et al. showed that DLR improves both perceived image quality and lesion conspicuity at high b-values, while preserving quantitative ADC measurements [30]. In addition to improving quality, DLR enables faster acquisition: Ursprung et al. reported a 39% reduction in DWI scan time, and Lee et al. achieved a 49% reduction, both without compromising image quality [31, 32]. DLR has also been integrated with advanced DWI sequences designed to reduce distortion. For example, reduced–field-of-view (rFOV) single-shot EPI (commercially ZOOMit/FOCUS) uses 2D spatially selective excitation to restrict the phase-encode FOV to the prostate, shortening the echo train and mitigating distortion and T2* blurring. In a prospective study, combining DLR with rFOV DWI enabled an ultrafast bpMRI (~3.5 min) that achieved non-inferior csPCa detection compared to standard mpMRI [33].
While these results are encouraging, aggressive subsampling poses risks across both T2WI and DWI. A multi-site study found that 4–8× acceleration of prostate MRI led to diminished diagnostic performance despite high image quality metrics such as SSIM and PSNR, emphasizing the need for task-specific and reader-based validation when pursuing higher acceleration factors [34]. When under sampling is extreme the reconstruction increasingly relies on the learned population-level priors. In such cases, the network may generate anatomically plausible but incorrect structures that reflect an average appearance rather than the true underlying anatomy—effectively hallucinating features consistent with the training distribution rather than the measurement [35].
It is important to distinguish deep learning–based reconstruction methods, which explicitly incorporate the MRI forward model and enforce data consistency, from purely image postprocessing or generative approaches. The large body of clinical evidence to date applies to physics-based reconstruction methods that are integrated into scanner platforms and operate directly on k-space data. These include vendor solutions such as GE Healthcare (AIR Recon DL), Canon Medical Systems (AiCE), Siemens Healthineers (Deep Resolve), and Philips Healthcare (SmartSpeed). By contrast, algorithms applied only to already reconstructed images—often third-party software designed for denoising or artifact suppression—do not enforce consistency with acquired data. While these generative approaches could prove useful for accelerated imaging, they have not yet undergone the same level of clinical validation as physics-based DLR. As a result, their reliability in accelerated prostate MRI remains uncertain, and the potential for hallucinated features that may arise due to lack of data consistency has not been systematically evaluated. Generative models are also being explored to synthesize prostate MRI series. For example, a GAN-based method to synthesize b = 1500 DWI from lower b-values showed improved perceived quality over computed images [36]. A separate study simulated DCE from non-contrast MRI using a pix2pix model, reporting high similarity and strong PI-RADS agreement across test sets [37]. A recent review highlights growing interest in prostate MRI synthesis, but notes that studies are retrospective, and lack multi-center and clinical validation [38].
It is also important to note that scan time reductions do not translate directly into proportional room time savings, since setup and positioning time remain fixed. However, reduced scan time may yield greater overall efficiency by reducing motion and lowering the need for repeat acquisitions. The full impact on throughput and cost remains underexplored, and early evidence is mixed. In a multicenter outpatient study of over 7000 MRI exams, including pelvic MRI, the use of a commercial DLR system led to modest scan time reductions but little or no room time savings, limiting its effect on throughput [39]. Conversely, an economic modeling study found that implementing DLR for all applicable protocols on a five-scanner fleet could maintain equivalent service levels with one fewer scanner [40]. This strategy was projected to cost roughly 11% of purchasing an additional scanner and ~20% of extending scanner hours into the weekend. While these findings suggest a potential cost benefit in specific use cases, prostate MRI specific evaluations are missing from the literature.
At our institution, DLR is routinely applied for T2-weighted imaging of the prostate, and we are now testing its extension to accelerated DWI. Figure 3 illustrates one such example, where a fully DLR-enabled bpMRI protocol achieves a 5 min scan time while maintaining lesion conspicuity.
While model-based DLR methods have translated rapidly into clinical use, they are typically tailored to specific sequences and sampling patterns which limit generalizability [41]. Emerging methods aim to overcome the sequence-specific nature of current commercial solutions. Generative priors, learned from large datasets using models such as variational autoencoders (VAEs) [42], generative adversarial networks (GANs) [43], or diffusion models [44-46], represent the manifold of anatomically plausible images. When integrated into iterative reconstruction, these priors constrain the optimization to produce images consistent both with acquired k-space data and with the learned image distribution. Diffusion models have gained particular attention; these methods generate images through iterative denoising and naturally provide uncertainty estimates [44]. Recent work has extended this approach to better capture fine anatomical detail [45], operate in k-space [47], and adapt across sites and scanner protocols [48]. Federated learning of generative priors has been proposed to enhance cross-site generalizability [49]. These reconstruction techniques are still largely in the research phase and have yet to be applied to prostate MRI, but offer a path toward more flexible, generalizable, and interpretable reconstruction.
An exploratory direction for DLR is its extension to low-field MRI. DLR can enhance the clinical feasibility of low-field imaging by compensating for reduced SNR. A recent study showed that DL-based denoising improved prostate T2-weighted image quality at 0.55 T [50]. Such developments could help expand access to MRI in cost-sensitive settings where high-field scanners remain limited.
A second line of exploration uses patient-specific priors to embed information from prior or concurrently acquired scans to improve fidelity under sparse sampling. For example, NeRP embeds a patient's prior MRI into the reconstruction process to improve fidelity under sparse sampling [51]. Atalik et al. proposed a trust-guided variational network that leverages previously acquired sequences within the same exam to accelerate subsequent acquisitions [52]. Although not yet tested in prostate MRI, these methods suggest potential for progressive acceleration and personalized protocol design, which may be particularly useful for active surveillance.
In summary, deep learning–based reconstruction has rapidly progressed from research to clinical deployment, with FDA-cleared, sequence-specific implementations now routinely used in prostate MRI to reduce scan times. These model-based methods have demonstrated consistent performance, but acceleration beyond 3–4× requires caution. Research-stage approaches—including generative priors, and patient-specific priors—offer a path toward more generalizable and adaptive reconstruction, though clinical translation and application to prostate MRI remain to be demonstrated.

Image Quality Control

4 ∣
Image Quality Control
Automated image quality assessment (IQA) offers a path to standardized, real-time quality control in prostate MRI. Suboptimal scans can obscure small or low-contrast lesions, lowering sensitivity for csPCa, and they also reduce the reliability of AI models that are typically trained on curated, high-quality datasets. Automated IQA aims to provide a reproducible, point-of-care assessment that can determine in real time whether a sequence should be repeated or adjusted, thereby improving reliability for both human readers and downstream AI systems. Automated IQA complements technologist oversight by providing objective, sequence-specific feedback on diagnostic adequacy.
Detection of poor-quality scans has been most extensively studied for T2WI. Lin et al. developed a DL classifier that distinguished high- from low-quality scans and showed that higher quality was associated with improved specificity for detecting extracapsular extension (ECE) (72% vs. 63%) [53]. Thijssen et al. trained a radiomics-based model that identified suboptimal T2WI with ~85% accuracy [54] and later validated it across vendors and institutions, confirming generalizability [55]. Other groups have explored prospective applications: Gloe et al. built a model predicting the need for rescanning with > 75% accuracy, while Belue et al. developed voxellevel quality maps from a 3D CNN to localize artifacts and guide targeted reacquisition [56, 57]. Together, these studies demonstrate that automated T2W IQA can move beyond retrospective scoring toward actionable quality feedback during acquisition.
Extension to other sequences remains less mature but is emerging. Alis et al. trained a 3D CNN on bpMRI (T2WI and ADC) and reported that automated 3-point quality scores reached agreement with expert consensus comparable to individual radiologists (κ = 0.42 for T2WI, κ = 0.61 for ADC) [58]. Brender et al. used T2WI to predict ADC map quality, enabling early identification of potential diffusion failures [59]. Kluckert et al. evaluated automated quality scoring on bpMRI to determine whether DCE sequences could be safely omitted, illustrating how IQA could contribute to protocol tailoring [60]. Collectively, these studies suggest that IQA across multiple sequences is feasible and could eventually support both scan-level quality assurance and adaptive scanning.
Artifact correction represents a complementary strategy. Hu et al. developed the TPAS model, a CNN that suppressed rectal gas-related artifacts, improving interpretability and csPCa detection [61]. Pfaff et al. proposed a self-supervised, repetition-aware denoiser for prostate DWI that improved SNR and lesion conspicuity without requiring clean reference data [62]. These approaches demonstrate the potential of AI to not only flag poor quality but also to actively recover image quality from compromised or noisy images. While these models may improve perceived image quality, further studies are required to confirm recovery of diagnostic image quality.
Standardization of quality definitions will be essential for broader adoption. Current studies use inconsistent criteria, radiologist consensus [54, 58], artifact presence [61], or diagnostic confidence [53]—which hinders cross-comparison. Curated, multi-vendor datasets with expert-labeled quality scores would enable more robust model training and benchmarking. The PI-QUAL framework addresses this need by defining whether an exam meets minimum diagnostic standards for csPCa detection based on coverage, resolution, and artifacts [63, 64]. By offering a reproducible scale for diagnostic adequacy, PI-QUAL provides a benchmark against which AI models can be trained and evaluated, facilitating fairer comparisons across algorithms and sites.
In summary, automated IQA has shown feasibility for T2-weighted imaging and early promise for multi-parametric protocols, while artifact correction strategies demonstrate that AI may be able to actively recover compromised image quality. Most published studies remain retrospective and proof-of-concept, with limited prospective evaluation, and commercial translation has lagged behind deep learning reconstruction. One of the few deployed systems for automated IQA is AI-QUAL (Quibim), which integrates PI-QUAL v2–based scoring and artifact detection into the QP-Prostate platform. The critical next step is prospective, multi-site validation to establish whether AI-based IQA tools can meaningfully reduce nondiagnostic scans and improve workflow efficiency and reliability.

AI for PCa Diagnosis

5 ∣
AI for PCa Diagnosis
In 2024, 8000 patients underwent MRI for known or suspected prostate cancer in our health system, with over 1900 biopsies performed. The ever-increasing volume of MRI exams performed each year for risk stratification and biopsy guidance places enormous demands on radiologists. Meta-analyses report high pooled sensitivity (0.90) and negative predictive value (0.92) for diagnosing csPCa with MRI [65]. However, MRI still suffers from false positives that lead to unnecessary biopsies, reflected in low and variable biopsy yield [66]. In addition, lesion detection and scoring remain subjective, with substantial intrareader variability [11] and inter-reader variability (κ = 0.42–0.70) [67-70]. AI tools have the potential to address these challenges and be a radiologist-aid to alleviate workload and present a consistent and objective assessment of prostate MRI.
Recent years have seen the development of several promising DL models that can extract salient diagnostic information from prostate MRI, improving diagnostic consistency and accuracy, and performing on par with radiologists in the prediction of csPCa at the time of biopsy. There are two distinct categories of DL models—those that localize lesion information prior to malignancy prediction, either at lesion level, at patient level, or both, and those that use the full prostate MRI field of view to predict the presence of csPCa at the patient level. The choice of prostate MRI also varies between bpMRI and mpMRI. Schelb et al. trained a DL model with sextant-specific systematic and targeted lesion histopathology as ground truth to detect and segment suspicious findings in mpMRI that demonstrated comparable performance to clinical MRI assessment (sensitivity = 92% vs. 88% with PI-RADS, specificity = 47% vs. 50% with PI-RADS) [71]. Work at the US National Cancer Institute (NCI) pursued a slightly different approach—emulating the diagnostic processes of radiologists by detecting and segmenting lesions of interest, followed by malignancy risk prediction. A cascaded DL model was trained to first detect and segment radiologist-annotated MRI-visible lesions on bi-parametric MRI, followed by predicting PI-RADS categories 2–5 and BPH [72] with a lesion-level sensitivity of 56% and a positive predictive value (PPV) of 62.7%. In a prospective secondary analysis of this model in 658 male participants (csPCa in 45%), Lin et al. reported that the model detected 96% of all patients with csPCa compared to radiologists who detected 98% (p = 0.23) [73]. In participants without any AI lesion predictions, 52% had benign biopsy results and 28% had Gleason score 6. Other retrospective studies for lesion detection and subsequent csPCa prediction have found similar trends [74] (lesion-level accuracy 86%, sensitivity of 90% on an external test set of 330 lesions).
Developing AI models for PCa diagnosis with explicit lesion localization requires annotation effort for the lesions of interest from experts (radiologists and/or urologists). This not only limits the size of training data available for DL models but also introduces another degree of variability. A study by Cai et al. sidestepped this limitation by only requiring patient-level labels for their DL model. Even when explicit lesion masks were not used, the performance of DL models was not different from that of radiologists in detecting csPCa in prostate mpMRI [75]. The model, trained with patient-level labels from 5215 patients (including 1514 with csPCa), achieved AUCs comparable to radiologists: 0.89 versus 0.89 on 400 internal exams and 0.86 versus 0.84 on 204 ProstateX exams. Bosma et al. addressed the lesion-annotation burden with a semi-supervised learning approach, guided by radiology reports, to train their DL model, and showed comparable performance to fully supervised approaches with six times fewer annotations [76].
The publicly hosted PI-CAI challenge with 1500 MRI exams, associated metadata (age, PSA level, prostate volume, scanner name), and biopsy-driven ground truth has facilitated the development of several AI models for prostate MRI [13]. Saha et al. created an ensemble of experts by combining the predictions from five independently retrained AI models with the best diagnostic performance on the PI-CAI challenge. The combined predictions were found to be “statistically superior” in identifying patients with csPCa compared with the mean of 62 radiologists as part of an international reader study [13]. At the routinely used PI-RADS cutoff of three or higher for biopsy guidance, this expert model resulted in 50.4% fewer false positives and detected 20% fewer indolent cancers at the same sensitivity (89.4%) as the pool of radiologists.
The potential of AI models to objectively and consistently interpret prostate MRI has also been beneficial in understanding the ambiguous PI-RADS 3 category. Although MRI has a high negative predictive value (NPV) for csPCa when lesions are classified as low-risk and a high PPV when lesions are classified as high-risk, its performance is especially poor when it comes to equivocal assessments (i.e., PI-RADS 3). Literature reports the presence of csPCa in only 16%–20% of the PI-RADS 3 cases, leading to unnecessary biopsies [77]. A work by Esengur et al. evaluated the predictive performance of the bpMRI-based model of Mehralivand et al. [72] when combined with other clinical variables in biopsy decisions for PI-RADS 3 [78] In particular, they looked at the lesion segmentation model's ability to correctly map a PCa lesion on bpMRI together with PSA density, and observed a sensitivity of 77.8% with a PPV of 17.1% and high NPV of 93.1% in a cohort of treatment-naïve patients (n = 140) with highest PI-RADS scores of 3. This approach avoided 38.6% of unnecessary biopsies. Umapathy et al. tackled the ambiguity in PI-RADS 3 differently—using a representation learning approach for DL models that did not require explicit lesion localization to identify no/low-risk cases to avoid benign biopsies [79]. Trained on bpMRI from a large cohort of 21,465 men with PI-RADS-guided contrastive learning, the DL model avoided 41% of benign biopsies while improving the biopsy yield (50% vs. 40% with the biopsy-all approach) in 253 treatment-naïve men with a highest PI-RADS score of 3.
Table 1 summarizes the key published studies on AI for prostate cancer diagnosis on MRI, including whether the model leverages bpMRI or mpMRI data and whether it performs lesion detection. We also summarize ground-truth label sources, test set sizes, and level of evidence. The majority of studies to date are single-center and retrospective, highlighting the need for prospective, multi-center evaluations to establish generalizability and clinical utility.
Several FDA-cleared commercial AI tools that support PCa diagnosis are available, including AI-Rad Companion (Siemens), Quantib Prostate (DeepHealth), QP-Prostate (Quibim), PROView DL (GE Healthcare) and Pi (Lucida Medical). These systems provide lesion detection and related diagnostic support and can be deployed locally or via the cloud depending on the institution. In these setups radiologists typically must open a separate viewer to visualize the algorithm output. This separation creates workflow friction, limiting adoption compared to solutions directly embedded within PACS or reporting systems. Regardless of the platform, algorithms should be validated on the institution's own cases, and periodic quality assurance (e.g., drift detection, revalidation) must be put in place.
Automated approaches for PCa diagnosis in prostate MRI have come a long way from simpler rule-based approaches to more sophisticated representation learning approaches. Although the agreement between the radiologist-annotated MRI-visible lesions and the DL-model predicted lesion masks in the literature shows poor to moderate Dice scores (0.4–0.52) [73, 80-82] identifying the existence of a lesion of interest and its position within the prostate may in itself be helpful in supporting diagnostic decision making for a radiologist. Radiologist-defined MRI positive lesions may still be negative at biopsy, and there may be MRI-invisible lesions that the AI model can detect even when not marked by radiologists [73]. The AI-predicted suspicious lesions could be used as triage tools to provide supplementary information to radiologists and guide clinical decision-making [83]. For AI models that do not include such explicit lesion localization, explainable AI approaches such as Grad-CAM saliency maps can add interpretability [75, 79] and increase radiologist confidence in model decisions. A wider adoption of these tools in the clinical setting can be facilitated with external validation [84] and conducting prospective clinical trials that can demonstrate generalizability [85]. Building on these AI diagnostic systems, Fransen et al. proposed an AI-assisted reading workflow in which a csPCa detection model incorporating uncertainty estimates automatically triaged exams—routing confident predictions for autonomous reporting while directing uncertain cases to expert radiologists. In a multicenter simulation, this approach reduced radiologist workload by approximately 20% without compromising diagnostic accuracy [86]. Looking ahead, as automated diagnostic systems continue to mature, generative AI techniques could assist in drafting structured impressions or preliminary report text. Large language models have already achieved near-expert performance for impression generation in chest radiography [87, 88], though similar applications have not yet been investigated in prostate MRI.

AI for Prognosis

6 ∣
AI for Prognosis
The use of AI and prostate MRI features to predict various prostate cancer outcomes has only recently begun to be explored. These outcomes include oncologic outcomes such as cancer recurrence, metastasis, and survival, as well as functional outcomes like urinary incontinence and erectile dysfunction. Currently, the vast majority of prognostic models have focused on predicting biochemical recurrence (BCR) following radical prostatectomy [89-93]. This emphasis reflects both the relatively high frequency of BCR compared to metastasis or cancer-specific mortality and the ease of defining BCR using structured data.
Approaches have evolved considerably over time. In a 2004 study, Poulakis et al. (n = 210) developed a model based on information manually extracted from prostate MRI radiology reports (suspicion of extracapsular extension (ECE), seminal vesicle invasion (SVI), and lymph node involvement) combined with clinical features (PSA, TNM stage, and Gleason score) [89]. While the Poulakis et al. study relied on human pre-processed MRI input features for the model, a more recent model used an autosegmentation algorithm (nnU-Net) to segment prostate tumors on MRI images. Tumor volumes from these AI-derived 3D tumor segmentations were then used to predict long-term BCR after radical prostatectomy at 5 years (AUC 0.65) and radiation therapy at 7 years (AUC 0.79) [94], demonstrating the capability of AI-automated MRI analysis to play a significant role in predicting BCR.
Further models have gone beyond human-derived clinical features like tumor volume or PI-RADS score and have utilized radiomics for automated feature extraction, which extracts pixel-level quantitative information from images [95]. Several studies have demonstrated the capabilities of radiomics-derived models to predict BCR [90, 92, 93, 96, 97]. However, radiomics still relies on features such as shape and intensity, which are based on human understanding of image characteristics. Conversely, deep learning models are data-driven, extracting features from MRI without human input and have already been shown to outperform radiomics-based models in prostate cancer prognosis prediction. For example, in the direct comparison by Wang et al. (n = 131) of models built using radiomics-extracted features versus a MedicalNet feature encoder, the CNN feature model outperformed the radiomics model with an AUC of 0.954 versus 0.771 [91]. Other deep learning feature encoders that have been used for MRI-derived BCR prediction include NAFNet [98], EfficientNet [99], and ResNet-50 [100].
These MRI image-based models can be enhanced when combined with other data modalities like clinical features. Lian et al. (n = 232) demonstrated that a multimodal transformer AI model combining an MRI CNN feature encoder with clinical data like PSA, Gleason score, and pathological outcomes successfully predicted BCR with high performance (AUC = 0.835) [101]. Similarly, Lee et al. (n = 437) were able to predict BCR-free survival with a concordance index (C-index) = 0.89, using a Cox-LASSO model built on EfficientNet CNN-derived MRI features in combination with clinical factors like age, PSA, ECE, SVI, positive surgical margin presence, and ISUP grade [98]. Other models combining multiple data modalities [89, 91, 93, 96-99, 101] suggest that AI-derived MRI features and human-derived clinical features can complement each other to produce more robust predictions of BCR than either alone.
Combining prostate MRI data with pathology images can further enhance the performance of prognostic models. Zhou et al. (n = 201) demonstrated this potential by creating an integrated nomogram (AUC 0.860) combining an MRI radiomics-derived support vector machine with two separate ResNet-50 models built to extract information from MRI and radical prostatectomy histopathology slides [100]. This example demonstrates that increasingly complex multimodal models can provide robust predictive scores for prostate cancer BCR after treatment.
Despite the strong performance of these models, many are limited by their reduction of BCR to a binary outcome instead of modeling the outcome as time-to-BCR. Traditional implementations of machine learning models like random forests, designed to predict binary/categorical outcomes, are unable to model time-to-event outcomes. Few researchers have taken advantage of deep learning methods for survival analysis like Cox-DeepSurv [102], which has been shown to outperform traditional methods of survival analysis in other diseases including cardiovascular disease risk [103] and chondrosarcoma [104]. In prostate cancer, Cox-DeepSurv demonstrated a higher C-index compared to a traditional proportional hazards model (0.8036 vs. 0.7862) [97], although the difference in performance was small.
There has been significantly less research on using AI and prostate MRI to predict metastases, despite the greater clinical relevance of metastases as compared to BCR. The creation of models to predict metastases is challenged by the limited number of patients experiencing metastasis after definitive treatment for localized disease and the need for prolonged follow-up and compliance to capture these events. For instance, one study of 1161 patients found that only 38 (3.3%) had bone metastases [105]. Another model had similarly limited sample sizes (44 out of 732 patients) [94]. Although these models have shown promise, with high performance metrics (AUCs ranging from 0.84 to 0.91) [94, 105], these results must be interpreted cautiously due to the small sample sizes of patients with metastases.
Another significant outcome that AI models using MRI can help predict is the restoration of urinary continence after radical prostatectomy, which is a major quality-of-life concern [106]. These models can offer personalized predictions for continence recovery, helping patients make informed decisions regarding their choice of prostate cancer treatment. Sumitomo et al. (n = 400) used a Naïve Bayes classifier built on a VGG-16 CNN feature encoder, MRI-derived anatomical features, and clinical variables—including age, BMI, neoadjuvant androgen deprivation therapy history, preoperative continence status, PSA, Gleason score, clinical stage, and European Association of Urology risk criteria—to predict continence at 3 months after radical prostatectomy (AUC = 0.775) [107]. For longer-term continence, Shahait et al. (2023; n = 140) successfully predicted continence at 12 months (AUC = 0.885) using an XGBoost model built on radiomics-derived features from the levator ani muscle [108].
Despite the promising results described above, the existing literature on leveraging prostate MRI for prognosis has some limitations, even with the progress that has been made. First, there has been a focus predominantly on investigating BCR in the setting of radical prostatectomy, which neglects the 40%–50% [109] of patients with localized prostate cancer in the United States who receive alternative interventions such as radiation therapy or focal therapy.
Second, external validation is lacking for several models, and some that have performed this important step in model evaluation have found inferior performance compared to evaluation on internal cohorts [90, 93, 96, 101]. Importantly, model performance must also be carefully assessed in sensitive subgroups to ensure model fairness. For example, a recent study found that a machine learning model predicting survival after radical prostatectomy demonstrated worse performance in Black patients relative to white patients [110]. This disparate model performance is particularly concerning in prostate cancer where Black men in the United States are more likely to be diagnosed with advanced prostate cancer and have a higher mortality than white men [111]. Thus, the application of AI models that have not been assessed for fairness may perpetuate and even drive further disparities in prostate cancer care.
Finally, there is a need for increased diversity of patients in training and testing datasets. Most AI models that utilize MRI for prostate cancer prognosis were developed in China. Compared to the United States, Chinese prostate cancer patients present with a higher age and tumor grade [112] and have been shown to have a reduced 5-year age-adjusted survival [113]. These differences may result in poor performance of models applied to populations, such as US populations, with different characteristics than are found in the training set [112].
The ultimate goal of these models is to provide actionable insights for clinicians and patients. Deployment of a model is the process of making a model accessible and usable for end-users in a real-world setting. However, none of the models, which use MRI-derived features have yet been deployed. Future research must prioritize not only improving predictive performance but also developing robust validation pipelines and user-friendly deployment tools to ensure these advanced AI models can be successfully integrated into routine clinical practice.

Future Directions

7 ∣
Future Directions
Table 2 summarizes the current level of translation for each domain and outlines key next steps. Acquisition and reconstruction are furthest along, with multiple FDA-cleared products and prospective evaluations. Triage, quality control, and prognosis remain earlier in development and will require targeted efforts to demonstrate clinical utility. Artificial intelligence has made inroads across the prostate MRI pathway; the next phase of research must focus on prospective clinical validation and the practical integration of AI into imaging workflows.
Despite promising results in retrospective evaluations, few AI tools for diagnosis have been tested prospectively or implemented in routine practice. A key priority is to assess whether AI systems can improve patient outcomes, reduce unnecessary biopsies, or streamline workflows when used alongside radiologists. This will require prospective studies that evaluate not only diagnostic accuracy, but also operational metrics, such as interpretation time, confidence, and the downstream consequences of AI-assisted decisions. The PI-CAI international reader study represents the strongest evidence to date for AI-based interpretation of prostate MRI, showing that an ensemble of leading AI models had non-inferior performance, and in some analyses superior performance, as compared with 62 radiologists [13]. This demonstrated that AI could achieve expert-level diagnostic performance at scale, but this study was retrospective and did not address workflow integration or downstream outcomes such as biopsy yield, or changes in treatment decisions. Moving forward, prospective trials that embed AI into clinical workflows will be necessary to determine its true impact on patient care. Integration into radiologist environments via PACS, scanners, or reporting platforms will also be essential to enable routine use.
In parallel, it is essential to ensure that AI tools perform reliably across diverse patient populations and imaging settings. Many published models are trained on single-center data and narrow demographics, and subgroup-level performance is rarely reported. As prostate cancer incidence and outcomes vary significantly by race, geography, and health system [111], external validation must become a standard part of AI development. Tools should be evaluated for consistency across age and race, and efforts to build diverse, multi-institutional datasets and/or federated learning strategies [114] should be supported. Fairness is not separate from performance; it is central to safe and responsible deployment.
Another area in need of advancement is confidence calibration and out-of-distribution (OOD) detection. As AI tools begin to influence clinical decisions, it is not enough for them to output a single prediction; they must also convey when that prediction may be unreliable. While uncertainty quantification and OOD detection are increasingly recognized as safeguards in medical AI more broadly [115], applications in prostate MRI remain limited. To date, only one study has directly addressed this, using conformal prediction to estimate error bounds in prostate gland segmentation and showing that uncertainty estimates could reduce clinically significant volume misclassifications [116]. Broader use of such methods in triage and diagnostic models could strengthen clinical trust and provide a critical safety net, but this remains an underexplored area and represents a clear priority for future development.
A promising direction is to build AI models that integrate multiple data modalities beyond MRI alone: prior imaging, digital pathology, genomics, and comprehensive EHR (e.g., PSA trajectories, lab values, demographics). Multimodal AI has shown benefits in other domains by capturing complementary signals—for example, a recent review highlights how combining imaging, molecular, and clinical data improves diagnostic accuracy and prediction of outcomes in oncology and general medicine [117]. In pathology, multimodal systems that link tissue morphology and genomic profiles are enabling more robust prognostic models [118]. For prostate MRI, this could mean a model that ingests prior MRIs [119], digital pathology, genomic risk scores, PSA kinetics and more to refine risk prediction for csPCa and prognosis of recurrence or metastasis.
Finally, AI may help enable a more ambitious shift in the role of prostate MRI itself. Traditionally limited to men with elevated PSA or abnormal clinical findings, MRI has been too expensive and time-consuming for broader screening. With the advent of faster acquisition protocols, AI-driven triage, and automated interpretation, MRI could become financially and logistically feasible at earlier stages of the care pathway. Population-level trials such as STHLM3-MRI and ReIMAGINE suggest that MRI-based screening can reduce unnecessary biopsies while improving the detection of clinically significant cancers [14, 15]. AI could play a key role in making such protocols scalable and more reliable, transforming prostate MRI into a practical platform for population-level early detection.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

A Phase I Study of Hydroxychloroquine and Suba-Itraconazole in Men with Biochemical Relapse of Prostate Cancer (HITMAN-PC): Dose Escalation Results.
Cancer research communications 2026 Talmor B 외 📖 OA
Self-management of male urinary symptoms: qualitative findings from a primary care trial.
The British journal of general practice : the journal of the Royal College of General Practitioners 2026 Wheeler JR 외 📖 OA
Clinical and Liquid Biomarkers of 20-Year Prostate Cancer Risk in Men Aged 45 to 70 Years.
JAMA network open 2026 Lindholz M 외 📖 OA
Diagnostic accuracy of Ga-PSMA PET/CT versus multiparametric MRI for preoperative pelvic invasion in the patients with prostate cancer.
Science progress 2026 Qin Z 외 📖 OA
Association of patient health education with the postoperative health related quality of life in low- intermediate recurrence risk differentiated thyroid cancer patients.
Scientific reports 2026 Li S 외 📖 OA
Early local immune activation following intra-operative radiotherapy in human breast tissue.
Oncoimmunology 2026 Tiefenthaller A 외 📖 OA