Equity and Generalizability of Artificial Intelligence for Skin-Lesion Diagnosis Using Clinical, Dermoscopic, and Smartphone Images: A Systematic Review and Meta-Analysis.

Tjiu JW; Lu CF

doi:10.3390/medicina61122186

← 뒤로

Equity and Generalizability of Artificial Intelligence for Skin-Lesion Diagnosis Using Clinical, Dermoscopic, and Smartphone Images: A Systematic Review and Meta-Analysis.

Medicina (Kaunas, Lithuania) 2025 Vol.61(12)

Tjiu JW, Lu CF

PMC 전문 ↗ 원문 ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

: Artificial intelligence (AI) has shown promising performance in skin-lesion classification; however, its fairness, external validity, and real-world reliability remain uncertain.

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)

95% CI 0.74-0.97
연구 설계 systematic review

이 논문을 인용하기

BibTeX ↓ RIS ↓

APA Tjiu JW, Lu CF (2025). Equity and Generalizability of Artificial Intelligence for Skin-Lesion Diagnosis Using Clinical, Dermoscopic, and Smartphone Images: A Systematic Review and Meta-Analysis.. Medicina (Kaunas, Lithuania), 61(12). https://doi.org/10.3390/medicina61122186

MLA Tjiu JW, et al.. "Equity and Generalizability of Artificial Intelligence for Skin-Lesion Diagnosis Using Clinical, Dermoscopic, and Smartphone Images: A Systematic Review and Meta-Analysis.." Medicina (Kaunas, Lithuania), vol. 61, no. 12, 2025.

PMID 41470188

DOI 10.3390/medicina61122186

Abstract

: Artificial intelligence (AI) has shown promising performance in skin-lesion classification; however, its fairness, external validity, and real-world reliability remain uncertain. This systematic review and meta-analysis evaluated the diagnostic accuracy, equity, and generalizability of AI-based dermatology systems across diverse imaging modalities and clinical settings. : A comprehensive search of PubMed, Embase, Web of Science, and ClinicalTrials.gov (inception-31 October 2025) identified diagnostic accuracy studies using clinical, dermoscopic, or smartphone images. Eighteen studies (11 melanoma-focused; 7 mixed benign-malignant) met inclusion criteria. Six studies provided complete 2 × 2 contingency data for bivariate Reitsma HSROC modeling, while seven reported AUROC values with extractable variance. Risk of bias was assessed using QUADAS-2, and evidence certainty was graded using GRADE. : Across more than 70,000 test images, pooled sensitivity and specificity were 0.91 (95% CI 0.74-0.97) and 0.64 (95% CI 0.47-0.78), respectively, corresponding to an HSROC AUROC of 0.88 (95% CI 0.84-0.92). The AUROC-only meta-analysis yielded a similar pooled AUROC of 0.88 (95% CI 0.87-0.90). Diagnostic performance was highest in specialist settings (AUROC 0.90), followed by community care (0.85) and smartphone environments (0.81). Notably, performance was lower in darker skin tones (Fitzpatrick IV-VI: AUROC 0.82) compared with lighter skin tones (I-III: 0.89), indicating persistent fairness gaps. : AI-based dermatology systems achieve high diagnostic accuracy but demonstrate reduced performance in darker skin tones and non-specialist environments. These findings emphasize the need for diverse training datasets, skin-tone-stratified reporting, and rigorous external validation before broad clinical deployment.

MeSH Terms

Humans; Artificial Intelligence; Smartphone; Dermoscopy; Skin Neoplasms; Reproducibility of Results; Sensitivity and Specificity