PIBAdb: a public cohort of multimodal colonoscopy videos and images including polyps with histological information.
TL;DR
PIBAdb is one of the largest and most complete multimodal public datasets for colorectal polyp research and is characterized by its rich per-polyp metadata, inclusion of NBI and WL images, and non-polyp images at multiple levels of cleanness.
OpenAlex 토픽 ·
Colorectal Cancer Screening and Detection
AI in cancer detection
COVID-19 diagnosis using AI
PIBAdb is one of the largest and most complete multimodal public datasets for colorectal polyp research and is characterized by its rich per-polyp metadata, inclusion of NBI and WL images, and non-pol
APA
Alba Nogueira-Rodríguez, R. Domínguez, et al. (2026). PIBAdb: a public cohort of multimodal colonoscopy videos and images including polyps with histological information.. Computer methods and programs in biomedicine, 280, 109315. https://doi.org/10.1016/j.cmpb.2026.109315
MLA
Alba Nogueira-Rodríguez, et al.. "PIBAdb: a public cohort of multimodal colonoscopy videos and images including polyps with histological information.." Computer methods and programs in biomedicine, vol. 280, 2026, pp. 109315.
PMID
41863886
Abstract
[BACKGROUND AND OBJECTIVE] Colorectal cancer is the third most common cancer worldwide and presents a high mortality rate. Colonoscopy is the gold standard for screening, as it can reduce its incidence and mortality. Deep Learning techniques have become state-of-the-art in lesion detection and classification, and several Deep-Learning-based Computer-Aided Diagnosis systems are already undergoing clinical evaluation or commercialization. However, the development of reliable models requires large, high-quality datasets, which are costly and time-consuming to create. Thus, the availability of public datasets is critical for the scientific community to develop artificial intelligent models. This work aims to contribute to the available resources by presenting PIBAdb, a new multimodal public cohort of colorectal videos and images.
[METHODS] The PIBAdb cohort contains polyp data derived from routine colonoscopies conducted between January 2018 and May 2021 at Hospital Universitario de Ourense, under the PolyDeep project. Each polyp was resected, histologically analysed, morphologically classified, and annotated by expert clinicians with bounding boxes in images and temporal segments in videos. The main characteristics of PIBAdb were compared with another 25 public datasets. The utility of PIBAdb was evaluated in polyp detection and classification scenarios using Deep Learning models.
[RESULTS] PIBAdb includes detailed clinical and histological metadata from 1176 polyps, 31,946 manually annotated polyp images, 14,124 non-polyp images, nearly 7 h of annotated video segments showing polyps, and over 4 h of annotated video segments without polyps. It comprises both the raw database and several curated image datasets, each accompanied by metadata and documentation. PIBAdb is publicly available upon request for non-profit purposes.
[CONCLUSIONS] PIBAdb is one of the largest and most complete multimodal public datasets for colorectal polyp research. It is characterized by its rich per-polyp metadata (histology and PARIS/NICE classifications), inclusion of NBI and WL images, and non-polyp images at multiple levels of cleanness. While the image datasets included are practical for developing classification or detection models, the full database enables more complex video-based research and custom dataset creation using the PIBA management tool, supported by a queryable relational database. Its availability is expected to support the development of Deep Learning models and to foster future contributions from the research community.
[METHODS] The PIBAdb cohort contains polyp data derived from routine colonoscopies conducted between January 2018 and May 2021 at Hospital Universitario de Ourense, under the PolyDeep project. Each polyp was resected, histologically analysed, morphologically classified, and annotated by expert clinicians with bounding boxes in images and temporal segments in videos. The main characteristics of PIBAdb were compared with another 25 public datasets. The utility of PIBAdb was evaluated in polyp detection and classification scenarios using Deep Learning models.
[RESULTS] PIBAdb includes detailed clinical and histological metadata from 1176 polyps, 31,946 manually annotated polyp images, 14,124 non-polyp images, nearly 7 h of annotated video segments showing polyps, and over 4 h of annotated video segments without polyps. It comprises both the raw database and several curated image datasets, each accompanied by metadata and documentation. PIBAdb is publicly available upon request for non-profit purposes.
[CONCLUSIONS] PIBAdb is one of the largest and most complete multimodal public datasets for colorectal polyp research. It is characterized by its rich per-polyp metadata (histology and PARIS/NICE classifications), inclusion of NBI and WL images, and non-polyp images at multiple levels of cleanness. While the image datasets included are practical for developing classification or detection models, the full database enables more complex video-based research and custom dataset creation using the PIBA management tool, supported by a queryable relational database. Its availability is expected to support the development of Deep Learning models and to foster future contributions from the research community.
MeSH Terms
Humans; Colonoscopy; Deep Learning; Colonic Polyps; Colorectal Neoplasms; Video Recording; Cohort Studies; Diagnosis, Computer-Assisted; Databases, Factual