The database contains 959 patches obtained from 115 anonymized Pap smear samples sourced from the permanent repository of the Pathology Department at Hospital Bernardino Rivadavia. These samples are categorized globally as ASC-US, LSIL, ASC-H, HSIL, HSIL-LSIL, SCC and NILM. The slides were digitized using a Grundium Ocus 40 scanner at 40X magnification to capture regions of interest. These regions are subdivided into 1024 pixels x 1024 pixels mini-patches with a resolution of 0.25 µm/pixel, as specified by the scanner manufacturer.
The images in this database include 26158 annotations corresponding to 15019 different cells, classified into five types of (pre)cancerous cells (SCC, HSIL, ASC-H, LSIL, ASC-US) and three categories of non-lesion cells (INFL, ENDO, NILM), where INFL and ENDO denote inflammatory and endocervical cells, respectively. Each annotation contains the (x, y) coordinates representing the approximate location of the cell nucleus, as well as the cell type classification performed by one to four independent annotators.
All data is available for download for each individual cell on the CLASSIFICATION screen or for the entire set as a zip file in the DOWNLOAD section. This rich and diverse collection serves as an invaluable resource for advancing cytological research and supporting AI-assisted diagnostics.
The images displayed on this webpage feature various labels corresponding to different types of cellular lesions. Each lesion is represented by a unique color:
Each color-coded label provides a quick visual reference to the type of cellular lesion depicted in the image, facilitating easier interpretation and analysis of the cytological patterns.
Additionally, the annotations made by independent experts are marked with distinct shapes to indicate the annotator responsible for each classification:
This system enables the comparison of annotations and helps maintain the integrity and anonymity of the annotators involved in this project.