Fast algorithms for the optimum-path forest-based classifier
Descripción del Articulo
Pattern Recognition applications deal with ever increasing datasets, both in size and complexity. In this work, we propose and analyze efficient algorithms for the Optimum-Path Forest (OPF) supervised classifier. This classifier has proven to provide results comparable to most well-know pattern reco...
| Autor: | |
|---|---|
| Formato: | tesis de grado |
| Fecha de Publicación: | 2018 |
| Institución: | Universidad Católica San Pablo |
| Repositorio: | UCSP-Institucional |
| Lenguaje: | español |
| OAI Identifier: | oai:repositorio.ucsp.edu.pe:20.500.12590/15589 |
| Enlace del recurso: | https://hdl.handle.net/20.500.12590/15589 |
| Nivel de acceso: | acceso abierto |
| Materia: | Algorithm Optimun Path Forest (OPF) Image Foresting Transform https://purl.org/pe-repo/ocde/ford#1.02.01 |
| Sumario: | Pattern Recognition applications deal with ever increasing datasets, both in size and complexity. In this work, we propose and analyze efficient algorithms for the Optimum-Path Forest (OPF) supervised classifier. This classifier has proven to provide results comparable to most well-know pattern recognition techniques, but with a much faster training phase. However, there is still room for improvement. The contribution of this work is the introduction of spatial indexing and parallel algorithms on the training and classification phases of the OPF supervised classifier. First, we propose a simple parallelization approach for the training phase. Following the traditional sequential training for the OPF, it maintains a priority queue to compute best samples at each iteration. Later on, we replace this priority queue by an array and a linear search, in the aim of using a more parallel-friendly data structure. We show that this approach leads to more temporal and spatial locality than the former, providing better speedups. Additionally, we show how the use of vectorization on distance calculations affects the overall speedup and we provide directions on when to use it. For the classification phase, we first aim to reduce the number of distance calculations against the classifier samples and, then, we also introduce parallelization. For this purpose, we elaborate a novel theory to index the OPF classifier in a metric space. Then, we use it to build an efficient data structure that allows us to reduce the number of comparison with classifier samples. Finally, we propose its parallelization, leading to a very fast classification for new samples. |
|---|
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).