Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion
Descripción del Articulo
Face recognition has become relevant in the search for non-physical contact solutions in enclosed spaces for identity verification in the context of the SARS-CoV-2 pandemic. One of the challenges of face recognition is mask occlusion which hides more than 50 % of the face. This research evaluated fo...
| Autores: | , , |
|---|---|
| Formato: | artículo |
| Fecha de Publicación: | 2023 |
| Institución: | Universidad de Lima |
| Repositorio: | Revistas - Universidad de Lima |
| Lenguaje: | español |
| OAI Identifier: | oai:revistas.ulima.edu.pe:article/6361 |
| Enlace del recurso: | https://revistas.ulima.edu.pe/index.php/Interfases/article/view/6361 |
| Nivel de acceso: | acceso abierto |
| Materia: | face recognition RESNET-50 VGG-16 Vision Transformer Swin Transformer reconocimiento facial |
| Sumario: | Face recognition has become relevant in the search for non-physical contact solutions in enclosed spaces for identity verification in the context of the SARS-CoV-2 pandemic. One of the challenges of face recognition is mask occlusion which hides more than 50 % of the face. This research evaluated four models pre-trained by transfer learning: VGG-16, RESNET-50, Vision Transformer (ViT), and Swin Transformer, trained on their upper layers with a proprietary dataset. The analysis obtained an accuracy of 24 % (RESNET-50), 25 % (VGG-16), 96 % (ViT), and 91 % (Swin) with unmasked subjects. While with a mask, accuracy was 32 % (RESNET-50), 53 % (VGG-16), 87 % (ViT), and 61 % (Swin). These percentages indicate that modern architectures such as the Transformers perform better in mask recognition than the CNNs (VGG-16 and RESNET-50). The contribution of the research lies in the experimentation with two types of architectures: CNNs and Transformers, as well as the creation of the public dataset shared with the scientific community. This work strengthens the state of the art of computer vision in face recognition by mask occlusion by illustrating with experiments the variation of accuracy with different scenarios and architectures. |
|---|
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).