Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion

Descripción del Articulo

Face recognition has become relevant in the search for non-physical contact solutions in enclosed spaces for identity verification in the context of the SARS-CoV-2 pandemic. One of the challenges of face recognition is mask occlusion which hides more than 50 % of the face. This research evaluated fo...

Descripción completa

Detalles Bibliográficos
Autores: Tafur Acenjo, Brenda Xiomara, Tello Pariona, Martin Alexis, Escobedo Cárdenas, Edwin Jhonatan
Formato: artículo
Fecha de Publicación:2023
Institución:Universidad de Lima
Repositorio:Revistas - Universidad de Lima
Lenguaje:español
OAI Identifier:oai:ojs.pkp.sfu.ca:article/6361
Enlace del recurso:https://revistas.ulima.edu.pe/index.php/Interfases/article/view/6361
Nivel de acceso:acceso abierto
Materia:face recognition
RESNET-50
VGG-16
Vision Transformer
Swin Transformer
reconocimiento facial
id REVULIMA_48a8750c0996824e9892fccca54e98e5
oai_identifier_str oai:ojs.pkp.sfu.ca:article/6361
network_acronym_str REVULIMA
network_name_str Revistas - Universidad de Lima
repository_id_str
spelling Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusionComparativa entre RESNET-50, VGG-16, Vision Transformer y Swin Transformer para el reconocimiento facial con oclusión de una mascarillaTafur Acenjo, Brenda Xiomara Tello Pariona, Martin AlexisEscobedo Cárdenas, Edwin JhonatanTafur Acenjo, Brenda Xiomara Tello Pariona, Martin AlexisEscobedo Cárdenas, Edwin JhonatanTafur Acenjo, Brenda Xiomara Tello Pariona, Martin AlexisEscobedo Cárdenas, Edwin Jhonatanface recognitionRESNET-50VGG-16Vision TransformerSwin Transformerreconocimiento facialRESNET-50VGG-16Vision TransformerSwin TransformerFace recognition has become relevant in the search for non-physical contact solutions in enclosed spaces for identity verification in the context of the SARS-CoV-2 pandemic. One of the challenges of face recognition is mask occlusion which hides more than 50 % of the face. This research evaluated four models pre-trained by transfer learning: VGG-16, RESNET-50, Vision Transformer (ViT), and Swin Transformer, trained on their upper layers with a proprietary dataset. The analysis obtained an accuracy of 24 % (RESNET-50), 25 % (VGG-16), 96 % (ViT), and 91 % (Swin) with unmasked subjects. While with a mask, accuracy was 32 % (RESNET-50), 53 % (VGG-16), 87 % (ViT), and 61 % (Swin). These percentages indicate that modern architectures such as the Transformers perform better in mask recognition than the CNNs (VGG-16 and RESNET-50). The contribution of the research lies in the experimentation with two types of architectures: CNNs and Transformers, as well as the creation of the public dataset shared with the scientific community. This work strengthens the state of the art of computer vision in face recognition by mask occlusion by illustrating with experiments the variation of accuracy with different scenarios and architectures.En la búsqueda de soluciones sin contacto físico en espacios cerrados para la verificación de identidad en el contexto de la pandemia por el SARS-CoV-2, el reconocimiento facial ha tomado relevancia. Uno de los retos en este ámbito es la oclusión por mascarilla, ya que oculta más del 50 % del rostro. La presente investigación evaluó cuatro modelos preentrenados por aprendizaje por transferencia: VGG-16, RESNET-50, Vision Transformer (ViT) y Swin Transformer, los cuales se entrenaron en sus capas superiores con un conjunto de datos propio. Para el entrenamiento sin mascarilla, se obtuvo un accuracy de 24 % (RESNET-50), 25 % (VGG-16), 96 % (ViT) y 91 % (Swin). En cambio, con mascarilla se obtuvo un accuracy de 32 % (RESNET-50), 53 % (VGG-16), 87 % (ViT) y 61 % (Swin). Estos porcentajes de testing accuracy indican que las arquitecturas más modernas como los transformers arrojan mejores resultados en el reconocimiento con mascarilla que las CNN (VGG-16 y RESNET-50). El aporte de la investigación recae en la experimentación con dos tipos de arquitecturas: CNN y transformers, así como en la creación del conjunto de datos público que se comparte a la comunidad científica. Este trabajo robustece el estado del arte de la visión computacional en el reconocimiento facial por oclusión de una mascarilla, ya que ilustra con experimentos la variación del accuracy con distintos escenarios y arquitecturas.Universidad de Lima2023-07-31info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdftext/htmlhttps://revistas.ulima.edu.pe/index.php/Interfases/article/view/636110.26439/interfases2023.n017.6361Interfases; No. 017 (2023); 56-78Interfases; Núm. 017 (2023); 56-78Interfases; n. 017 (2023); 56-781993-491210.26439/interfases2023.n017reponame:Revistas - Universidad de Limainstname:Universidad de Limainstacron:ULIMAspahttps://revistas.ulima.edu.pe/index.php/Interfases/article/view/6361/6383https://revistas.ulima.edu.pe/index.php/Interfases/article/view/6361/6387info:eu-repo/semantics/openAccessoai:ojs.pkp.sfu.ca:article/63612024-05-23T22:47:35Z
dc.title.none.fl_str_mv Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion
Comparativa entre RESNET-50, VGG-16, Vision Transformer y Swin Transformer para el reconocimiento facial con oclusión de una mascarilla
title Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion
spellingShingle Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion
Tafur Acenjo, Brenda Xiomara
face recognition
RESNET-50
VGG-16
Vision Transformer
Swin Transformer
reconocimiento facial
RESNET-50
VGG-16
Vision Transformer
Swin Transformer
title_short Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion
title_full Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion
title_fullStr Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion
title_full_unstemmed Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion
title_sort Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion
dc.creator.none.fl_str_mv Tafur Acenjo, Brenda Xiomara
Tello Pariona, Martin Alexis
Escobedo Cárdenas, Edwin Jhonatan
Tafur Acenjo, Brenda Xiomara
Tello Pariona, Martin Alexis
Escobedo Cárdenas, Edwin Jhonatan
Tafur Acenjo, Brenda Xiomara
Tello Pariona, Martin Alexis
Escobedo Cárdenas, Edwin Jhonatan
author Tafur Acenjo, Brenda Xiomara
author_facet Tafur Acenjo, Brenda Xiomara
Tello Pariona, Martin Alexis
Escobedo Cárdenas, Edwin Jhonatan
author_role author
author2 Tello Pariona, Martin Alexis
Escobedo Cárdenas, Edwin Jhonatan
author2_role author
author
dc.subject.none.fl_str_mv face recognition
RESNET-50
VGG-16
Vision Transformer
Swin Transformer
reconocimiento facial
RESNET-50
VGG-16
Vision Transformer
Swin Transformer
topic face recognition
RESNET-50
VGG-16
Vision Transformer
Swin Transformer
reconocimiento facial
RESNET-50
VGG-16
Vision Transformer
Swin Transformer
description Face recognition has become relevant in the search for non-physical contact solutions in enclosed spaces for identity verification in the context of the SARS-CoV-2 pandemic. One of the challenges of face recognition is mask occlusion which hides more than 50 % of the face. This research evaluated four models pre-trained by transfer learning: VGG-16, RESNET-50, Vision Transformer (ViT), and Swin Transformer, trained on their upper layers with a proprietary dataset. The analysis obtained an accuracy of 24 % (RESNET-50), 25 % (VGG-16), 96 % (ViT), and 91 % (Swin) with unmasked subjects. While with a mask, accuracy was 32 % (RESNET-50), 53 % (VGG-16), 87 % (ViT), and 61 % (Swin). These percentages indicate that modern architectures such as the Transformers perform better in mask recognition than the CNNs (VGG-16 and RESNET-50). The contribution of the research lies in the experimentation with two types of architectures: CNNs and Transformers, as well as the creation of the public dataset shared with the scientific community. This work strengthens the state of the art of computer vision in face recognition by mask occlusion by illustrating with experiments the variation of accuracy with different scenarios and architectures.
publishDate 2023
dc.date.none.fl_str_mv 2023-07-31
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv https://revistas.ulima.edu.pe/index.php/Interfases/article/view/6361
10.26439/interfases2023.n017.6361
url https://revistas.ulima.edu.pe/index.php/Interfases/article/view/6361
identifier_str_mv 10.26439/interfases2023.n017.6361
dc.language.none.fl_str_mv spa
language spa
dc.relation.none.fl_str_mv https://revistas.ulima.edu.pe/index.php/Interfases/article/view/6361/6383
https://revistas.ulima.edu.pe/index.php/Interfases/article/view/6361/6387
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
text/html
dc.publisher.none.fl_str_mv Universidad de Lima
publisher.none.fl_str_mv Universidad de Lima
dc.source.none.fl_str_mv Interfases; No. 017 (2023); 56-78
Interfases; Núm. 017 (2023); 56-78
Interfases; n. 017 (2023); 56-78
1993-4912
10.26439/interfases2023.n017
reponame:Revistas - Universidad de Lima
instname:Universidad de Lima
instacron:ULIMA
instname_str Universidad de Lima
instacron_str ULIMA
institution ULIMA
reponame_str Revistas - Universidad de Lima
collection Revistas - Universidad de Lima
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_ 1846791802481278976
score 13.924177
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).