Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion
Descripción del Articulo
Face recognition has become relevant in the search for non-physical contact solutions in enclosed spaces for identity verification in the context of the SARS-CoV-2 pandemic. One of the challenges of face recognition is mask occlusion which hides more than 50 % of the face. This research evaluated fo...
| Autores: | , , |
|---|---|
| Formato: | artículo |
| Fecha de Publicación: | 2023 |
| Institución: | Universidad de Lima |
| Repositorio: | Revistas - Universidad de Lima |
| Lenguaje: | español |
| OAI Identifier: | oai:ojs.pkp.sfu.ca:article/6361 |
| Enlace del recurso: | https://revistas.ulima.edu.pe/index.php/Interfases/article/view/6361 |
| Nivel de acceso: | acceso abierto |
| Materia: | face recognition RESNET-50 VGG-16 Vision Transformer Swin Transformer reconocimiento facial |
| id |
REVULIMA_48a8750c0996824e9892fccca54e98e5 |
|---|---|
| oai_identifier_str |
oai:ojs.pkp.sfu.ca:article/6361 |
| network_acronym_str |
REVULIMA |
| network_name_str |
Revistas - Universidad de Lima |
| repository_id_str |
|
| spelling |
Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusionComparativa entre RESNET-50, VGG-16, Vision Transformer y Swin Transformer para el reconocimiento facial con oclusión de una mascarillaTafur Acenjo, Brenda Xiomara Tello Pariona, Martin AlexisEscobedo Cárdenas, Edwin JhonatanTafur Acenjo, Brenda Xiomara Tello Pariona, Martin AlexisEscobedo Cárdenas, Edwin JhonatanTafur Acenjo, Brenda Xiomara Tello Pariona, Martin AlexisEscobedo Cárdenas, Edwin Jhonatanface recognitionRESNET-50VGG-16Vision TransformerSwin Transformerreconocimiento facialRESNET-50VGG-16Vision TransformerSwin TransformerFace recognition has become relevant in the search for non-physical contact solutions in enclosed spaces for identity verification in the context of the SARS-CoV-2 pandemic. One of the challenges of face recognition is mask occlusion which hides more than 50 % of the face. This research evaluated four models pre-trained by transfer learning: VGG-16, RESNET-50, Vision Transformer (ViT), and Swin Transformer, trained on their upper layers with a proprietary dataset. The analysis obtained an accuracy of 24 % (RESNET-50), 25 % (VGG-16), 96 % (ViT), and 91 % (Swin) with unmasked subjects. While with a mask, accuracy was 32 % (RESNET-50), 53 % (VGG-16), 87 % (ViT), and 61 % (Swin). These percentages indicate that modern architectures such as the Transformers perform better in mask recognition than the CNNs (VGG-16 and RESNET-50). The contribution of the research lies in the experimentation with two types of architectures: CNNs and Transformers, as well as the creation of the public dataset shared with the scientific community. This work strengthens the state of the art of computer vision in face recognition by mask occlusion by illustrating with experiments the variation of accuracy with different scenarios and architectures.En la búsqueda de soluciones sin contacto físico en espacios cerrados para la verificación de identidad en el contexto de la pandemia por el SARS-CoV-2, el reconocimiento facial ha tomado relevancia. Uno de los retos en este ámbito es la oclusión por mascarilla, ya que oculta más del 50 % del rostro. La presente investigación evaluó cuatro modelos preentrenados por aprendizaje por transferencia: VGG-16, RESNET-50, Vision Transformer (ViT) y Swin Transformer, los cuales se entrenaron en sus capas superiores con un conjunto de datos propio. Para el entrenamiento sin mascarilla, se obtuvo un accuracy de 24 % (RESNET-50), 25 % (VGG-16), 96 % (ViT) y 91 % (Swin). En cambio, con mascarilla se obtuvo un accuracy de 32 % (RESNET-50), 53 % (VGG-16), 87 % (ViT) y 61 % (Swin). Estos porcentajes de testing accuracy indican que las arquitecturas más modernas como los transformers arrojan mejores resultados en el reconocimiento con mascarilla que las CNN (VGG-16 y RESNET-50). El aporte de la investigación recae en la experimentación con dos tipos de arquitecturas: CNN y transformers, así como en la creación del conjunto de datos público que se comparte a la comunidad científica. Este trabajo robustece el estado del arte de la visión computacional en el reconocimiento facial por oclusión de una mascarilla, ya que ilustra con experimentos la variación del accuracy con distintos escenarios y arquitecturas.Universidad de Lima2023-07-31info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdftext/htmlhttps://revistas.ulima.edu.pe/index.php/Interfases/article/view/636110.26439/interfases2023.n017.6361Interfases; No. 017 (2023); 56-78Interfases; Núm. 017 (2023); 56-78Interfases; n. 017 (2023); 56-781993-491210.26439/interfases2023.n017reponame:Revistas - Universidad de Limainstname:Universidad de Limainstacron:ULIMAspahttps://revistas.ulima.edu.pe/index.php/Interfases/article/view/6361/6383https://revistas.ulima.edu.pe/index.php/Interfases/article/view/6361/6387info:eu-repo/semantics/openAccessoai:ojs.pkp.sfu.ca:article/63612024-05-23T22:47:35Z |
| dc.title.none.fl_str_mv |
Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion Comparativa entre RESNET-50, VGG-16, Vision Transformer y Swin Transformer para el reconocimiento facial con oclusión de una mascarilla |
| title |
Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion |
| spellingShingle |
Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion Tafur Acenjo, Brenda Xiomara face recognition RESNET-50 VGG-16 Vision Transformer Swin Transformer reconocimiento facial RESNET-50 VGG-16 Vision Transformer Swin Transformer |
| title_short |
Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion |
| title_full |
Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion |
| title_fullStr |
Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion |
| title_full_unstemmed |
Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion |
| title_sort |
Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion |
| dc.creator.none.fl_str_mv |
Tafur Acenjo, Brenda Xiomara Tello Pariona, Martin Alexis Escobedo Cárdenas, Edwin Jhonatan Tafur Acenjo, Brenda Xiomara Tello Pariona, Martin Alexis Escobedo Cárdenas, Edwin Jhonatan Tafur Acenjo, Brenda Xiomara Tello Pariona, Martin Alexis Escobedo Cárdenas, Edwin Jhonatan |
| author |
Tafur Acenjo, Brenda Xiomara |
| author_facet |
Tafur Acenjo, Brenda Xiomara Tello Pariona, Martin Alexis Escobedo Cárdenas, Edwin Jhonatan |
| author_role |
author |
| author2 |
Tello Pariona, Martin Alexis Escobedo Cárdenas, Edwin Jhonatan |
| author2_role |
author author |
| dc.subject.none.fl_str_mv |
face recognition RESNET-50 VGG-16 Vision Transformer Swin Transformer reconocimiento facial RESNET-50 VGG-16 Vision Transformer Swin Transformer |
| topic |
face recognition RESNET-50 VGG-16 Vision Transformer Swin Transformer reconocimiento facial RESNET-50 VGG-16 Vision Transformer Swin Transformer |
| description |
Face recognition has become relevant in the search for non-physical contact solutions in enclosed spaces for identity verification in the context of the SARS-CoV-2 pandemic. One of the challenges of face recognition is mask occlusion which hides more than 50 % of the face. This research evaluated four models pre-trained by transfer learning: VGG-16, RESNET-50, Vision Transformer (ViT), and Swin Transformer, trained on their upper layers with a proprietary dataset. The analysis obtained an accuracy of 24 % (RESNET-50), 25 % (VGG-16), 96 % (ViT), and 91 % (Swin) with unmasked subjects. While with a mask, accuracy was 32 % (RESNET-50), 53 % (VGG-16), 87 % (ViT), and 61 % (Swin). These percentages indicate that modern architectures such as the Transformers perform better in mask recognition than the CNNs (VGG-16 and RESNET-50). The contribution of the research lies in the experimentation with two types of architectures: CNNs and Transformers, as well as the creation of the public dataset shared with the scientific community. This work strengthens the state of the art of computer vision in face recognition by mask occlusion by illustrating with experiments the variation of accuracy with different scenarios and architectures. |
| publishDate |
2023 |
| dc.date.none.fl_str_mv |
2023-07-31 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.none.fl_str_mv |
https://revistas.ulima.edu.pe/index.php/Interfases/article/view/6361 10.26439/interfases2023.n017.6361 |
| url |
https://revistas.ulima.edu.pe/index.php/Interfases/article/view/6361 |
| identifier_str_mv |
10.26439/interfases2023.n017.6361 |
| dc.language.none.fl_str_mv |
spa |
| language |
spa |
| dc.relation.none.fl_str_mv |
https://revistas.ulima.edu.pe/index.php/Interfases/article/view/6361/6383 https://revistas.ulima.edu.pe/index.php/Interfases/article/view/6361/6387 |
| dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf text/html |
| dc.publisher.none.fl_str_mv |
Universidad de Lima |
| publisher.none.fl_str_mv |
Universidad de Lima |
| dc.source.none.fl_str_mv |
Interfases; No. 017 (2023); 56-78 Interfases; Núm. 017 (2023); 56-78 Interfases; n. 017 (2023); 56-78 1993-4912 10.26439/interfases2023.n017 reponame:Revistas - Universidad de Lima instname:Universidad de Lima instacron:ULIMA |
| instname_str |
Universidad de Lima |
| instacron_str |
ULIMA |
| institution |
ULIMA |
| reponame_str |
Revistas - Universidad de Lima |
| collection |
Revistas - Universidad de Lima |
| repository.name.fl_str_mv |
|
| repository.mail.fl_str_mv |
|
| _version_ |
1846791802481278976 |
| score |
13.924177 |
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).