Can artificial intelligence-based large language models pass the National Dentistry Examination in Peru?

Descripción del Articulo

Objective: To determine which artificial intelligence (AI) large language model demonstrates the highest accuracy in answering the 2023 National Dentistry Examination (ENAO, by its acronym in Spanish) in Peru, compared with the official answer key. Material and methods: The 100 multiple-choice quest...

Descripción completa

Detalles Bibliográficos
Autores: Saravia-Rojas, Miguel Ángel, Mendiola-Aquino, Carlos, Orejuela-Ramirez, Francisco, Tunquipa-Chacón, Wanderley, Geng-Vivanco, Rocio
Formato: artículo
Fecha de Publicación:2025
Institución:Universidad Peruana Cayetano Heredia
Repositorio:Revistas - Universidad Peruana Cayetano Heredia
Lenguaje:inglés
OAI Identifier:oai:revistas.upch.edu.pe:article/6253
Enlace del recurso:https://revistas.upch.edu.pe/index.php/REH/article/view/6253
Nivel de acceso:acceso abierto
Materia:inteligencia artificial
educación odontológica
evaluación educativa
modelos de lenguaje de gran tamaño
artificial intelligence
dental education
educational assessment
large language models
inteligência artificial
educação odontológica
avaliação educacional
modelos de linguagem de grande porte
Descripción
Sumario:Objective: To determine which artificial intelligence (AI) large language model demonstrates the highest accuracy in answering the 2023 National Dentistry Examination (ENAO, by its acronym in Spanish) in Peru, compared with the official answer key. Material and methods: The 100 multiple-choice questions from the 2023 ENAO were tested using ChatGPT-3.5, ChatGPT-4, Gemini, and Copilot. Responses were categorized by subject area and scored as correct or incorrect. Data were analyzed using the chi-square test (α = 0.05). Results: ChatGPT-4 achieved the highest overall accuracy (90.00%), followed by Gemini (82.00%), Copilot (79.00%), and ChatGPT-3.5 (76.00%). Across most models, the highest accuracy was observed in Public Health, Research, Health Services Management, and Ethics, whereas lower performance was observed in Anatomy and in Oral Medicine and Pathology. Pairwise comparisons revealed that ChatGPT-4 performed significantly better than ChatGPT-3.5 (difference: 14%; p = 0.0084) and Copilot (difference: 11%; p = 0.0316); no significant differences were found among the remaining model comparisons (p > 0.05). Conclusion: All AI language models demonstrated effectiveness in answering the 2023 ENAO questions, with ChatGPT-4 achieving the highest accuracy.
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).