Can artificial intelligence-based large language models pass the National Dentistry Examination in Peru?
Descripción del Articulo
Objective: To determine which artificial intelligence (AI) large language model demonstrates the highest accuracy in answering the 2023 National Dentistry Examination (ENAO, by its acronym in Spanish) in Peru, compared with the official answer key. Material and methods: The 100 multiple-choice quest...
| Autores: | , , , , |
|---|---|
| Formato: | artículo |
| Fecha de Publicación: | 2025 |
| Institución: | Universidad Peruana Cayetano Heredia |
| Repositorio: | Revistas - Universidad Peruana Cayetano Heredia |
| Lenguaje: | inglés |
| OAI Identifier: | oai:revistas.upch.edu.pe:article/6253 |
| Enlace del recurso: | https://revistas.upch.edu.pe/index.php/REH/article/view/6253 |
| Nivel de acceso: | acceso abierto |
| Materia: | inteligencia artificial educación odontológica evaluación educativa modelos de lenguaje de gran tamaño artificial intelligence dental education educational assessment large language models inteligência artificial educação odontológica avaliação educacional modelos de linguagem de grande porte |
| Sumario: | Objective: To determine which artificial intelligence (AI) large language model demonstrates the highest accuracy in answering the 2023 National Dentistry Examination (ENAO, by its acronym in Spanish) in Peru, compared with the official answer key. Material and methods: The 100 multiple-choice questions from the 2023 ENAO were tested using ChatGPT-3.5, ChatGPT-4, Gemini, and Copilot. Responses were categorized by subject area and scored as correct or incorrect. Data were analyzed using the chi-square test (α = 0.05). Results: ChatGPT-4 achieved the highest overall accuracy (90.00%), followed by Gemini (82.00%), Copilot (79.00%), and ChatGPT-3.5 (76.00%). Across most models, the highest accuracy was observed in Public Health, Research, Health Services Management, and Ethics, whereas lower performance was observed in Anatomy and in Oral Medicine and Pathology. Pairwise comparisons revealed that ChatGPT-4 performed significantly better than ChatGPT-3.5 (difference: 14%; p = 0.0084) and Copilot (difference: 11%; p = 0.0316); no significant differences were found among the remaining model comparisons (p > 0.05). Conclusion: All AI language models demonstrated effectiveness in answering the 2023 ENAO questions, with ChatGPT-4 achieving the highest accuracy. |
|---|
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).