Resultados de Búsqueda por Autor

artículo

Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical Examination: a cross-sectional study.

Publicado por
Torres-Zegarra, B.C., Rios-Garcia, W., Ñaña-Cordova, A.M., Arteaga-Cisneros, K.F., Benavente-Chalco, X.C., Bustamante-Ordoñez, M.A., Gutierrez-Rios, C.J., Ramos-Godoy, C.A., Teresa Panta Quezada, K.L., Gutiérrez-Arratia, J.D., Flores-Cohaila, J.A.

Publicado 2023

Enlace

Purpose We aimed to describe the performance and evaluate the educational value of justifications provided by artificial intelligence chatbots, including GPT-3.5, GPT-4, Bard, Claude, and Bing, on the Peruvian National Medical Licensing Examination (P-NLME). Methods This was a cross-sectional analytical study. On July 25, 2023, each multiple-choice question (MCQ) from the P-NLME was entered into each chatbot (GPT-3, GPT-4, Bing, Bard, and Claude) 3 times. Then, 4 medical educators categorized the MCQs in terms of medical area, item type, and whether the MCQ required Peru-specific knowledge. They assessed the educational value of the justifications from the 2 top performers (GPT-4 and Bing). Results GPT-4 scored 86.7% and Bing scored 82.2%, followed by Bard and Claude, and the historical performance of Peruvian examinees was 55%. Among the factors associated with correct answers, only MCQ...