ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination

Flores-Cohaila, Javier A.; Miranda Chavez, Brayan; Mayta-Tristán, Percy

ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination

Descripción del Articulo

Objective: To evaluate ChatGPT's performance onNLMEs worldwide and determine whether it could achieve licensure to practice medicine across different countries. Methods: We searched PubMed, Scopus, andGoogle Scholarforstudies evaluating ChatGPT's performance onNLMEs. Reference lists of inc...

Descripción completa

Detalles Bibliográficos
Autores:	Flores-Cohaila, Javier A., Miranda Chavez, Brayan, Mayta-Tristán, Percy
Formato:	artículo
Fecha de Publicación:	2025
Institución:	Colegio Médico del Perú
Repositorio:	Acta Médica Peruana
Lenguaje:	inglés
OAI Identifier:	oai:amp.cmp.org.pe:article/3706
Enlace del recurso:	https://amp.cmp.org.pe/index.php/AMP/article/view/3706
Nivel de acceso:	acceso abierto
Materia:	Medical education Artificial Intelligence ChatGPT Generative Artificial Intelligence

id	REVCMP_3b134ef1af9b41a05685edb74875c5aa
oai_identifier_str	oai:amp.cmp.org.pe:article/3706
network_acronym_str	REVCMP
network_name_str	Acta Médica Peruana
repository_id_str	.
dc.title.none.fl_str_mv	ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination
title	ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination
spellingShingle	ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination Flores-Cohaila, Javier A. Medical education Artificial Intelligence ChatGPT Generative Artificial Intelligence Medical education Artificial Intelligence ChatGPT Generative Artificial Intelligence
title_short	ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination
title_full	ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination
title_fullStr	ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination
title_full_unstemmed	ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination
title_sort	ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination
dc.creator.none.fl_str_mv	Flores-Cohaila, Javier A. Miranda Chavez, Brayan Mayta-Tristán, Percy Flores-Cohaila, Javier A. Miranda Chavez, Brayan Mayta-Tristán, Percy
author	Flores-Cohaila, Javier A.
author_facet	Flores-Cohaila, Javier A. Miranda Chavez, Brayan Mayta-Tristán, Percy
author_role	author
author2	Miranda Chavez, Brayan Mayta-Tristán, Percy
author2_role	author author
dc.subject.none.fl_str_mv	Medical education Artificial Intelligence ChatGPT Generative Artificial Intelligence Medical education Artificial Intelligence ChatGPT Generative Artificial Intelligence
topic	Medical education Artificial Intelligence ChatGPT Generative Artificial Intelligence Medical education Artificial Intelligence ChatGPT Generative Artificial Intelligence
description	Objective: To evaluate ChatGPT's performance onNLMEs worldwide and determine whether it could achieve licensure to practice medicine across different countries. Methods: We searched PubMed, Scopus, andGoogle Scholarforstudies evaluating ChatGPT's performance onNLMEs. Reference lists of included studies were also reviewed. Two reviewers independently screened studies and extracted the accuracy rates(performance) of GPT-3.5 and GPT-4, including those that passed thresholds, human examinee scores, and other study characteristics. The risk of bias was assessed using the JBI Critical Appraisal Checklist for Prevalence Studies. Results: We identified 37 studies evaluating ChatGPT's performance across 18 NLMEs. Most studies assessed the United States, Chinese, and Japanese examinations. While most studies used official datasets, others relied on unofficial third-party sources, and few employed advanced prompting techniques.GPT-4 wassuperiortoGPT-3.5 in allNLMEs, with accuracy rates ranging from 67% to 89%. GPT-4 passed all 18 NLMEs (100%), while GPT-3.5 passed 10 of 15 (67%). Compared to human examinees, GPT-4 outperformed the average score in 6 of 7 NLMEs (86%); the sole exception was Japan, where examinees achieved 84.9% versus 81.5% for GPT-4. Conclusion: Current evidence demonstrates that GPT-4 can pass all 18 NLMEs evaluated, surpassing human examinees in most cases. However, this finding likely reflects low passing thresholds rather than AI superiority over physicians.
publishDate	2025
dc.date.none.fl_str_mv	2025-12-30
dc.type.none.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion
format	article
status_str	publishedVersion
dc.identifier.none.fl_str_mv	https://amp.cmp.org.pe/index.php/AMP/article/view/3706 10.35663/amp.2025.424.3706
url	https://amp.cmp.org.pe/index.php/AMP/article/view/3706
identifier_str_mv	10.35663/amp.2025.424.3706
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	https://amp.cmp.org.pe/index.php/AMP/article/view/3706/2040
dc.rights.none.fl_str_mv	Copyright (c) 2025 Javier A. Flores-Cohaila, Brayan Miranda Chavez, Percy Mayta-Tristán https://creativecommons.org/licenses/by/4.0 info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Copyright (c) 2025 Javier A. Flores-Cohaila, Brayan Miranda Chavez, Percy Mayta-Tristán https://creativecommons.org/licenses/by/4.0
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Colegio Médico del Perú
publisher.none.fl_str_mv	Colegio Médico del Perú
dc.source.none.fl_str_mv	ACTA MEDICA PERUANA; Vol. 42 No. 4 (2025): October - December; 284-293 ACTA MEDICA PERUANA; Vol. 42 Núm. 4 (2025): Octubre - Diciembre; 284-293 1728-5917 1018-8800 reponame:Acta Médica Peruana instname:Colegio Médico del Perú instacron:CMP
instname_str	Colegio Médico del Perú
instacron_str	CMP
institution	CMP
reponame_str	Acta Médica Peruana
collection	Acta Médica Peruana
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_	1864906918668009472
spelling	ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examinationChatGPT as a global doctor: a rapid review of its performance on national licensing medical examinationFlores-Cohaila, Javier A.Miranda Chavez, BrayanMayta-Tristán, PercyFlores-Cohaila, Javier A.Miranda Chavez, BrayanMayta-Tristán, Percy Medical educationArtificial IntelligenceChatGPTGenerative Artificial Intelligence Medical educationArtificial IntelligenceChatGPTGenerative Artificial IntelligenceObjective: To evaluate ChatGPT's performance onNLMEs worldwide and determine whether it could achieve licensure to practice medicine across different countries. Methods: We searched PubMed, Scopus, andGoogle Scholarforstudies evaluating ChatGPT's performance onNLMEs. Reference lists of included studies were also reviewed. Two reviewers independently screened studies and extracted the accuracy rates(performance) of GPT-3.5 and GPT-4, including those that passed thresholds, human examinee scores, and other study characteristics. The risk of bias was assessed using the JBI Critical Appraisal Checklist for Prevalence Studies. Results: We identified 37 studies evaluating ChatGPT's performance across 18 NLMEs. Most studies assessed the United States, Chinese, and Japanese examinations. While most studies used official datasets, others relied on unofficial third-party sources, and few employed advanced prompting techniques.GPT-4 wassuperiortoGPT-3.5 in allNLMEs, with accuracy rates ranging from 67% to 89%. GPT-4 passed all 18 NLMEs (100%), while GPT-3.5 passed 10 of 15 (67%). Compared to human examinees, GPT-4 outperformed the average score in 6 of 7 NLMEs (86%); the sole exception was Japan, where examinees achieved 84.9% versus 81.5% for GPT-4. Conclusion: Current evidence demonstrates that GPT-4 can pass all 18 NLMEs evaluated, surpassing human examinees in most cases. However, this finding likely reflects low passing thresholds rather than AI superiority over physicians.Objective: To evaluate ChatGPT's performance onNLMEs worldwide and determine whether it could achieve licensure to practice medicine across different countries. Methods: We searched PubMed, Scopus, andGoogle Scholarforstudies evaluating ChatGPT's performance onNLMEs. Reference lists of included studies were also reviewed. Two reviewers independently screened studies and extracted the accuracy rates(performance) of GPT-3.5 and GPT-4, including those that passed thresholds, human examinee scores, and other study characteristics. The risk of bias was assessed using the JBI Critical Appraisal Checklist for Prevalence Studies. Results: We identified 37 studies evaluating ChatGPT's performance across 18 NLMEs. Most studies assessed the United States, Chinese, and Japanese examinations. While most studies used official datasets, others relied on unofficial third-party sources, and few employed advanced prompting techniques.GPT-4 wassuperiortoGPT-3.5 in allNLMEs, with accuracy rates ranging from 67% to 89%. GPT-4 passed all 18 NLMEs (100%), while GPT-3.5 passed 10 of 15 (67%). Compared to human examinees, GPT-4 outperformed the average score in 6 of 7 NLMEs (86%); the sole exception was Japan, where examinees achieved 84.9% versus 81.5% for GPT-4. Conclusion: Current evidence demonstrates that GPT-4 can pass all 18 NLMEs evaluated, surpassing human examinees in most cases. However, this finding likely reflects low passing thresholds rather than AI superiority over physicians.Colegio Médico del Perú2025-12-30info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://amp.cmp.org.pe/index.php/AMP/article/view/370610.35663/amp.2025.424.3706ACTA MEDICA PERUANA; Vol. 42 No. 4 (2025): October - December; 284-293ACTA MEDICA PERUANA; Vol. 42 Núm. 4 (2025): Octubre - Diciembre; 284-2931728-59171018-8800reponame:Acta Médica Peruanainstname:Colegio Médico del Perúinstacron:CMPenghttps://amp.cmp.org.pe/index.php/AMP/article/view/3706/2040Copyright (c) 2025 Javier A. Flores-Cohaila, Brayan Miranda Chavez, Percy Mayta-Tristánhttps://creativecommons.org/licenses/by/4.0info:eu-repo/semantics/openAccessoai:amp.cmp.org.pe:article/37062026-02-14T05:21:22Z
score	13.069414

ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination

Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).

ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination

Descripción del Articulo

Ejemplares Similares