ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination
Descripción del Articulo
Objective: To evaluate ChatGPT's performance onNLMEs worldwide and determine whether it could achieve licensure to practice medicine across different countries. Methods: We searched PubMed, Scopus, andGoogle Scholarforstudies evaluating ChatGPT's performance onNLMEs. Reference lists of inc...
| Autores: | , , |
|---|---|
| Formato: | artículo |
| Fecha de Publicación: | 2025 |
| Institución: | Colegio Médico del Perú |
| Repositorio: | Acta Médica Peruana |
| Lenguaje: | inglés |
| OAI Identifier: | oai:amp.cmp.org.pe:article/3706 |
| Enlace del recurso: | https://amp.cmp.org.pe/index.php/AMP/article/view/3706 |
| Nivel de acceso: | acceso abierto |
| Materia: | Medical education Artificial Intelligence ChatGPT Generative Artificial Intelligence |
| id |
REVCMP_3b134ef1af9b41a05685edb74875c5aa |
|---|---|
| oai_identifier_str |
oai:amp.cmp.org.pe:article/3706 |
| network_acronym_str |
REVCMP |
| network_name_str |
Acta Médica Peruana |
| repository_id_str |
. |
| dc.title.none.fl_str_mv |
ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination |
| title |
ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination |
| spellingShingle |
ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination Flores-Cohaila, Javier A. Medical education Artificial Intelligence ChatGPT Generative Artificial Intelligence Medical education Artificial Intelligence ChatGPT Generative Artificial Intelligence |
| title_short |
ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination |
| title_full |
ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination |
| title_fullStr |
ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination |
| title_full_unstemmed |
ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination |
| title_sort |
ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination |
| dc.creator.none.fl_str_mv |
Flores-Cohaila, Javier A. Miranda Chavez, Brayan Mayta-Tristán, Percy Flores-Cohaila, Javier A. Miranda Chavez, Brayan Mayta-Tristán, Percy |
| author |
Flores-Cohaila, Javier A. |
| author_facet |
Flores-Cohaila, Javier A. Miranda Chavez, Brayan Mayta-Tristán, Percy |
| author_role |
author |
| author2 |
Miranda Chavez, Brayan Mayta-Tristán, Percy |
| author2_role |
author author |
| dc.subject.none.fl_str_mv |
Medical education Artificial Intelligence ChatGPT Generative Artificial Intelligence Medical education Artificial Intelligence ChatGPT Generative Artificial Intelligence |
| topic |
Medical education Artificial Intelligence ChatGPT Generative Artificial Intelligence Medical education Artificial Intelligence ChatGPT Generative Artificial Intelligence |
| description |
Objective: To evaluate ChatGPT's performance onNLMEs worldwide and determine whether it could achieve licensure to practice medicine across different countries. Methods: We searched PubMed, Scopus, andGoogle Scholarforstudies evaluating ChatGPT's performance onNLMEs. Reference lists of included studies were also reviewed. Two reviewers independently screened studies and extracted the accuracy rates(performance) of GPT-3.5 and GPT-4, including those that passed thresholds, human examinee scores, and other study characteristics. The risk of bias was assessed using the JBI Critical Appraisal Checklist for Prevalence Studies. Results: We identified 37 studies evaluating ChatGPT's performance across 18 NLMEs. Most studies assessed the United States, Chinese, and Japanese examinations. While most studies used official datasets, others relied on unofficial third-party sources, and few employed advanced prompting techniques.GPT-4 wassuperiortoGPT-3.5 in allNLMEs, with accuracy rates ranging from 67% to 89%. GPT-4 passed all 18 NLMEs (100%), while GPT-3.5 passed 10 of 15 (67%). Compared to human examinees, GPT-4 outperformed the average score in 6 of 7 NLMEs (86%); the sole exception was Japan, where examinees achieved 84.9% versus 81.5% for GPT-4. Conclusion: Current evidence demonstrates that GPT-4 can pass all 18 NLMEs evaluated, surpassing human examinees in most cases. However, this finding likely reflects low passing thresholds rather than AI superiority over physicians. |
| publishDate |
2025 |
| dc.date.none.fl_str_mv |
2025-12-30 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.none.fl_str_mv |
https://amp.cmp.org.pe/index.php/AMP/article/view/3706 10.35663/amp.2025.424.3706 |
| url |
https://amp.cmp.org.pe/index.php/AMP/article/view/3706 |
| identifier_str_mv |
10.35663/amp.2025.424.3706 |
| dc.language.none.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
https://amp.cmp.org.pe/index.php/AMP/article/view/3706/2040 |
| dc.rights.none.fl_str_mv |
Copyright (c) 2025 Javier A. Flores-Cohaila, Brayan Miranda Chavez, Percy Mayta-Tristán https://creativecommons.org/licenses/by/4.0 info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
Copyright (c) 2025 Javier A. Flores-Cohaila, Brayan Miranda Chavez, Percy Mayta-Tristán https://creativecommons.org/licenses/by/4.0 |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
Colegio Médico del Perú |
| publisher.none.fl_str_mv |
Colegio Médico del Perú |
| dc.source.none.fl_str_mv |
ACTA MEDICA PERUANA; Vol. 42 No. 4 (2025): October - December; 284-293 ACTA MEDICA PERUANA; Vol. 42 Núm. 4 (2025): Octubre - Diciembre; 284-293 1728-5917 1018-8800 reponame:Acta Médica Peruana instname:Colegio Médico del Perú instacron:CMP |
| instname_str |
Colegio Médico del Perú |
| instacron_str |
CMP |
| institution |
CMP |
| reponame_str |
Acta Médica Peruana |
| collection |
Acta Médica Peruana |
| repository.name.fl_str_mv |
|
| repository.mail.fl_str_mv |
|
| _version_ |
1864906918668009472 |
| spelling |
ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examinationChatGPT as a global doctor: a rapid review of its performance on national licensing medical examinationFlores-Cohaila, Javier A.Miranda Chavez, BrayanMayta-Tristán, PercyFlores-Cohaila, Javier A.Miranda Chavez, BrayanMayta-Tristán, Percy Medical educationArtificial IntelligenceChatGPTGenerative Artificial Intelligence Medical educationArtificial IntelligenceChatGPTGenerative Artificial IntelligenceObjective: To evaluate ChatGPT's performance onNLMEs worldwide and determine whether it could achieve licensure to practice medicine across different countries. Methods: We searched PubMed, Scopus, andGoogle Scholarforstudies evaluating ChatGPT's performance onNLMEs. Reference lists of included studies were also reviewed. Two reviewers independently screened studies and extracted the accuracy rates(performance) of GPT-3.5 and GPT-4, including those that passed thresholds, human examinee scores, and other study characteristics. The risk of bias was assessed using the JBI Critical Appraisal Checklist for Prevalence Studies. Results: We identified 37 studies evaluating ChatGPT's performance across 18 NLMEs. Most studies assessed the United States, Chinese, and Japanese examinations. While most studies used official datasets, others relied on unofficial third-party sources, and few employed advanced prompting techniques.GPT-4 wassuperiortoGPT-3.5 in allNLMEs, with accuracy rates ranging from 67% to 89%. GPT-4 passed all 18 NLMEs (100%), while GPT-3.5 passed 10 of 15 (67%). Compared to human examinees, GPT-4 outperformed the average score in 6 of 7 NLMEs (86%); the sole exception was Japan, where examinees achieved 84.9% versus 81.5% for GPT-4. Conclusion: Current evidence demonstrates that GPT-4 can pass all 18 NLMEs evaluated, surpassing human examinees in most cases. However, this finding likely reflects low passing thresholds rather than AI superiority over physicians.Objective: To evaluate ChatGPT's performance onNLMEs worldwide and determine whether it could achieve licensure to practice medicine across different countries. Methods: We searched PubMed, Scopus, andGoogle Scholarforstudies evaluating ChatGPT's performance onNLMEs. Reference lists of included studies were also reviewed. Two reviewers independently screened studies and extracted the accuracy rates(performance) of GPT-3.5 and GPT-4, including those that passed thresholds, human examinee scores, and other study characteristics. The risk of bias was assessed using the JBI Critical Appraisal Checklist for Prevalence Studies. Results: We identified 37 studies evaluating ChatGPT's performance across 18 NLMEs. Most studies assessed the United States, Chinese, and Japanese examinations. While most studies used official datasets, others relied on unofficial third-party sources, and few employed advanced prompting techniques.GPT-4 wassuperiortoGPT-3.5 in allNLMEs, with accuracy rates ranging from 67% to 89%. GPT-4 passed all 18 NLMEs (100%), while GPT-3.5 passed 10 of 15 (67%). Compared to human examinees, GPT-4 outperformed the average score in 6 of 7 NLMEs (86%); the sole exception was Japan, where examinees achieved 84.9% versus 81.5% for GPT-4. Conclusion: Current evidence demonstrates that GPT-4 can pass all 18 NLMEs evaluated, surpassing human examinees in most cases. However, this finding likely reflects low passing thresholds rather than AI superiority over physicians.Colegio Médico del Perú2025-12-30info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://amp.cmp.org.pe/index.php/AMP/article/view/370610.35663/amp.2025.424.3706ACTA MEDICA PERUANA; Vol. 42 No. 4 (2025): October - December; 284-293ACTA MEDICA PERUANA; Vol. 42 Núm. 4 (2025): Octubre - Diciembre; 284-2931728-59171018-8800reponame:Acta Médica Peruanainstname:Colegio Médico del Perúinstacron:CMPenghttps://amp.cmp.org.pe/index.php/AMP/article/view/3706/2040Copyright (c) 2025 Javier A. Flores-Cohaila, Brayan Miranda Chavez, Percy Mayta-Tristánhttps://creativecommons.org/licenses/by/4.0info:eu-repo/semantics/openAccessoai:amp.cmp.org.pe:article/37062026-02-14T05:21:22Z |
| score |
13.069414 |
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).