ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination
Descripción del Articulo
Objective: To evaluate ChatGPT's performance onNLMEs worldwide and determine whether it could achieve licensure to practice medicine across different countries. Methods: We searched PubMed, Scopus, andGoogle Scholarforstudies evaluating ChatGPT's performance onNLMEs. Reference lists of inc...
| Autores: | , , |
|---|---|
| Formato: | artículo |
| Fecha de Publicación: | 2025 |
| Institución: | Colegio Médico del Perú |
| Repositorio: | Acta Médica Peruana |
| Lenguaje: | inglés |
| OAI Identifier: | oai:amp.cmp.org.pe:article/3706 |
| Enlace del recurso: | https://amp.cmp.org.pe/index.php/AMP/article/view/3706 |
| Nivel de acceso: | acceso abierto |
| Materia: | Medical education Artificial Intelligence ChatGPT Generative Artificial Intelligence |
| Sumario: | Objective: To evaluate ChatGPT's performance onNLMEs worldwide and determine whether it could achieve licensure to practice medicine across different countries. Methods: We searched PubMed, Scopus, andGoogle Scholarforstudies evaluating ChatGPT's performance onNLMEs. Reference lists of included studies were also reviewed. Two reviewers independently screened studies and extracted the accuracy rates(performance) of GPT-3.5 and GPT-4, including those that passed thresholds, human examinee scores, and other study characteristics. The risk of bias was assessed using the JBI Critical Appraisal Checklist for Prevalence Studies. Results: We identified 37 studies evaluating ChatGPT's performance across 18 NLMEs. Most studies assessed the United States, Chinese, and Japanese examinations. While most studies used official datasets, others relied on unofficial third-party sources, and few employed advanced prompting techniques.GPT-4 wassuperiortoGPT-3.5 in allNLMEs, with accuracy rates ranging from 67% to 89%. GPT-4 passed all 18 NLMEs (100%), while GPT-3.5 passed 10 of 15 (67%). Compared to human examinees, GPT-4 outperformed the average score in 6 of 7 NLMEs (86%); the sole exception was Japan, where examinees achieved 84.9% versus 81.5% for GPT-4. Conclusion: Current evidence demonstrates that GPT-4 can pass all 18 NLMEs evaluated, surpassing human examinees in most cases. However, this finding likely reflects low passing thresholds rather than AI superiority over physicians. |
|---|
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).