Optimizing Credit Risk Prediction in the Financial Sector Using Boosting Algorithms: A Comparative Study with Financial Datasets

Descripción del Articulo

Credit risk is a significant concern for financial institutions. Despite advances in predictive models, there is still room for improvement in accurately assessing credit risk. This study focuses on developing a methodological process to predict credit risk in the financial sector using algorithms b...

Descripción completa

Detalles Bibliográficos
Autores: Villanueva Mora, Renzo Orlando, Escobedo Cárdenas, Edwin Jhonatan
Formato: artículo
Fecha de Publicación:2025
Institución:Universidad de Lima
Repositorio:ULIMA-Institucional
Lenguaje:inglés
OAI Identifier:oai:repositorio.ulima.edu.pe:20.500.12724/24465
Enlace del recurso:https://hdl.handle.net/20.500.12724/24465
https://doi.org/10.13053/CyS-29-2-5173
Nivel de acceso:acceso abierto
Materia:Pendiente
https://purl.org/pe-repo/ocde/ford#2.02.04
Descripción
Sumario:Credit risk is a significant concern for financial institutions. Despite advances in predictive models, there is still room for improvement in accurately assessing credit risk. This study focuses on developing a methodological process to predict credit risk in the financial sector using algorithms based on boosting techniques, such as XGBoost, LightGBM and Boosted Random Forest. We found that datasets with good accessibility and an appropriate variable distribution are contained in the UCI Machine Learning Repository. These datasets are potential to outperform results with different metrics, such as the F-Score and the Area Under the Curve. The datasets used include Statlog German Credit Data, Statlog Australian Credit Approval, Bank Marketing, Credit Approval, and South German Credit Data. The approach involves feature engineering, exploratory data analysis, and hyperparameter tuning. Furthermore, we propose a new strategy that involves adding a column based on an unsupervised algorithm such as Kmeans. Our results indicate that XGBoost has better performance than LightGBM and Boosted Random Forest in different scenarios. Finally, the performance of these boosting-based models is superior to that of Boosted Decision Trees and Factorization Machine models from previous studies. These findings are important for financial institutions seeking an effective methodology to improve credit risk prediction rate.
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).