Predictive analysis of vector-borne diseases through tabular classification of epidemiological data

Descripción del Articulo

Vector-borne diseases (VBDs) are major threats to human health. They are estimated to cause more than 700,000 deaths each year. This presents serious health problems for CBD. In recent years, the incidence of VBDs has increased globally, affecting one billion people approximately and accounting for...

Descripción completa

Detalles Bibliográficos
Autores: Iparraguirre-Villanueva, Orlando, Cabanillas-Carbonell, Michael
Formato: artículo
Fecha de Publicación:2024
Institución:Universidad Tecnológica del Perú
Repositorio:UTP-Institucional
Lenguaje:inglés
OAI Identifier:oai:repositorio.utp.edu.pe:20.500.12867/14482
Enlace del recurso:https://hdl.handle.net/20.500.12867/14482
https://doi.org/10.3991/ijoe.v20i13.50437
Nivel de acceso:acceso abierto
Materia:Prediction
Machine learning
Epidemiological data
Models
https://purl.org/pe-repo/ocde/ford#2.02.04
Descripción
Sumario:Vector-borne diseases (VBDs) are major threats to human health. They are estimated to cause more than 700,000 deaths each year. This presents serious health problems for CBD. In recent years, the incidence of VBDs has increased globally, affecting one billion people approximately and accounting for 17% of all infectious diseases. Globally, disease rates have risen at an alarming rate, with more than 3.9 billion people at risk of infection. Therefore, it is essential to find approaches to detect these diseases; this is where machine learning (ML) models come into play. The purpose of this study was to predict VBDs using tabular epidemiological data. For this purpose, a set of ML models was used, such as support vector classifier (SVC), extreme gradient boosting (XGBoost), LightGBM, CatBoost, random forest (RF), and balanced random forest (BRF). A dataset consisting of 65 features and 1262 records was used during the training stage. The results highlighted the successful integration of the different models, such as SVC, XGBoost, LightGBM, CatBoost, BRF, and RF, with weights of 0.49959 ± 0.27112, 0.58496 ± 0.22619, 0.48482 ± 0.29971, 0.54992 ± 0.27982, 0.24924 ± 0.22654, and 0.45592 ±0.25849. In addition, the BRF model stood out for having the lowest log loss, evaluated through the ensemble log-loss metric, with an average of 0.24924 and a standard deviation of 0.22654.
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).