Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, Peru

Descripción del Articulo

Air pollution is a major problem that affects both human health and the environment, causing millions of premature deaths annually worldwide and severely degrading the state of the planet. Exposure to fine particulate matter, which is highly hazardous, enables these particles to penetrate deeply int...

Descripción completa

Detalles Bibliográficos
Autores: Oblitas Mantilla, Johan Andrés, Escobedo Cárdenas, Edwin Jhonatan
Formato: artículo
Fecha de Publicación:2024
Institución:Universidad de Lima
Repositorio:Revistas - Universidad de Lima
Lenguaje:inglés
OAI Identifier:oai:revistas.ulima.edu.pe:article/7417
Enlace del recurso:https://revistas.ulima.edu.pe/index.php/Interfases/article/view/7417
Nivel de acceso:acceso abierto
Materia:air pollution
air quality
meteorological data
machine learning
XGBoost
LightGBM
contaminación del aire
calidad del aire
datos meteorológicos
aprendizaje automático
id REVULIMA_5a808105d4f00e3a35c89d2219ce2472
oai_identifier_str oai:revistas.ulima.edu.pe:article/7417
network_acronym_str REVULIMA
network_name_str Revistas - Universidad de Lima
repository_id_str
dc.title.none.fl_str_mv Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, Peru
Predicción de concentraciones de PM2.5 y PM10 utilizando los algoritmos XGboost y LightGBM: un estudio de caso en Lima, Perú
title Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, Peru
spellingShingle Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, Peru
Oblitas Mantilla, Johan Andrés
air pollution
air quality
meteorological data
machine learning
XGBoost
LightGBM
contaminación del aire
calidad del aire
datos meteorológicos
aprendizaje automático
XGBoost
LightGBM
title_short Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, Peru
title_full Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, Peru
title_fullStr Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, Peru
title_full_unstemmed Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, Peru
title_sort Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, Peru
dc.creator.none.fl_str_mv Oblitas Mantilla, Johan Andrés
Escobedo Cárdenas, Edwin Jhonatan
author Oblitas Mantilla, Johan Andrés
author_facet Oblitas Mantilla, Johan Andrés
Escobedo Cárdenas, Edwin Jhonatan
author_role author
author2 Escobedo Cárdenas, Edwin Jhonatan
author2_role author
dc.subject.none.fl_str_mv air pollution
air quality
meteorological data
machine learning
XGBoost
LightGBM
contaminación del aire
calidad del aire
datos meteorológicos
aprendizaje automático
XGBoost
LightGBM
topic air pollution
air quality
meteorological data
machine learning
XGBoost
LightGBM
contaminación del aire
calidad del aire
datos meteorológicos
aprendizaje automático
XGBoost
LightGBM
description Air pollution is a major problem that affects both human health and the environment, causing millions of premature deaths annually worldwide and severely degrading the state of the planet. Exposure to fine particulate matter, which is highly hazardous, enables these particles to penetrate deeply into the lungs and lead to serious health issues, including a reduction in life expectancy by more than two years. In response to this problem, it is crucial to identify effective ways to monitor the levels of these pollutants in our daily surroundings. This article presents a case study conducted in the district of San Borja, Lima, Peru, where prediction models for PM2.5 and PM10 were implemented using the XGBoost and LightGBM algorithms. Employing data from the SENAMHI portal and a correlation analysis of variables, two different scenarios were developed for training the models. In scenario 1, prediction models for PM2.5 and PM10 were trained using all available meteorological and pollution variables. In scenario 2, the models were trained for PM2.5 excluding the PM10 variable, and vice versa. The results showed that both models achieved high accuracy, measured by the coefficient of determination, with no statistically significant difference indicating the superiority of either model. Furthermore, the analysis of the proposed scenarios revealed that excluding key variables can result in significantly less accurate predictions, potentially undermining the effectiveness of environmental management strategies.
publishDate 2024
dc.date.none.fl_str_mv 2024-12-26
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv https://revistas.ulima.edu.pe/index.php/Interfases/article/view/7417
10.26439/interfases2024.n020.7417
url https://revistas.ulima.edu.pe/index.php/Interfases/article/view/7417
identifier_str_mv 10.26439/interfases2024.n020.7417
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv https://revistas.ulima.edu.pe/index.php/Interfases/article/view/7417/7473
https://revistas.ulima.edu.pe/index.php/Interfases/article/view/7417/7474
dc.rights.none.fl_str_mv https://creativecommons.org/licenses/by/4.0
info:eu-repo/semantics/openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by/4.0
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
text/html
dc.publisher.none.fl_str_mv Universidad de Lima
publisher.none.fl_str_mv Universidad de Lima
dc.source.none.fl_str_mv Interfases; No. 020 (2024); 185-208
Interfases; Núm. 020 (2024); 185-208
Interfases; n. 020 (2024); 185-208
1993-4912
10.26439/interfases2024.n020
reponame:Revistas - Universidad de Lima
instname:Universidad de Lima
instacron:ULIMA
instname_str Universidad de Lima
instacron_str ULIMA
institution ULIMA
reponame_str Revistas - Universidad de Lima
collection Revistas - Universidad de Lima
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_ 1844893192387821568
spelling Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, PeruPredicción de concentraciones de PM2.5 y PM10 utilizando los algoritmos XGboost y LightGBM: un estudio de caso en Lima, PerúOblitas Mantilla, Johan AndrésEscobedo Cárdenas, Edwin Jhonatanair pollutionair qualitymeteorological datamachine learningXGBoostLightGBMcontaminación del airecalidad del airedatos meteorológicosaprendizaje automáticoXGBoostLightGBMAir pollution is a major problem that affects both human health and the environment, causing millions of premature deaths annually worldwide and severely degrading the state of the planet. Exposure to fine particulate matter, which is highly hazardous, enables these particles to penetrate deeply into the lungs and lead to serious health issues, including a reduction in life expectancy by more than two years. In response to this problem, it is crucial to identify effective ways to monitor the levels of these pollutants in our daily surroundings. This article presents a case study conducted in the district of San Borja, Lima, Peru, where prediction models for PM2.5 and PM10 were implemented using the XGBoost and LightGBM algorithms. Employing data from the SENAMHI portal and a correlation analysis of variables, two different scenarios were developed for training the models. In scenario 1, prediction models for PM2.5 and PM10 were trained using all available meteorological and pollution variables. In scenario 2, the models were trained for PM2.5 excluding the PM10 variable, and vice versa. The results showed that both models achieved high accuracy, measured by the coefficient of determination, with no statistically significant difference indicating the superiority of either model. Furthermore, the analysis of the proposed scenarios revealed that excluding key variables can result in significantly less accurate predictions, potentially undermining the effectiveness of environmental management strategies. La contaminación del aire es un problema importante que afecta tanto a la salud humana como al medio ambiente, causando millones de muertes prematuras anualmente en todo el mundo y degradando severamente el estado del planeta. La exposición a material particulado fino, altamente peligroso, permite que estas partículas penetren profundamente en los pulmones y provoquen problemas de salud graves, incluyendo una reducción en la esperanza de vida de más de dos años. En respuesta a este problema, es crucial identificar formas efectivas de monitorear los niveles de estos contaminantes en nuestro entorno diario. Este artículo presenta un estudio de caso realizado en el distrito de San Borja, Lima, Perú, donde se implementaron modelos de predicción para PM2,5 y PM10 utilizando los algoritmos XGBoost y LightGBM. Empleando datos del portal del SENAMHI y un análisis de correlación de variables, se desarrollaron dos escenarios diferentes para el entrenamiento de los modelos. En el escenario 1, se entrenaron modelos de predicción para PM2,5 y PM10 utilizando todas las variables meteorológicas y de contaminación disponibles. En el escenario 2, los modelos se entrenaron para PM2,5 excluyendo la variable PM10, y viceversa. Los resultados mostraron que ambos modelos lograron una alta precisión, medida por el coeficiente de determinación, sin diferencias estadísticamente significativas que indicaran la superioridad de alguno de los modelos. Además, el análisis de los escenarios propuestos reveló que excluir variables clave puede resultar en predicciones significativamente menos precisas, lo que podría comprometer la efectividad de las estrategias de gestión ambiental.Universidad de Lima2024-12-26info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdftext/htmlhttps://revistas.ulima.edu.pe/index.php/Interfases/article/view/741710.26439/interfases2024.n020.7417Interfases; No. 020 (2024); 185-208Interfases; Núm. 020 (2024); 185-208Interfases; n. 020 (2024); 185-2081993-491210.26439/interfases2024.n020reponame:Revistas - Universidad de Limainstname:Universidad de Limainstacron:ULIMAenghttps://revistas.ulima.edu.pe/index.php/Interfases/article/view/7417/7473https://revistas.ulima.edu.pe/index.php/Interfases/article/view/7417/7474https://creativecommons.org/licenses/by/4.0info:eu-repo/semantics/openAccessoai:revistas.ulima.edu.pe:article/74172025-04-30T15:32:28Z
score 12.615219
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).