Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, Peru
Descripción del Articulo
Air pollution is a major problem that affects both human health and the environment, causing millions of premature deaths annually worldwide and severely degrading the state of the planet. Exposure to fine particulate matter, which is highly hazardous, enables these particles to penetrate deeply int...
Autores: | , |
---|---|
Formato: | artículo |
Fecha de Publicación: | 2024 |
Institución: | Universidad de Lima |
Repositorio: | Revistas - Universidad de Lima |
Lenguaje: | inglés |
OAI Identifier: | oai:revistas.ulima.edu.pe:article/7417 |
Enlace del recurso: | https://revistas.ulima.edu.pe/index.php/Interfases/article/view/7417 |
Nivel de acceso: | acceso abierto |
Materia: | air pollution air quality meteorological data machine learning XGBoost LightGBM contaminación del aire calidad del aire datos meteorológicos aprendizaje automático |
id |
REVULIMA_5a808105d4f00e3a35c89d2219ce2472 |
---|---|
oai_identifier_str |
oai:revistas.ulima.edu.pe:article/7417 |
network_acronym_str |
REVULIMA |
network_name_str |
Revistas - Universidad de Lima |
repository_id_str |
|
dc.title.none.fl_str_mv |
Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, Peru Predicción de concentraciones de PM2.5 y PM10 utilizando los algoritmos XGboost y LightGBM: un estudio de caso en Lima, Perú |
title |
Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, Peru |
spellingShingle |
Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, Peru Oblitas Mantilla, Johan Andrés air pollution air quality meteorological data machine learning XGBoost LightGBM contaminación del aire calidad del aire datos meteorológicos aprendizaje automático XGBoost LightGBM |
title_short |
Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, Peru |
title_full |
Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, Peru |
title_fullStr |
Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, Peru |
title_full_unstemmed |
Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, Peru |
title_sort |
Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, Peru |
dc.creator.none.fl_str_mv |
Oblitas Mantilla, Johan Andrés Escobedo Cárdenas, Edwin Jhonatan |
author |
Oblitas Mantilla, Johan Andrés |
author_facet |
Oblitas Mantilla, Johan Andrés Escobedo Cárdenas, Edwin Jhonatan |
author_role |
author |
author2 |
Escobedo Cárdenas, Edwin Jhonatan |
author2_role |
author |
dc.subject.none.fl_str_mv |
air pollution air quality meteorological data machine learning XGBoost LightGBM contaminación del aire calidad del aire datos meteorológicos aprendizaje automático XGBoost LightGBM |
topic |
air pollution air quality meteorological data machine learning XGBoost LightGBM contaminación del aire calidad del aire datos meteorológicos aprendizaje automático XGBoost LightGBM |
description |
Air pollution is a major problem that affects both human health and the environment, causing millions of premature deaths annually worldwide and severely degrading the state of the planet. Exposure to fine particulate matter, which is highly hazardous, enables these particles to penetrate deeply into the lungs and lead to serious health issues, including a reduction in life expectancy by more than two years. In response to this problem, it is crucial to identify effective ways to monitor the levels of these pollutants in our daily surroundings. This article presents a case study conducted in the district of San Borja, Lima, Peru, where prediction models for PM2.5 and PM10 were implemented using the XGBoost and LightGBM algorithms. Employing data from the SENAMHI portal and a correlation analysis of variables, two different scenarios were developed for training the models. In scenario 1, prediction models for PM2.5 and PM10 were trained using all available meteorological and pollution variables. In scenario 2, the models were trained for PM2.5 excluding the PM10 variable, and vice versa. The results showed that both models achieved high accuracy, measured by the coefficient of determination, with no statistically significant difference indicating the superiority of either model. Furthermore, the analysis of the proposed scenarios revealed that excluding key variables can result in significantly less accurate predictions, potentially undermining the effectiveness of environmental management strategies. |
publishDate |
2024 |
dc.date.none.fl_str_mv |
2024-12-26 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
https://revistas.ulima.edu.pe/index.php/Interfases/article/view/7417 10.26439/interfases2024.n020.7417 |
url |
https://revistas.ulima.edu.pe/index.php/Interfases/article/view/7417 |
identifier_str_mv |
10.26439/interfases2024.n020.7417 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
https://revistas.ulima.edu.pe/index.php/Interfases/article/view/7417/7473 https://revistas.ulima.edu.pe/index.php/Interfases/article/view/7417/7474 |
dc.rights.none.fl_str_mv |
https://creativecommons.org/licenses/by/4.0 info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by/4.0 |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf text/html |
dc.publisher.none.fl_str_mv |
Universidad de Lima |
publisher.none.fl_str_mv |
Universidad de Lima |
dc.source.none.fl_str_mv |
Interfases; No. 020 (2024); 185-208 Interfases; Núm. 020 (2024); 185-208 Interfases; n. 020 (2024); 185-208 1993-4912 10.26439/interfases2024.n020 reponame:Revistas - Universidad de Lima instname:Universidad de Lima instacron:ULIMA |
instname_str |
Universidad de Lima |
instacron_str |
ULIMA |
institution |
ULIMA |
reponame_str |
Revistas - Universidad de Lima |
collection |
Revistas - Universidad de Lima |
repository.name.fl_str_mv |
|
repository.mail.fl_str_mv |
|
_version_ |
1844893192387821568 |
spelling |
Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, PeruPredicción de concentraciones de PM2.5 y PM10 utilizando los algoritmos XGboost y LightGBM: un estudio de caso en Lima, PerúOblitas Mantilla, Johan AndrésEscobedo Cárdenas, Edwin Jhonatanair pollutionair qualitymeteorological datamachine learningXGBoostLightGBMcontaminación del airecalidad del airedatos meteorológicosaprendizaje automáticoXGBoostLightGBMAir pollution is a major problem that affects both human health and the environment, causing millions of premature deaths annually worldwide and severely degrading the state of the planet. Exposure to fine particulate matter, which is highly hazardous, enables these particles to penetrate deeply into the lungs and lead to serious health issues, including a reduction in life expectancy by more than two years. In response to this problem, it is crucial to identify effective ways to monitor the levels of these pollutants in our daily surroundings. This article presents a case study conducted in the district of San Borja, Lima, Peru, where prediction models for PM2.5 and PM10 were implemented using the XGBoost and LightGBM algorithms. Employing data from the SENAMHI portal and a correlation analysis of variables, two different scenarios were developed for training the models. In scenario 1, prediction models for PM2.5 and PM10 were trained using all available meteorological and pollution variables. In scenario 2, the models were trained for PM2.5 excluding the PM10 variable, and vice versa. The results showed that both models achieved high accuracy, measured by the coefficient of determination, with no statistically significant difference indicating the superiority of either model. Furthermore, the analysis of the proposed scenarios revealed that excluding key variables can result in significantly less accurate predictions, potentially undermining the effectiveness of environmental management strategies. La contaminación del aire es un problema importante que afecta tanto a la salud humana como al medio ambiente, causando millones de muertes prematuras anualmente en todo el mundo y degradando severamente el estado del planeta. La exposición a material particulado fino, altamente peligroso, permite que estas partículas penetren profundamente en los pulmones y provoquen problemas de salud graves, incluyendo una reducción en la esperanza de vida de más de dos años. En respuesta a este problema, es crucial identificar formas efectivas de monitorear los niveles de estos contaminantes en nuestro entorno diario. Este artículo presenta un estudio de caso realizado en el distrito de San Borja, Lima, Perú, donde se implementaron modelos de predicción para PM2,5 y PM10 utilizando los algoritmos XGBoost y LightGBM. Empleando datos del portal del SENAMHI y un análisis de correlación de variables, se desarrollaron dos escenarios diferentes para el entrenamiento de los modelos. En el escenario 1, se entrenaron modelos de predicción para PM2,5 y PM10 utilizando todas las variables meteorológicas y de contaminación disponibles. En el escenario 2, los modelos se entrenaron para PM2,5 excluyendo la variable PM10, y viceversa. Los resultados mostraron que ambos modelos lograron una alta precisión, medida por el coeficiente de determinación, sin diferencias estadísticamente significativas que indicaran la superioridad de alguno de los modelos. Además, el análisis de los escenarios propuestos reveló que excluir variables clave puede resultar en predicciones significativamente menos precisas, lo que podría comprometer la efectividad de las estrategias de gestión ambiental.Universidad de Lima2024-12-26info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdftext/htmlhttps://revistas.ulima.edu.pe/index.php/Interfases/article/view/741710.26439/interfases2024.n020.7417Interfases; No. 020 (2024); 185-208Interfases; Núm. 020 (2024); 185-208Interfases; n. 020 (2024); 185-2081993-491210.26439/interfases2024.n020reponame:Revistas - Universidad de Limainstname:Universidad de Limainstacron:ULIMAenghttps://revistas.ulima.edu.pe/index.php/Interfases/article/view/7417/7473https://revistas.ulima.edu.pe/index.php/Interfases/article/view/7417/7474https://creativecommons.org/licenses/by/4.0info:eu-repo/semantics/openAccessoai:revistas.ulima.edu.pe:article/74172025-04-30T15:32:28Z |
score |
12.615219 |
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).