Classification of Tweets Related to Natural Disasters Using Machine Learning Algorithms

Descripción del Articulo

Abstract—In recent years, computer science has advanced exponentially, helping significantly to identify and classify text extracted from social networks, specifically Twitter. This work identifies, classifies, and analyzes tweets related to real natural disasters through tweets with the hashtag #Na...

Descripción completa

Detalles Bibliográficos
Autores: Iparraguirre-Villanueva, Orlando, Melgarejo-Graciano, Melquiades, Castro-Leon, Gloria, Olaya-Cotera, Sandro, John, Ruiz-Alvarado, Epifanía-Huerta, Andrés, Cabanillas-Carbonell, Michael, Zapata-Paulini, Joselyn
Formato: artículo
Fecha de Publicación:2023
Institución:Universidad Autónoma del Perú
Repositorio:AUTONOMA-Institucional
Lenguaje:inglés
OAI Identifier:oai:repositorio.autonoma.edu.pe:20.500.13067/2875
Enlace del recurso:https://hdl.handle.net/20.500.13067/2875
https://doi.org/10.3991/ijim.v17i14.39907
Nivel de acceso:acceso abierto
Materia:Classification
Tweets
Disasters
Machine learning
Natural
https://purl.org/pe-repo/ocde/ford#2.02.04
id AUTO_9d998482f225b9265dd9036e0821e758
oai_identifier_str oai:repositorio.autonoma.edu.pe:20.500.13067/2875
network_acronym_str AUTO
network_name_str AUTONOMA-Institucional
repository_id_str 4774
spelling Iparraguirre-Villanueva, OrlandoMelgarejo-Graciano, MelquiadesCastro-Leon, GloriaOlaya-Cotera, SandroJohn, Ruiz-AlvaradoEpifanía-Huerta, AndrésCabanillas-Carbonell, MichaelZapata-Paulini, Joselyn2023-12-20T15:11:40Z2023-12-20T15:11:40Z2023https://hdl.handle.net/20.500.13067/2875International Journal of Interactive Mobile Technologies (iJIM)https://doi.org/10.3991/ijim.v17i14.39907Abstract—In recent years, computer science has advanced exponentially, helping significantly to identify and classify text extracted from social networks, specifically Twitter. This work identifies, classifies, and analyzes tweets related to real natural disasters through tweets with the hashtag #Nat-uralDisasters, using Machine learning (ML) algorithms, such as Bernoulli Naive Bayes (BNB), Multinomial Naive Bayes (MNB), Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF). First, tweets related to natural disasters were identified, creating a dataset of 122k geo-located tweets for training. Secondly, the data-cleaning process was carried out by applying stemming and lemmatization techniques. Third, exploratory data analysis (EDA) was performed to gain an initial understanding of the data. Fourth, the training and testing process of the BNB, MNB, L, KNN, DT, and RF models was initiated, using tools and libraries for this type of task. The results of the trained models demonstrated optimal performance: BNB, MNB, and LR models achieved a perfor mance rate of 87% accuracy; and KNN, DT, and RF models achieved perfor mances of 82%, 75%, and 86%, respectively. However, the BNB, MNB, and LR models performed better with respect to performance on their respective metrics, such as processing time, test accuracy, precision, and F1 score. Demonstrating, for this context and with the trained dataset that they are the best in terms of text classifiers.application/pdfengInternational Journal of Interactive Mobile Technologies (iJIM)info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/4.0/ClassificationTweetsDisastersMachine learningNaturalhttps://purl.org/pe-repo/ocde/ford#2.02.04Classification of Tweets Related to Natural Disasters Using Machine Learning Algorithmsinfo:eu-repo/semantics/articlehttps://online-journals.org/index.php/i-jim/article/view/399071714144162reponame:AUTONOMA-Institucionalinstname:Universidad Autónoma del Perúinstacron:AUTONOMAORIGINAL42_2023.pdf42_2023.pdfArtículoapplication/pdf2220811http://repositorio.autonoma.edu.pe/bitstream/20.500.13067/2875/1/42_2023.pdfe0d55dbf66537ed3fa182725eead7f86MD51LICENSElicense.txtlicense.txttext/plain; charset=utf-885http://repositorio.autonoma.edu.pe/bitstream/20.500.13067/2875/2/license.txt9243398ff393db1861c890baeaeee5f9MD52TEXT42_2023.pdf.txt42_2023.pdf.txtExtracted texttext/plain48952http://repositorio.autonoma.edu.pe/bitstream/20.500.13067/2875/3/42_2023.pdf.txt248445ed94dba32b2fdd5dc977ca46d8MD53THUMBNAIL42_2023.pdf.jpg42_2023.pdf.jpgGenerated Thumbnailimage/jpeg4372http://repositorio.autonoma.edu.pe/bitstream/20.500.13067/2875/4/42_2023.pdf.jpg4e69e60a267125ad0087445a6f9b1575MD5420.500.13067/2875oai:repositorio.autonoma.edu.pe:20.500.13067/28752023-12-21 03:00:36.021Repositorio de la Universidad Autonoma del Perúrepositorio@autonoma.peVG9kb3MgbG9zIGRlcmVjaG9zIHJlc2VydmFkb3MgcG9yOg0KVU5JVkVSU0lEQUQgQVVUw5NOT01BIERFTCBQRVLDmg0KQ1JFQVRJVkUgQ09NTU9OUw==
dc.title.es_PE.fl_str_mv Classification of Tweets Related to Natural Disasters Using Machine Learning Algorithms
title Classification of Tweets Related to Natural Disasters Using Machine Learning Algorithms
spellingShingle Classification of Tweets Related to Natural Disasters Using Machine Learning Algorithms
Iparraguirre-Villanueva, Orlando
Classification
Tweets
Disasters
Machine learning
Natural
https://purl.org/pe-repo/ocde/ford#2.02.04
title_short Classification of Tweets Related to Natural Disasters Using Machine Learning Algorithms
title_full Classification of Tweets Related to Natural Disasters Using Machine Learning Algorithms
title_fullStr Classification of Tweets Related to Natural Disasters Using Machine Learning Algorithms
title_full_unstemmed Classification of Tweets Related to Natural Disasters Using Machine Learning Algorithms
title_sort Classification of Tweets Related to Natural Disasters Using Machine Learning Algorithms
author Iparraguirre-Villanueva, Orlando
author_facet Iparraguirre-Villanueva, Orlando
Melgarejo-Graciano, Melquiades
Castro-Leon, Gloria
Olaya-Cotera, Sandro
John, Ruiz-Alvarado
Epifanía-Huerta, Andrés
Cabanillas-Carbonell, Michael
Zapata-Paulini, Joselyn
author_role author
author2 Melgarejo-Graciano, Melquiades
Castro-Leon, Gloria
Olaya-Cotera, Sandro
John, Ruiz-Alvarado
Epifanía-Huerta, Andrés
Cabanillas-Carbonell, Michael
Zapata-Paulini, Joselyn
author2_role author
author
author
author
author
author
author
dc.contributor.author.fl_str_mv Iparraguirre-Villanueva, Orlando
Melgarejo-Graciano, Melquiades
Castro-Leon, Gloria
Olaya-Cotera, Sandro
John, Ruiz-Alvarado
Epifanía-Huerta, Andrés
Cabanillas-Carbonell, Michael
Zapata-Paulini, Joselyn
dc.subject.es_PE.fl_str_mv Classification
Tweets
Disasters
Machine learning
Natural
topic Classification
Tweets
Disasters
Machine learning
Natural
https://purl.org/pe-repo/ocde/ford#2.02.04
dc.subject.ocde.es_PE.fl_str_mv https://purl.org/pe-repo/ocde/ford#2.02.04
description Abstract—In recent years, computer science has advanced exponentially, helping significantly to identify and classify text extracted from social networks, specifically Twitter. This work identifies, classifies, and analyzes tweets related to real natural disasters through tweets with the hashtag #Nat-uralDisasters, using Machine learning (ML) algorithms, such as Bernoulli Naive Bayes (BNB), Multinomial Naive Bayes (MNB), Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF). First, tweets related to natural disasters were identified, creating a dataset of 122k geo-located tweets for training. Secondly, the data-cleaning process was carried out by applying stemming and lemmatization techniques. Third, exploratory data analysis (EDA) was performed to gain an initial understanding of the data. Fourth, the training and testing process of the BNB, MNB, L, KNN, DT, and RF models was initiated, using tools and libraries for this type of task. The results of the trained models demonstrated optimal performance: BNB, MNB, and LR models achieved a perfor mance rate of 87% accuracy; and KNN, DT, and RF models achieved perfor mances of 82%, 75%, and 86%, respectively. However, the BNB, MNB, and LR models performed better with respect to performance on their respective metrics, such as processing time, test accuracy, precision, and F1 score. Demonstrating, for this context and with the trained dataset that they are the best in terms of text classifiers.
publishDate 2023
dc.date.accessioned.none.fl_str_mv 2023-12-20T15:11:40Z
dc.date.available.none.fl_str_mv 2023-12-20T15:11:40Z
dc.date.issued.fl_str_mv 2023
dc.type.es_PE.fl_str_mv info:eu-repo/semantics/article
format article
dc.identifier.uri.none.fl_str_mv https://hdl.handle.net/20.500.13067/2875
dc.identifier.journal.es_PE.fl_str_mv International Journal of Interactive Mobile Technologies (iJIM)
dc.identifier.doi.none.fl_str_mv https://doi.org/10.3991/ijim.v17i14.39907
url https://hdl.handle.net/20.500.13067/2875
https://doi.org/10.3991/ijim.v17i14.39907
identifier_str_mv International Journal of Interactive Mobile Technologies (iJIM)
dc.language.iso.es_PE.fl_str_mv eng
language eng
dc.relation.url.es_PE.fl_str_mv https://online-journals.org/index.php/i-jim/article/view/39907
dc.rights.es_PE.fl_str_mv info:eu-repo/semantics/openAccess
dc.rights.uri.es_PE.fl_str_mv https://creativecommons.org/licenses/by/4.0/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by/4.0/
dc.format.es_PE.fl_str_mv application/pdf
dc.publisher.es_PE.fl_str_mv International Journal of Interactive Mobile Technologies (iJIM)
dc.source.none.fl_str_mv reponame:AUTONOMA-Institucional
instname:Universidad Autónoma del Perú
instacron:AUTONOMA
instname_str Universidad Autónoma del Perú
instacron_str AUTONOMA
institution AUTONOMA
reponame_str AUTONOMA-Institucional
collection AUTONOMA-Institucional
dc.source.volume.es_PE.fl_str_mv 17
dc.source.issue.es_PE.fl_str_mv 14
dc.source.beginpage.es_PE.fl_str_mv 144
dc.source.endpage.es_PE.fl_str_mv 162
bitstream.url.fl_str_mv http://repositorio.autonoma.edu.pe/bitstream/20.500.13067/2875/1/42_2023.pdf
http://repositorio.autonoma.edu.pe/bitstream/20.500.13067/2875/2/license.txt
http://repositorio.autonoma.edu.pe/bitstream/20.500.13067/2875/3/42_2023.pdf.txt
http://repositorio.autonoma.edu.pe/bitstream/20.500.13067/2875/4/42_2023.pdf.jpg
bitstream.checksum.fl_str_mv e0d55dbf66537ed3fa182725eead7f86
9243398ff393db1861c890baeaeee5f9
248445ed94dba32b2fdd5dc977ca46d8
4e69e60a267125ad0087445a6f9b1575
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositorio de la Universidad Autonoma del Perú
repository.mail.fl_str_mv repositorio@autonoma.pe
_version_ 1835915259026604032
score 13.924177
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).