Classification of tweets related to natural disasters using machine learning algorithms

Descripción del Articulo

Identifying and classifying text extracted from social networks, following the traditional method, is very complex. In recent years, computer science has advanced exponentially, helping significantly to identify and classify text extracted from social networks, specifically Twitter. This work aims t...

Descripción completa

Detalles Bibliográficos
Autores: Ruíz Alvarado, John Fernando, Iparraguirre-Villanueva, Orlando, Melgarejo-Graciano, Melquiades, Castro-Leon, Gloria, Olaya-Cotera, Sandro, Ruiz-Alvarado, John, Epifanía-Huerta, Andrés, Cabanillas-Carbonell, Michael, Zapata-Paulini, Joselyn
Formato: artículo
Fecha de Publicación:2023
Institución:Universidad Tecnológica del Perú
Repositorio:UTP-Institucional
Lenguaje:inglés
OAI Identifier:oai:repositorio.utp.edu.pe:20.500.12867/7928
Enlace del recurso:https://hdl.handle.net/20.500.12867/7928
https://doi.org/10.3991/ijim.v17i14.39907
Nivel de acceso:acceso abierto
Materia:Machine learning
Social network
Natural language processing
Natural disasters
https://purl.org/pe-repo/ocde/ford#1.02.00
id UTPD_9dc09f7924e1ece2c44429d08865ce5c
oai_identifier_str oai:repositorio.utp.edu.pe:20.500.12867/7928
network_acronym_str UTPD
network_name_str UTP-Institucional
repository_id_str 4782
dc.title.es_PE.fl_str_mv Classification of tweets related to natural disasters using machine learning algorithms
title Classification of tweets related to natural disasters using machine learning algorithms
spellingShingle Classification of tweets related to natural disasters using machine learning algorithms
Ruíz Alvarado, John Fernando
Machine learning
Social network
Natural language processing
Natural disasters
https://purl.org/pe-repo/ocde/ford#1.02.00
title_short Classification of tweets related to natural disasters using machine learning algorithms
title_full Classification of tweets related to natural disasters using machine learning algorithms
title_fullStr Classification of tweets related to natural disasters using machine learning algorithms
title_full_unstemmed Classification of tweets related to natural disasters using machine learning algorithms
title_sort Classification of tweets related to natural disasters using machine learning algorithms
author Ruíz Alvarado, John Fernando
author_facet Ruíz Alvarado, John Fernando
Iparraguirre-Villanueva, Orlando
Melgarejo-Graciano, Melquiades
Castro-Leon, Gloria
Olaya-Cotera, Sandro
Ruiz-Alvarado, John
Epifanía-Huerta, Andrés
Cabanillas-Carbonell, Michael
Zapata-Paulini, Joselyn
author_role author
author2 Iparraguirre-Villanueva, Orlando
Melgarejo-Graciano, Melquiades
Castro-Leon, Gloria
Olaya-Cotera, Sandro
Ruiz-Alvarado, John
Epifanía-Huerta, Andrés
Cabanillas-Carbonell, Michael
Zapata-Paulini, Joselyn
author2_role author
author
author
author
author
author
author
author
dc.contributor.author.fl_str_mv Ruíz Alvarado, John Fernando
Iparraguirre-Villanueva, Orlando
Melgarejo-Graciano, Melquiades
Castro-Leon, Gloria
Olaya-Cotera, Sandro
Ruiz-Alvarado, John
Epifanía-Huerta, Andrés
Cabanillas-Carbonell, Michael
Zapata-Paulini, Joselyn
dc.subject.es_PE.fl_str_mv Machine learning
Social network
Natural language processing
Natural disasters
topic Machine learning
Social network
Natural language processing
Natural disasters
https://purl.org/pe-repo/ocde/ford#1.02.00
dc.subject.ocde.es_PE.fl_str_mv https://purl.org/pe-repo/ocde/ford#1.02.00
description Identifying and classifying text extracted from social networks, following the traditional method, is very complex. In recent years, computer science has advanced exponentially, helping significantly to identify and classify text extracted from social networks, specifically Twitter. This work aims to identify, classify and analyze tweets related to real natural disasters through tweets with the hashtag #NaturalDisasters, using Machine learning (ML) algorithms, such as Bernoulli Naive Bayes (BNB), Multinomial Naive Bayes (MNB), Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF). First, tweets related to natural disasters were identified, creating a dataset of 122k geolocated tweets for training. Secondly, the data-cleaning process was carried out by applying stemming and lemmatization techniques. Third, exploratory data analysis (EDA) was performed to gain an initial understanding of the data. Fourth, the training and testing process of the BNB, MNB, L, KNN, DT, and RF models was initiated, using tools and libraries for this type of task. The results of the trained models demonstrated optimal performance: BNB, MNB, and LR models achieved a performance rate of 87% accuracy; and KNN, DT, and RF models achieved performances of 82%, 75%, and 86%, respectively. However, the BNB, MNB, and LR models performed better with respect to performance on their respective metrics, such as processing time, test accuracy, precision, and F1 score. Demonstrating, for this context and with the trained dataset that they are the best in terms of text classifiers.
publishDate 2023
dc.date.accessioned.none.fl_str_mv 2023-11-23T20:56:55Z
dc.date.available.none.fl_str_mv 2023-11-23T20:56:55Z
dc.date.issued.fl_str_mv 2023
dc.type.es_PE.fl_str_mv info:eu-repo/semantics/article
dc.type.version.es_PE.fl_str_mv info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.issn.none.fl_str_mv 1865-7923
dc.identifier.uri.none.fl_str_mv https://hdl.handle.net/20.500.12867/7928
dc.identifier.journal.es_PE.fl_str_mv International Journal of Interactive Mobile Technologies
dc.identifier.doi.none.fl_str_mv https://doi.org/10.3991/ijim.v17i14.39907
identifier_str_mv 1865-7923
International Journal of Interactive Mobile Technologies
url https://hdl.handle.net/20.500.12867/7928
https://doi.org/10.3991/ijim.v17i14.39907
dc.language.iso.es_PE.fl_str_mv eng
language eng
dc.relation.ispartofseries.none.fl_str_mv International Journal of Interactive Mobile Technologies;vol. 17, n° 4
dc.rights.es_PE.fl_str_mv info:eu-repo/semantics/openAccess
dc.rights.uri.es_PE.fl_str_mv http://creativecommons.org/licenses/by/4.0/
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by/4.0/
dc.format.es_PE.fl_str_mv application/pdf
dc.publisher.es_PE.fl_str_mv International Association of Online Engineering
dc.publisher.country.es_PE.fl_str_mv AT
dc.source.es_PE.fl_str_mv Repositorio Institucional - UTP
Universidad Tecnológica del Perú
dc.source.none.fl_str_mv reponame:UTP-Institucional
instname:Universidad Tecnológica del Perú
instacron:UTP
instname_str Universidad Tecnológica del Perú
instacron_str UTP
institution UTP
reponame_str UTP-Institucional
collection UTP-Institucional
bitstream.url.fl_str_mv http://repositorio.utp.edu.pe/bitstream/20.500.12867/7928/1/J.Ruiz_Articulo_2023.pdf
http://repositorio.utp.edu.pe/bitstream/20.500.12867/7928/2/license.txt
http://repositorio.utp.edu.pe/bitstream/20.500.12867/7928/3/J.Ruiz_Articulo_2023.pdf.txt
http://repositorio.utp.edu.pe/bitstream/20.500.12867/7928/4/J.Ruiz_Articulo_2023.pdf.jpg
bitstream.checksum.fl_str_mv e0d55dbf66537ed3fa182725eead7f86
8a4605be74aa9ea9d79846c1fba20a33
248445ed94dba32b2fdd5dc977ca46d8
efdd971bdc7cf851b84da33197d826cd
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositorio Institucional de la Universidad Tecnológica del Perú
repository.mail.fl_str_mv repositorio@utp.edu.pe
_version_ 1817984884428242944
spelling Ruíz Alvarado, John FernandoIparraguirre-Villanueva, OrlandoMelgarejo-Graciano, MelquiadesCastro-Leon, GloriaOlaya-Cotera, SandroRuiz-Alvarado, JohnEpifanía-Huerta, AndrésCabanillas-Carbonell, MichaelZapata-Paulini, Joselyn2023-11-23T20:56:55Z2023-11-23T20:56:55Z20231865-7923https://hdl.handle.net/20.500.12867/7928International Journal of Interactive Mobile Technologieshttps://doi.org/10.3991/ijim.v17i14.39907Identifying and classifying text extracted from social networks, following the traditional method, is very complex. In recent years, computer science has advanced exponentially, helping significantly to identify and classify text extracted from social networks, specifically Twitter. This work aims to identify, classify and analyze tweets related to real natural disasters through tweets with the hashtag #NaturalDisasters, using Machine learning (ML) algorithms, such as Bernoulli Naive Bayes (BNB), Multinomial Naive Bayes (MNB), Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF). First, tweets related to natural disasters were identified, creating a dataset of 122k geolocated tweets for training. Secondly, the data-cleaning process was carried out by applying stemming and lemmatization techniques. Third, exploratory data analysis (EDA) was performed to gain an initial understanding of the data. Fourth, the training and testing process of the BNB, MNB, L, KNN, DT, and RF models was initiated, using tools and libraries for this type of task. The results of the trained models demonstrated optimal performance: BNB, MNB, and LR models achieved a performance rate of 87% accuracy; and KNN, DT, and RF models achieved performances of 82%, 75%, and 86%, respectively. However, the BNB, MNB, and LR models performed better with respect to performance on their respective metrics, such as processing time, test accuracy, precision, and F1 score. Demonstrating, for this context and with the trained dataset that they are the best in terms of text classifiers.Campus Chimboteapplication/pdfengInternational Association of Online EngineeringATInternational Journal of Interactive Mobile Technologies;vol. 17, n° 4info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/4.0/Repositorio Institucional - UTPUniversidad Tecnológica del Perúreponame:UTP-Institucionalinstname:Universidad Tecnológica del Perúinstacron:UTPMachine learningSocial networkNatural language processingNatural disastershttps://purl.org/pe-repo/ocde/ford#1.02.00Classification of tweets related to natural disasters using machine learning algorithmsinfo:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionORIGINALJ.Ruiz_Articulo_2023.pdfJ.Ruiz_Articulo_2023.pdfapplication/pdf2220811http://repositorio.utp.edu.pe/bitstream/20.500.12867/7928/1/J.Ruiz_Articulo_2023.pdfe0d55dbf66537ed3fa182725eead7f86MD51LICENSElicense.txtlicense.txttext/plain; charset=utf-81748http://repositorio.utp.edu.pe/bitstream/20.500.12867/7928/2/license.txt8a4605be74aa9ea9d79846c1fba20a33MD52TEXTJ.Ruiz_Articulo_2023.pdf.txtJ.Ruiz_Articulo_2023.pdf.txtExtracted texttext/plain48952http://repositorio.utp.edu.pe/bitstream/20.500.12867/7928/3/J.Ruiz_Articulo_2023.pdf.txt248445ed94dba32b2fdd5dc977ca46d8MD53THUMBNAILJ.Ruiz_Articulo_2023.pdf.jpgJ.Ruiz_Articulo_2023.pdf.jpgGenerated Thumbnailimage/jpeg12902http://repositorio.utp.edu.pe/bitstream/20.500.12867/7928/4/J.Ruiz_Articulo_2023.pdf.jpgefdd971bdc7cf851b84da33197d826cdMD5420.500.12867/7928oai:repositorio.utp.edu.pe:20.500.12867/79282023-11-23 17:04:22.61Repositorio Institucional de la Universidad Tecnológica del Perúrepositorio@utp.edu.peTk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=
score 13.924177
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).