Classification of tweets related to natural disasters using machine learning algorithms
Descripción del Articulo
Identifying and classifying text extracted from social networks, following the traditional method, is very complex. In recent years, computer science has advanced exponentially, helping significantly to identify and classify text extracted from social networks, specifically Twitter. This work aims t...
| Autores: | , , , , , , , , |
|---|---|
| Formato: | artículo |
| Fecha de Publicación: | 2023 |
| Institución: | Universidad Tecnológica del Perú |
| Repositorio: | UTP-Institucional |
| Lenguaje: | inglés |
| OAI Identifier: | oai:repositorio.utp.edu.pe:20.500.12867/7928 |
| Enlace del recurso: | https://hdl.handle.net/20.500.12867/7928 https://doi.org/10.3991/ijim.v17i14.39907 |
| Nivel de acceso: | acceso abierto |
| Materia: | Machine learning Social network Natural language processing Natural disasters https://purl.org/pe-repo/ocde/ford#1.02.00 |
| id |
UTPD_9dc09f7924e1ece2c44429d08865ce5c |
|---|---|
| oai_identifier_str |
oai:repositorio.utp.edu.pe:20.500.12867/7928 |
| network_acronym_str |
UTPD |
| network_name_str |
UTP-Institucional |
| repository_id_str |
4782 |
| dc.title.es_PE.fl_str_mv |
Classification of tweets related to natural disasters using machine learning algorithms |
| title |
Classification of tweets related to natural disasters using machine learning algorithms |
| spellingShingle |
Classification of tweets related to natural disasters using machine learning algorithms Ruíz Alvarado, John Fernando Machine learning Social network Natural language processing Natural disasters https://purl.org/pe-repo/ocde/ford#1.02.00 |
| title_short |
Classification of tweets related to natural disasters using machine learning algorithms |
| title_full |
Classification of tweets related to natural disasters using machine learning algorithms |
| title_fullStr |
Classification of tweets related to natural disasters using machine learning algorithms |
| title_full_unstemmed |
Classification of tweets related to natural disasters using machine learning algorithms |
| title_sort |
Classification of tweets related to natural disasters using machine learning algorithms |
| author |
Ruíz Alvarado, John Fernando |
| author_facet |
Ruíz Alvarado, John Fernando Iparraguirre-Villanueva, Orlando Melgarejo-Graciano, Melquiades Castro-Leon, Gloria Olaya-Cotera, Sandro Ruiz-Alvarado, John Epifanía-Huerta, Andrés Cabanillas-Carbonell, Michael Zapata-Paulini, Joselyn |
| author_role |
author |
| author2 |
Iparraguirre-Villanueva, Orlando Melgarejo-Graciano, Melquiades Castro-Leon, Gloria Olaya-Cotera, Sandro Ruiz-Alvarado, John Epifanía-Huerta, Andrés Cabanillas-Carbonell, Michael Zapata-Paulini, Joselyn |
| author2_role |
author author author author author author author author |
| dc.contributor.author.fl_str_mv |
Ruíz Alvarado, John Fernando Iparraguirre-Villanueva, Orlando Melgarejo-Graciano, Melquiades Castro-Leon, Gloria Olaya-Cotera, Sandro Ruiz-Alvarado, John Epifanía-Huerta, Andrés Cabanillas-Carbonell, Michael Zapata-Paulini, Joselyn |
| dc.subject.es_PE.fl_str_mv |
Machine learning Social network Natural language processing Natural disasters |
| topic |
Machine learning Social network Natural language processing Natural disasters https://purl.org/pe-repo/ocde/ford#1.02.00 |
| dc.subject.ocde.es_PE.fl_str_mv |
https://purl.org/pe-repo/ocde/ford#1.02.00 |
| description |
Identifying and classifying text extracted from social networks, following the traditional method, is very complex. In recent years, computer science has advanced exponentially, helping significantly to identify and classify text extracted from social networks, specifically Twitter. This work aims to identify, classify and analyze tweets related to real natural disasters through tweets with the hashtag #NaturalDisasters, using Machine learning (ML) algorithms, such as Bernoulli Naive Bayes (BNB), Multinomial Naive Bayes (MNB), Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF). First, tweets related to natural disasters were identified, creating a dataset of 122k geolocated tweets for training. Secondly, the data-cleaning process was carried out by applying stemming and lemmatization techniques. Third, exploratory data analysis (EDA) was performed to gain an initial understanding of the data. Fourth, the training and testing process of the BNB, MNB, L, KNN, DT, and RF models was initiated, using tools and libraries for this type of task. The results of the trained models demonstrated optimal performance: BNB, MNB, and LR models achieved a performance rate of 87% accuracy; and KNN, DT, and RF models achieved performances of 82%, 75%, and 86%, respectively. However, the BNB, MNB, and LR models performed better with respect to performance on their respective metrics, such as processing time, test accuracy, precision, and F1 score. Demonstrating, for this context and with the trained dataset that they are the best in terms of text classifiers. |
| publishDate |
2023 |
| dc.date.accessioned.none.fl_str_mv |
2023-11-23T20:56:55Z |
| dc.date.available.none.fl_str_mv |
2023-11-23T20:56:55Z |
| dc.date.issued.fl_str_mv |
2023 |
| dc.type.es_PE.fl_str_mv |
info:eu-repo/semantics/article |
| dc.type.version.es_PE.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.issn.none.fl_str_mv |
1865-7923 |
| dc.identifier.uri.none.fl_str_mv |
https://hdl.handle.net/20.500.12867/7928 |
| dc.identifier.journal.es_PE.fl_str_mv |
International Journal of Interactive Mobile Technologies |
| dc.identifier.doi.none.fl_str_mv |
https://doi.org/10.3991/ijim.v17i14.39907 |
| identifier_str_mv |
1865-7923 International Journal of Interactive Mobile Technologies |
| url |
https://hdl.handle.net/20.500.12867/7928 https://doi.org/10.3991/ijim.v17i14.39907 |
| dc.language.iso.es_PE.fl_str_mv |
eng |
| language |
eng |
| dc.relation.ispartofseries.none.fl_str_mv |
International Journal of Interactive Mobile Technologies;vol. 17, n° 4 |
| dc.rights.es_PE.fl_str_mv |
info:eu-repo/semantics/openAccess |
| dc.rights.uri.es_PE.fl_str_mv |
http://creativecommons.org/licenses/by/4.0/ |
| eu_rights_str_mv |
openAccess |
| rights_invalid_str_mv |
http://creativecommons.org/licenses/by/4.0/ |
| dc.format.es_PE.fl_str_mv |
application/pdf |
| dc.publisher.es_PE.fl_str_mv |
International Association of Online Engineering |
| dc.publisher.country.es_PE.fl_str_mv |
AT |
| dc.source.es_PE.fl_str_mv |
Repositorio Institucional - UTP Universidad Tecnológica del Perú |
| dc.source.none.fl_str_mv |
reponame:UTP-Institucional instname:Universidad Tecnológica del Perú instacron:UTP |
| instname_str |
Universidad Tecnológica del Perú |
| instacron_str |
UTP |
| institution |
UTP |
| reponame_str |
UTP-Institucional |
| collection |
UTP-Institucional |
| bitstream.url.fl_str_mv |
http://repositorio.utp.edu.pe/bitstream/20.500.12867/7928/1/J.Ruiz_Articulo_2023.pdf http://repositorio.utp.edu.pe/bitstream/20.500.12867/7928/2/license.txt http://repositorio.utp.edu.pe/bitstream/20.500.12867/7928/3/J.Ruiz_Articulo_2023.pdf.txt http://repositorio.utp.edu.pe/bitstream/20.500.12867/7928/4/J.Ruiz_Articulo_2023.pdf.jpg |
| bitstream.checksum.fl_str_mv |
e0d55dbf66537ed3fa182725eead7f86 8a4605be74aa9ea9d79846c1fba20a33 248445ed94dba32b2fdd5dc977ca46d8 efdd971bdc7cf851b84da33197d826cd |
| bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 |
| repository.name.fl_str_mv |
Repositorio Institucional de la Universidad Tecnológica del Perú |
| repository.mail.fl_str_mv |
repositorio@utp.edu.pe |
| _version_ |
1817984884428242944 |
| spelling |
Ruíz Alvarado, John FernandoIparraguirre-Villanueva, OrlandoMelgarejo-Graciano, MelquiadesCastro-Leon, GloriaOlaya-Cotera, SandroRuiz-Alvarado, JohnEpifanía-Huerta, AndrésCabanillas-Carbonell, MichaelZapata-Paulini, Joselyn2023-11-23T20:56:55Z2023-11-23T20:56:55Z20231865-7923https://hdl.handle.net/20.500.12867/7928International Journal of Interactive Mobile Technologieshttps://doi.org/10.3991/ijim.v17i14.39907Identifying and classifying text extracted from social networks, following the traditional method, is very complex. In recent years, computer science has advanced exponentially, helping significantly to identify and classify text extracted from social networks, specifically Twitter. This work aims to identify, classify and analyze tweets related to real natural disasters through tweets with the hashtag #NaturalDisasters, using Machine learning (ML) algorithms, such as Bernoulli Naive Bayes (BNB), Multinomial Naive Bayes (MNB), Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF). First, tweets related to natural disasters were identified, creating a dataset of 122k geolocated tweets for training. Secondly, the data-cleaning process was carried out by applying stemming and lemmatization techniques. Third, exploratory data analysis (EDA) was performed to gain an initial understanding of the data. Fourth, the training and testing process of the BNB, MNB, L, KNN, DT, and RF models was initiated, using tools and libraries for this type of task. The results of the trained models demonstrated optimal performance: BNB, MNB, and LR models achieved a performance rate of 87% accuracy; and KNN, DT, and RF models achieved performances of 82%, 75%, and 86%, respectively. However, the BNB, MNB, and LR models performed better with respect to performance on their respective metrics, such as processing time, test accuracy, precision, and F1 score. Demonstrating, for this context and with the trained dataset that they are the best in terms of text classifiers.Campus Chimboteapplication/pdfengInternational Association of Online EngineeringATInternational Journal of Interactive Mobile Technologies;vol. 17, n° 4info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/4.0/Repositorio Institucional - UTPUniversidad Tecnológica del Perúreponame:UTP-Institucionalinstname:Universidad Tecnológica del Perúinstacron:UTPMachine learningSocial networkNatural language processingNatural disastershttps://purl.org/pe-repo/ocde/ford#1.02.00Classification of tweets related to natural disasters using machine learning algorithmsinfo:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionORIGINALJ.Ruiz_Articulo_2023.pdfJ.Ruiz_Articulo_2023.pdfapplication/pdf2220811http://repositorio.utp.edu.pe/bitstream/20.500.12867/7928/1/J.Ruiz_Articulo_2023.pdfe0d55dbf66537ed3fa182725eead7f86MD51LICENSElicense.txtlicense.txttext/plain; charset=utf-81748http://repositorio.utp.edu.pe/bitstream/20.500.12867/7928/2/license.txt8a4605be74aa9ea9d79846c1fba20a33MD52TEXTJ.Ruiz_Articulo_2023.pdf.txtJ.Ruiz_Articulo_2023.pdf.txtExtracted texttext/plain48952http://repositorio.utp.edu.pe/bitstream/20.500.12867/7928/3/J.Ruiz_Articulo_2023.pdf.txt248445ed94dba32b2fdd5dc977ca46d8MD53THUMBNAILJ.Ruiz_Articulo_2023.pdf.jpgJ.Ruiz_Articulo_2023.pdf.jpgGenerated Thumbnailimage/jpeg12902http://repositorio.utp.edu.pe/bitstream/20.500.12867/7928/4/J.Ruiz_Articulo_2023.pdf.jpgefdd971bdc7cf851b84da33197d826cdMD5420.500.12867/7928oai:repositorio.utp.edu.pe:20.500.12867/79282023-11-23 17:04:22.61Repositorio Institucional de la Universidad Tecnológica del Perúrepositorio@utp.edu.peTk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo= |
| score |
13.924177 |
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).