“Search and classify topics in a corpus of text using the latent dirichlet allocation model“

Descripción del Articulo

“This work aims at discovering topics in a text corpus and classifying the most relevant terms for each of the discovered topics. The process was performed in four steps: first, document extraction and data processing; second, labeling and training of the data; third, labeling of the unseen data; an...

Descripción completa

Detalles Bibliográficos
Autores: Iparraguirre-Villanueva, Orlando, Sierra-Liñan, Fernando, Herrera Salazar, Jose Luis, Beltozar-Clemente, Saul, Pucuhuayla-Revatta, Félix, Zapata-Paulin, Joselyn, Cabanillas-Carbonell, Michael
Formato: artículo
Fecha de Publicación:2022
Institución:Universidad Privada Norbert Wiener
Repositorio:UWIENER-Institucional
Lenguaje:inglés
OAI Identifier:oai:repositorio.uwiener.edu.pe:20.500.13053/8119
Enlace del recurso:https://hdl.handle.net/20.500.13053/8119
Nivel de acceso:acceso abierto
Materia:"Classify Discovering Latent dirichlet allocation Text corpus Topics"
http://purl.org/pe-repo/ocde/ford#1.02.01
id UWIE_521177af4fb1c4437013ccb2d3423e40
oai_identifier_str oai:repositorio.uwiener.edu.pe:20.500.13053/8119
network_acronym_str UWIE
network_name_str UWIENER-Institucional
repository_id_str 9398
dc.title.es_ES.fl_str_mv “Search and classify topics in a corpus of text using the latent dirichlet allocation model“
title “Search and classify topics in a corpus of text using the latent dirichlet allocation model“
spellingShingle “Search and classify topics in a corpus of text using the latent dirichlet allocation model“
Iparraguirre-Villanueva, Orlando
"Classify Discovering Latent dirichlet allocation Text corpus Topics"
http://purl.org/pe-repo/ocde/ford#1.02.01
title_short “Search and classify topics in a corpus of text using the latent dirichlet allocation model“
title_full “Search and classify topics in a corpus of text using the latent dirichlet allocation model“
title_fullStr “Search and classify topics in a corpus of text using the latent dirichlet allocation model“
title_full_unstemmed “Search and classify topics in a corpus of text using the latent dirichlet allocation model“
title_sort “Search and classify topics in a corpus of text using the latent dirichlet allocation model“
author Iparraguirre-Villanueva, Orlando
author_facet Iparraguirre-Villanueva, Orlando
Sierra-Liñan, Fernando
Herrera Salazar, Jose Luis
Beltozar-Clemente, Saul
Pucuhuayla-Revatta, Félix
Zapata-Paulin, Joselyn
Cabanillas-Carbonell, Michael
author_role author
author2 Sierra-Liñan, Fernando
Herrera Salazar, Jose Luis
Beltozar-Clemente, Saul
Pucuhuayla-Revatta, Félix
Zapata-Paulin, Joselyn
Cabanillas-Carbonell, Michael
author2_role author
author
author
author
author
author
dc.contributor.author.fl_str_mv Iparraguirre-Villanueva, Orlando
Sierra-Liñan, Fernando
Herrera Salazar, Jose Luis
Beltozar-Clemente, Saul
Pucuhuayla-Revatta, Félix
Zapata-Paulin, Joselyn
Cabanillas-Carbonell, Michael
dc.subject.es_ES.fl_str_mv "Classify Discovering Latent dirichlet allocation Text corpus Topics"
topic "Classify Discovering Latent dirichlet allocation Text corpus Topics"
http://purl.org/pe-repo/ocde/ford#1.02.01
dc.subject.ocde.es_ES.fl_str_mv http://purl.org/pe-repo/ocde/ford#1.02.01
description “This work aims at discovering topics in a text corpus and classifying the most relevant terms for each of the discovered topics. The process was performed in four steps: first, document extraction and data processing; second, labeling and training of the data; third, labeling of the unseen data; and fourth, evaluation of the model performance. For processing, a total of 10,322 ““curriculum““ documents related to data science were collected from the web during 2018-2022. The latent dirichlet allocation (LDA) model was used for the analysis and structure of the subjects. After processing, 12 themes were generated, which allowed ranking the most relevant terms to identify the skills of each of the candidates. This work concludes that candidates interested in data science must have skills in the following topics: first, they must be technical, they must have mastery of structured query language, mastery of programming languages such as R, Python, java, and data management, among other tools associated with the technology.“
publishDate 2022
dc.date.accessioned.none.fl_str_mv 2023-03-16T16:48:29Z
dc.date.available.none.fl_str_mv 2023-03-16T16:48:29Z
dc.date.issued.fl_str_mv 2022-11-18
dc.type.es_ES.fl_str_mv info:eu-repo/semantics/article
dc.type.version.es_ES.fl_str_mv info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.none.fl_str_mv https://hdl.handle.net/20.500.13053/8119
dc.identifier.doi.es_ES.fl_str_mv 10.11591/ijeecs.v30.i1.pp246-256
url https://hdl.handle.net/20.500.13053/8119
identifier_str_mv 10.11591/ijeecs.v30.i1.pp246-256
dc.language.iso.es_ES.fl_str_mv eng
language eng
dc.rights.es_ES.fl_str_mv info:eu-repo/semantics/openAccess
dc.rights.uri.es_ES.fl_str_mv https://creativecommons.org/licenses/by/4.0/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by/4.0/
dc.format.es_ES.fl_str_mv application/pdf
dc.publisher.es_ES.fl_str_mv Institute of Advanced Engineering and Science
dc.publisher.country.es_ES.fl_str_mv ID
dc.source.none.fl_str_mv reponame:UWIENER-Institucional
instname:Universidad Privada Norbert Wiener
instacron:UWIENER
instname_str Universidad Privada Norbert Wiener
instacron_str UWIENER
institution UWIENER
reponame_str UWIENER-Institucional
collection UWIENER-Institucional
bitstream.url.fl_str_mv https://dspace-uwiener.metabuscador.org/bitstreams/31faeacc-958f-421c-88d4-84a038952952/download
https://dspace-uwiener.metabuscador.org/bitstreams/abc10387-768a-446e-9cf1-526ce3d6f27d/download
https://dspace-uwiener.metabuscador.org/bitstreams/7c70ce4c-6205-4651-b9e9-4f45fe583322/download
https://dspace-uwiener.metabuscador.org/bitstreams/6d6003c9-eaa9-4a84-9db8-58b1ca53e647/download
bitstream.checksum.fl_str_mv 9612a19922a6b02e74c30e5467962abb
8a4605be74aa9ea9d79846c1fba20a33
02004f08cbd72736cb797cbbaff03b61
45eb29d54253f8b2a1083763de964a5e
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositorio Institucional de la Universidad de Wiener
repository.mail.fl_str_mv bdigital@metabiblioteca.com
_version_ 1835828883497156608
spelling Iparraguirre-Villanueva, OrlandoSierra-Liñan, FernandoHerrera Salazar, Jose LuisBeltozar-Clemente, SaulPucuhuayla-Revatta, FélixZapata-Paulin, JoselynCabanillas-Carbonell, Michael2023-03-16T16:48:29Z2023-03-16T16:48:29Z2022-11-18https://hdl.handle.net/20.500.13053/811910.11591/ijeecs.v30.i1.pp246-256“This work aims at discovering topics in a text corpus and classifying the most relevant terms for each of the discovered topics. The process was performed in four steps: first, document extraction and data processing; second, labeling and training of the data; third, labeling of the unseen data; and fourth, evaluation of the model performance. For processing, a total of 10,322 ““curriculum““ documents related to data science were collected from the web during 2018-2022. The latent dirichlet allocation (LDA) model was used for the analysis and structure of the subjects. After processing, 12 themes were generated, which allowed ranking the most relevant terms to identify the skills of each of the candidates. This work concludes that candidates interested in data science must have skills in the following topics: first, they must be technical, they must have mastery of structured query language, mastery of programming languages such as R, Python, java, and data management, among other tools associated with the technology.“application/pdfengInstitute of Advanced Engineering and ScienceIDinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/4.0/"Classify Discovering Latent dirichlet allocation Text corpus Topics"http://purl.org/pe-repo/ocde/ford#1.02.01“Search and classify topics in a corpus of text using the latent dirichlet allocation model“info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionreponame:UWIENER-Institucionalinstname:Universidad Privada Norbert Wienerinstacron:UWIENERPublicationORIGINAL30256-61127-1-PB.pdf30256-61127-1-PB.pdfapplication/pdf646288https://dspace-uwiener.metabuscador.org/bitstreams/31faeacc-958f-421c-88d4-84a038952952/download9612a19922a6b02e74c30e5467962abbMD51LICENSElicense.txtlicense.txttext/plain; charset=utf-81748https://dspace-uwiener.metabuscador.org/bitstreams/abc10387-768a-446e-9cf1-526ce3d6f27d/download8a4605be74aa9ea9d79846c1fba20a33MD52TEXT30256-61127-1-PB.pdf.txt30256-61127-1-PB.pdf.txtExtracted texttext/plain44859https://dspace-uwiener.metabuscador.org/bitstreams/7c70ce4c-6205-4651-b9e9-4f45fe583322/download02004f08cbd72736cb797cbbaff03b61MD53THUMBNAIL30256-61127-1-PB.pdf.jpg30256-61127-1-PB.pdf.jpgGenerated Thumbnailimage/jpeg10791https://dspace-uwiener.metabuscador.org/bitstreams/6d6003c9-eaa9-4a84-9db8-58b1ca53e647/download45eb29d54253f8b2a1083763de964a5eMD5420.500.13053/8119oai:dspace-uwiener.metabuscador.org:20.500.13053/81192024-12-13 12:08:13.521https://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccessopen.accesshttps://dspace-uwiener.metabuscador.orgRepositorio Institucional de la Universidad de Wienerbdigital@metabiblioteca.comTk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=
score 13.983407
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).