“Search and classify topics in a corpus of text using the latent dirichlet allocation model“
Descripción del Articulo
“This work aims at discovering topics in a text corpus and classifying the most relevant terms for each of the discovered topics. The process was performed in four steps: first, document extraction and data processing; second, labeling and training of the data; third, labeling of the unseen data; an...
| Autores: | , , , , , , |
|---|---|
| Formato: | artículo |
| Fecha de Publicación: | 2022 |
| Institución: | Universidad Privada Norbert Wiener |
| Repositorio: | UWIENER-Institucional |
| Lenguaje: | inglés |
| OAI Identifier: | oai:repositorio.uwiener.edu.pe:20.500.13053/8119 |
| Enlace del recurso: | https://hdl.handle.net/20.500.13053/8119 |
| Nivel de acceso: | acceso abierto |
| Materia: | "Classify Discovering Latent dirichlet allocation Text corpus Topics" http://purl.org/pe-repo/ocde/ford#1.02.01 |
| id |
UWIE_521177af4fb1c4437013ccb2d3423e40 |
|---|---|
| oai_identifier_str |
oai:repositorio.uwiener.edu.pe:20.500.13053/8119 |
| network_acronym_str |
UWIE |
| network_name_str |
UWIENER-Institucional |
| repository_id_str |
9398 |
| dc.title.es_ES.fl_str_mv |
“Search and classify topics in a corpus of text using the latent dirichlet allocation model“ |
| title |
“Search and classify topics in a corpus of text using the latent dirichlet allocation model“ |
| spellingShingle |
“Search and classify topics in a corpus of text using the latent dirichlet allocation model“ Iparraguirre-Villanueva, Orlando "Classify Discovering Latent dirichlet allocation Text corpus Topics" http://purl.org/pe-repo/ocde/ford#1.02.01 |
| title_short |
“Search and classify topics in a corpus of text using the latent dirichlet allocation model“ |
| title_full |
“Search and classify topics in a corpus of text using the latent dirichlet allocation model“ |
| title_fullStr |
“Search and classify topics in a corpus of text using the latent dirichlet allocation model“ |
| title_full_unstemmed |
“Search and classify topics in a corpus of text using the latent dirichlet allocation model“ |
| title_sort |
“Search and classify topics in a corpus of text using the latent dirichlet allocation model“ |
| author |
Iparraguirre-Villanueva, Orlando |
| author_facet |
Iparraguirre-Villanueva, Orlando Sierra-Liñan, Fernando Herrera Salazar, Jose Luis Beltozar-Clemente, Saul Pucuhuayla-Revatta, Félix Zapata-Paulin, Joselyn Cabanillas-Carbonell, Michael |
| author_role |
author |
| author2 |
Sierra-Liñan, Fernando Herrera Salazar, Jose Luis Beltozar-Clemente, Saul Pucuhuayla-Revatta, Félix Zapata-Paulin, Joselyn Cabanillas-Carbonell, Michael |
| author2_role |
author author author author author author |
| dc.contributor.author.fl_str_mv |
Iparraguirre-Villanueva, Orlando Sierra-Liñan, Fernando Herrera Salazar, Jose Luis Beltozar-Clemente, Saul Pucuhuayla-Revatta, Félix Zapata-Paulin, Joselyn Cabanillas-Carbonell, Michael |
| dc.subject.es_ES.fl_str_mv |
"Classify Discovering Latent dirichlet allocation Text corpus Topics" |
| topic |
"Classify Discovering Latent dirichlet allocation Text corpus Topics" http://purl.org/pe-repo/ocde/ford#1.02.01 |
| dc.subject.ocde.es_ES.fl_str_mv |
http://purl.org/pe-repo/ocde/ford#1.02.01 |
| description |
“This work aims at discovering topics in a text corpus and classifying the most relevant terms for each of the discovered topics. The process was performed in four steps: first, document extraction and data processing; second, labeling and training of the data; third, labeling of the unseen data; and fourth, evaluation of the model performance. For processing, a total of 10,322 ““curriculum““ documents related to data science were collected from the web during 2018-2022. The latent dirichlet allocation (LDA) model was used for the analysis and structure of the subjects. After processing, 12 themes were generated, which allowed ranking the most relevant terms to identify the skills of each of the candidates. This work concludes that candidates interested in data science must have skills in the following topics: first, they must be technical, they must have mastery of structured query language, mastery of programming languages such as R, Python, java, and data management, among other tools associated with the technology.“ |
| publishDate |
2022 |
| dc.date.accessioned.none.fl_str_mv |
2023-03-16T16:48:29Z |
| dc.date.available.none.fl_str_mv |
2023-03-16T16:48:29Z |
| dc.date.issued.fl_str_mv |
2022-11-18 |
| dc.type.es_ES.fl_str_mv |
info:eu-repo/semantics/article |
| dc.type.version.es_ES.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.uri.none.fl_str_mv |
https://hdl.handle.net/20.500.13053/8119 |
| dc.identifier.doi.es_ES.fl_str_mv |
10.11591/ijeecs.v30.i1.pp246-256 |
| url |
https://hdl.handle.net/20.500.13053/8119 |
| identifier_str_mv |
10.11591/ijeecs.v30.i1.pp246-256 |
| dc.language.iso.es_ES.fl_str_mv |
eng |
| language |
eng |
| dc.rights.es_ES.fl_str_mv |
info:eu-repo/semantics/openAccess |
| dc.rights.uri.es_ES.fl_str_mv |
https://creativecommons.org/licenses/by/4.0/ |
| eu_rights_str_mv |
openAccess |
| rights_invalid_str_mv |
https://creativecommons.org/licenses/by/4.0/ |
| dc.format.es_ES.fl_str_mv |
application/pdf |
| dc.publisher.es_ES.fl_str_mv |
Institute of Advanced Engineering and Science |
| dc.publisher.country.es_ES.fl_str_mv |
ID |
| dc.source.none.fl_str_mv |
reponame:UWIENER-Institucional instname:Universidad Privada Norbert Wiener instacron:UWIENER |
| instname_str |
Universidad Privada Norbert Wiener |
| instacron_str |
UWIENER |
| institution |
UWIENER |
| reponame_str |
UWIENER-Institucional |
| collection |
UWIENER-Institucional |
| bitstream.url.fl_str_mv |
https://dspace-uwiener.metabuscador.org/bitstreams/31faeacc-958f-421c-88d4-84a038952952/download https://dspace-uwiener.metabuscador.org/bitstreams/abc10387-768a-446e-9cf1-526ce3d6f27d/download https://dspace-uwiener.metabuscador.org/bitstreams/7c70ce4c-6205-4651-b9e9-4f45fe583322/download https://dspace-uwiener.metabuscador.org/bitstreams/6d6003c9-eaa9-4a84-9db8-58b1ca53e647/download |
| bitstream.checksum.fl_str_mv |
9612a19922a6b02e74c30e5467962abb 8a4605be74aa9ea9d79846c1fba20a33 02004f08cbd72736cb797cbbaff03b61 45eb29d54253f8b2a1083763de964a5e |
| bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 |
| repository.name.fl_str_mv |
Repositorio Institucional de la Universidad de Wiener |
| repository.mail.fl_str_mv |
bdigital@metabiblioteca.com |
| _version_ |
1835828883497156608 |
| spelling |
Iparraguirre-Villanueva, OrlandoSierra-Liñan, FernandoHerrera Salazar, Jose LuisBeltozar-Clemente, SaulPucuhuayla-Revatta, FélixZapata-Paulin, JoselynCabanillas-Carbonell, Michael2023-03-16T16:48:29Z2023-03-16T16:48:29Z2022-11-18https://hdl.handle.net/20.500.13053/811910.11591/ijeecs.v30.i1.pp246-256“This work aims at discovering topics in a text corpus and classifying the most relevant terms for each of the discovered topics. The process was performed in four steps: first, document extraction and data processing; second, labeling and training of the data; third, labeling of the unseen data; and fourth, evaluation of the model performance. For processing, a total of 10,322 ““curriculum““ documents related to data science were collected from the web during 2018-2022. The latent dirichlet allocation (LDA) model was used for the analysis and structure of the subjects. After processing, 12 themes were generated, which allowed ranking the most relevant terms to identify the skills of each of the candidates. This work concludes that candidates interested in data science must have skills in the following topics: first, they must be technical, they must have mastery of structured query language, mastery of programming languages such as R, Python, java, and data management, among other tools associated with the technology.“application/pdfengInstitute of Advanced Engineering and ScienceIDinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/4.0/"Classify Discovering Latent dirichlet allocation Text corpus Topics"http://purl.org/pe-repo/ocde/ford#1.02.01“Search and classify topics in a corpus of text using the latent dirichlet allocation model“info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionreponame:UWIENER-Institucionalinstname:Universidad Privada Norbert Wienerinstacron:UWIENERPublicationORIGINAL30256-61127-1-PB.pdf30256-61127-1-PB.pdfapplication/pdf646288https://dspace-uwiener.metabuscador.org/bitstreams/31faeacc-958f-421c-88d4-84a038952952/download9612a19922a6b02e74c30e5467962abbMD51LICENSElicense.txtlicense.txttext/plain; charset=utf-81748https://dspace-uwiener.metabuscador.org/bitstreams/abc10387-768a-446e-9cf1-526ce3d6f27d/download8a4605be74aa9ea9d79846c1fba20a33MD52TEXT30256-61127-1-PB.pdf.txt30256-61127-1-PB.pdf.txtExtracted texttext/plain44859https://dspace-uwiener.metabuscador.org/bitstreams/7c70ce4c-6205-4651-b9e9-4f45fe583322/download02004f08cbd72736cb797cbbaff03b61MD53THUMBNAIL30256-61127-1-PB.pdf.jpg30256-61127-1-PB.pdf.jpgGenerated Thumbnailimage/jpeg10791https://dspace-uwiener.metabuscador.org/bitstreams/6d6003c9-eaa9-4a84-9db8-58b1ca53e647/download45eb29d54253f8b2a1083763de964a5eMD5420.500.13053/8119oai:dspace-uwiener.metabuscador.org:20.500.13053/81192024-12-13 12:08:13.521https://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccessopen.accesshttps://dspace-uwiener.metabuscador.orgRepositorio Institucional de la Universidad de Wienerbdigital@metabiblioteca.comTk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo= |
| score |
13.983407 |
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).