A Low-Resourced Peruvian Language Identification Model

Descripción del Articulo

Due to the linguistic revitalization in Peru´ through the last years, there is a growing interest to reinforce the bilingual education in the country and to increase the research focused in its native languages. From the computer science perspective, one of the first steps to support the languages s...

Descripción completa

Detalles Bibliográficos
Autores: Linares A.E., Oncevay-Marcos A.
Formato: objeto de conferencia
Fecha de Publicación:2017
Institución:Consejo Nacional de Ciencia Tecnología e Innovación
Repositorio:CONCYTEC-Institucional
Lenguaje:inglés
OAI Identifier:oai:repositorio.concytec.gob.pe:20.500.12390/488
Enlace del recurso:https://hdl.handle.net/20.500.12390/488
Nivel de acceso:acceso abierto
Materia:Learning systems
Big data
Education
Information management
Automatic language identification
Bilingual education
Complex task
https://purl.org/pe-repo/ocde/ford#6.02.00
id CONC_b6a754588cce53eb7617aed537425073
oai_identifier_str oai:repositorio.concytec.gob.pe:20.500.12390/488
network_acronym_str CONC
network_name_str CONCYTEC-Institucional
repository_id_str 4689
dc.title.none.fl_str_mv A Low-Resourced Peruvian Language Identification Model
title A Low-Resourced Peruvian Language Identification Model
spellingShingle A Low-Resourced Peruvian Language Identification Model
Linares A.E.
Learning systems
Big data
Education
Information management
Automatic language identification
Bilingual education
Complex task
https://purl.org/pe-repo/ocde/ford#6.02.00
title_short A Low-Resourced Peruvian Language Identification Model
title_full A Low-Resourced Peruvian Language Identification Model
title_fullStr A Low-Resourced Peruvian Language Identification Model
title_full_unstemmed A Low-Resourced Peruvian Language Identification Model
title_sort A Low-Resourced Peruvian Language Identification Model
author Linares A.E.
author_facet Linares A.E.
Oncevay-Marcos A.
author_role author
author2 Oncevay-Marcos A.
author2_role author
dc.contributor.author.fl_str_mv Linares A.E.
Oncevay-Marcos A.
dc.subject.none.fl_str_mv Learning systems
topic Learning systems
Big data
Education
Information management
Automatic language identification
Bilingual education
Complex task
https://purl.org/pe-repo/ocde/ford#6.02.00
dc.subject.es_PE.fl_str_mv Big data
Education
Information management
Automatic language identification
Bilingual education
Complex task
dc.subject.ocde.none.fl_str_mv https://purl.org/pe-repo/ocde/ford#6.02.00
description Due to the linguistic revitalization in Peru´ through the last years, there is a growing interest to reinforce the bilingual education in the country and to increase the research focused in its native languages. From the computer science perspective, one of the first steps to support the languages study is the implementation of an automatic language identification tool using machine learning methods. Therefore, this work focuses in two steps: (1) the building of a digital and annotated corpus for 16 Peruvian native languages extracted from documents in web repositories, and (2) the fit of a supervised learning model for the language identification task using features identified from related studies in the state of the art, such as ngrams. The obtained results were promising (97% in average precision), and it is expected to take advantage of the corpus and the model for more complex tasks in the future.
publishDate 2017
dc.date.accessioned.none.fl_str_mv 2024-05-30T23:13:38Z
dc.date.available.none.fl_str_mv 2024-05-30T23:13:38Z
dc.date.issued.fl_str_mv 2017
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
format conferenceObject
dc.identifier.uri.none.fl_str_mv https://hdl.handle.net/20.500.12390/488
dc.identifier.scopus.none.fl_str_mv 2-s2.0-85040614941
url https://hdl.handle.net/20.500.12390/488
identifier_str_mv 2-s2.0-85040614941
dc.language.iso.none.fl_str_mv eng
language eng
dc.relation.ispartof.none.fl_str_mv CEUR Workshop Proceedings
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
dc.rights.uri.none.fl_str_mv https://creativecommons.org/licenses/by-nc-nd/4.0/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.publisher.none.fl_str_mv CEUR-WS
publisher.none.fl_str_mv CEUR-WS
dc.source.none.fl_str_mv reponame:CONCYTEC-Institucional
instname:Consejo Nacional de Ciencia Tecnología e Innovación
instacron:CONCYTEC
instname_str Consejo Nacional de Ciencia Tecnología e Innovación
instacron_str CONCYTEC
institution CONCYTEC
reponame_str CONCYTEC-Institucional
collection CONCYTEC-Institucional
repository.name.fl_str_mv Repositorio Institucional CONCYTEC
repository.mail.fl_str_mv repositorio@concytec.gob.pe
_version_ 1844882999807574016
spelling Publicationrp00569600rp00570600Linares A.E.Oncevay-Marcos A.2024-05-30T23:13:38Z2024-05-30T23:13:38Z2017https://hdl.handle.net/20.500.12390/4882-s2.0-85040614941Due to the linguistic revitalization in Peru´ through the last years, there is a growing interest to reinforce the bilingual education in the country and to increase the research focused in its native languages. From the computer science perspective, one of the first steps to support the languages study is the implementation of an automatic language identification tool using machine learning methods. Therefore, this work focuses in two steps: (1) the building of a digital and annotated corpus for 16 Peruvian native languages extracted from documents in web repositories, and (2) the fit of a supervised learning model for the language identification task using features identified from related studies in the state of the art, such as ngrams. The obtained results were promising (97% in average precision), and it is expected to take advantage of the corpus and the model for more complex tasks in the future.Consejo Nacional de Ciencia, Tecnología e Innovación Tecnológica - ConcytecengCEUR-WSCEUR Workshop Proceedingsinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/4.0/Learning systemsBig data-1Education-1Information management-1Automatic language identification-1Bilingual education-1Complex task-1https://purl.org/pe-repo/ocde/ford#6.02.00-1A Low-Resourced Peruvian Language Identification Modelinfo:eu-repo/semantics/conferenceObjectreponame:CONCYTEC-Institucionalinstname:Consejo Nacional de Ciencia Tecnología e Innovacióninstacron:CONCYTEC20.500.12390/488oai:repositorio.concytec.gob.pe:20.500.12390/4882024-05-30 15:57:36.723https://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_14cbinfo:eu-repo/semantics/closedAccessmetadata only accesshttps://repositorio.concytec.gob.peRepositorio Institucional CONCYTECrepositorio@concytec.gob.pe#PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE#<Publication xmlns="https://www.openaire.eu/cerif-profile/1.1/" id="90c0cfcd-5b05-488c-a89c-397df5f434a3"> <Type xmlns="https://www.openaire.eu/cerif-profile/vocab/COAR_Publication_Types">http://purl.org/coar/resource_type/c_1843</Type> <Language>eng</Language> <Title>A Low-Resourced Peruvian Language Identification Model</Title> <PublishedIn> <Publication> <Title>CEUR Workshop Proceedings</Title> </Publication> </PublishedIn> <PublicationDate>2017</PublicationDate> <SCP-Number>2-s2.0-85040614941</SCP-Number> <Authors> <Author> <DisplayName>Linares A.E.</DisplayName> <Person id="rp00569" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Oncevay-Marcos A.</DisplayName> <Person id="rp00570" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> </Authors> <Editors> </Editors> <Publishers> <Publisher> <DisplayName>CEUR-WS</DisplayName> <OrgUnit /> </Publisher> </Publishers> <License>https://creativecommons.org/licenses/by-nc-nd/4.0/</License> <Keyword>Learning systems</Keyword> <Keyword>Big data</Keyword> <Keyword>Education</Keyword> <Keyword>Information management</Keyword> <Keyword>Automatic language identification</Keyword> <Keyword>Bilingual education</Keyword> <Keyword>Complex task</Keyword> <Abstract>Due to the linguistic revitalization in Peru´ through the last years, there is a growing interest to reinforce the bilingual education in the country and to increase the research focused in its native languages. From the computer science perspective, one of the first steps to support the languages study is the implementation of an automatic language identification tool using machine learning methods. Therefore, this work focuses in two steps: (1) the building of a digital and annotated corpus for 16 Peruvian native languages extracted from documents in web repositories, and (2) the fit of a supervised learning model for the language identification task using features identified from related studies in the state of the art, such as ngrams. The obtained results were promising (97% in average precision), and it is expected to take advantage of the corpus and the model for more complex tasks in the future.</Abstract> <Access xmlns="http://purl.org/coar/access_right" > </Access> </Publication> -1
score 13.413352
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).