A Low-Resourced Peruvian Language Identification Model
Descripción del Articulo
Due to the linguistic revitalization in Peru´ through the last years, there is a growing interest to reinforce the bilingual education in the country and to increase the research focused in its native languages. From the computer science perspective, one of the first steps to support the languages s...
| Autores: | , |
|---|---|
| Formato: | objeto de conferencia |
| Fecha de Publicación: | 2017 |
| Institución: | Consejo Nacional de Ciencia Tecnología e Innovación |
| Repositorio: | CONCYTEC-Institucional |
| Lenguaje: | inglés |
| OAI Identifier: | oai:repositorio.concytec.gob.pe:20.500.12390/488 |
| Enlace del recurso: | https://hdl.handle.net/20.500.12390/488 |
| Nivel de acceso: | acceso abierto |
| Materia: | Learning systems Big data Education Information management Automatic language identification Bilingual education Complex task https://purl.org/pe-repo/ocde/ford#6.02.00 |
| id |
CONC_b6a754588cce53eb7617aed537425073 |
|---|---|
| oai_identifier_str |
oai:repositorio.concytec.gob.pe:20.500.12390/488 |
| network_acronym_str |
CONC |
| network_name_str |
CONCYTEC-Institucional |
| repository_id_str |
4689 |
| dc.title.none.fl_str_mv |
A Low-Resourced Peruvian Language Identification Model |
| title |
A Low-Resourced Peruvian Language Identification Model |
| spellingShingle |
A Low-Resourced Peruvian Language Identification Model Linares A.E. Learning systems Big data Education Information management Automatic language identification Bilingual education Complex task https://purl.org/pe-repo/ocde/ford#6.02.00 |
| title_short |
A Low-Resourced Peruvian Language Identification Model |
| title_full |
A Low-Resourced Peruvian Language Identification Model |
| title_fullStr |
A Low-Resourced Peruvian Language Identification Model |
| title_full_unstemmed |
A Low-Resourced Peruvian Language Identification Model |
| title_sort |
A Low-Resourced Peruvian Language Identification Model |
| author |
Linares A.E. |
| author_facet |
Linares A.E. Oncevay-Marcos A. |
| author_role |
author |
| author2 |
Oncevay-Marcos A. |
| author2_role |
author |
| dc.contributor.author.fl_str_mv |
Linares A.E. Oncevay-Marcos A. |
| dc.subject.none.fl_str_mv |
Learning systems |
| topic |
Learning systems Big data Education Information management Automatic language identification Bilingual education Complex task https://purl.org/pe-repo/ocde/ford#6.02.00 |
| dc.subject.es_PE.fl_str_mv |
Big data Education Information management Automatic language identification Bilingual education Complex task |
| dc.subject.ocde.none.fl_str_mv |
https://purl.org/pe-repo/ocde/ford#6.02.00 |
| description |
Due to the linguistic revitalization in Peru´ through the last years, there is a growing interest to reinforce the bilingual education in the country and to increase the research focused in its native languages. From the computer science perspective, one of the first steps to support the languages study is the implementation of an automatic language identification tool using machine learning methods. Therefore, this work focuses in two steps: (1) the building of a digital and annotated corpus for 16 Peruvian native languages extracted from documents in web repositories, and (2) the fit of a supervised learning model for the language identification task using features identified from related studies in the state of the art, such as ngrams. The obtained results were promising (97% in average precision), and it is expected to take advantage of the corpus and the model for more complex tasks in the future. |
| publishDate |
2017 |
| dc.date.accessioned.none.fl_str_mv |
2024-05-30T23:13:38Z |
| dc.date.available.none.fl_str_mv |
2024-05-30T23:13:38Z |
| dc.date.issued.fl_str_mv |
2017 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject |
| format |
conferenceObject |
| dc.identifier.uri.none.fl_str_mv |
https://hdl.handle.net/20.500.12390/488 |
| dc.identifier.scopus.none.fl_str_mv |
2-s2.0-85040614941 |
| url |
https://hdl.handle.net/20.500.12390/488 |
| identifier_str_mv |
2-s2.0-85040614941 |
| dc.language.iso.none.fl_str_mv |
eng |
| language |
eng |
| dc.relation.ispartof.none.fl_str_mv |
CEUR Workshop Proceedings |
| dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess |
| dc.rights.uri.none.fl_str_mv |
https://creativecommons.org/licenses/by-nc-nd/4.0/ |
| eu_rights_str_mv |
openAccess |
| rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-nd/4.0/ |
| dc.publisher.none.fl_str_mv |
CEUR-WS |
| publisher.none.fl_str_mv |
CEUR-WS |
| dc.source.none.fl_str_mv |
reponame:CONCYTEC-Institucional instname:Consejo Nacional de Ciencia Tecnología e Innovación instacron:CONCYTEC |
| instname_str |
Consejo Nacional de Ciencia Tecnología e Innovación |
| instacron_str |
CONCYTEC |
| institution |
CONCYTEC |
| reponame_str |
CONCYTEC-Institucional |
| collection |
CONCYTEC-Institucional |
| repository.name.fl_str_mv |
Repositorio Institucional CONCYTEC |
| repository.mail.fl_str_mv |
repositorio@concytec.gob.pe |
| _version_ |
1844882999807574016 |
| spelling |
Publicationrp00569600rp00570600Linares A.E.Oncevay-Marcos A.2024-05-30T23:13:38Z2024-05-30T23:13:38Z2017https://hdl.handle.net/20.500.12390/4882-s2.0-85040614941Due to the linguistic revitalization in Peru´ through the last years, there is a growing interest to reinforce the bilingual education in the country and to increase the research focused in its native languages. From the computer science perspective, one of the first steps to support the languages study is the implementation of an automatic language identification tool using machine learning methods. Therefore, this work focuses in two steps: (1) the building of a digital and annotated corpus for 16 Peruvian native languages extracted from documents in web repositories, and (2) the fit of a supervised learning model for the language identification task using features identified from related studies in the state of the art, such as ngrams. The obtained results were promising (97% in average precision), and it is expected to take advantage of the corpus and the model for more complex tasks in the future.Consejo Nacional de Ciencia, Tecnología e Innovación Tecnológica - ConcytecengCEUR-WSCEUR Workshop Proceedingsinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/4.0/Learning systemsBig data-1Education-1Information management-1Automatic language identification-1Bilingual education-1Complex task-1https://purl.org/pe-repo/ocde/ford#6.02.00-1A Low-Resourced Peruvian Language Identification Modelinfo:eu-repo/semantics/conferenceObjectreponame:CONCYTEC-Institucionalinstname:Consejo Nacional de Ciencia Tecnología e Innovacióninstacron:CONCYTEC20.500.12390/488oai:repositorio.concytec.gob.pe:20.500.12390/4882024-05-30 15:57:36.723https://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_14cbinfo:eu-repo/semantics/closedAccessmetadata only accesshttps://repositorio.concytec.gob.peRepositorio Institucional CONCYTECrepositorio@concytec.gob.pe#PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE#<Publication xmlns="https://www.openaire.eu/cerif-profile/1.1/" id="90c0cfcd-5b05-488c-a89c-397df5f434a3"> <Type xmlns="https://www.openaire.eu/cerif-profile/vocab/COAR_Publication_Types">http://purl.org/coar/resource_type/c_1843</Type> <Language>eng</Language> <Title>A Low-Resourced Peruvian Language Identification Model</Title> <PublishedIn> <Publication> <Title>CEUR Workshop Proceedings</Title> </Publication> </PublishedIn> <PublicationDate>2017</PublicationDate> <SCP-Number>2-s2.0-85040614941</SCP-Number> <Authors> <Author> <DisplayName>Linares A.E.</DisplayName> <Person id="rp00569" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Oncevay-Marcos A.</DisplayName> <Person id="rp00570" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> </Authors> <Editors> </Editors> <Publishers> <Publisher> <DisplayName>CEUR-WS</DisplayName> <OrgUnit /> </Publisher> </Publishers> <License>https://creativecommons.org/licenses/by-nc-nd/4.0/</License> <Keyword>Learning systems</Keyword> <Keyword>Big data</Keyword> <Keyword>Education</Keyword> <Keyword>Information management</Keyword> <Keyword>Automatic language identification</Keyword> <Keyword>Bilingual education</Keyword> <Keyword>Complex task</Keyword> <Abstract>Due to the linguistic revitalization in Peru´ through the last years, there is a growing interest to reinforce the bilingual education in the country and to increase the research focused in its native languages. From the computer science perspective, one of the first steps to support the languages study is the implementation of an automatic language identification tool using machine learning methods. Therefore, this work focuses in two steps: (1) the building of a digital and annotated corpus for 16 Peruvian native languages extracted from documents in web repositories, and (2) the fit of a supervised learning model for the language identification task using features identified from related studies in the state of the art, such as ngrams. The obtained results were promising (97% in average precision), and it is expected to take advantage of the corpus and the model for more complex tasks in the future.</Abstract> <Access xmlns="http://purl.org/coar/access_right" > </Access> </Publication> -1 |
| score |
13.413352 |
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).