WordNet-SHP: Towards the building of a lexical database for a Peruvian minority language

Descripción del Articulo

WordNet-like resources are lexical databases with highly relevance information and data which could be exploited in more complex computational linguistics research and applications. The building process requires manual and automatic tasks, that could be more arduous if the language is a minority one...

Descripción completa

Detalles Bibliográficos
Autores: Maguiño-Valencia D., Oncevay-Marcos A., Sobrevilla Cabezudo M.A.
Formato: objeto de conferencia
Fecha de Publicación:2019
Institución:Consejo Nacional de Ciencia Tecnología e Innovación
Repositorio:CONCYTEC-Institucional
Lenguaje:inglés
OAI Identifier:oai:repositorio.concytec.gob.pe:20.500.12390/819
Enlace del recurso:https://hdl.handle.net/20.500.12390/819
Nivel de acceso:acceso abierto
Materia:Wordnet
Computational linguistics
Database systems
Natural language processing systems
Ships
Bilingual dictionary
Digital resources
Lexical database
Machine translations
Minority languages
Research and application
Word Sense Disambiguation
Ontology
https://purl.org/pe-repo/ocde/ford#6.02.06
id CONC_8e5176666738abfc24e2c0d247c4afce
oai_identifier_str oai:repositorio.concytec.gob.pe:20.500.12390/819
network_acronym_str CONC
network_name_str CONCYTEC-Institucional
repository_id_str 4689
dc.title.none.fl_str_mv WordNet-SHP: Towards the building of a lexical database for a Peruvian minority language
title WordNet-SHP: Towards the building of a lexical database for a Peruvian minority language
spellingShingle WordNet-SHP: Towards the building of a lexical database for a Peruvian minority language
Maguiño-Valencia D.
Wordnet
Computational linguistics
Database systems
Natural language processing systems
Ships
Bilingual dictionary
Digital resources
Lexical database
Machine translations
Minority languages
Research and application
Word Sense Disambiguation
Ontology
https://purl.org/pe-repo/ocde/ford#6.02.06
title_short WordNet-SHP: Towards the building of a lexical database for a Peruvian minority language
title_full WordNet-SHP: Towards the building of a lexical database for a Peruvian minority language
title_fullStr WordNet-SHP: Towards the building of a lexical database for a Peruvian minority language
title_full_unstemmed WordNet-SHP: Towards the building of a lexical database for a Peruvian minority language
title_sort WordNet-SHP: Towards the building of a lexical database for a Peruvian minority language
author Maguiño-Valencia D.
author_facet Maguiño-Valencia D.
Oncevay-Marcos A.
Sobrevilla Cabezudo M.A.
author_role author
author2 Oncevay-Marcos A.
Sobrevilla Cabezudo M.A.
author2_role author
author
dc.contributor.author.fl_str_mv Maguiño-Valencia D.
Oncevay-Marcos A.
Sobrevilla Cabezudo M.A.
dc.subject.none.fl_str_mv Wordnet
topic Wordnet
Computational linguistics
Database systems
Natural language processing systems
Ships
Bilingual dictionary
Digital resources
Lexical database
Machine translations
Minority languages
Research and application
Word Sense Disambiguation
Ontology
https://purl.org/pe-repo/ocde/ford#6.02.06
dc.subject.es_PE.fl_str_mv Computational linguistics
Database systems
Natural language processing systems
Ships
Bilingual dictionary
Digital resources
Lexical database
Machine translations
Minority languages
Research and application
Word Sense Disambiguation
Ontology
dc.subject.ocde.none.fl_str_mv https://purl.org/pe-repo/ocde/ford#6.02.06
description WordNet-like resources are lexical databases with highly relevance information and data which could be exploited in more complex computational linguistics research and applications. The building process requires manual and automatic tasks, that could be more arduous if the language is a minority one with fewer digital resources. This study focuses in the construction of an initial WordNetdatabase for a low-resourced and indigenous language in Peru: Shipibo-Konibo (shp). First, the stages of development from a scarce scenario (a bilingual dictionary shp-es) are described. Then, it is proposed a synset alignment method by comparing the definition glosses in the dictionary (written in Spanish) with the content of a Spanish WordNet. In this sense, word2vec similarity was the chosen metric for the proximity measure. Finally, an evaluation process is performed for the synsets, using a manually annotated Gold Standard inShipibo-Konibo. The obtained results are promising, and this resource is expected to serve well in further applications, such as word sense disambiguation and even machine translation in the shp-es language pair.
publishDate 2019
dc.date.accessioned.none.fl_str_mv 2024-05-30T23:13:38Z
dc.date.available.none.fl_str_mv 2024-05-30T23:13:38Z
dc.date.issued.fl_str_mv 2019
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
format conferenceObject
dc.identifier.isbn.none.fl_str_mv urn:isbn:9791095546009
dc.identifier.uri.none.fl_str_mv https://hdl.handle.net/20.500.12390/819
dc.identifier.scopus.none.fl_str_mv 2-s2.0-85059915834
identifier_str_mv urn:isbn:9791095546009
2-s2.0-85059915834
url https://hdl.handle.net/20.500.12390/819
dc.language.iso.none.fl_str_mv eng
language eng
dc.relation.ispartof.none.fl_str_mv LREC 2018 - 11th International Conference on Language Resources and Evaluation
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv European Language Resources Association (ELRA)
publisher.none.fl_str_mv European Language Resources Association (ELRA)
dc.source.none.fl_str_mv reponame:CONCYTEC-Institucional
instname:Consejo Nacional de Ciencia Tecnología e Innovación
instacron:CONCYTEC
instname_str Consejo Nacional de Ciencia Tecnología e Innovación
instacron_str CONCYTEC
institution CONCYTEC
reponame_str CONCYTEC-Institucional
collection CONCYTEC-Institucional
repository.name.fl_str_mv Repositorio Institucional CONCYTEC
repository.mail.fl_str_mv repositorio@concytec.gob.pe
_version_ 1844883043128442880
spelling Publicationrp02103600rp00570500rp02102600Maguiño-Valencia D.Oncevay-Marcos A.Sobrevilla Cabezudo M.A.2024-05-30T23:13:38Z2024-05-30T23:13:38Z2019urn:isbn:9791095546009https://hdl.handle.net/20.500.12390/8192-s2.0-85059915834WordNet-like resources are lexical databases with highly relevance information and data which could be exploited in more complex computational linguistics research and applications. The building process requires manual and automatic tasks, that could be more arduous if the language is a minority one with fewer digital resources. This study focuses in the construction of an initial WordNetdatabase for a low-resourced and indigenous language in Peru: Shipibo-Konibo (shp). First, the stages of development from a scarce scenario (a bilingual dictionary shp-es) are described. Then, it is proposed a synset alignment method by comparing the definition glosses in the dictionary (written in Spanish) with the content of a Spanish WordNet. In this sense, word2vec similarity was the chosen metric for the proximity measure. Finally, an evaluation process is performed for the synsets, using a manually annotated Gold Standard inShipibo-Konibo. The obtained results are promising, and this resource is expected to serve well in further applications, such as word sense disambiguation and even machine translation in the shp-es language pair.Consejo Nacional de Ciencia, Tecnología e Innovación Tecnológica - ConcytecengEuropean Language Resources Association (ELRA)LREC 2018 - 11th International Conference on Language Resources and Evaluationinfo:eu-repo/semantics/openAccessWordnetComputational linguistics-1Database systems-1Natural language processing systems-1Ships-1Bilingual dictionary-1Digital resources-1Lexical database-1Machine translations-1Minority languages-1Research and application-1Word Sense Disambiguation-1Ontology-1https://purl.org/pe-repo/ocde/ford#6.02.06-1WordNet-SHP: Towards the building of a lexical database for a Peruvian minority languageinfo:eu-repo/semantics/conferenceObjectreponame:CONCYTEC-Institucionalinstname:Consejo Nacional de Ciencia Tecnología e Innovacióninstacron:CONCYTEC20.500.12390/819oai:repositorio.concytec.gob.pe:20.500.12390/8192024-05-30 15:59:10.458http://purl.org/coar/access_right/c_14cbinfo:eu-repo/semantics/closedAccessmetadata only accesshttps://repositorio.concytec.gob.peRepositorio Institucional CONCYTECrepositorio@concytec.gob.pe#PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE#<Publication xmlns="https://www.openaire.eu/cerif-profile/1.1/" id="e63d4112-67ab-4bd6-9bf9-7b97967ad53b"> <Type xmlns="https://www.openaire.eu/cerif-profile/vocab/COAR_Publication_Types">http://purl.org/coar/resource_type/c_1843</Type> <Language>eng</Language> <Title>WordNet-SHP: Towards the building of a lexical database for a Peruvian minority language</Title> <PublishedIn> <Publication> <Title>LREC 2018 - 11th International Conference on Language Resources and Evaluation</Title> </Publication> </PublishedIn> <PublicationDate>2019</PublicationDate> <SCP-Number>2-s2.0-85059915834</SCP-Number> <ISBN>urn:isbn:9791095546009</ISBN> <Authors> <Author> <DisplayName>Maguiño-Valencia D.</DisplayName> <Person id="rp02103" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Oncevay-Marcos A.</DisplayName> <Person id="rp00570" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Sobrevilla Cabezudo M.A.</DisplayName> <Person id="rp02102" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> </Authors> <Editors> </Editors> <Publishers> <Publisher> <DisplayName>European Language Resources Association (ELRA)</DisplayName> <OrgUnit /> </Publisher> </Publishers> <Keyword>Wordnet</Keyword> <Keyword>Computational linguistics</Keyword> <Keyword>Database systems</Keyword> <Keyword>Natural language processing systems</Keyword> <Keyword>Ships</Keyword> <Keyword>Bilingual dictionary</Keyword> <Keyword>Digital resources</Keyword> <Keyword>Lexical database</Keyword> <Keyword>Machine translations</Keyword> <Keyword>Minority languages</Keyword> <Keyword>Research and application</Keyword> <Keyword>Word Sense Disambiguation</Keyword> <Keyword>Ontology</Keyword> <Abstract>WordNet-like resources are lexical databases with highly relevance information and data which could be exploited in more complex computational linguistics research and applications. The building process requires manual and automatic tasks, that could be more arduous if the language is a minority one with fewer digital resources. This study focuses in the construction of an initial WordNetdatabase for a low-resourced and indigenous language in Peru: Shipibo-Konibo (shp). First, the stages of development from a scarce scenario (a bilingual dictionary shp-es) are described. Then, it is proposed a synset alignment method by comparing the definition glosses in the dictionary (written in Spanish) with the content of a Spanish WordNet. In this sense, word2vec similarity was the chosen metric for the proximity measure. Finally, an evaluation process is performed for the synsets, using a manually annotated Gold Standard inShipibo-Konibo. The obtained results are promising, and this resource is expected to serve well in further applications, such as word sense disambiguation and even machine translation in the shp-es language pair.</Abstract> <Access xmlns="http://purl.org/coar/access_right" > </Access> </Publication> -1
score 13.457506
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).