Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases

Descripción del Articulo

Genetic diversity is an important component of biodiversity, and it is crucial for current efforts to protect and sustainably manage several organisms and habitats. As far as we know, there is only one work describing Peruvian genetic information stored in public databases. We aimed to update this p...

Descripción completa

Detalles Bibliográficos
Autores: Romero, Pedro Eduardo, Castillo-Vilcahuaman, Camila
Formato: artículo
Fecha de Publicación:2021
Institución:Universidad Nacional Mayor de San Marcos
Repositorio:Revista UNMSM - Revista Peruana de Biología
Lenguaje:inglés
OAI Identifier:oai:ojs.csi.unmsm:article/17867
Enlace del recurso:https://revistasinvestigacion.unmsm.edu.pe/index.php/rpb/article/view/17867
Nivel de acceso:acceso abierto
Materia:Genetic diversity
public databases
biodiversity
Peru
data mining
Diversidad genética
bases de datos públicas
biodiversidad
Perú
minería de datos
Descripción
Sumario:Genetic diversity is an important component of biodiversity, and it is crucial for current efforts to protect and sustainably manage several organisms and habitats. As far as we know, there is only one work describing Peruvian genetic information stored in public databases. We aimed to update this previous work searching in four public databases that stored digital sequence information: Nucleotide, BioProject, PATRIC, BOLD. With this information, we comment on the contribution of Peruvian institutions during recent years. In Nucleotide, the largest database, Bacteria are the most sequenced organisms by Peruvian institutions (70.60%), pathogenic bacteria such as Pasteurella multocida, Neisseria meningitidis, and Vibrio parahaemolyticus were the most abundant. We found no sequence records from the Archaea domain. In BioProject, the most common sequence belongs to Salmonella enterica subsp. enterica serovar Infantis. In PATRIC, a database of pathogenic agents, Mycobacterium tuberculosis and Yersinia pestis had the highest number of entries. Finally, in BOLD, an exclusively Eukaryotic database, Chordata (Aves and Actinopterygii), Angiospermae, and Arthropoda (Insecta, and Arachnida) were the most frequent records. Our results would indicate research preferences of Peruvian institutions, focusing on infectious diseases and some Eukaryotic phyla. Although there has been a significant increase of DNA information submitted by Peruvian institutions since the last report, the genetic diversity reflected in these databases remains inconsistent with the diversity in the country. More efforts must be made to obtain genetic information from more underestimated taxonomic groups and to promote more genetic research in regional Peruvian institutions.
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).