Classification of Organisms into Kingdoms using DNA Codon Frequency

Descripción del Articulo

This study aims to use machine learning classifiers to predict the kingdom to which an organism belongs by the frequency of use of DNA codons. The study used 13,028 data from GenBank organisms distributed in eleven kingdoms and reduced them to six kingdoms (archaea, bacteria, invertebrates, plants,...

Descripción completa

Detalles Bibliográficos
Autor: Palma Ttito, Luis Beltrán
Formato: artículo
Fecha de Publicación:2022
Institución:Universidad de Lima
Repositorio:Revistas - Universidad de Lima
Lenguaje:español
OAI Identifier:oai:ojs.pkp.sfu.ca:article/5896
Enlace del recurso:https://revistas.ulima.edu.pe/index.php/Interfases/article/view/5896
Nivel de acceso:acceso abierto
Materia:machine learning
Ensembles
DNA codon frequency
kingdom
ensambles
frecuencia de codones ADN
reino
Descripción
Sumario:This study aims to use machine learning classifiers to predict the kingdom to which an organism belongs by the frequency of use of DNA codons. The study used 13,028 data from GenBank organisms distributed in eleven kingdoms and reduced them to six kingdoms (archaea, bacteria, invertebrates, plants, viruses, and vertebrates) with 9,027 regrouped data. The process required cleaning irrelevant attributes, using measurement metrics of accuracy, precision, sensitivity, and score classifiers, and the adjustment of hyperparameters of the models. The classification algorithms were voting, bagging, boosting, and stacking, using KNN, AD, MLP, SVC, and RF. Random forest was used in selecting the attributes. The stacking ensemble, with its models, better predicts the classification of organisms in the present study.
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).