Classification of Organisms into Kingdoms using DNA Codon Frequency

Descripción del Articulo

This study aims to use machine learning classifiers to predict the kingdom to which an organism belongs by the frequency of use of DNA codons. The study used 13,028 data from GenBank organisms distributed in eleven kingdoms and reduced them to six kingdoms (archaea, bacteria, invertebrates, plants,...

Descripción completa

Detalles Bibliográficos
Autor: Palma Ttito, Luis Beltrán
Formato: artículo
Fecha de Publicación:2022
Institución:Universidad de Lima
Repositorio:Revistas - Universidad de Lima
Lenguaje:español
OAI Identifier:oai:ojs.pkp.sfu.ca:article/5896
Enlace del recurso:https://revistas.ulima.edu.pe/index.php/Interfases/article/view/5896
Nivel de acceso:acceso abierto
Materia:machine learning
Ensembles
DNA codon frequency
kingdom
ensambles
frecuencia de codones ADN
reino
id REVULIMA_e46bfe66cc9b329f2bcb6d8bd4230ff9
oai_identifier_str oai:ojs.pkp.sfu.ca:article/5896
network_acronym_str REVULIMA
network_name_str Revistas - Universidad de Lima
repository_id_str
spelling Classification of Organisms into Kingdoms using DNA Codon FrequencyClasificación de Organismos en Reinos utilizando Frecuencia de Codones de ADNPalma Ttito, Luis BeltránPalma Ttito, Luis Beltránmachine learningEnsemblesDNA codon frequencykingdommachine learningensamblesfrecuencia de codones ADNreinoThis study aims to use machine learning classifiers to predict the kingdom to which an organism belongs by the frequency of use of DNA codons. The study used 13,028 data from GenBank organisms distributed in eleven kingdoms and reduced them to six kingdoms (archaea, bacteria, invertebrates, plants, viruses, and vertebrates) with 9,027 regrouped data. The process required cleaning irrelevant attributes, using measurement metrics of accuracy, precision, sensitivity, and score classifiers, and the adjustment of hyperparameters of the models. The classification algorithms were voting, bagging, boosting, and stacking, using KNN, AD, MLP, SVC, and RF. Random forest was used in selecting the attributes. The stacking ensemble, with its models, better predicts the classification of organisms in the present study.Este estudio de tiene por objetivo utilizar clasificadores de machine learning para predecir el reino al que pertenece un organismo por la frecuencia de uso de codones de ADN. Para ello se ha tomado 13 028 datos de organismos del GenBank distribuidos en once reinos y se los redujo a seis reinos (arqueas, bacterias, invertebrados, plantas, virus y vertebrados) con 9027 datos reagrupados. El proceso requirió la, depuración de atributos irrelevantes, el empleo de métricas de medición de clasificadores de exactitud, precisión, sensibilidad y puntuación, así como el ajuste de hiperparámetros de los modelos. Los algoritmos de clasificación fueron voting, bagging, boosting y stacking, usando KNN, AD, MLP, SVC y RF. La selección de atributos se hizo con random forest. El ensamble stacking, con sus modelos, predice mejor la clasificación de organismos en el presente estudio.Universidad de Lima2022-07-29info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdftext/htmlhttps://revistas.ulima.edu.pe/index.php/Interfases/article/view/589610.26439/interfases2022.n015.5896Interfases; No. 015 (2022); 131-143Interfases; Núm. 015 (2022); 131-143Interfases; n. 015 (2022); 131-1431993-491210.26439/interfases2022.n015reponame:Revistas - Universidad de Limainstname:Universidad de Limainstacron:ULIMAspahttps://revistas.ulima.edu.pe/index.php/Interfases/article/view/5896/5790https://revistas.ulima.edu.pe/index.php/Interfases/article/view/5896/5798info:eu-repo/semantics/openAccessoai:ojs.pkp.sfu.ca:article/58962023-07-24T13:33:18Z
dc.title.none.fl_str_mv Classification of Organisms into Kingdoms using DNA Codon Frequency
Clasificación de Organismos en Reinos utilizando Frecuencia de Codones de ADN
title Classification of Organisms into Kingdoms using DNA Codon Frequency
spellingShingle Classification of Organisms into Kingdoms using DNA Codon Frequency
Palma Ttito, Luis Beltrán
machine learning
Ensembles
DNA codon frequency
kingdom
machine learning
ensambles
frecuencia de codones ADN
reino
title_short Classification of Organisms into Kingdoms using DNA Codon Frequency
title_full Classification of Organisms into Kingdoms using DNA Codon Frequency
title_fullStr Classification of Organisms into Kingdoms using DNA Codon Frequency
title_full_unstemmed Classification of Organisms into Kingdoms using DNA Codon Frequency
title_sort Classification of Organisms into Kingdoms using DNA Codon Frequency
dc.creator.none.fl_str_mv Palma Ttito, Luis Beltrán
Palma Ttito, Luis Beltrán
author Palma Ttito, Luis Beltrán
author_facet Palma Ttito, Luis Beltrán
author_role author
dc.subject.none.fl_str_mv machine learning
Ensembles
DNA codon frequency
kingdom
machine learning
ensambles
frecuencia de codones ADN
reino
topic machine learning
Ensembles
DNA codon frequency
kingdom
machine learning
ensambles
frecuencia de codones ADN
reino
description This study aims to use machine learning classifiers to predict the kingdom to which an organism belongs by the frequency of use of DNA codons. The study used 13,028 data from GenBank organisms distributed in eleven kingdoms and reduced them to six kingdoms (archaea, bacteria, invertebrates, plants, viruses, and vertebrates) with 9,027 regrouped data. The process required cleaning irrelevant attributes, using measurement metrics of accuracy, precision, sensitivity, and score classifiers, and the adjustment of hyperparameters of the models. The classification algorithms were voting, bagging, boosting, and stacking, using KNN, AD, MLP, SVC, and RF. Random forest was used in selecting the attributes. The stacking ensemble, with its models, better predicts the classification of organisms in the present study.
publishDate 2022
dc.date.none.fl_str_mv 2022-07-29
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv https://revistas.ulima.edu.pe/index.php/Interfases/article/view/5896
10.26439/interfases2022.n015.5896
url https://revistas.ulima.edu.pe/index.php/Interfases/article/view/5896
identifier_str_mv 10.26439/interfases2022.n015.5896
dc.language.none.fl_str_mv spa
language spa
dc.relation.none.fl_str_mv https://revistas.ulima.edu.pe/index.php/Interfases/article/view/5896/5790
https://revistas.ulima.edu.pe/index.php/Interfases/article/view/5896/5798
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
text/html
dc.publisher.none.fl_str_mv Universidad de Lima
publisher.none.fl_str_mv Universidad de Lima
dc.source.none.fl_str_mv Interfases; No. 015 (2022); 131-143
Interfases; Núm. 015 (2022); 131-143
Interfases; n. 015 (2022); 131-143
1993-4912
10.26439/interfases2022.n015
reponame:Revistas - Universidad de Lima
instname:Universidad de Lima
instacron:ULIMA
instname_str Universidad de Lima
instacron_str ULIMA
institution ULIMA
reponame_str Revistas - Universidad de Lima
collection Revistas - Universidad de Lima
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_ 1849962721376731136
score 13.090503
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).