Classification of Organisms into Kingdoms using DNA Codon Frequency
Descripción del Articulo
This study aims to use machine learning classifiers to predict the kingdom to which an organism belongs by the frequency of use of DNA codons. The study used 13,028 data from GenBank organisms distributed in eleven kingdoms and reduced them to six kingdoms (archaea, bacteria, invertebrates, plants,...
| Autor: | |
|---|---|
| Formato: | artículo |
| Fecha de Publicación: | 2022 |
| Institución: | Universidad de Lima |
| Repositorio: | Revistas - Universidad de Lima |
| Lenguaje: | español |
| OAI Identifier: | oai:ojs.pkp.sfu.ca:article/5896 |
| Enlace del recurso: | https://revistas.ulima.edu.pe/index.php/Interfases/article/view/5896 |
| Nivel de acceso: | acceso abierto |
| Materia: | machine learning Ensembles DNA codon frequency kingdom ensambles frecuencia de codones ADN reino |
| id |
REVULIMA_e46bfe66cc9b329f2bcb6d8bd4230ff9 |
|---|---|
| oai_identifier_str |
oai:ojs.pkp.sfu.ca:article/5896 |
| network_acronym_str |
REVULIMA |
| network_name_str |
Revistas - Universidad de Lima |
| repository_id_str |
|
| spelling |
Classification of Organisms into Kingdoms using DNA Codon FrequencyClasificación de Organismos en Reinos utilizando Frecuencia de Codones de ADNPalma Ttito, Luis BeltránPalma Ttito, Luis Beltránmachine learningEnsemblesDNA codon frequencykingdommachine learningensamblesfrecuencia de codones ADNreinoThis study aims to use machine learning classifiers to predict the kingdom to which an organism belongs by the frequency of use of DNA codons. The study used 13,028 data from GenBank organisms distributed in eleven kingdoms and reduced them to six kingdoms (archaea, bacteria, invertebrates, plants, viruses, and vertebrates) with 9,027 regrouped data. The process required cleaning irrelevant attributes, using measurement metrics of accuracy, precision, sensitivity, and score classifiers, and the adjustment of hyperparameters of the models. The classification algorithms were voting, bagging, boosting, and stacking, using KNN, AD, MLP, SVC, and RF. Random forest was used in selecting the attributes. The stacking ensemble, with its models, better predicts the classification of organisms in the present study.Este estudio de tiene por objetivo utilizar clasificadores de machine learning para predecir el reino al que pertenece un organismo por la frecuencia de uso de codones de ADN. Para ello se ha tomado 13 028 datos de organismos del GenBank distribuidos en once reinos y se los redujo a seis reinos (arqueas, bacterias, invertebrados, plantas, virus y vertebrados) con 9027 datos reagrupados. El proceso requirió la, depuración de atributos irrelevantes, el empleo de métricas de medición de clasificadores de exactitud, precisión, sensibilidad y puntuación, así como el ajuste de hiperparámetros de los modelos. Los algoritmos de clasificación fueron voting, bagging, boosting y stacking, usando KNN, AD, MLP, SVC y RF. La selección de atributos se hizo con random forest. El ensamble stacking, con sus modelos, predice mejor la clasificación de organismos en el presente estudio.Universidad de Lima2022-07-29info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdftext/htmlhttps://revistas.ulima.edu.pe/index.php/Interfases/article/view/589610.26439/interfases2022.n015.5896Interfases; No. 015 (2022); 131-143Interfases; Núm. 015 (2022); 131-143Interfases; n. 015 (2022); 131-1431993-491210.26439/interfases2022.n015reponame:Revistas - Universidad de Limainstname:Universidad de Limainstacron:ULIMAspahttps://revistas.ulima.edu.pe/index.php/Interfases/article/view/5896/5790https://revistas.ulima.edu.pe/index.php/Interfases/article/view/5896/5798info:eu-repo/semantics/openAccessoai:ojs.pkp.sfu.ca:article/58962023-07-24T13:33:18Z |
| dc.title.none.fl_str_mv |
Classification of Organisms into Kingdoms using DNA Codon Frequency Clasificación de Organismos en Reinos utilizando Frecuencia de Codones de ADN |
| title |
Classification of Organisms into Kingdoms using DNA Codon Frequency |
| spellingShingle |
Classification of Organisms into Kingdoms using DNA Codon Frequency Palma Ttito, Luis Beltrán machine learning Ensembles DNA codon frequency kingdom machine learning ensambles frecuencia de codones ADN reino |
| title_short |
Classification of Organisms into Kingdoms using DNA Codon Frequency |
| title_full |
Classification of Organisms into Kingdoms using DNA Codon Frequency |
| title_fullStr |
Classification of Organisms into Kingdoms using DNA Codon Frequency |
| title_full_unstemmed |
Classification of Organisms into Kingdoms using DNA Codon Frequency |
| title_sort |
Classification of Organisms into Kingdoms using DNA Codon Frequency |
| dc.creator.none.fl_str_mv |
Palma Ttito, Luis Beltrán Palma Ttito, Luis Beltrán |
| author |
Palma Ttito, Luis Beltrán |
| author_facet |
Palma Ttito, Luis Beltrán |
| author_role |
author |
| dc.subject.none.fl_str_mv |
machine learning Ensembles DNA codon frequency kingdom machine learning ensambles frecuencia de codones ADN reino |
| topic |
machine learning Ensembles DNA codon frequency kingdom machine learning ensambles frecuencia de codones ADN reino |
| description |
This study aims to use machine learning classifiers to predict the kingdom to which an organism belongs by the frequency of use of DNA codons. The study used 13,028 data from GenBank organisms distributed in eleven kingdoms and reduced them to six kingdoms (archaea, bacteria, invertebrates, plants, viruses, and vertebrates) with 9,027 regrouped data. The process required cleaning irrelevant attributes, using measurement metrics of accuracy, precision, sensitivity, and score classifiers, and the adjustment of hyperparameters of the models. The classification algorithms were voting, bagging, boosting, and stacking, using KNN, AD, MLP, SVC, and RF. Random forest was used in selecting the attributes. The stacking ensemble, with its models, better predicts the classification of organisms in the present study. |
| publishDate |
2022 |
| dc.date.none.fl_str_mv |
2022-07-29 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.none.fl_str_mv |
https://revistas.ulima.edu.pe/index.php/Interfases/article/view/5896 10.26439/interfases2022.n015.5896 |
| url |
https://revistas.ulima.edu.pe/index.php/Interfases/article/view/5896 |
| identifier_str_mv |
10.26439/interfases2022.n015.5896 |
| dc.language.none.fl_str_mv |
spa |
| language |
spa |
| dc.relation.none.fl_str_mv |
https://revistas.ulima.edu.pe/index.php/Interfases/article/view/5896/5790 https://revistas.ulima.edu.pe/index.php/Interfases/article/view/5896/5798 |
| dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf text/html |
| dc.publisher.none.fl_str_mv |
Universidad de Lima |
| publisher.none.fl_str_mv |
Universidad de Lima |
| dc.source.none.fl_str_mv |
Interfases; No. 015 (2022); 131-143 Interfases; Núm. 015 (2022); 131-143 Interfases; n. 015 (2022); 131-143 1993-4912 10.26439/interfases2022.n015 reponame:Revistas - Universidad de Lima instname:Universidad de Lima instacron:ULIMA |
| instname_str |
Universidad de Lima |
| instacron_str |
ULIMA |
| institution |
ULIMA |
| reponame_str |
Revistas - Universidad de Lima |
| collection |
Revistas - Universidad de Lima |
| repository.name.fl_str_mv |
|
| repository.mail.fl_str_mv |
|
| _version_ |
1849962721376731136 |
| score |
13.090503 |
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).