1
tesis doctoral
Publicado 2023
Enlace
Enlace
El cáncer de seno, es una de las enfermedades, que aproximadamente genera 2.26 millones de muertes a nivel mundial anualmente, según la Organización Mundial de la Salud. El diagnóstico de la enfermedad, en etapas iniciales es importante, para permitir un tratamiento que elimine y/o alivie las consecuencias del mismo. Proveer de diversas técnicas para la detección del cáncer de seno, dará mayores opciones a los pacientes para el diagnóstico, y permitirá la disminución de costos. Por ello, es necesario conocer, ¿qué ensambles heterogéneos de aprendizaje automático, tiene mejor predicción de cáncer de seno, a partir de datos de expresiones géneticas de microarray?. En la presente investigación, se diseñó e implemento, cuatro ensambles de algoritmos heterogéneos: voting, bagging, boosting y stacking, los cuales fueron entrenados con un dataset de 4113 muestras miARN, ca...
2
artículo
This study aims to use machine learning classifiers to predict the kingdom to which an organism belongs by the frequency of use of DNA codons. The study used 13,028 data from GenBank organisms distributed in eleven kingdoms and reduced them to six kingdoms (archaea, bacteria, invertebrates, plants, viruses, and vertebrates) with 9,027 regrouped data. The process required cleaning irrelevant attributes, using measurement metrics of accuracy, precision, sensitivity, and score classifiers, and the adjustment of hyperparameters of the models. The classification algorithms were voting, bagging, boosting, and stacking, using KNN, AD, MLP, SVC, and RF. Random forest was used in selecting the attributes. The stacking ensemble, with its models, better predicts the classification of organisms in the present study.
3
artículo
This study aims to use machine learning classifiers to predict the kingdom to which an organism belongs by the frequency of use of DNA codons. The study used 13,028 data from GenBank organisms distributed in eleven kingdoms and reduced them to six kingdoms (archaea, bacteria, invertebrates, plants, viruses, and vertebrates) with 9,027 regrouped data. The process required cleaning irrelevant attributes, using measurement metrics of accuracy, precision, sensitivity, and score classifiers, and the adjustment of hyperparameters of the models. The classification algorithms were voting, bagging, boosting, and stacking, using KNN, AD, MLP, SVC, and RF. Random forest was used in selecting the attributes. The stacking ensemble, with its models, better predicts the classification of organisms in the present study.