Ideal step size estimation for the multinomial logistic regression

Descripción del Articulo

At the core of deep learning optimization problems reside algorithms such as the Stochastic Gradient Descent (SGD), which employs a subset of the data per iteration to estimate the gradient in order to minimize a cost function. Adaptive algorithms, based on SGD, are well known for being effective in...

Descripción completa

Detalles Bibliográficos
Autor: Ramirez Orihuela, Gabriel
Formato: tesis de maestría
Fecha de Publicación:2024
Institución:Pontificia Universidad Católica del Perú
Repositorio:PUCP-Tesis
Lenguaje:inglés
OAI Identifier:oai:tesis.pucp.edu.pe:20.500.12404/29791
Enlace del recurso:http://hdl.handle.net/20.500.12404/29791
Nivel de acceso:acceso abierto
Materia:Aprendizaje automático (Inteligencia artificial)
Aprendizaje profundo (Aprendizaje automático)
Optimización matemática
Análisis de regresión
https://purl.org/pe-repo/ocde/ford#2.00.00
id PUCP_3204342643e33b1e9bf5956ced6d9efd
oai_identifier_str oai:tesis.pucp.edu.pe:20.500.12404/29791
network_acronym_str PUCP
network_name_str PUCP-Tesis
repository_id_str .
dc.title.none.fl_str_mv Ideal step size estimation for the multinomial logistic regression
dc.title.en_EN.fl_str_mv Ideal step size estimation for the multinomial logistic regression
title Ideal step size estimation for the multinomial logistic regression
spellingShingle Ideal step size estimation for the multinomial logistic regression
Ramirez Orihuela, Gabriel
Aprendizaje automático (Inteligencia artificial)
Aprendizaje profundo (Aprendizaje automático)
Optimización matemática
Análisis de regresión
https://purl.org/pe-repo/ocde/ford#2.00.00
title_short Ideal step size estimation for the multinomial logistic regression
title_full Ideal step size estimation for the multinomial logistic regression
title_fullStr Ideal step size estimation for the multinomial logistic regression
title_full_unstemmed Ideal step size estimation for the multinomial logistic regression
title_sort Ideal step size estimation for the multinomial logistic regression
author Ramirez Orihuela, Gabriel
author_facet Ramirez Orihuela, Gabriel
author_role author
dc.contributor.advisor.fl_str_mv Rodríguez Valderrama, Paul Antonio
dc.contributor.author.fl_str_mv Ramirez Orihuela, Gabriel
dc.subject.none.fl_str_mv Aprendizaje automático (Inteligencia artificial)
Aprendizaje profundo (Aprendizaje automático)
Optimización matemática
Análisis de regresión
topic Aprendizaje automático (Inteligencia artificial)
Aprendizaje profundo (Aprendizaje automático)
Optimización matemática
Análisis de regresión
https://purl.org/pe-repo/ocde/ford#2.00.00
dc.subject.ocde.none.fl_str_mv https://purl.org/pe-repo/ocde/ford#2.00.00
description At the core of deep learning optimization problems reside algorithms such as the Stochastic Gradient Descent (SGD), which employs a subset of the data per iteration to estimate the gradient in order to minimize a cost function. Adaptive algorithms, based on SGD, are well known for being effective in using gradient information from past iterations, generating momentum or memory that enables a more accurate prediction of the true gradient slope in future iterations, thus accelerating convergence. Nevertheless, these algorithms still need an initial (scalar) learning rate (LR) as well as a LR scheduler. In this work we propose a new SGD algorithm that estimates the initial (scalar) LR via an adaptation of the ideal Cauchy step size for the multinomial logistic regression; furthermore, the LR is recursively updated up to a given number of epochs, after which a decaying LR scheduler is used. The proposed method is assessed for several well-known multiclass classification architectures and favorably compares against other well-tuned (scalar and spatially) adaptive alternatives, including the Adam algorithm.
publishDate 2024
dc.date.created.none.fl_str_mv 2024
dc.date.accessioned.none.fl_str_mv 2025-01-22T20:44:12Z
dc.date.issued.fl_str_mv 2025-01-22
dc.type.none.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
dc.identifier.uri.none.fl_str_mv http://hdl.handle.net/20.500.12404/29791
url http://hdl.handle.net/20.500.12404/29791
dc.language.iso.none.fl_str_mv eng
language eng
dc.relation.ispartof.fl_str_mv SUNEDU
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
dc.rights.uri.none.fl_str_mv https://creativecommons.org/licenses/by-sa/2.5/pe/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-sa/2.5/pe/
dc.publisher.es_ES.fl_str_mv Pontificia Universidad Católica del Perú
dc.publisher.country.none.fl_str_mv PE
dc.source.none.fl_str_mv reponame:PUCP-Tesis
instname:Pontificia Universidad Católica del Perú
instacron:PUCP
instname_str Pontificia Universidad Católica del Perú
instacron_str PUCP
institution PUCP
reponame_str PUCP-Tesis
collection PUCP-Tesis
bitstream.url.fl_str_mv https://tesis.pucp.edu.pe/bitstreams/d4e02a30-6cb2-4434-a639-cc7723625093/download
https://tesis.pucp.edu.pe/bitstreams/dc5d2346-fd67-4c7e-a773-5b9059fadfc7/download
https://tesis.pucp.edu.pe/bitstreams/8dd44a19-1df1-422c-a631-35b6dbe79d3c/download
https://tesis.pucp.edu.pe/bitstreams/5c11a02e-f11e-4b30-b720-57a643fef94d/download
https://tesis.pucp.edu.pe/bitstreams/b96320d5-6463-4e47-bbd7-8acf62c82a73/download
https://tesis.pucp.edu.pe/bitstreams/c2435dd0-4cbb-45c6-bfa2-ab26d0b8a42c/download
https://tesis.pucp.edu.pe/bitstreams/3683c1a9-5299-44dc-a479-143434fa425e/download
https://tesis.pucp.edu.pe/bitstreams/cd4961e0-19b0-4217-87c4-aa688ee3dd3f/download
https://tesis.pucp.edu.pe/bitstreams/4cbef631-8407-4562-9405-a4245055ade9/download
https://tesis.pucp.edu.pe/bitstreams/6f90787c-8043-4031-8655-3f205a862f97/download
https://tesis.pucp.edu.pe/bitstreams/7b8c683e-18cc-4282-9369-6b88dd858035/download
https://tesis.pucp.edu.pe/bitstreams/d318dd4c-18df-4b3b-be46-69900fe3e4aa/download
bitstream.checksum.fl_str_mv 2984aa5f080882c38e2488377770c36c
01238100d1195e8832f496054ef6e468
85e50b88013d0c13f136d8bfc3dd4616
bb9bdc0b3349e4284e09149f943790b4
c7c286929505be80c301967fe66cf6fc
3d56063918c7c2a6b58747fadfa5df99
8037eb65b3fc4b5d336f0fede7bb3100
9b1227e39c770c027bdf40e3a0294ca2
c7c286929505be80c301967fe66cf6fc
3d56063918c7c2a6b58747fadfa5df99
8037eb65b3fc4b5d336f0fede7bb3100
9b1227e39c770c027bdf40e3a0294ca2
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
MD5
MD5
MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositorio de Tesis PUCP
repository.mail.fl_str_mv raul.sifuentes@pucp.pe
_version_ 1834736922378895360
spelling Rodríguez Valderrama, Paul AntonioRamirez Orihuela, Gabriel2025-01-22T20:44:12Z20242025-01-22http://hdl.handle.net/20.500.12404/29791At the core of deep learning optimization problems reside algorithms such as the Stochastic Gradient Descent (SGD), which employs a subset of the data per iteration to estimate the gradient in order to minimize a cost function. Adaptive algorithms, based on SGD, are well known for being effective in using gradient information from past iterations, generating momentum or memory that enables a more accurate prediction of the true gradient slope in future iterations, thus accelerating convergence. Nevertheless, these algorithms still need an initial (scalar) learning rate (LR) as well as a LR scheduler. In this work we propose a new SGD algorithm that estimates the initial (scalar) LR via an adaptation of the ideal Cauchy step size for the multinomial logistic regression; furthermore, the LR is recursively updated up to a given number of epochs, after which a decaying LR scheduler is used. The proposed method is assessed for several well-known multiclass classification architectures and favorably compares against other well-tuned (scalar and spatially) adaptive alternatives, including the Adam algorithm.En la base de los problemas de optimización en aprendizaje profundo residen algoritmos como el Gradiente Descendiente Estocástico (SGD, por sus siglas en inglés), el cual emplea un subconjunto de los datos por iteración para estimar el gradiente con el fin de minimizar una función de costo. Los algoritmos adaptativos, basados en el SGD, son ampliamente reconocidos por su efectividad al utilizar la información del gradiente de iteraciones previas, generando un momento o memoria que permite una predicción más precisa de la pendiente real del gradiente en iteraciones futuras, acelerando así la convergencia. No obstante, estos algoritmos aún requieren una tasa de aprendizaje (learning rate o LR) inicial (escalar), así como un programador de LR. En este trabajo proponemos un nuevo algoritmo de SGD que estima la LR inicial (escalar) mediante una adaptación del tamaño de paso ideal de Cauchy para la regresión logística multinomial; además, la LR se actualiza de manera recursiva hasta un número determinado de épocas, tras lo cual se emplea un programador de LR decreciente. El método propuesto se evalúa en varias arquitecturas de clasificación multiclase bien conocidas y se compara favorablemente con otras alternativas adaptativas (escalares y espaciales) bien optimizadas, incluyendo el algoritmo Adam.engPontificia Universidad Católica del PerúPEinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-sa/2.5/pe/Aprendizaje automático (Inteligencia artificial)Aprendizaje profundo (Aprendizaje automático)Optimización matemáticaAnálisis de regresiónhttps://purl.org/pe-repo/ocde/ford#2.00.00Ideal step size estimation for the multinomial logistic regressionIdeal step size estimation for the multinomial logistic regressioninfo:eu-repo/semantics/masterThesisreponame:PUCP-Tesisinstname:Pontificia Universidad Católica del Perúinstacron:PUCPSUNEDUMaestro en Procesamiento de Señales e Imágenes Digitales.MaestríaPontificia Universidad Católica del Perú. Escuela de PosgradoProcesamiento de Señales e Imágenes Digitales07754238https://orcid.org/0000-0002-8501-090770352996613077Silva Obregón, Gustavo ManuelRodríguez Valderrama, Paul AntonioBeltrán Castañón, César Armandohttps://purl.org/pe-repo/renati/level#maestrohttps://purl.org/pe-repo/renati/type#tesisORIGINALRAMIREZ_ORIHUELA_GABRIEL.pdfRAMIREZ_ORIHUELA_GABRIEL.pdfTexto completoapplication/pdf983596https://tesis.pucp.edu.pe/bitstreams/d4e02a30-6cb2-4434-a639-cc7723625093/download2984aa5f080882c38e2488377770c36cMD51trueAnonymousREADRAMIREZ_ORIHUELA_GABRIEL_T.pdfRAMIREZ_ORIHUELA_GABRIEL_T.pdfReporte de originalidadapplication/pdf5321068https://tesis.pucp.edu.pe/bitstreams/dc5d2346-fd67-4c7e-a773-5b9059fadfc7/download01238100d1195e8832f496054ef6e468MD52falseAdministratorREADCC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-81160https://tesis.pucp.edu.pe/bitstreams/8dd44a19-1df1-422c-a631-35b6dbe79d3c/download85e50b88013d0c13f136d8bfc3dd4616MD53falseAnonymousREADLICENSElicense.txtlicense.txttext/plain; charset=utf-81748https://tesis.pucp.edu.pe/bitstreams/5c11a02e-f11e-4b30-b720-57a643fef94d/downloadbb9bdc0b3349e4284e09149f943790b4MD54falseAnonymousREADTEXTRAMIREZ_ORIHUELA_GABRIEL.pdf.txtRAMIREZ_ORIHUELA_GABRIEL.pdf.txtExtracted texttext/plain57668https://tesis.pucp.edu.pe/bitstreams/b96320d5-6463-4e47-bbd7-8acf62c82a73/downloadc7c286929505be80c301967fe66cf6fcMD55falseAnonymousREADRAMIREZ_ORIHUELA_GABRIEL_T.pdf.txtRAMIREZ_ORIHUELA_GABRIEL_T.pdf.txtExtracted texttext/plain6455https://tesis.pucp.edu.pe/bitstreams/c2435dd0-4cbb-45c6-bfa2-ab26d0b8a42c/download3d56063918c7c2a6b58747fadfa5df99MD57falseAdministratorREADTHUMBNAILRAMIREZ_ORIHUELA_GABRIEL.pdf.jpgRAMIREZ_ORIHUELA_GABRIEL.pdf.jpgGenerated Thumbnailimage/jpeg10339https://tesis.pucp.edu.pe/bitstreams/3683c1a9-5299-44dc-a479-143434fa425e/download8037eb65b3fc4b5d336f0fede7bb3100MD56falseAnonymousREADRAMIREZ_ORIHUELA_GABRIEL_T.pdf.jpgRAMIREZ_ORIHUELA_GABRIEL_T.pdf.jpgGenerated Thumbnailimage/jpeg8666https://tesis.pucp.edu.pe/bitstreams/cd4961e0-19b0-4217-87c4-aa688ee3dd3f/download9b1227e39c770c027bdf40e3a0294ca2MD58falseAdministratorREADTEXTRAMIREZ_ORIHUELA_GABRIEL.pdf.txtRAMIREZ_ORIHUELA_GABRIEL.pdf.txtExtracted texttext/plain57668https://tesis.pucp.edu.pe/bitstreams/4cbef631-8407-4562-9405-a4245055ade9/downloadc7c286929505be80c301967fe66cf6fcMD55falseAnonymousREADRAMIREZ_ORIHUELA_GABRIEL_T.pdf.txtRAMIREZ_ORIHUELA_GABRIEL_T.pdf.txtExtracted texttext/plain6455https://tesis.pucp.edu.pe/bitstreams/6f90787c-8043-4031-8655-3f205a862f97/download3d56063918c7c2a6b58747fadfa5df99MD57falseAdministratorREADTHUMBNAILRAMIREZ_ORIHUELA_GABRIEL.pdf.jpgRAMIREZ_ORIHUELA_GABRIEL.pdf.jpgGenerated Thumbnailimage/jpeg10339https://tesis.pucp.edu.pe/bitstreams/7b8c683e-18cc-4282-9369-6b88dd858035/download8037eb65b3fc4b5d336f0fede7bb3100MD56falseAnonymousREADRAMIREZ_ORIHUELA_GABRIEL_T.pdf.jpgRAMIREZ_ORIHUELA_GABRIEL_T.pdf.jpgGenerated Thumbnailimage/jpeg8666https://tesis.pucp.edu.pe/bitstreams/d318dd4c-18df-4b3b-be46-69900fe3e4aa/download9b1227e39c770c027bdf40e3a0294ca2MD58falseAdministratorREAD20.500.12404/29791oai:tesis.pucp.edu.pe:20.500.12404/297912025-04-22 11:57:12.52https://creativecommons.org/licenses/by-sa/2.5/pe/info:eu-repo/semantics/openAccessopen.accesshttps://tesis.pucp.edu.peRepositorio de Tesis PUCPraul.sifuentes@pucp.peTk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0IG93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLCB0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZyB0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sIGluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlIHN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yIHB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZSB0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQgdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uIGFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LCB5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZSBjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdCBzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkIHdpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRCBCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUgRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSCBDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZSBzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMgbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=
score 13.936249
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).