Ideal step size estimation for the multinomial logistic regression
Descripción del Articulo
At the core of deep learning optimization problems reside algorithms such as the Stochastic Gradient Descent (SGD), which employs a subset of the data per iteration to estimate the gradient in order to minimize a cost function. Adaptive algorithms, based on SGD, are well known for being effective in...
| Autor: | |
|---|---|
| Formato: | tesis de maestría |
| Fecha de Publicación: | 2024 |
| Institución: | Pontificia Universidad Católica del Perú |
| Repositorio: | PUCP-Tesis |
| Lenguaje: | inglés |
| OAI Identifier: | oai:tesis.pucp.edu.pe:20.500.12404/29791 |
| Enlace del recurso: | http://hdl.handle.net/20.500.12404/29791 |
| Nivel de acceso: | acceso abierto |
| Materia: | Aprendizaje automático (Inteligencia artificial) Aprendizaje profundo (Aprendizaje automático) Optimización matemática Análisis de regresión https://purl.org/pe-repo/ocde/ford#2.00.00 |
| id |
PUCP_3204342643e33b1e9bf5956ced6d9efd |
|---|---|
| oai_identifier_str |
oai:tesis.pucp.edu.pe:20.500.12404/29791 |
| network_acronym_str |
PUCP |
| network_name_str |
PUCP-Tesis |
| repository_id_str |
. |
| dc.title.none.fl_str_mv |
Ideal step size estimation for the multinomial logistic regression |
| dc.title.en_EN.fl_str_mv |
Ideal step size estimation for the multinomial logistic regression |
| title |
Ideal step size estimation for the multinomial logistic regression |
| spellingShingle |
Ideal step size estimation for the multinomial logistic regression Ramirez Orihuela, Gabriel Aprendizaje automático (Inteligencia artificial) Aprendizaje profundo (Aprendizaje automático) Optimización matemática Análisis de regresión https://purl.org/pe-repo/ocde/ford#2.00.00 |
| title_short |
Ideal step size estimation for the multinomial logistic regression |
| title_full |
Ideal step size estimation for the multinomial logistic regression |
| title_fullStr |
Ideal step size estimation for the multinomial logistic regression |
| title_full_unstemmed |
Ideal step size estimation for the multinomial logistic regression |
| title_sort |
Ideal step size estimation for the multinomial logistic regression |
| author |
Ramirez Orihuela, Gabriel |
| author_facet |
Ramirez Orihuela, Gabriel |
| author_role |
author |
| dc.contributor.advisor.fl_str_mv |
Rodríguez Valderrama, Paul Antonio |
| dc.contributor.author.fl_str_mv |
Ramirez Orihuela, Gabriel |
| dc.subject.none.fl_str_mv |
Aprendizaje automático (Inteligencia artificial) Aprendizaje profundo (Aprendizaje automático) Optimización matemática Análisis de regresión |
| topic |
Aprendizaje automático (Inteligencia artificial) Aprendizaje profundo (Aprendizaje automático) Optimización matemática Análisis de regresión https://purl.org/pe-repo/ocde/ford#2.00.00 |
| dc.subject.ocde.none.fl_str_mv |
https://purl.org/pe-repo/ocde/ford#2.00.00 |
| description |
At the core of deep learning optimization problems reside algorithms such as the Stochastic Gradient Descent (SGD), which employs a subset of the data per iteration to estimate the gradient in order to minimize a cost function. Adaptive algorithms, based on SGD, are well known for being effective in using gradient information from past iterations, generating momentum or memory that enables a more accurate prediction of the true gradient slope in future iterations, thus accelerating convergence. Nevertheless, these algorithms still need an initial (scalar) learning rate (LR) as well as a LR scheduler. In this work we propose a new SGD algorithm that estimates the initial (scalar) LR via an adaptation of the ideal Cauchy step size for the multinomial logistic regression; furthermore, the LR is recursively updated up to a given number of epochs, after which a decaying LR scheduler is used. The proposed method is assessed for several well-known multiclass classification architectures and favorably compares against other well-tuned (scalar and spatially) adaptive alternatives, including the Adam algorithm. |
| publishDate |
2024 |
| dc.date.created.none.fl_str_mv |
2024 |
| dc.date.accessioned.none.fl_str_mv |
2025-01-22T20:44:12Z |
| dc.date.issued.fl_str_mv |
2025-01-22 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| dc.identifier.uri.none.fl_str_mv |
http://hdl.handle.net/20.500.12404/29791 |
| url |
http://hdl.handle.net/20.500.12404/29791 |
| dc.language.iso.none.fl_str_mv |
eng |
| language |
eng |
| dc.relation.ispartof.fl_str_mv |
SUNEDU |
| dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess |
| dc.rights.uri.none.fl_str_mv |
https://creativecommons.org/licenses/by-sa/2.5/pe/ |
| eu_rights_str_mv |
openAccess |
| rights_invalid_str_mv |
https://creativecommons.org/licenses/by-sa/2.5/pe/ |
| dc.publisher.es_ES.fl_str_mv |
Pontificia Universidad Católica del Perú |
| dc.publisher.country.none.fl_str_mv |
PE |
| dc.source.none.fl_str_mv |
reponame:PUCP-Tesis instname:Pontificia Universidad Católica del Perú instacron:PUCP |
| instname_str |
Pontificia Universidad Católica del Perú |
| instacron_str |
PUCP |
| institution |
PUCP |
| reponame_str |
PUCP-Tesis |
| collection |
PUCP-Tesis |
| bitstream.url.fl_str_mv |
https://tesis.pucp.edu.pe/bitstreams/d4e02a30-6cb2-4434-a639-cc7723625093/download https://tesis.pucp.edu.pe/bitstreams/dc5d2346-fd67-4c7e-a773-5b9059fadfc7/download https://tesis.pucp.edu.pe/bitstreams/8dd44a19-1df1-422c-a631-35b6dbe79d3c/download https://tesis.pucp.edu.pe/bitstreams/5c11a02e-f11e-4b30-b720-57a643fef94d/download https://tesis.pucp.edu.pe/bitstreams/b96320d5-6463-4e47-bbd7-8acf62c82a73/download https://tesis.pucp.edu.pe/bitstreams/c2435dd0-4cbb-45c6-bfa2-ab26d0b8a42c/download https://tesis.pucp.edu.pe/bitstreams/3683c1a9-5299-44dc-a479-143434fa425e/download https://tesis.pucp.edu.pe/bitstreams/cd4961e0-19b0-4217-87c4-aa688ee3dd3f/download https://tesis.pucp.edu.pe/bitstreams/4cbef631-8407-4562-9405-a4245055ade9/download https://tesis.pucp.edu.pe/bitstreams/6f90787c-8043-4031-8655-3f205a862f97/download https://tesis.pucp.edu.pe/bitstreams/7b8c683e-18cc-4282-9369-6b88dd858035/download https://tesis.pucp.edu.pe/bitstreams/d318dd4c-18df-4b3b-be46-69900fe3e4aa/download |
| bitstream.checksum.fl_str_mv |
2984aa5f080882c38e2488377770c36c 01238100d1195e8832f496054ef6e468 85e50b88013d0c13f136d8bfc3dd4616 bb9bdc0b3349e4284e09149f943790b4 c7c286929505be80c301967fe66cf6fc 3d56063918c7c2a6b58747fadfa5df99 8037eb65b3fc4b5d336f0fede7bb3100 9b1227e39c770c027bdf40e3a0294ca2 c7c286929505be80c301967fe66cf6fc 3d56063918c7c2a6b58747fadfa5df99 8037eb65b3fc4b5d336f0fede7bb3100 9b1227e39c770c027bdf40e3a0294ca2 |
| bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 MD5 MD5 MD5 MD5 MD5 MD5 MD5 |
| repository.name.fl_str_mv |
Repositorio de Tesis PUCP |
| repository.mail.fl_str_mv |
raul.sifuentes@pucp.pe |
| _version_ |
1834736922378895360 |
| spelling |
Rodríguez Valderrama, Paul AntonioRamirez Orihuela, Gabriel2025-01-22T20:44:12Z20242025-01-22http://hdl.handle.net/20.500.12404/29791At the core of deep learning optimization problems reside algorithms such as the Stochastic Gradient Descent (SGD), which employs a subset of the data per iteration to estimate the gradient in order to minimize a cost function. Adaptive algorithms, based on SGD, are well known for being effective in using gradient information from past iterations, generating momentum or memory that enables a more accurate prediction of the true gradient slope in future iterations, thus accelerating convergence. Nevertheless, these algorithms still need an initial (scalar) learning rate (LR) as well as a LR scheduler. In this work we propose a new SGD algorithm that estimates the initial (scalar) LR via an adaptation of the ideal Cauchy step size for the multinomial logistic regression; furthermore, the LR is recursively updated up to a given number of epochs, after which a decaying LR scheduler is used. The proposed method is assessed for several well-known multiclass classification architectures and favorably compares against other well-tuned (scalar and spatially) adaptive alternatives, including the Adam algorithm.En la base de los problemas de optimización en aprendizaje profundo residen algoritmos como el Gradiente Descendiente Estocástico (SGD, por sus siglas en inglés), el cual emplea un subconjunto de los datos por iteración para estimar el gradiente con el fin de minimizar una función de costo. Los algoritmos adaptativos, basados en el SGD, son ampliamente reconocidos por su efectividad al utilizar la información del gradiente de iteraciones previas, generando un momento o memoria que permite una predicción más precisa de la pendiente real del gradiente en iteraciones futuras, acelerando así la convergencia. No obstante, estos algoritmos aún requieren una tasa de aprendizaje (learning rate o LR) inicial (escalar), así como un programador de LR. En este trabajo proponemos un nuevo algoritmo de SGD que estima la LR inicial (escalar) mediante una adaptación del tamaño de paso ideal de Cauchy para la regresión logística multinomial; además, la LR se actualiza de manera recursiva hasta un número determinado de épocas, tras lo cual se emplea un programador de LR decreciente. El método propuesto se evalúa en varias arquitecturas de clasificación multiclase bien conocidas y se compara favorablemente con otras alternativas adaptativas (escalares y espaciales) bien optimizadas, incluyendo el algoritmo Adam.engPontificia Universidad Católica del PerúPEinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-sa/2.5/pe/Aprendizaje automático (Inteligencia artificial)Aprendizaje profundo (Aprendizaje automático)Optimización matemáticaAnálisis de regresiónhttps://purl.org/pe-repo/ocde/ford#2.00.00Ideal step size estimation for the multinomial logistic regressionIdeal step size estimation for the multinomial logistic regressioninfo:eu-repo/semantics/masterThesisreponame:PUCP-Tesisinstname:Pontificia Universidad Católica del Perúinstacron:PUCPSUNEDUMaestro en Procesamiento de Señales e Imágenes Digitales.MaestríaPontificia Universidad Católica del Perú. Escuela de PosgradoProcesamiento de Señales e Imágenes Digitales07754238https://orcid.org/0000-0002-8501-090770352996613077Silva Obregón, Gustavo ManuelRodríguez Valderrama, Paul AntonioBeltrán Castañón, César Armandohttps://purl.org/pe-repo/renati/level#maestrohttps://purl.org/pe-repo/renati/type#tesisORIGINALRAMIREZ_ORIHUELA_GABRIEL.pdfRAMIREZ_ORIHUELA_GABRIEL.pdfTexto completoapplication/pdf983596https://tesis.pucp.edu.pe/bitstreams/d4e02a30-6cb2-4434-a639-cc7723625093/download2984aa5f080882c38e2488377770c36cMD51trueAnonymousREADRAMIREZ_ORIHUELA_GABRIEL_T.pdfRAMIREZ_ORIHUELA_GABRIEL_T.pdfReporte de originalidadapplication/pdf5321068https://tesis.pucp.edu.pe/bitstreams/dc5d2346-fd67-4c7e-a773-5b9059fadfc7/download01238100d1195e8832f496054ef6e468MD52falseAdministratorREADCC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-81160https://tesis.pucp.edu.pe/bitstreams/8dd44a19-1df1-422c-a631-35b6dbe79d3c/download85e50b88013d0c13f136d8bfc3dd4616MD53falseAnonymousREADLICENSElicense.txtlicense.txttext/plain; charset=utf-81748https://tesis.pucp.edu.pe/bitstreams/5c11a02e-f11e-4b30-b720-57a643fef94d/downloadbb9bdc0b3349e4284e09149f943790b4MD54falseAnonymousREADTEXTRAMIREZ_ORIHUELA_GABRIEL.pdf.txtRAMIREZ_ORIHUELA_GABRIEL.pdf.txtExtracted texttext/plain57668https://tesis.pucp.edu.pe/bitstreams/b96320d5-6463-4e47-bbd7-8acf62c82a73/downloadc7c286929505be80c301967fe66cf6fcMD55falseAnonymousREADRAMIREZ_ORIHUELA_GABRIEL_T.pdf.txtRAMIREZ_ORIHUELA_GABRIEL_T.pdf.txtExtracted texttext/plain6455https://tesis.pucp.edu.pe/bitstreams/c2435dd0-4cbb-45c6-bfa2-ab26d0b8a42c/download3d56063918c7c2a6b58747fadfa5df99MD57falseAdministratorREADTHUMBNAILRAMIREZ_ORIHUELA_GABRIEL.pdf.jpgRAMIREZ_ORIHUELA_GABRIEL.pdf.jpgGenerated Thumbnailimage/jpeg10339https://tesis.pucp.edu.pe/bitstreams/3683c1a9-5299-44dc-a479-143434fa425e/download8037eb65b3fc4b5d336f0fede7bb3100MD56falseAnonymousREADRAMIREZ_ORIHUELA_GABRIEL_T.pdf.jpgRAMIREZ_ORIHUELA_GABRIEL_T.pdf.jpgGenerated Thumbnailimage/jpeg8666https://tesis.pucp.edu.pe/bitstreams/cd4961e0-19b0-4217-87c4-aa688ee3dd3f/download9b1227e39c770c027bdf40e3a0294ca2MD58falseAdministratorREADTEXTRAMIREZ_ORIHUELA_GABRIEL.pdf.txtRAMIREZ_ORIHUELA_GABRIEL.pdf.txtExtracted texttext/plain57668https://tesis.pucp.edu.pe/bitstreams/4cbef631-8407-4562-9405-a4245055ade9/downloadc7c286929505be80c301967fe66cf6fcMD55falseAnonymousREADRAMIREZ_ORIHUELA_GABRIEL_T.pdf.txtRAMIREZ_ORIHUELA_GABRIEL_T.pdf.txtExtracted texttext/plain6455https://tesis.pucp.edu.pe/bitstreams/6f90787c-8043-4031-8655-3f205a862f97/download3d56063918c7c2a6b58747fadfa5df99MD57falseAdministratorREADTHUMBNAILRAMIREZ_ORIHUELA_GABRIEL.pdf.jpgRAMIREZ_ORIHUELA_GABRIEL.pdf.jpgGenerated Thumbnailimage/jpeg10339https://tesis.pucp.edu.pe/bitstreams/7b8c683e-18cc-4282-9369-6b88dd858035/download8037eb65b3fc4b5d336f0fede7bb3100MD56falseAnonymousREADRAMIREZ_ORIHUELA_GABRIEL_T.pdf.jpgRAMIREZ_ORIHUELA_GABRIEL_T.pdf.jpgGenerated Thumbnailimage/jpeg8666https://tesis.pucp.edu.pe/bitstreams/d318dd4c-18df-4b3b-be46-69900fe3e4aa/download9b1227e39c770c027bdf40e3a0294ca2MD58falseAdministratorREAD20.500.12404/29791oai:tesis.pucp.edu.pe:20.500.12404/297912025-04-22 11:57:12.52https://creativecommons.org/licenses/by-sa/2.5/pe/info:eu-repo/semantics/openAccessopen.accesshttps://tesis.pucp.edu.peRepositorio de Tesis PUCPraul.sifuentes@pucp.peTk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0IG93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLCB0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZyB0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sIGluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlIHN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yIHB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZSB0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQgdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uIGFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LCB5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZSBjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdCBzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkIHdpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRCBCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUgRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSCBDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZSBzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMgbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo= |
| score |
13.936249 |
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).