On Semantic Solutions for Efficient Approximate Similarity Search on Large-Scale Datasets
Descripción del Articulo
Approximate similarity search algorithms based on hashing were proposed to query high-dimensional datasets due to its fast retrieval speed and low storage cost. Recent studies, promote the use of Convolutional Neural Network (CNN) with hashing techniques to improve the search accuracy. However, ther...
| Autores: | , , |
|---|---|
| Formato: | artículo |
| Fecha de Publicación: | 2018 |
| Institución: | Universidad La Salle |
| Repositorio: | ULASALLE-Institucional |
| Lenguaje: | inglés |
| OAI Identifier: | oai:repositorio.ulasalle.edu.pe:20.500.12953/30 |
| Enlace del recurso: | http://repositorio.ulasalle.edu.pe/handle/20.500.12953/30 https://doi.org/10.1007/978-3-319-75193-1 |
| Nivel de acceso: | acceso restringido |
| Materia: | Research Subject Categories::TECHNOLOGY |
| id |
ULSA_b85651bec2f318c8222bb02d5e69a074 |
|---|---|
| oai_identifier_str |
oai:repositorio.ulasalle.edu.pe:20.500.12953/30 |
| network_acronym_str |
ULSA |
| network_name_str |
ULASALLE-Institucional |
| repository_id_str |
3920 |
| dc.title.es_ES.fl_str_mv |
On Semantic Solutions for Efficient Approximate Similarity Search on Large-Scale Datasets |
| title |
On Semantic Solutions for Efficient Approximate Similarity Search on Large-Scale Datasets |
| spellingShingle |
On Semantic Solutions for Efficient Approximate Similarity Search on Large-Scale Datasets Ocsa, Alexander Research Subject Categories::TECHNOLOGY Research Subject Categories::TECHNOLOGY |
| title_short |
On Semantic Solutions for Efficient Approximate Similarity Search on Large-Scale Datasets |
| title_full |
On Semantic Solutions for Efficient Approximate Similarity Search on Large-Scale Datasets |
| title_fullStr |
On Semantic Solutions for Efficient Approximate Similarity Search on Large-Scale Datasets |
| title_full_unstemmed |
On Semantic Solutions for Efficient Approximate Similarity Search on Large-Scale Datasets |
| title_sort |
On Semantic Solutions for Efficient Approximate Similarity Search on Large-Scale Datasets |
| author |
Ocsa, Alexander |
| author_facet |
Ocsa, Alexander Huillca, Jose Luis López Del Alamo, Cristian |
| author_role |
author |
| author2 |
Huillca, Jose Luis López Del Alamo, Cristian |
| author2_role |
author author |
| dc.contributor.author.fl_str_mv |
Ocsa, Alexander Huillca, Jose Luis López Del Alamo, Cristian |
| dc.subject.es_ES.fl_str_mv |
Research Subject Categories::TECHNOLOGY |
| topic |
Research Subject Categories::TECHNOLOGY Research Subject Categories::TECHNOLOGY |
| dc.subject.ocde.es_ES.fl_str_mv |
Research Subject Categories::TECHNOLOGY |
| description |
Approximate similarity search algorithms based on hashing were proposed to query high-dimensional datasets due to its fast retrieval speed and low storage cost. Recent studies, promote the use of Convolutional Neural Network (CNN) with hashing techniques to improve the search accuracy. However, there are challenges to solve in order to find a practical and efficient solution to index CNN features, such as the need for heavy training process to achieve accurate query results and the critical dependency on data-parameters. Aiming to overcome these issues, we propose a new method for scalable similarity search, i.e., Deep frActal based Hashing (DAsH), by computing the best data-parameters values for optimal sub-space projection exploring the correlations among CNN features attributes using fractal theory. Moreover, inspired by recent advances in CNNs, we use not only activations of lower layers which are more general-purpose but also previous knowledge of the semantic data on the latest CNN layer to improve the search accuracy. Thus, our method produces a better representation of the data space with a less computational cost for a better accuracy. This significant gain in speed and accuracy allows us to evaluate the framework on a large, realistic, and challenging set of datasets. |
| publishDate |
2018 |
| dc.date.accessioned.none.fl_str_mv |
2018-11-21T17:14:44Z |
| dc.date.available.none.fl_str_mv |
2018-11-21T17:14:44Z |
| dc.date.issued.fl_str_mv |
2018-07-04 |
| dc.type.es_ES.fl_str_mv |
info:eu-repo/semantics/article |
| format |
article |
| dc.identifier.citation.es_ES.fl_str_mv |
Ocsa A., Huillca J.L., Lopez del Alamo C. (2018) On Semantic Solutions for Efficient Approximate Similarity Search on Large-Scale Datasets. In: Mendoza M., Velastín S. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2017. Lecture Notes in Computer Science, vol 10657. Springer, Cham |
| dc.identifier.isbn.none.fl_str_mv |
978-3-319-75193-1 |
| dc.identifier.uri.none.fl_str_mv |
http://repositorio.ulasalle.edu.pe/handle/20.500.12953/30 |
| dc.identifier.journal.es_ES.fl_str_mv |
Iberoamerican Congress on Pattern Recognition |
| dc.identifier.doi.es_ES.fl_str_mv |
https://doi.org/10.1007/978-3-319-75193-1 |
| identifier_str_mv |
Ocsa A., Huillca J.L., Lopez del Alamo C. (2018) On Semantic Solutions for Efficient Approximate Similarity Search on Large-Scale Datasets. In: Mendoza M., Velastín S. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2017. Lecture Notes in Computer Science, vol 10657. Springer, Cham 978-3-319-75193-1 Iberoamerican Congress on Pattern Recognition |
| url |
http://repositorio.ulasalle.edu.pe/handle/20.500.12953/30 https://doi.org/10.1007/978-3-319-75193-1 |
| dc.language.iso.eng_US.fl_str_mv |
eng |
| language |
eng |
| dc.rights.es_ES.fl_str_mv |
info:eu-repo/semantics/restrictedAccess |
| eu_rights_str_mv |
restrictedAccess |
| dc.publisher.es_ES.fl_str_mv |
Universidad La Salle |
| dc.source.es_ES.fl_str_mv |
Universidad La Salle Repositorio institucional - ULASALLE |
| dc.source.none.fl_str_mv |
reponame:ULASALLE-Institucional instname:Universidad La Salle instacron:ULASALLE |
| instname_str |
Universidad La Salle |
| instacron_str |
ULASALLE |
| institution |
ULASALLE |
| reponame_str |
ULASALLE-Institucional |
| collection |
ULASALLE-Institucional |
| bitstream.url.fl_str_mv |
http://repositorio.ulasalle.edu.pe/bitstream/20.500.12953/30/1/link_articulo.txt http://repositorio.ulasalle.edu.pe/bitstream/20.500.12953/30/2/license.txt http://repositorio.ulasalle.edu.pe/bitstream/20.500.12953/30/3/link_articulo.txt.txt |
| bitstream.checksum.fl_str_mv |
0db83502828a9ee71f838dabf78ef098 8a4605be74aa9ea9d79846c1fba20a33 b5390a0d10c3af67678d607f261ad5ad |
| bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
| repository.name.fl_str_mv |
Repositorio Institucional de la Universidad La Salle |
| repository.mail.fl_str_mv |
repositorio@ulasalle.edu.pe |
| _version_ |
1764532734532780032 |
| spelling |
Ocsa, AlexanderHuillca, Jose LuisLópez Del Alamo, Cristian2018-11-21T17:14:44Z2018-11-21T17:14:44Z2018-07-04Ocsa A., Huillca J.L., Lopez del Alamo C. (2018) On Semantic Solutions for Efficient Approximate Similarity Search on Large-Scale Datasets. In: Mendoza M., Velastín S. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2017. Lecture Notes in Computer Science, vol 10657. Springer, Cham978-3-319-75193-1http://repositorio.ulasalle.edu.pe/handle/20.500.12953/30Iberoamerican Congress on Pattern Recognitionhttps://doi.org/10.1007/978-3-319-75193-1Approximate similarity search algorithms based on hashing were proposed to query high-dimensional datasets due to its fast retrieval speed and low storage cost. Recent studies, promote the use of Convolutional Neural Network (CNN) with hashing techniques to improve the search accuracy. However, there are challenges to solve in order to find a practical and efficient solution to index CNN features, such as the need for heavy training process to achieve accurate query results and the critical dependency on data-parameters. Aiming to overcome these issues, we propose a new method for scalable similarity search, i.e., Deep frActal based Hashing (DAsH), by computing the best data-parameters values for optimal sub-space projection exploring the correlations among CNN features attributes using fractal theory. Moreover, inspired by recent advances in CNNs, we use not only activations of lower layers which are more general-purpose but also previous knowledge of the semantic data on the latest CNN layer to improve the search accuracy. Thus, our method produces a better representation of the data space with a less computational cost for a better accuracy. This significant gain in speed and accuracy allows us to evaluate the framework on a large, realistic, and challenging set of datasets.Trabajo de investigaciónDoble ciegoengUniversidad La Salleinfo:eu-repo/semantics/restrictedAccessUniversidad La SalleRepositorio institucional - ULASALLEreponame:ULASALLE-Institucionalinstname:Universidad La Salleinstacron:ULASALLEResearch Subject Categories::TECHNOLOGYResearch Subject Categories::TECHNOLOGYOn Semantic Solutions for Efficient Approximate Similarity Search on Large-Scale Datasetsinfo:eu-repo/semantics/articleORIGINALlink_articulo.txtlink_articulo.txttext/plain43http://repositorio.ulasalle.edu.pe/bitstream/20.500.12953/30/1/link_articulo.txt0db83502828a9ee71f838dabf78ef098MD51LICENSElicense.txtlicense.txttext/plain; charset=utf-81748http://repositorio.ulasalle.edu.pe/bitstream/20.500.12953/30/2/license.txt8a4605be74aa9ea9d79846c1fba20a33MD52TEXTlink_articulo.txt.txtlink_articulo.txt.txtExtracted texttext/plain43http://repositorio.ulasalle.edu.pe/bitstream/20.500.12953/30/3/link_articulo.txt.txtb5390a0d10c3af67678d607f261ad5adMD5320.500.12953/30oai:repositorio.ulasalle.edu.pe:20.500.12953/302021-06-11 14:39:34.116Repositorio Institucional de la Universidad La Sallerepositorio@ulasalle.edu.peTk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo= |
| score |
13.945474 |
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).