MPEG-1 psychoacoustic model emulation using multiscale convolutional neural networks
Descripción del Articulo
The Moving Picture Experts Group - 1 (MPEG-1) perceptual audio compression scheme is a successful family of audio codecs described in standard ISO/IEC 11172–3. Currently, there is no general framework to emulate nor MPEG-1 neither any other psychoacoustic model, which is a core piece of many percept...
| Autores: | , , |
|---|---|
| Formato: | artículo |
| Fecha de Publicación: | 2023 |
| Institución: | Universidad Peruana de Ciencias Aplicadas |
| Repositorio: | UPC-Institucional |
| Lenguaje: | inglés |
| OAI Identifier: | oai:repositorioacademico.upc.edu.pe:10757/668741 |
| Enlace del recurso: | http://hdl.handle.net/10757/668741 |
| Nivel de acceso: | acceso embargado |
| Materia: | audio coding MPEG neural networks perceptual coding psychoacoustic model https://purl.org/pe-repo/ocde/ford#2.02.01 |
| id |
UUPC_0fa235aca981cedb49306a76dbd628c1 |
|---|---|
| oai_identifier_str |
oai:repositorioacademico.upc.edu.pe:10757/668741 |
| network_acronym_str |
UUPC |
| network_name_str |
UPC-Institucional |
| repository_id_str |
2670 |
| dc.title.es_PE.fl_str_mv |
MPEG-1 psychoacoustic model emulation using multiscale convolutional neural networks |
| title |
MPEG-1 psychoacoustic model emulation using multiscale convolutional neural networks |
| spellingShingle |
MPEG-1 psychoacoustic model emulation using multiscale convolutional neural networks Kemper, Guillermo audio coding MPEG neural networks perceptual coding psychoacoustic model https://purl.org/pe-repo/ocde/ford#2.02.01 |
| title_short |
MPEG-1 psychoacoustic model emulation using multiscale convolutional neural networks |
| title_full |
MPEG-1 psychoacoustic model emulation using multiscale convolutional neural networks |
| title_fullStr |
MPEG-1 psychoacoustic model emulation using multiscale convolutional neural networks |
| title_full_unstemmed |
MPEG-1 psychoacoustic model emulation using multiscale convolutional neural networks |
| title_sort |
MPEG-1 psychoacoustic model emulation using multiscale convolutional neural networks |
| author |
Kemper, Guillermo |
| author_facet |
Kemper, Guillermo Sanchez, Alonso Serpa, Sergio |
| author_role |
author |
| author2 |
Sanchez, Alonso Serpa, Sergio |
| author2_role |
author author |
| dc.contributor.author.fl_str_mv |
Kemper, Guillermo Sanchez, Alonso Serpa, Sergio |
| dc.subject.es_PE.fl_str_mv |
audio coding MPEG neural networks perceptual coding psychoacoustic model |
| topic |
audio coding MPEG neural networks perceptual coding psychoacoustic model https://purl.org/pe-repo/ocde/ford#2.02.01 |
| dc.subject.ocde.none.fl_str_mv |
https://purl.org/pe-repo/ocde/ford#2.02.01 |
| description |
The Moving Picture Experts Group - 1 (MPEG-1) perceptual audio compression scheme is a successful family of audio codecs described in standard ISO/IEC 11172–3. Currently, there is no general framework to emulate nor MPEG-1 neither any other psychoacoustic model, which is a core piece of many perceptual codecs. This work presents a successful implementation of a convolutional neural network which emulates psychoacoustic model 1 from the MPEG-1 standard, termed “MCNN-PM” (Multiscale Convolutional Neural Network – Psychoacoustic Model). It is then implemented as part of the MPEG-1, Layer I codec. Using the objective difference grade (ODG) to evaluate audio quality, the MCNN-PM MPEG-1, Layer I codec outperforms the original MPEG-1, Layer I codec by up to 17% at 96 kbps, 14% at 128 kbps and performs almost equally at 192 kbps. This work shows that convolutional neural networks are a viable alternative to standard psychoacoustic models and can be used as part of perceptual audio codecs successfully. |
| publishDate |
2023 |
| dc.date.accessioned.none.fl_str_mv |
2023-09-25T15:02:11Z |
| dc.date.available.none.fl_str_mv |
2023-09-25T15:02:11Z |
| dc.date.issued.fl_str_mv |
2023-01-01 |
| dc.type.es_PE.fl_str_mv |
info:eu-repo/semantics/article |
| format |
article |
| dc.identifier.issn.none.fl_str_mv |
13807501 |
| dc.identifier.doi.none.fl_str_mv |
10.1007/s11042-023-15949-y |
| dc.identifier.uri.none.fl_str_mv |
http://hdl.handle.net/10757/668741 |
| dc.identifier.eissn.none.fl_str_mv |
15737721 |
| dc.identifier.journal.es_PE.fl_str_mv |
Multimedia Tools and Applications |
| dc.identifier.eid.none.fl_str_mv |
2-s2.0-85160962748 |
| dc.identifier.scopusid.none.fl_str_mv |
SCOPUS_ID:85160962748 |
| dc.identifier.isni.none.fl_str_mv |
0000 0001 2196 144X |
| identifier_str_mv |
13807501 10.1007/s11042-023-15949-y 15737721 Multimedia Tools and Applications 2-s2.0-85160962748 SCOPUS_ID:85160962748 0000 0001 2196 144X |
| url |
http://hdl.handle.net/10757/668741 |
| dc.language.iso.es_PE.fl_str_mv |
eng |
| language |
eng |
| dc.relation.url.es_PE.fl_str_mv |
https://link.springer.com/article/10.1007/s11042-023-15949-y |
| dc.rights.es_PE.fl_str_mv |
info:eu-repo/semantics/embargoedAccess |
| eu_rights_str_mv |
embargoedAccess |
| dc.format.es_PE.fl_str_mv |
application/html |
| dc.publisher.es_PE.fl_str_mv |
Springer |
| dc.source.none.fl_str_mv |
reponame:UPC-Institucional instname:Universidad Peruana de Ciencias Aplicadas instacron:UPC |
| instname_str |
Universidad Peruana de Ciencias Aplicadas |
| instacron_str |
UPC |
| institution |
UPC |
| reponame_str |
UPC-Institucional |
| collection |
UPC-Institucional |
| dc.source.journaltitle.none.fl_str_mv |
Multimedia Tools and Applications |
| bitstream.url.fl_str_mv |
https://repositorioacademico.upc.edu.pe/bitstream/10757/668741/1/license.txt |
| bitstream.checksum.fl_str_mv |
8a4605be74aa9ea9d79846c1fba20a33 |
| bitstream.checksumAlgorithm.fl_str_mv |
MD5 |
| repository.name.fl_str_mv |
Repositorio Académico UPC |
| repository.mail.fl_str_mv |
upc@openrepository.com |
| _version_ |
1851775229648437248 |
| spelling |
81a224b2a512525985a4d85a3aa8658f500f776858fef9eda0f25482ac9d2dd039c5004163fc45a1d433efa187fc2734064846Kemper, GuillermoSanchez, AlonsoSerpa, Sergio2023-09-25T15:02:11Z2023-09-25T15:02:11Z2023-01-011380750110.1007/s11042-023-15949-yhttp://hdl.handle.net/10757/66874115737721Multimedia Tools and Applications2-s2.0-85160962748SCOPUS_ID:851609627480000 0001 2196 144XThe Moving Picture Experts Group - 1 (MPEG-1) perceptual audio compression scheme is a successful family of audio codecs described in standard ISO/IEC 11172–3. Currently, there is no general framework to emulate nor MPEG-1 neither any other psychoacoustic model, which is a core piece of many perceptual codecs. This work presents a successful implementation of a convolutional neural network which emulates psychoacoustic model 1 from the MPEG-1 standard, termed “MCNN-PM” (Multiscale Convolutional Neural Network – Psychoacoustic Model). It is then implemented as part of the MPEG-1, Layer I codec. Using the objective difference grade (ODG) to evaluate audio quality, the MCNN-PM MPEG-1, Layer I codec outperforms the original MPEG-1, Layer I codec by up to 17% at 96 kbps, 14% at 128 kbps and performs almost equally at 192 kbps. This work shows that convolutional neural networks are a viable alternative to standard psychoacoustic models and can be used as part of perceptual audio codecs successfully.ODS 9: Industria, Innovación e InfraestructuraODS 12: Producción y Consumo ResponsablesODS 4: Educación de Calidadapplication/htmlengSpringerhttps://link.springer.com/article/10.1007/s11042-023-15949-yinfo:eu-repo/semantics/embargoedAccessaudio codingMPEGneural networksperceptual codingpsychoacoustic modelhttps://purl.org/pe-repo/ocde/ford#2.02.01MPEG-1 psychoacoustic model emulation using multiscale convolutional neural networksinfo:eu-repo/semantics/articleMultimedia Tools and Applicationsreponame:UPC-Institucionalinstname:Universidad Peruana de Ciencias Aplicadasinstacron:UPCLICENSElicense.txtlicense.txttext/plain; charset=utf-81748https://repositorioacademico.upc.edu.pe/bitstream/10757/668741/1/license.txt8a4605be74aa9ea9d79846c1fba20a33MD51false10757/668741oai:repositorioacademico.upc.edu.pe:10757/6687412025-10-30 07:41:58.831Repositorio Académico UPCupc@openrepository.comTk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo= |
| score |
13.394499 |
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).