Generación automática de resúmenes abstractivos mono documento utilizando análisis semántico y del discurso

Valderrama Vilca, Gregory Cesar

Generación automática de resúmenes abstractivos mono documento utilizando análisis semántico y del discurso

Descripción del Articulo

The web is a giant resource of data and information about security, health, education, and others, matters that have great utility for people, but to get a synthesis or abstract about one or many documents is an expensive labor, which with manual process might be impossible due to the huge amount of...

Descripción completa

Detalles Bibliográficos
Autor:	Valderrama Vilca, Gregory Cesar
Formato:	tesis de maestría
Fecha de Publicación:	2017
Institución:	Pontificia Universidad Católica del Perú
Repositorio:	PUCP-Tesis
Lenguaje:	inglés
OAI Identifier:	oai:tesis.pucp.edu.pe:20.500.12404/9361
Enlace del recurso:	http://hdl.handle.net/20.500.12404/9361
Nivel de acceso:	acceso abierto
Materia:	Computación semántica Resúmenes Semántica https://purl.org/pe-repo/ocde/ford#1.02.00

id	PUCP_c8886a2d0840bab4755487da0220243d
oai_identifier_str	oai:tesis.pucp.edu.pe:20.500.12404/9361
network_acronym_str	PUCP
network_name_str	PUCP-Tesis
repository_id_str	.
dc.title.es_ES.fl_str_mv	Generación automática de resúmenes abstractivos mono documento utilizando análisis semántico y del discurso
title	Generación automática de resúmenes abstractivos mono documento utilizando análisis semántico y del discurso
spellingShingle	Generación automática de resúmenes abstractivos mono documento utilizando análisis semántico y del discurso Valderrama Vilca, Gregory Cesar Computación semántica Resúmenes Semántica https://purl.org/pe-repo/ocde/ford#1.02.00
title_short	Generación automática de resúmenes abstractivos mono documento utilizando análisis semántico y del discurso
title_full	Generación automática de resúmenes abstractivos mono documento utilizando análisis semántico y del discurso
title_fullStr	Generación automática de resúmenes abstractivos mono documento utilizando análisis semántico y del discurso
title_full_unstemmed	Generación automática de resúmenes abstractivos mono documento utilizando análisis semántico y del discurso
title_sort	Generación automática de resúmenes abstractivos mono documento utilizando análisis semántico y del discurso
author	Valderrama Vilca, Gregory Cesar
author_facet	Valderrama Vilca, Gregory Cesar
author_role	author
dc.contributor.advisor.fl_str_mv	Sobrevilla Cabezudo, Marco Antonio
dc.contributor.author.fl_str_mv	Valderrama Vilca, Gregory Cesar
dc.subject.es_ES.fl_str_mv	Computación semántica Resúmenes Semántica
topic	Computación semántica Resúmenes Semántica https://purl.org/pe-repo/ocde/ford#1.02.00
dc.subject.ocde.es_ES.fl_str_mv	https://purl.org/pe-repo/ocde/ford#1.02.00
description	The web is a giant resource of data and information about security, health, education, and others, matters that have great utility for people, but to get a synthesis or abstract about one or many documents is an expensive labor, which with manual process might be impossible due to the huge amount of data. Abstract generation is a challenging task, due to that involves analysis and comprehension of the written text in non structural natural language dependent of a context and it must describe an events synthesis or knowledge in a simple form, becoming natural for any reader. There are diverse approaches to summarize. These categorized into extractive or abstractive. On abstractive technique, summaries are generated starting from selecting outstanding sentences on source text. Abstractive summaries are created by regenerating the content extracted from source text, through that phrases are reformulated by terms fusion, compression or suppression processes. In this manner, paraphrasing sentences are obtained or even sentences were not in the original text. This summarize type has a major probability to reach coherence and smoothness like one generated by human beings. The present work implements a method that allows to integrate syntactic, semantic (AMR annotator) and discursive (RST) information into a conceptual graph. This will be summarized through the use of a new measure of concept similarity on WordNet.To find the most relevant concepts we use PageRank, considering all discursive information given by the O”Donell method application. With the most important concepts and semantic roles information got from the PropBank, a natural language generation method was implemented with tool SimpleNLG. In this work we can appreciated the results of applying this method to the corpus of Document Understanding Conference 2002 and tested by Rouge metric, widely used in the automatic summarization task. Our method reaches a measure F1 of 24 % in Rouge-1 metric for the mono-document abstract generation task. This shows that using these techniques are workable and even more profitable and recommended configurations and useful tools for this task.
publishDate	2017
dc.date.accessioned.es_ES.fl_str_mv	2017-09-20T23:47:13Z
dc.date.available.es_ES.fl_str_mv	2017-09-20T23:47:13Z
dc.date.created.es_ES.fl_str_mv	2017
dc.date.issued.fl_str_mv	2017-09-20
dc.type.es_ES.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
dc.identifier.uri.none.fl_str_mv	http://hdl.handle.net/20.500.12404/9361
url	http://hdl.handle.net/20.500.12404/9361
dc.language.iso.es_ES.fl_str_mv	eng
language	eng
dc.relation.ispartof.fl_str_mv	SUNEDU
dc.rights.es_ES.fl_str_mv	info:eu-repo/semantics/openAccess
dc.rights.uri.*.fl_str_mv	http://creativecommons.org/licenses/by-nc-nd/2.5/pe/
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-nd/2.5/pe/
dc.publisher.es_ES.fl_str_mv	Pontificia Universidad Católica del Perú
dc.publisher.country.es_ES.fl_str_mv	PE
dc.source.none.fl_str_mv	reponame:PUCP-Tesis instname:Pontificia Universidad Católica del Perú instacron:PUCP
instname_str	Pontificia Universidad Católica del Perú
instacron_str	PUCP
institution	PUCP
reponame_str	PUCP-Tesis
collection	PUCP-Tesis
bitstream.url.fl_str_mv	https://tesis.pucp.edu.pe/bitstreams/f2aaf613-0d3a-489d-9c9f-edd7b6623c77/download https://tesis.pucp.edu.pe/bitstreams/cf32a879-771d-43a2-8d84-465f5daae73d/download https://tesis.pucp.edu.pe/bitstreams/346d92ee-a81b-42b3-8643-65c7e96ff175/download https://tesis.pucp.edu.pe/bitstreams/aa1bd33b-d5ab-4b02-a4e1-7735bf281355/download
bitstream.checksum.fl_str_mv	8f44c33eb8d64f8fa5f07ea840a851d3 8a4605be74aa9ea9d79846c1fba20a33 ba81d02bf6748a4ca0bf38468a7df370 bbfa0117cc49b0a20896a0ca29d100f9
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5 MD5
repository.name.fl_str_mv	Repositorio de Tesis PUCP
repository.mail.fl_str_mv	raul.sifuentes@pucp.pe
_version_	1839177145766641664
spelling	Sobrevilla Cabezudo, Marco AntonioValderrama Vilca, Gregory Cesar2017-09-20T23:47:13Z2017-09-20T23:47:13Z20172017-09-20http://hdl.handle.net/20.500.12404/9361The web is a giant resource of data and information about security, health, education, and others, matters that have great utility for people, but to get a synthesis or abstract about one or many documents is an expensive labor, which with manual process might be impossible due to the huge amount of data. Abstract generation is a challenging task, due to that involves analysis and comprehension of the written text in non structural natural language dependent of a context and it must describe an events synthesis or knowledge in a simple form, becoming natural for any reader. There are diverse approaches to summarize. These categorized into extractive or abstractive. On abstractive technique, summaries are generated starting from selecting outstanding sentences on source text. Abstractive summaries are created by regenerating the content extracted from source text, through that phrases are reformulated by terms fusion, compression or suppression processes. In this manner, paraphrasing sentences are obtained or even sentences were not in the original text. This summarize type has a major probability to reach coherence and smoothness like one generated by human beings. The present work implements a method that allows to integrate syntactic, semantic (AMR annotator) and discursive (RST) information into a conceptual graph. This will be summarized through the use of a new measure of concept similarity on WordNet.To find the most relevant concepts we use PageRank, considering all discursive information given by the O”Donell method application. With the most important concepts and semantic roles information got from the PropBank, a natural language generation method was implemented with tool SimpleNLG. In this work we can appreciated the results of applying this method to the corpus of Document Understanding Conference 2002 and tested by Rouge metric, widely used in the automatic summarization task. Our method reaches a measure F1 of 24 % in Rouge-1 metric for the mono-document abstract generation task. This shows that using these techniques are workable and even more profitable and recommended configurations and useful tools for this task.TesisengPontificia Universidad Católica del PerúPEinfo:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-nd/2.5/pe/Computación semánticaResúmenesSemánticahttps://purl.org/pe-repo/ocde/ford#1.02.00Generación automática de resúmenes abstractivos mono documento utilizando análisis semántico y del discursoinfo:eu-repo/semantics/masterThesisreponame:PUCP-Tesisinstname:Pontificia Universidad Católica del Perúinstacron:PUCPSUNEDUMaestro en Informática con mención en Ciencias de la ComputaciónMaestríaPontificia Universidad Católica del Perú. Escuela de PosgradoInformática con mención en Ciencias de la Computación611087https://purl.org/pe-repo/renati/level#maestrohttps://purl.org/pe-repo/renati/type#tesisORIGINALVALDERRAMA_GREGORY_RESUMENES_ABSTRACTIVOS_ANALISIS_SEMANTICO.pdfVALDERRAMA_GREGORY_RESUMENES_ABSTRACTIVOS_ANALISIS_SEMANTICO.pdfTexto completoapplication/pdf9072333https://tesis.pucp.edu.pe/bitstreams/f2aaf613-0d3a-489d-9c9f-edd7b6623c77/download8f44c33eb8d64f8fa5f07ea840a851d3MD51trueAnonymousREADLICENSElicense.txtlicense.txttext/plain; charset=utf-81748https://tesis.pucp.edu.pe/bitstreams/cf32a879-771d-43a2-8d84-465f5daae73d/download8a4605be74aa9ea9d79846c1fba20a33MD52falseAnonymousREADTHUMBNAILVALDERRAMA_GREGORY_RESUMENES_ABSTRACTIVOS_ANALISIS_SEMANTICO.pdf.jpgVALDERRAMA_GREGORY_RESUMENES_ABSTRACTIVOS_ANALISIS_SEMANTICO.pdf.jpgIM Thumbnailimage/jpeg15618https://tesis.pucp.edu.pe/bitstreams/346d92ee-a81b-42b3-8643-65c7e96ff175/downloadba81d02bf6748a4ca0bf38468a7df370MD53falseAnonymousREADTEXTVALDERRAMA_GREGORY_RESUMENES_ABSTRACTIVOS_ANALISIS_SEMANTICO.pdf.txtVALDERRAMA_GREGORY_RESUMENES_ABSTRACTIVOS_ANALISIS_SEMANTICO.pdf.txtExtracted texttext/plain188083https://tesis.pucp.edu.pe/bitstreams/aa1bd33b-d5ab-4b02-a4e1-7735bf281355/downloadbbfa0117cc49b0a20896a0ca29d100f9MD54falseAnonymousREAD20.500.12404/9361oai:tesis.pucp.edu.pe:20.500.12404/93612025-07-18 12:58:45.849http://creativecommons.org/licenses/by-nc-nd/2.5/pe/info:eu-repo/semantics/openAccessopen.accesshttps://tesis.pucp.edu.peRepositorio de Tesis PUCPraul.sifuentes@pucp.peTk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=
score	13.918711

Generación automática de resúmenes abstractivos mono documento utilizando análisis semántico y del discurso

Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).

Generación automática de resúmenes abstractivos mono documento utilizando análisis semántico y del discurso

Descripción del Articulo

Ejemplares Similares