Web Scraping versus Twitter API: A Comparison for a Credibility Analysis

Dongo I.; Cadinale Y.; Aguilera A.; Martínez F.; Quintero Y.; Barrios S.

Web Scraping versus Twitter API: A Comparison for a Credibility Analysis

Descripción del Articulo

Twitter is one of the most popular information source available on the Web. Thus, there exist many studies focused on analyzing the credibility of the shared information. Most proposals use either Twitter API or web scraping to extract the data to perform such analysis. Both extraction techniques ha...

Descripción completa

Detalles Bibliográficos
Autores:	Dongo I., Cadinale Y., Aguilera A., Martínez F., Quintero Y., Barrios S.
Formato:	artículo
Fecha de Publicación:	2020
Institución:	Consejo Nacional de Ciencia Tecnología e Innovación
Repositorio:	CONCYTEC-Institucional
Lenguaje:	inglés
OAI Identifier:	oai:repositorio.concytec.gob.pe:20.500.12390/2460
Enlace del recurso:	https://hdl.handle.net/20.500.12390/2460 https://doi.org/10.1145/3428757.3429104
Nivel de acceso:	acceso abierto
Materia:	Web Scraping API Credibility Twitter http://purl.org/pe-repo/ocde/ford#2.02.04

id	CONC_a28686e174c47510cc37b0d26d538053
oai_identifier_str	oai:repositorio.concytec.gob.pe:20.500.12390/2460
network_acronym_str	CONC
network_name_str	CONCYTEC-Institucional
repository_id_str	4689
dc.title.none.fl_str_mv	Web Scraping versus Twitter API: A Comparison for a Credibility Analysis
title	Web Scraping versus Twitter API: A Comparison for a Credibility Analysis
spellingShingle	Web Scraping versus Twitter API: A Comparison for a Credibility Analysis Dongo I. Web Scraping API Credibility Twitter http://purl.org/pe-repo/ocde/ford#2.02.04
title_short	Web Scraping versus Twitter API: A Comparison for a Credibility Analysis
title_full	Web Scraping versus Twitter API: A Comparison for a Credibility Analysis
title_fullStr	Web Scraping versus Twitter API: A Comparison for a Credibility Analysis
title_full_unstemmed	Web Scraping versus Twitter API: A Comparison for a Credibility Analysis
title_sort	Web Scraping versus Twitter API: A Comparison for a Credibility Analysis
author	Dongo I.
author_facet	Dongo I. Cadinale Y. Aguilera A. Martínez F. Quintero Y. Barrios S.
author_role	author
author2	Cadinale Y. Aguilera A. Martínez F. Quintero Y. Barrios S.
author2_role	author author author author author
dc.contributor.author.fl_str_mv	Dongo I. Cadinale Y. Aguilera A. Martínez F. Quintero Y. Barrios S.
dc.subject.none.fl_str_mv	Web Scraping
topic	Web Scraping API Credibility Twitter http://purl.org/pe-repo/ocde/ford#2.02.04
dc.subject.es_PE.fl_str_mv	API Credibility Twitter
dc.subject.ocde.none.fl_str_mv	http://purl.org/pe-repo/ocde/ford#2.02.04
description	Twitter is one of the most popular information source available on the Web. Thus, there exist many studies focused on analyzing the credibility of the shared information. Most proposals use either Twitter API or web scraping to extract the data to perform such analysis. Both extraction techniques have advantages and disadvantages. In this work, we present a study to evaluate their performance and behavior. The motivation for this research comes from the necessity to know ways to extract online information in order to analyze in real-time the credibility of the content posted on the Web. To do so, we develop a framework which offers both alternatives of data extraction and implements a previously proposed credibility model. Our framework is implemented as a Google Chrome extension able to analyze tweets in real-time. Results report that both methods produce identical credibility values, when a robust normalization process is applied to the text (i.e., tweet). Moreover, concerning the time performance, web scraping is faster than Twitter API, and it is more flexible in terms of obtaining data; however, web scraping is very sensitive to website changes. © 2020 ACM.
publishDate	2020
dc.date.accessioned.none.fl_str_mv	2024-05-30T23:13:38Z
dc.date.available.none.fl_str_mv	2024-05-30T23:13:38Z
dc.date.issued.fl_str_mv	2020
dc.type.none.fl_str_mv	info:eu-repo/semantics/article
format	article
dc.identifier.uri.none.fl_str_mv	https://hdl.handle.net/20.500.12390/2460
dc.identifier.doi.none.fl_str_mv	https://doi.org/10.1145/3428757.3429104
dc.identifier.scopus.none.fl_str_mv	2-s2.0-85100336680
url	https://hdl.handle.net/20.500.12390/2460 https://doi.org/10.1145/3428757.3429104
identifier_str_mv	2-s2.0-85100336680
dc.language.iso.none.fl_str_mv	eng
language	eng
dc.relation.ispartof.none.fl_str_mv	ACM International Conference Proceeding Series
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Association for Computing Machinery
publisher.none.fl_str_mv	Association for Computing Machinery
dc.source.none.fl_str_mv	reponame:CONCYTEC-Institucional instname:Consejo Nacional de Ciencia Tecnología e Innovación instacron:CONCYTEC
instname_str	Consejo Nacional de Ciencia Tecnología e Innovación
instacron_str	CONCYTEC
institution	CONCYTEC
reponame_str	CONCYTEC-Institucional
collection	CONCYTEC-Institucional
repository.name.fl_str_mv	Repositorio Institucional CONCYTEC
repository.mail.fl_str_mv	repositorio@concytec.gob.pe
_version_	1844883121607016448
spelling	Publicationrp05705600rp06232600rp06233600rp06235600rp06234600rp06236600Dongo I.Cadinale Y.Aguilera A.Martínez F.Quintero Y.Barrios S.2024-05-30T23:13:38Z2024-05-30T23:13:38Z2020https://hdl.handle.net/20.500.12390/2460https://doi.org/10.1145/3428757.34291042-s2.0-85100336680Twitter is one of the most popular information source available on the Web. Thus, there exist many studies focused on analyzing the credibility of the shared information. Most proposals use either Twitter API or web scraping to extract the data to perform such analysis. Both extraction techniques have advantages and disadvantages. In this work, we present a study to evaluate their performance and behavior. The motivation for this research comes from the necessity to know ways to extract online information in order to analyze in real-time the credibility of the content posted on the Web. To do so, we develop a framework which offers both alternatives of data extraction and implements a previously proposed credibility model. Our framework is implemented as a Google Chrome extension able to analyze tweets in real-time. Results report that both methods produce identical credibility values, when a robust normalization process is applied to the text (i.e., tweet). Moreover, concerning the time performance, web scraping is faster than Twitter API, and it is more flexible in terms of obtaining data; however, web scraping is very sensitive to website changes. © 2020 ACM.Consejo Nacional de Ciencia, Tecnología e Innovación Tecnológica - ConcytecengAssociation for Computing MachineryACM International Conference Proceeding Seriesinfo:eu-repo/semantics/openAccessWeb ScrapingAPI-1Credibility-1Twitter-1http://purl.org/pe-repo/ocde/ford#2.02.04-1Web Scraping versus Twitter API: A Comparison for a Credibility Analysisinfo:eu-repo/semantics/articlereponame:CONCYTEC-Institucionalinstname:Consejo Nacional de Ciencia Tecnología e Innovacióninstacron:CONCYTEC20.500.12390/2460oai:repositorio.concytec.gob.pe:20.500.12390/24602024-05-30 16:08:25.406http://purl.org/coar/access_right/c_14cbinfo:eu-repo/semantics/closedAccessmetadata only accesshttps://repositorio.concytec.gob.peRepositorio Institucional CONCYTECrepositorio@concytec.gob.pe#PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE#<Publication xmlns="https://www.openaire.eu/cerif-profile/1.1/" id="6ec84d84-827c-4f34-ae8a-d2d1e16eb360"> <Type xmlns="https://www.openaire.eu/cerif-profile/vocab/COAR_Publication_Types">http://purl.org/coar/resource_type/c_1843</Type> <Language>eng</Language> <Title>Web Scraping versus Twitter API: A Comparison for a Credibility Analysis</Title> <PublishedIn> <Publication> <Title>ACM International Conference Proceeding Series</Title> </Publication> </PublishedIn> <PublicationDate>2020</PublicationDate> <DOI>https://doi.org/10.1145/3428757.3429104</DOI> <SCP-Number>2-s2.0-85100336680</SCP-Number> <Authors> <Author> <DisplayName>Dongo I.</DisplayName> <Person id="rp05705" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Cadinale Y.</DisplayName> <Person id="rp06232" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Aguilera A.</DisplayName> <Person id="rp06233" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Martínez F.</DisplayName> <Person id="rp06235" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Quintero Y.</DisplayName> <Person id="rp06234" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Barrios S.</DisplayName> <Person id="rp06236" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> </Authors> <Editors> </Editors> <Publishers> <Publisher> <DisplayName>Association for Computing Machinery</DisplayName> <OrgUnit /> </Publisher> </Publishers> <Keyword>Web Scraping</Keyword> <Keyword>API</Keyword> <Keyword>Credibility</Keyword> <Keyword>Twitter</Keyword> <Abstract>Twitter is one of the most popular information source available on the Web. Thus, there exist many studies focused on analyzing the credibility of the shared information. Most proposals use either Twitter API or web scraping to extract the data to perform such analysis. Both extraction techniques have advantages and disadvantages. In this work, we present a study to evaluate their performance and behavior. The motivation for this research comes from the necessity to know ways to extract online information in order to analyze in real-time the credibility of the content posted on the Web. To do so, we develop a framework which offers both alternatives of data extraction and implements a previously proposed credibility model. Our framework is implemented as a Google Chrome extension able to analyze tweets in real-time. Results report that both methods produce identical credibility values, when a robust normalization process is applied to the text (i.e., tweet). Moreover, concerning the time performance, web scraping is faster than Twitter API, and it is more flexible in terms of obtaining data; however, web scraping is very sensitive to website changes. © 2020 ACM.</Abstract> <Access xmlns="http://purl.org/coar/access_right" > </Access> </Publication> -1
score	13.8878145

Web Scraping versus Twitter API: A Comparison for a Credibility Analysis

Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).

Web Scraping versus Twitter API: A Comparison for a Credibility Analysis

Descripción del Articulo

Ejemplares Similares