Web Scraping versus Twitter API: A Comparison for a Credibility Analysis
Descripción del Articulo
Twitter is one of the most popular information source available on the Web. Thus, there exist many studies focused on analyzing the credibility of the shared information. Most proposals use either Twitter API or web scraping to extract the data to perform such analysis. Both extraction techniques ha...
Autores: | , , , , , |
---|---|
Formato: | artículo |
Fecha de Publicación: | 2020 |
Institución: | Consejo Nacional de Ciencia Tecnología e Innovación |
Repositorio: | CONCYTEC-Institucional |
Lenguaje: | inglés |
OAI Identifier: | oai:repositorio.concytec.gob.pe:20.500.12390/2460 |
Enlace del recurso: | https://hdl.handle.net/20.500.12390/2460 https://doi.org/10.1145/3428757.3429104 |
Nivel de acceso: | acceso abierto |
Materia: | Web Scraping API Credibility http://purl.org/pe-repo/ocde/ford#2.02.04 |
id |
CONC_a28686e174c47510cc37b0d26d538053 |
---|---|
oai_identifier_str |
oai:repositorio.concytec.gob.pe:20.500.12390/2460 |
network_acronym_str |
CONC |
network_name_str |
CONCYTEC-Institucional |
repository_id_str |
4689 |
dc.title.none.fl_str_mv |
Web Scraping versus Twitter API: A Comparison for a Credibility Analysis |
title |
Web Scraping versus Twitter API: A Comparison for a Credibility Analysis |
spellingShingle |
Web Scraping versus Twitter API: A Comparison for a Credibility Analysis Dongo I. Web Scraping API Credibility http://purl.org/pe-repo/ocde/ford#2.02.04 |
title_short |
Web Scraping versus Twitter API: A Comparison for a Credibility Analysis |
title_full |
Web Scraping versus Twitter API: A Comparison for a Credibility Analysis |
title_fullStr |
Web Scraping versus Twitter API: A Comparison for a Credibility Analysis |
title_full_unstemmed |
Web Scraping versus Twitter API: A Comparison for a Credibility Analysis |
title_sort |
Web Scraping versus Twitter API: A Comparison for a Credibility Analysis |
author |
Dongo I. |
author_facet |
Dongo I. Cadinale Y. Aguilera A. Martínez F. Quintero Y. Barrios S. |
author_role |
author |
author2 |
Cadinale Y. Aguilera A. Martínez F. Quintero Y. Barrios S. |
author2_role |
author author author author author |
dc.contributor.author.fl_str_mv |
Dongo I. Cadinale Y. Aguilera A. Martínez F. Quintero Y. Barrios S. |
dc.subject.none.fl_str_mv |
Web Scraping |
topic |
Web Scraping API Credibility http://purl.org/pe-repo/ocde/ford#2.02.04 |
dc.subject.es_PE.fl_str_mv |
API Credibility |
dc.subject.ocde.none.fl_str_mv |
http://purl.org/pe-repo/ocde/ford#2.02.04 |
description |
Twitter is one of the most popular information source available on the Web. Thus, there exist many studies focused on analyzing the credibility of the shared information. Most proposals use either Twitter API or web scraping to extract the data to perform such analysis. Both extraction techniques have advantages and disadvantages. In this work, we present a study to evaluate their performance and behavior. The motivation for this research comes from the necessity to know ways to extract online information in order to analyze in real-time the credibility of the content posted on the Web. To do so, we develop a framework which offers both alternatives of data extraction and implements a previously proposed credibility model. Our framework is implemented as a Google Chrome extension able to analyze tweets in real-time. Results report that both methods produce identical credibility values, when a robust normalization process is applied to the text (i.e., tweet). Moreover, concerning the time performance, web scraping is faster than Twitter API, and it is more flexible in terms of obtaining data; however, web scraping is very sensitive to website changes. © 2020 ACM. |
publishDate |
2020 |
dc.date.accessioned.none.fl_str_mv |
2024-05-30T23:13:38Z |
dc.date.available.none.fl_str_mv |
2024-05-30T23:13:38Z |
dc.date.issued.fl_str_mv |
2020 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
dc.identifier.uri.none.fl_str_mv |
https://hdl.handle.net/20.500.12390/2460 |
dc.identifier.doi.none.fl_str_mv |
https://doi.org/10.1145/3428757.3429104 |
dc.identifier.scopus.none.fl_str_mv |
2-s2.0-85100336680 |
url |
https://hdl.handle.net/20.500.12390/2460 https://doi.org/10.1145/3428757.3429104 |
identifier_str_mv |
2-s2.0-85100336680 |
dc.language.iso.none.fl_str_mv |
eng |
language |
eng |
dc.relation.ispartof.none.fl_str_mv |
ACM International Conference Proceeding Series |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Association for Computing Machinery |
publisher.none.fl_str_mv |
Association for Computing Machinery |
dc.source.none.fl_str_mv |
reponame:CONCYTEC-Institucional instname:Consejo Nacional de Ciencia Tecnología e Innovación instacron:CONCYTEC |
instname_str |
Consejo Nacional de Ciencia Tecnología e Innovación |
instacron_str |
CONCYTEC |
institution |
CONCYTEC |
reponame_str |
CONCYTEC-Institucional |
collection |
CONCYTEC-Institucional |
repository.name.fl_str_mv |
Repositorio Institucional CONCYTEC |
repository.mail.fl_str_mv |
repositorio@concytec.gob.pe |
_version_ |
1844883121607016448 |
spelling |
Publicationrp05705600rp06232600rp06233600rp06235600rp06234600rp06236600Dongo I.Cadinale Y.Aguilera A.Martínez F.Quintero Y.Barrios S.2024-05-30T23:13:38Z2024-05-30T23:13:38Z2020https://hdl.handle.net/20.500.12390/2460https://doi.org/10.1145/3428757.34291042-s2.0-85100336680Twitter is one of the most popular information source available on the Web. Thus, there exist many studies focused on analyzing the credibility of the shared information. Most proposals use either Twitter API or web scraping to extract the data to perform such analysis. Both extraction techniques have advantages and disadvantages. In this work, we present a study to evaluate their performance and behavior. The motivation for this research comes from the necessity to know ways to extract online information in order to analyze in real-time the credibility of the content posted on the Web. To do so, we develop a framework which offers both alternatives of data extraction and implements a previously proposed credibility model. Our framework is implemented as a Google Chrome extension able to analyze tweets in real-time. Results report that both methods produce identical credibility values, when a robust normalization process is applied to the text (i.e., tweet). Moreover, concerning the time performance, web scraping is faster than Twitter API, and it is more flexible in terms of obtaining data; however, web scraping is very sensitive to website changes. © 2020 ACM.Consejo Nacional de Ciencia, Tecnología e Innovación Tecnológica - ConcytecengAssociation for Computing MachineryACM International Conference Proceeding Seriesinfo:eu-repo/semantics/openAccessWeb ScrapingAPI-1Credibility-1Twitter-1http://purl.org/pe-repo/ocde/ford#2.02.04-1Web Scraping versus Twitter API: A Comparison for a Credibility Analysisinfo:eu-repo/semantics/articlereponame:CONCYTEC-Institucionalinstname:Consejo Nacional de Ciencia Tecnología e Innovacióninstacron:CONCYTEC20.500.12390/2460oai:repositorio.concytec.gob.pe:20.500.12390/24602024-05-30 16:08:25.406http://purl.org/coar/access_right/c_14cbinfo:eu-repo/semantics/closedAccessmetadata only accesshttps://repositorio.concytec.gob.peRepositorio Institucional CONCYTECrepositorio@concytec.gob.pe#PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE#<Publication xmlns="https://www.openaire.eu/cerif-profile/1.1/" id="6ec84d84-827c-4f34-ae8a-d2d1e16eb360"> <Type xmlns="https://www.openaire.eu/cerif-profile/vocab/COAR_Publication_Types">http://purl.org/coar/resource_type/c_1843</Type> <Language>eng</Language> <Title>Web Scraping versus Twitter API: A Comparison for a Credibility Analysis</Title> <PublishedIn> <Publication> <Title>ACM International Conference Proceeding Series</Title> </Publication> </PublishedIn> <PublicationDate>2020</PublicationDate> <DOI>https://doi.org/10.1145/3428757.3429104</DOI> <SCP-Number>2-s2.0-85100336680</SCP-Number> <Authors> <Author> <DisplayName>Dongo I.</DisplayName> <Person id="rp05705" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Cadinale Y.</DisplayName> <Person id="rp06232" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Aguilera A.</DisplayName> <Person id="rp06233" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Martínez F.</DisplayName> <Person id="rp06235" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Quintero Y.</DisplayName> <Person id="rp06234" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Barrios S.</DisplayName> <Person id="rp06236" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> </Authors> <Editors> </Editors> <Publishers> <Publisher> <DisplayName>Association for Computing Machinery</DisplayName> <OrgUnit /> </Publisher> </Publishers> <Keyword>Web Scraping</Keyword> <Keyword>API</Keyword> <Keyword>Credibility</Keyword> <Keyword>Twitter</Keyword> <Abstract>Twitter is one of the most popular information source available on the Web. Thus, there exist many studies focused on analyzing the credibility of the shared information. Most proposals use either Twitter API or web scraping to extract the data to perform such analysis. Both extraction techniques have advantages and disadvantages. In this work, we present a study to evaluate their performance and behavior. The motivation for this research comes from the necessity to know ways to extract online information in order to analyze in real-time the credibility of the content posted on the Web. To do so, we develop a framework which offers both alternatives of data extraction and implements a previously proposed credibility model. Our framework is implemented as a Google Chrome extension able to analyze tweets in real-time. Results report that both methods produce identical credibility values, when a robust normalization process is applied to the text (i.e., tweet). Moreover, concerning the time performance, web scraping is faster than Twitter API, and it is more flexible in terms of obtaining data; however, web scraping is very sensitive to website changes. © 2020 ACM.</Abstract> <Access xmlns="http://purl.org/coar/access_right" > </Access> </Publication> -1 |
score |
13.476506 |
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).