Web Scraping versus Twitter API: A Comparison for a Credibility Analysis

Descripción del Articulo

Twitter is one of the most popular information source available on the Web. Thus, there exist many studies focused on analyzing the credibility of the shared information. Most proposals use either Twitter API or web scraping to extract the data to perform such analysis. Both extraction techniques ha...

Descripción completa

Detalles Bibliográficos
Autores: Dongo I., Cadinale Y., Aguilera A., Martínez F., Quintero Y., Barrios S.
Formato: artículo
Fecha de Publicación:2020
Institución:Consejo Nacional de Ciencia Tecnología e Innovación
Repositorio:CONCYTEC-Institucional
Lenguaje:inglés
OAI Identifier:oai:repositorio.concytec.gob.pe:20.500.12390/2460
Enlace del recurso:https://hdl.handle.net/20.500.12390/2460
https://doi.org/10.1145/3428757.3429104
Nivel de acceso:acceso abierto
Materia:Web Scraping
API
Credibility
Twitter
http://purl.org/pe-repo/ocde/ford#2.02.04
id CONC_a28686e174c47510cc37b0d26d538053
oai_identifier_str oai:repositorio.concytec.gob.pe:20.500.12390/2460
network_acronym_str CONC
network_name_str CONCYTEC-Institucional
repository_id_str 4689
dc.title.none.fl_str_mv Web Scraping versus Twitter API: A Comparison for a Credibility Analysis
title Web Scraping versus Twitter API: A Comparison for a Credibility Analysis
spellingShingle Web Scraping versus Twitter API: A Comparison for a Credibility Analysis
Dongo I.
Web Scraping
API
Credibility
Twitter
http://purl.org/pe-repo/ocde/ford#2.02.04
title_short Web Scraping versus Twitter API: A Comparison for a Credibility Analysis
title_full Web Scraping versus Twitter API: A Comparison for a Credibility Analysis
title_fullStr Web Scraping versus Twitter API: A Comparison for a Credibility Analysis
title_full_unstemmed Web Scraping versus Twitter API: A Comparison for a Credibility Analysis
title_sort Web Scraping versus Twitter API: A Comparison for a Credibility Analysis
author Dongo I.
author_facet Dongo I.
Cadinale Y.
Aguilera A.
Martínez F.
Quintero Y.
Barrios S.
author_role author
author2 Cadinale Y.
Aguilera A.
Martínez F.
Quintero Y.
Barrios S.
author2_role author
author
author
author
author
dc.contributor.author.fl_str_mv Dongo I.
Cadinale Y.
Aguilera A.
Martínez F.
Quintero Y.
Barrios S.
dc.subject.none.fl_str_mv Web Scraping
topic Web Scraping
API
Credibility
Twitter
http://purl.org/pe-repo/ocde/ford#2.02.04
dc.subject.es_PE.fl_str_mv API
Credibility
Twitter
dc.subject.ocde.none.fl_str_mv http://purl.org/pe-repo/ocde/ford#2.02.04
description Twitter is one of the most popular information source available on the Web. Thus, there exist many studies focused on analyzing the credibility of the shared information. Most proposals use either Twitter API or web scraping to extract the data to perform such analysis. Both extraction techniques have advantages and disadvantages. In this work, we present a study to evaluate their performance and behavior. The motivation for this research comes from the necessity to know ways to extract online information in order to analyze in real-time the credibility of the content posted on the Web. To do so, we develop a framework which offers both alternatives of data extraction and implements a previously proposed credibility model. Our framework is implemented as a Google Chrome extension able to analyze tweets in real-time. Results report that both methods produce identical credibility values, when a robust normalization process is applied to the text (i.e., tweet). Moreover, concerning the time performance, web scraping is faster than Twitter API, and it is more flexible in terms of obtaining data; however, web scraping is very sensitive to website changes. © 2020 ACM.
publishDate 2020
dc.date.accessioned.none.fl_str_mv 2024-05-30T23:13:38Z
dc.date.available.none.fl_str_mv 2024-05-30T23:13:38Z
dc.date.issued.fl_str_mv 2020
dc.type.none.fl_str_mv info:eu-repo/semantics/article
format article
dc.identifier.uri.none.fl_str_mv https://hdl.handle.net/20.500.12390/2460
dc.identifier.doi.none.fl_str_mv https://doi.org/10.1145/3428757.3429104
dc.identifier.scopus.none.fl_str_mv 2-s2.0-85100336680
url https://hdl.handle.net/20.500.12390/2460
https://doi.org/10.1145/3428757.3429104
identifier_str_mv 2-s2.0-85100336680
dc.language.iso.none.fl_str_mv eng
language eng
dc.relation.ispartof.none.fl_str_mv ACM International Conference Proceeding Series
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Association for Computing Machinery
publisher.none.fl_str_mv Association for Computing Machinery
dc.source.none.fl_str_mv reponame:CONCYTEC-Institucional
instname:Consejo Nacional de Ciencia Tecnología e Innovación
instacron:CONCYTEC
instname_str Consejo Nacional de Ciencia Tecnología e Innovación
instacron_str CONCYTEC
institution CONCYTEC
reponame_str CONCYTEC-Institucional
collection CONCYTEC-Institucional
repository.name.fl_str_mv Repositorio Institucional CONCYTEC
repository.mail.fl_str_mv repositorio@concytec.gob.pe
_version_ 1844883121607016448
spelling Publicationrp05705600rp06232600rp06233600rp06235600rp06234600rp06236600Dongo I.Cadinale Y.Aguilera A.Martínez F.Quintero Y.Barrios S.2024-05-30T23:13:38Z2024-05-30T23:13:38Z2020https://hdl.handle.net/20.500.12390/2460https://doi.org/10.1145/3428757.34291042-s2.0-85100336680Twitter is one of the most popular information source available on the Web. Thus, there exist many studies focused on analyzing the credibility of the shared information. Most proposals use either Twitter API or web scraping to extract the data to perform such analysis. Both extraction techniques have advantages and disadvantages. In this work, we present a study to evaluate their performance and behavior. The motivation for this research comes from the necessity to know ways to extract online information in order to analyze in real-time the credibility of the content posted on the Web. To do so, we develop a framework which offers both alternatives of data extraction and implements a previously proposed credibility model. Our framework is implemented as a Google Chrome extension able to analyze tweets in real-time. Results report that both methods produce identical credibility values, when a robust normalization process is applied to the text (i.e., tweet). Moreover, concerning the time performance, web scraping is faster than Twitter API, and it is more flexible in terms of obtaining data; however, web scraping is very sensitive to website changes. © 2020 ACM.Consejo Nacional de Ciencia, Tecnología e Innovación Tecnológica - ConcytecengAssociation for Computing MachineryACM International Conference Proceeding Seriesinfo:eu-repo/semantics/openAccessWeb ScrapingAPI-1Credibility-1Twitter-1http://purl.org/pe-repo/ocde/ford#2.02.04-1Web Scraping versus Twitter API: A Comparison for a Credibility Analysisinfo:eu-repo/semantics/articlereponame:CONCYTEC-Institucionalinstname:Consejo Nacional de Ciencia Tecnología e Innovacióninstacron:CONCYTEC20.500.12390/2460oai:repositorio.concytec.gob.pe:20.500.12390/24602024-05-30 16:08:25.406http://purl.org/coar/access_right/c_14cbinfo:eu-repo/semantics/closedAccessmetadata only accesshttps://repositorio.concytec.gob.peRepositorio Institucional CONCYTECrepositorio@concytec.gob.pe#PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE#<Publication xmlns="https://www.openaire.eu/cerif-profile/1.1/" id="6ec84d84-827c-4f34-ae8a-d2d1e16eb360"> <Type xmlns="https://www.openaire.eu/cerif-profile/vocab/COAR_Publication_Types">http://purl.org/coar/resource_type/c_1843</Type> <Language>eng</Language> <Title>Web Scraping versus Twitter API: A Comparison for a Credibility Analysis</Title> <PublishedIn> <Publication> <Title>ACM International Conference Proceeding Series</Title> </Publication> </PublishedIn> <PublicationDate>2020</PublicationDate> <DOI>https://doi.org/10.1145/3428757.3429104</DOI> <SCP-Number>2-s2.0-85100336680</SCP-Number> <Authors> <Author> <DisplayName>Dongo I.</DisplayName> <Person id="rp05705" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Cadinale Y.</DisplayName> <Person id="rp06232" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Aguilera A.</DisplayName> <Person id="rp06233" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Martínez F.</DisplayName> <Person id="rp06235" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Quintero Y.</DisplayName> <Person id="rp06234" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Barrios S.</DisplayName> <Person id="rp06236" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> </Authors> <Editors> </Editors> <Publishers> <Publisher> <DisplayName>Association for Computing Machinery</DisplayName> <OrgUnit /> </Publisher> </Publishers> <Keyword>Web Scraping</Keyword> <Keyword>API</Keyword> <Keyword>Credibility</Keyword> <Keyword>Twitter</Keyword> <Abstract>Twitter is one of the most popular information source available on the Web. Thus, there exist many studies focused on analyzing the credibility of the shared information. Most proposals use either Twitter API or web scraping to extract the data to perform such analysis. Both extraction techniques have advantages and disadvantages. In this work, we present a study to evaluate their performance and behavior. The motivation for this research comes from the necessity to know ways to extract online information in order to analyze in real-time the credibility of the content posted on the Web. To do so, we develop a framework which offers both alternatives of data extraction and implements a previously proposed credibility model. Our framework is implemented as a Google Chrome extension able to analyze tweets in real-time. Results report that both methods produce identical credibility values, when a robust normalization process is applied to the text (i.e., tweet). Moreover, concerning the time performance, web scraping is faster than Twitter API, and it is more flexible in terms of obtaining data; however, web scraping is very sensitive to website changes. © 2020 ACM.</Abstract> <Access xmlns="http://purl.org/coar/access_right" > </Access> </Publication> -1
score 13.476506
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).