A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis

Descripción del Articulo

This research was supported by the FONDO NACIONAL DE DESARROLLO CIENTÍFICO, TECNOLÓGICO Y DE INNOVACIÓN TECNOLÓGICA – FONDECYT as executing entity of CONCYTEC under grant agreement no. 01–2019-FONDECYT-BM-INC.INV in the project RUTAS: Robots para centros Urbanos Turísticos Autónomos y basados en Sem...

Descripción completa

Detalles Bibliográficos
Autores: Dongo I., Cardinale Y., Aguilera A., Martinez F., Quintero Y., Robayo G., Cabeza D.
Formato: artículo
Fecha de Publicación:2021
Institución:Consejo Nacional de Ciencia Tecnología e Innovación
Repositorio:CONCYTEC-Institucional
Lenguaje:inglés
OAI Identifier:oai:repositorio.concytec.gob.pe:20.500.12390/2961
Enlace del recurso:https://hdl.handle.net/20.500.12390/2961
https://doi.org/10.1108/IJWIS-03-2021-0037
Nivel de acceso:acceso abierto
Materia:Web scraping
API
Credibility
Qualitative analysis
Twitter
https://purl.org/pe-repo/ocde/ford#2.02.04
id CONC_5d764fcd42988a8ecb18b73950b65d53
oai_identifier_str oai:repositorio.concytec.gob.pe:20.500.12390/2961
network_acronym_str CONC
network_name_str CONCYTEC-Institucional
repository_id_str 4689
dc.title.none.fl_str_mv A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis
title A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis
spellingShingle A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis
Dongo I.
Web scraping
API
Credibility
Qualitative analysis
Twitter
https://purl.org/pe-repo/ocde/ford#2.02.04
title_short A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis
title_full A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis
title_fullStr A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis
title_full_unstemmed A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis
title_sort A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis
author Dongo I.
author_facet Dongo I.
Cardinale Y.
Aguilera A.
Martinez F.
Quintero Y.
Robayo G.
Cabeza D.
author_role author
author2 Cardinale Y.
Aguilera A.
Martinez F.
Quintero Y.
Robayo G.
Cabeza D.
author2_role author
author
author
author
author
author
dc.contributor.author.fl_str_mv Dongo I.
Cardinale Y.
Aguilera A.
Martinez F.
Quintero Y.
Robayo G.
Cabeza D.
dc.subject.none.fl_str_mv Web scraping
topic Web scraping
API
Credibility
Qualitative analysis
Twitter
https://purl.org/pe-repo/ocde/ford#2.02.04
dc.subject.es_PE.fl_str_mv API
Credibility
Qualitative analysis
Twitter
dc.subject.ocde.none.fl_str_mv https://purl.org/pe-repo/ocde/ford#2.02.04
description This research was supported by the FONDO NACIONAL DE DESARROLLO CIENTÍFICO, TECNOLÓGICO Y DE INNOVACIÓN TECNOLÓGICA – FONDECYT as executing entity of CONCYTEC under grant agreement no. 01–2019-FONDECYT-BM-INC.INV in the project RUTAS: Robots para centros Urbanos Turísticos Autónomos y basados en Semántica.
publishDate 2021
dc.date.accessioned.none.fl_str_mv 2024-05-30T23:13:38Z
dc.date.available.none.fl_str_mv 2024-05-30T23:13:38Z
dc.date.issued.fl_str_mv 2021
dc.type.none.fl_str_mv info:eu-repo/semantics/article
format article
dc.identifier.uri.none.fl_str_mv https://hdl.handle.net/20.500.12390/2961
dc.identifier.doi.none.fl_str_mv https://doi.org/10.1108/IJWIS-03-2021-0037
dc.identifier.scopus.none.fl_str_mv 2-s2.0-85111661872
url https://hdl.handle.net/20.500.12390/2961
https://doi.org/10.1108/IJWIS-03-2021-0037
identifier_str_mv 2-s2.0-85111661872
dc.language.iso.none.fl_str_mv eng
language eng
dc.relation.ispartof.none.fl_str_mv International Journal of Web Information Systems
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Emerald Group Holdings Ltd.
publisher.none.fl_str_mv Emerald Group Holdings Ltd.
dc.source.none.fl_str_mv reponame:CONCYTEC-Institucional
instname:Consejo Nacional de Ciencia Tecnología e Innovación
instacron:CONCYTEC
instname_str Consejo Nacional de Ciencia Tecnología e Innovación
instacron_str CONCYTEC
institution CONCYTEC
reponame_str CONCYTEC-Institucional
collection CONCYTEC-Institucional
repository.name.fl_str_mv Repositorio Institucional CONCYTEC
repository.mail.fl_str_mv repositorio@concytec.gob.pe
_version_ 1839175700645412864
spelling Publicationrp05705600rp05703600rp06233600rp08386600rp06234600rp08387600rp08385600Dongo I.Cardinale Y.Aguilera A.Martinez F.Quintero Y.Robayo G.Cabeza D.2024-05-30T23:13:38Z2024-05-30T23:13:38Z2021https://hdl.handle.net/20.500.12390/2961https://doi.org/10.1108/IJWIS-03-2021-00372-s2.0-85111661872This research was supported by the FONDO NACIONAL DE DESARROLLO CIENTÍFICO, TECNOLÓGICO Y DE INNOVACIÓN TECNOLÓGICA – FONDECYT as executing entity of CONCYTEC under grant agreement no. 01–2019-FONDECYT-BM-INC.INV in the project RUTAS: Robots para centros Urbanos Turísticos Autónomos y basados en Semántica.Purpose: This paper aims to perform an exhaustive revision of relevant and recent related studies, which reveals that both extraction methods are currently used to analyze credibility on Twitter. Thus, there is clear evidence of the need of having different options to extract different data for this purpose. Nevertheless, none of these studies perform a comparative evaluation of both extraction techniques. Moreover, the authors extend a previous comparison, which uses a recent developed framework that offers both alternates of data extraction and implements a previously proposed credibility model, by adding a qualitative evaluation and a Twitter-Application Programming Interface (API) performance analysis from different locations. Design/methodology/approach: As one of the most popular social platforms, Twitter has been the focus of recent research aimed at analyzing the credibility of the shared information. To do so, several proposals use either Twitter API or Web scraping to extract the data to perform the analysis. Qualitative and quantitative evaluations are performed to discover the advantages and disadvantages of both extraction methods. Findings: The study demonstrates the differences in terms of accuracy and efficiency of both extraction methods and gives relevance to much more problems related to this area to pursue true transparency and legitimacy of information on the Web. Originality/value: Results report that some Twitter attributes cannot be retrieved by Web scraping. Both methods produce identical credibility values when a robust normalization process is applied to the text (i.e. tweet). Moreover, concerning the time performance, Web scraping is faster than Twitter API and it is more flexible in terms of obtaining data; however, Web scraping is very sensitive to website changes. Additionally, the response time of the Twitter API is proportional to the distance from the central server at San Francisco. © 2021, Emerald Publishing Limited.Consejo Nacional de Ciencia, Tecnología e Innovación Tecnológica - ConcytecengEmerald Group Holdings Ltd.International Journal of Web Information Systemsinfo:eu-repo/semantics/openAccessWeb scrapingAPI-1Credibility-1Qualitative analysis-1Twitter-1https://purl.org/pe-repo/ocde/ford#2.02.04-1A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysisinfo:eu-repo/semantics/articlereponame:CONCYTEC-Institucionalinstname:Consejo Nacional de Ciencia Tecnología e Innovacióninstacron:CONCYTEC20.500.12390/2961oai:repositorio.concytec.gob.pe:20.500.12390/29612024-05-30 16:12:31.545http://purl.org/coar/access_right/c_14cbinfo:eu-repo/semantics/closedAccessmetadata only accesshttps://repositorio.concytec.gob.peRepositorio Institucional CONCYTECrepositorio@concytec.gob.pe#PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE#<Publication xmlns="https://www.openaire.eu/cerif-profile/1.1/" id="aec1045d-aee5-4279-bf8e-6894bd533f30"> <Type xmlns="https://www.openaire.eu/cerif-profile/vocab/COAR_Publication_Types">http://purl.org/coar/resource_type/c_1843</Type> <Language>eng</Language> <Title>A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis</Title> <PublishedIn> <Publication> <Title>International Journal of Web Information Systems</Title> </Publication> </PublishedIn> <PublicationDate>2021</PublicationDate> <DOI>https://doi.org/10.1108/IJWIS-03-2021-0037</DOI> <SCP-Number>2-s2.0-85111661872</SCP-Number> <Authors> <Author> <DisplayName>Dongo I.</DisplayName> <Person id="rp05705" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Cardinale Y.</DisplayName> <Person id="rp05703" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Aguilera A.</DisplayName> <Person id="rp06233" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Martinez F.</DisplayName> <Person id="rp08386" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Quintero Y.</DisplayName> <Person id="rp06234" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Robayo G.</DisplayName> <Person id="rp08387" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Cabeza D.</DisplayName> <Person id="rp08385" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> </Authors> <Editors> </Editors> <Publishers> <Publisher> <DisplayName>Emerald Group Holdings Ltd.</DisplayName> <OrgUnit /> </Publisher> </Publishers> <Keyword>Web scraping</Keyword> <Keyword>API</Keyword> <Keyword>Credibility</Keyword> <Keyword>Qualitative analysis</Keyword> <Keyword>Twitter</Keyword> <Abstract>Purpose: This paper aims to perform an exhaustive revision of relevant and recent related studies, which reveals that both extraction methods are currently used to analyze credibility on Twitter. Thus, there is clear evidence of the need of having different options to extract different data for this purpose. Nevertheless, none of these studies perform a comparative evaluation of both extraction techniques. Moreover, the authors extend a previous comparison, which uses a recent developed framework that offers both alternates of data extraction and implements a previously proposed credibility model, by adding a qualitative evaluation and a Twitter-Application Programming Interface (API) performance analysis from different locations. Design/methodology/approach: As one of the most popular social platforms, Twitter has been the focus of recent research aimed at analyzing the credibility of the shared information. To do so, several proposals use either Twitter API or Web scraping to extract the data to perform the analysis. Qualitative and quantitative evaluations are performed to discover the advantages and disadvantages of both extraction methods. Findings: The study demonstrates the differences in terms of accuracy and efficiency of both extraction methods and gives relevance to much more problems related to this area to pursue true transparency and legitimacy of information on the Web. Originality/value: Results report that some Twitter attributes cannot be retrieved by Web scraping. Both methods produce identical credibility values when a robust normalization process is applied to the text (i.e. tweet). Moreover, concerning the time performance, Web scraping is faster than Twitter API and it is more flexible in terms of obtaining data; however, Web scraping is very sensitive to website changes. Additionally, the response time of the Twitter API is proportional to the distance from the central server at San Francisco. © 2021, Emerald Publishing Limited.</Abstract> <Access xmlns="http://purl.org/coar/access_right" > </Access> </Publication> -1
score 13.439101
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).