A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis
Descripción del Articulo
This research was supported by the FONDO NACIONAL DE DESARROLLO CIENTÍFICO, TECNOLÓGICO Y DE INNOVACIÓN TECNOLÓGICA – FONDECYT as executing entity of CONCYTEC under grant agreement no. 01–2019-FONDECYT-BM-INC.INV in the project RUTAS: Robots para centros Urbanos Turísticos Autónomos y basados en Sem...
Autores: | , , , , , , |
---|---|
Formato: | artículo |
Fecha de Publicación: | 2021 |
Institución: | Consejo Nacional de Ciencia Tecnología e Innovación |
Repositorio: | CONCYTEC-Institucional |
Lenguaje: | inglés |
OAI Identifier: | oai:repositorio.concytec.gob.pe:20.500.12390/2961 |
Enlace del recurso: | https://hdl.handle.net/20.500.12390/2961 https://doi.org/10.1108/IJWIS-03-2021-0037 |
Nivel de acceso: | acceso abierto |
Materia: | Web scraping API Credibility Qualitative analysis https://purl.org/pe-repo/ocde/ford#2.02.04 |
id |
CONC_5d764fcd42988a8ecb18b73950b65d53 |
---|---|
oai_identifier_str |
oai:repositorio.concytec.gob.pe:20.500.12390/2961 |
network_acronym_str |
CONC |
network_name_str |
CONCYTEC-Institucional |
repository_id_str |
4689 |
dc.title.none.fl_str_mv |
A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis |
title |
A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis |
spellingShingle |
A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis Dongo I. Web scraping API Credibility Qualitative analysis https://purl.org/pe-repo/ocde/ford#2.02.04 |
title_short |
A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis |
title_full |
A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis |
title_fullStr |
A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis |
title_full_unstemmed |
A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis |
title_sort |
A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis |
author |
Dongo I. |
author_facet |
Dongo I. Cardinale Y. Aguilera A. Martinez F. Quintero Y. Robayo G. Cabeza D. |
author_role |
author |
author2 |
Cardinale Y. Aguilera A. Martinez F. Quintero Y. Robayo G. Cabeza D. |
author2_role |
author author author author author author |
dc.contributor.author.fl_str_mv |
Dongo I. Cardinale Y. Aguilera A. Martinez F. Quintero Y. Robayo G. Cabeza D. |
dc.subject.none.fl_str_mv |
Web scraping |
topic |
Web scraping API Credibility Qualitative analysis https://purl.org/pe-repo/ocde/ford#2.02.04 |
dc.subject.es_PE.fl_str_mv |
API Credibility Qualitative analysis |
dc.subject.ocde.none.fl_str_mv |
https://purl.org/pe-repo/ocde/ford#2.02.04 |
description |
This research was supported by the FONDO NACIONAL DE DESARROLLO CIENTÍFICO, TECNOLÓGICO Y DE INNOVACIÓN TECNOLÓGICA – FONDECYT as executing entity of CONCYTEC under grant agreement no. 01–2019-FONDECYT-BM-INC.INV in the project RUTAS: Robots para centros Urbanos Turísticos Autónomos y basados en Semántica. |
publishDate |
2021 |
dc.date.accessioned.none.fl_str_mv |
2024-05-30T23:13:38Z |
dc.date.available.none.fl_str_mv |
2024-05-30T23:13:38Z |
dc.date.issued.fl_str_mv |
2021 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
dc.identifier.uri.none.fl_str_mv |
https://hdl.handle.net/20.500.12390/2961 |
dc.identifier.doi.none.fl_str_mv |
https://doi.org/10.1108/IJWIS-03-2021-0037 |
dc.identifier.scopus.none.fl_str_mv |
2-s2.0-85111661872 |
url |
https://hdl.handle.net/20.500.12390/2961 https://doi.org/10.1108/IJWIS-03-2021-0037 |
identifier_str_mv |
2-s2.0-85111661872 |
dc.language.iso.none.fl_str_mv |
eng |
language |
eng |
dc.relation.ispartof.none.fl_str_mv |
International Journal of Web Information Systems |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Emerald Group Holdings Ltd. |
publisher.none.fl_str_mv |
Emerald Group Holdings Ltd. |
dc.source.none.fl_str_mv |
reponame:CONCYTEC-Institucional instname:Consejo Nacional de Ciencia Tecnología e Innovación instacron:CONCYTEC |
instname_str |
Consejo Nacional de Ciencia Tecnología e Innovación |
instacron_str |
CONCYTEC |
institution |
CONCYTEC |
reponame_str |
CONCYTEC-Institucional |
collection |
CONCYTEC-Institucional |
repository.name.fl_str_mv |
Repositorio Institucional CONCYTEC |
repository.mail.fl_str_mv |
repositorio@concytec.gob.pe |
_version_ |
1839175700645412864 |
spelling |
Publicationrp05705600rp05703600rp06233600rp08386600rp06234600rp08387600rp08385600Dongo I.Cardinale Y.Aguilera A.Martinez F.Quintero Y.Robayo G.Cabeza D.2024-05-30T23:13:38Z2024-05-30T23:13:38Z2021https://hdl.handle.net/20.500.12390/2961https://doi.org/10.1108/IJWIS-03-2021-00372-s2.0-85111661872This research was supported by the FONDO NACIONAL DE DESARROLLO CIENTÍFICO, TECNOLÓGICO Y DE INNOVACIÓN TECNOLÓGICA – FONDECYT as executing entity of CONCYTEC under grant agreement no. 01–2019-FONDECYT-BM-INC.INV in the project RUTAS: Robots para centros Urbanos Turísticos Autónomos y basados en Semántica.Purpose: This paper aims to perform an exhaustive revision of relevant and recent related studies, which reveals that both extraction methods are currently used to analyze credibility on Twitter. Thus, there is clear evidence of the need of having different options to extract different data for this purpose. Nevertheless, none of these studies perform a comparative evaluation of both extraction techniques. Moreover, the authors extend a previous comparison, which uses a recent developed framework that offers both alternates of data extraction and implements a previously proposed credibility model, by adding a qualitative evaluation and a Twitter-Application Programming Interface (API) performance analysis from different locations. Design/methodology/approach: As one of the most popular social platforms, Twitter has been the focus of recent research aimed at analyzing the credibility of the shared information. To do so, several proposals use either Twitter API or Web scraping to extract the data to perform the analysis. Qualitative and quantitative evaluations are performed to discover the advantages and disadvantages of both extraction methods. Findings: The study demonstrates the differences in terms of accuracy and efficiency of both extraction methods and gives relevance to much more problems related to this area to pursue true transparency and legitimacy of information on the Web. Originality/value: Results report that some Twitter attributes cannot be retrieved by Web scraping. Both methods produce identical credibility values when a robust normalization process is applied to the text (i.e. tweet). Moreover, concerning the time performance, Web scraping is faster than Twitter API and it is more flexible in terms of obtaining data; however, Web scraping is very sensitive to website changes. Additionally, the response time of the Twitter API is proportional to the distance from the central server at San Francisco. © 2021, Emerald Publishing Limited.Consejo Nacional de Ciencia, Tecnología e Innovación Tecnológica - ConcytecengEmerald Group Holdings Ltd.International Journal of Web Information Systemsinfo:eu-repo/semantics/openAccessWeb scrapingAPI-1Credibility-1Qualitative analysis-1Twitter-1https://purl.org/pe-repo/ocde/ford#2.02.04-1A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysisinfo:eu-repo/semantics/articlereponame:CONCYTEC-Institucionalinstname:Consejo Nacional de Ciencia Tecnología e Innovacióninstacron:CONCYTEC20.500.12390/2961oai:repositorio.concytec.gob.pe:20.500.12390/29612024-05-30 16:12:31.545http://purl.org/coar/access_right/c_14cbinfo:eu-repo/semantics/closedAccessmetadata only accesshttps://repositorio.concytec.gob.peRepositorio Institucional CONCYTECrepositorio@concytec.gob.pe#PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE#<Publication xmlns="https://www.openaire.eu/cerif-profile/1.1/" id="aec1045d-aee5-4279-bf8e-6894bd533f30"> <Type xmlns="https://www.openaire.eu/cerif-profile/vocab/COAR_Publication_Types">http://purl.org/coar/resource_type/c_1843</Type> <Language>eng</Language> <Title>A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis</Title> <PublishedIn> <Publication> <Title>International Journal of Web Information Systems</Title> </Publication> </PublishedIn> <PublicationDate>2021</PublicationDate> <DOI>https://doi.org/10.1108/IJWIS-03-2021-0037</DOI> <SCP-Number>2-s2.0-85111661872</SCP-Number> <Authors> <Author> <DisplayName>Dongo I.</DisplayName> <Person id="rp05705" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Cardinale Y.</DisplayName> <Person id="rp05703" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Aguilera A.</DisplayName> <Person id="rp06233" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Martinez F.</DisplayName> <Person id="rp08386" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Quintero Y.</DisplayName> <Person id="rp06234" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Robayo G.</DisplayName> <Person id="rp08387" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Cabeza D.</DisplayName> <Person id="rp08385" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> </Authors> <Editors> </Editors> <Publishers> <Publisher> <DisplayName>Emerald Group Holdings Ltd.</DisplayName> <OrgUnit /> </Publisher> </Publishers> <Keyword>Web scraping</Keyword> <Keyword>API</Keyword> <Keyword>Credibility</Keyword> <Keyword>Qualitative analysis</Keyword> <Keyword>Twitter</Keyword> <Abstract>Purpose: This paper aims to perform an exhaustive revision of relevant and recent related studies, which reveals that both extraction methods are currently used to analyze credibility on Twitter. Thus, there is clear evidence of the need of having different options to extract different data for this purpose. Nevertheless, none of these studies perform a comparative evaluation of both extraction techniques. Moreover, the authors extend a previous comparison, which uses a recent developed framework that offers both alternates of data extraction and implements a previously proposed credibility model, by adding a qualitative evaluation and a Twitter-Application Programming Interface (API) performance analysis from different locations. Design/methodology/approach: As one of the most popular social platforms, Twitter has been the focus of recent research aimed at analyzing the credibility of the shared information. To do so, several proposals use either Twitter API or Web scraping to extract the data to perform the analysis. Qualitative and quantitative evaluations are performed to discover the advantages and disadvantages of both extraction methods. Findings: The study demonstrates the differences in terms of accuracy and efficiency of both extraction methods and gives relevance to much more problems related to this area to pursue true transparency and legitimacy of information on the Web. Originality/value: Results report that some Twitter attributes cannot be retrieved by Web scraping. Both methods produce identical credibility values when a robust normalization process is applied to the text (i.e. tweet). Moreover, concerning the time performance, Web scraping is faster than Twitter API and it is more flexible in terms of obtaining data; however, Web scraping is very sensitive to website changes. Additionally, the response time of the Twitter API is proportional to the distance from the central server at San Francisco. © 2021, Emerald Publishing Limited.</Abstract> <Access xmlns="http://purl.org/coar/access_right" > </Access> </Publication> -1 |
score |
13.439101 |
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).