Natural language processing and Bert for social network authorprofiling X

Descripción del Articulo

Today X has become one of the most important socialnetworks for expressing opinions and interests on the web.The large amount of data generated allows automatedsystems to profile users based on gender, nationality andthematic interests. There are difficulties in this process notonly because of the s...

Descripción completa

Detalles Bibliográficos
Autores: Petrlik Azabache, Ivan, Rodríguez Rodríguez, Ciro, Lezama Gonzales, Pedro, Torres-Talaverano, Luz, Vásquez Hurtado, Enma Graciela, Hinojosa Pedraza, Karina Inés
Formato: artículo
Fecha de Publicación:2025
Institución:Universidad de San Martín de Porres
Repositorio:Revistas - Universidad de San Martín de Porres
Lenguaje:español
OAI Identifier:oai:revistas.usmp.edu.pe:article/3222
Enlace del recurso:https://portalrevistas.aulavirtualusmp.pe/index.php/rc/article/view/3222
Nivel de acceso:acceso abierto
Materia:Natural language, Bert , Profiling , Social Network X
Lenguaje natural, Bert, Perfilado, Red Social X
Descripción
Sumario:Today X has become one of the most important socialnetworks for expressing opinions and interests on the web.The large amount of data generated allows automatedsystems to profile users based on gender, nationality andthematic interests. There are difficulties in this process notonly because of the short content, but also because of theambiguity and the use of several languages.The goal of this proposal is to generate a deep learningmodel using BERT that is able to identify demographic andthematic attributes from tweets. Pre-trained models of theBERT and Multilingual BERT type will be used, applied on PAN Author Profiling Task (CLEF 2019) corpora in English and Spanish.The proposed work will deepen the analysis using supervised classification data for gender and nationality classification and topic extraction through unsupervised techniques, such as LDA and BERTopic. These options include preprocessing techniques, dimensional reduction (UMAP) and evaluation using metrics such as precision and accuracy.It is expected that the results of the analysis can demonstrate the applicability of BERT for automatic profiling in marketing, socio-political analysis and content personalization.
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).