Resultados de búsqueda - data from processing

1

artículo

Application of the KDD Process for the Visualization of Integrated Geo-Referenced Textual Data from the Pre-processing Phase

Publicado por
Gomez, Flavio, Iquira, Diego, Cuadros Valdivia, Ana María

Publicado 2018

Enlace

Geo-referenced textual data has been the subject of multiple investigations, by providing opportunities to better understand certain phenomena according to the content that is shared, either on-line such as social networks, blogs, and news; or through repositories such as scientific research articles, geo-referenced virtual books, among others. However, the characteristics of this information are studied, analyzed and processed separately, either through its textual components or its geo-spatial components, which offers a separate understanding of the results. In this paper, we propose an integration of textual and geo-spatial components from the pre-processing phase to the visualization stage, As a part of the Document Mapping process based on the phases of the Knowledge Discovery in Databases (KDD). Achieving two main results (1) minimize the problems that arise in the visual phase, su...

2

objeto de conferencia

Application of the KDD process for the visualization of integrated geo-referenced textual data from the pre-processing phase

Publicado por
Gomez F., Iquira D., Cuadros A.M.

Publicado 2018

Enlace

The present work was achieved thanks to the joint work with my advisor, for her persistence and tenacity at the moment of sharing her teachings with me, to my distinguished teachers who have forged knowledge from the first day of classes, whom with nobility and enthusiasm influenced as an example in me and my colleagues in the master’s degree in computer science; also thanks to CONCYTEC, FONDECYT and Cienciactiva for the support and opportunities provided that made this work possible.

3

artículo

Manipulation, analysis, and visualization of data from the demographic and family health survey with the r program

Publicado por
Hernándezm Vásquez, Akram, Chacón Torrico, Horacio

Publicado 2019

Enlace

The Demographic and Family Health Survey (ENDES, in Spanish) is a national population-based survey with representation at the departmental level and area of residence, constituting a source of information on the health status of the Peruvian population. In order to standardize its processing and subsequent reuse by the academic community and other stakeholders, we documented the code for the manipulation, analysis, and visualization of data from the ENDES 2017 health questionnaire, through an example on the prevalence of hypertension and obesity, using the R statistical programming environment and language. The R code is presented and detailed sequentially, as well as the theoretical support of the survey structure for the manipulation of databases, considering that the complex structure of the ENDES could be a potential barrier faced by researchers. Finally, this example can serve as a ...

4

objeto de conferencia

Language identification with scarce data: A case study from Peru

Publicado por
Espichán-Linares A., Oncevay-Marcos A.

Publicado 2018

Enlace

Language identification is an elemental task in natural language processing, where corpus-based methods reign the state-of-the-art results in multi-lingual setups. However, there is a need to extend this application to other scenarios with scarce data and multiple classes to face, analyzing which of the most well-known methods is the best fit. In this way, Peru offers a great challenge as a multi-cultural and linguistic country. Therefore, this study focuses in three steps: (1) to build from scratch a digital annotated corpus for 49 Peruvian indigenous languages and dialects, (2) to fit both standard and deep learning approaches for language identification, and (3) to statistically compare the results obtained. The standard model outperforms the deep learning one as it was expected, with 95.9% in average precision, and both corpus and model will be advantageous inputs for more complex ta...

5

capítulo de libro

Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data

Publicado por
Bravo-Rocca, Gusseppe, Torres-Robatty, Piero, Fiestas-Iquira, Jose

Publicado 2019

Enlace

This work proposes a semi-automated analysis and modeling package for Machine Learning related problems. The library goal is to reduce the steps involved in a traditional data science roadmap. To do so, Sparkmach takes advantage of Machine Learning techniques to build base models for both classification and regression problems. These models include exploratory data analysis, data preprocessing, feature engineering and modeling. The project has its basis in Pymach, a similar library that faces those steps for small and medium-sized datasets (about ten millions of rows and a few columns). Sparkmach central labor is to scale Pymach to overcome big datasets by using Apache Spark distributed computing, a distributed engine for large-scale data processing, that tackle several data science related problems in a cluster environment. Despite the software nature, Sparkmach can be of use for local ...

6

artículo

No data to crawl? Monolingual corpus creation from PDF files of truly low-resource languages in Peru

Publicado por
Bustamante G., Oncevay A., Zariquiey R.

Publicado 2020

Enlace

We introduce new monolingual corpora for four indigenous and endangered languages from Peru: Shipibo-konibo, Ashaninka, Yanesha and Yine. Given the total absence of these languages in the web, the extraction and processing of texts from PDF files is relevant in a truly low-resource language scenario. Our procedure for monolingual corpus creation considers language-specific and language-agnostic steps, and focuses on educational PDF files with multilingual sentences, noisy pages and low-structured content. Through an evaluation based on language modelling and character-level perplexity on a subset of manually extracted sentences, we determine that our method allows the creation of clean corpora for the four languages, a key resource for natural language processing tasks nowadays.

7

artículo

Some difficulties and risks derived from the application of new technologies in the colombian judicial process

Publicado por
Guayacán Ortiz, Juan Carlos

Publicado 2024

Enlace

The application of new technologies in the judicial process is a practice that became more common after the 2020 pandemic. This article analyzes 3 of the main vicissitudes that the use of these technologies has generated in Colombian judicial processes: as the sending of memorials through data messages, in relation to the so-called automatic notice of service, and about the vicissitudes that are occurring in virtual or digital hearings.

8

artículo

Speaker Identification from Forensic Phonetics: Application of SplitsTree4 Software for Schematic Organization of Linguistic Data

Publicado por
Jimenez Peña, Jhon, Torres Castillo, Fernando Aaron, Cueva Sanchez, Oscar Esaul

Publicado 2022

Enlace

This article presents a proposal for the interpretation and organization of the phonetic characteristics of indubitable and dubitative samples using SplitsTree4 software, with the purpose of clarifying an alleged crime of bribery in the exercise of police functions to the detriment of the State. The dubitted samples were provided by the Public Prosecutor's Office and the indubitable samples were obtained by means of voice sampling; likewise, data anonymity was chosen. First, the relevant phonetic features of the samples were categorized; then, they were assigned a binary value of existence and non-existence; then, the binary information was processed by SplitsTree4 software to regroup the features according to the universe of speakers and show the compatibility between the indicated samples. Finally, the results indicate that the SplitsTree4 software complies with the ordering of phoneti...

9

artículo

Extracting and Retargeting Color Mappings from Bitmap Images of Visualizations

Publicado por
Poco J., Mayhua A., Heer J.

Publicado 2018

Enlace

This work was supported by a Paul G. Allen Family Foundation Distinguished Investigator Award and the Moore Foundation Data-Driven Discovery Investigator program. The second author gratefully acknowledges CONCYTEC for a scholarship in support of graduate studies.

10

artículo

ELLAS Architecture and Process: Collecting and Curating Data on Women’s Presence in STEM

Publicado por
Berardi, Rita Cristina Galarraga, Auceli, Pedro Henrique Stolarski, Maciel, Cristiano, Fritoli, Rodgers, Dávila Calle, Guillermo Antonio, Guzman, Indira, Mendes, Luana

Publicado 2024

Enlace

The underrepresentation of women in STEM fields needs to be highlighted through data to assist decisionmakers and public policy creators in addressing the issue effectively. However, the lack of structured, organized data published openly in this domain is still a reality. To address this problem, a Latin American research network called ELLAS was created. The project’s goal is to develop a platform with Semantic Web-based technologies to structure and concentrate data from Brazil, Peru, and Bolivia, initially. This paper presents the processes defined for the collection and curation of both unstructured and structured data, sourced from scientific articles, social networks, and existing open data. We explore the architecture design in a way that facilitates understanding of the details of the processes and the actors involved for each data source. We present the preliminary results fr...

11

artículo

Optimization of 2D resistivity data image processing for modeling anomalous zones of disturbed phosphates

Publicado por
Bakkali, Saad, Amrani, Mahacine

Publicado 2005

Enlace

A Schlumberger resistivity record was made over an area of 50 hectares. A new field of processes based on the analytical signal response of the resistivity data was tested in the presence of disturbed phosphate deposits. Geology models were successively obtained from a peak model of the 2D resistivity data. The optimization of the imaging process was based on the optimization of surface tools. The descending analytical extension of the modeled surface over a depth of 30 meters was used for optimization of the modeling. The analytical processes found were consistently useful. The optimization of the phosphate reserve was improved and better constructed.

12

artículo

A data mining approach to guide students through the enrollment process based on academic performance

Publicado por
Vialardi Sacín, César, Chue Gallardo, Jorge, Peche, Juan Pablo, Alvarado, Gustavo, Vinatea, Bruno, Estrella, Jhonny, Ortigosa, Álvaro

Publicado 2011

Enlace

Student academic performance at universities is crucial for education management systems. Many actions and decisions are made based on it, specifically the enrollment process. During enrollment, students have to decide which courses to sign up for. This research presents the rationale behind the design of a recommender system to support the enrollment process using the students’ academic performance record. To build this system, the CRISP-DM methodology was applied to data from students of the Computer Science Department at University of Lima, Perú. One of the main contributions of this work is the use of two synthetic attributes to improve the relevance of the recommendations made. The first attribute estimates the inherent difficulty of a given course. The second attribute, named potential, is a measure of the competence of a student for a given course based on the grades obtained i...

13

artículo

Seismic Source of the Chile 2010 Earthquake from Inversion of Geodetic Data and Observations

Publicado por
Jiménez, César, Saavedra, Miguel, Moggiano, Nabilt, Moreno, Nick

Publicado 2018

Enlace

On February 27, 2010 an earthquake of magnitude 8.8 Mw (according to USGS) shook the center-southern region of Chile with a balance of more than 500 deaths. As a coseismic effect, a tsunami was generated which destroyed many coastal villages, as well as the permanent crustal deformation. This coseismic deformation can be quantified by geodetic measurements: GPS, field observations at the littoral and InSAR satellite interferometry data. From the analysis and processing of the geodetic data it is possible to obtain the parameters that characterize the distribution of the seismic source through an inversion process, in which the simulated data is compared with the observed data using the non-negative least squares method. The results show the existence of two main asperities located to the north and south of the epicenter. The maximum slip or dislocation was 17.3 m located in the northern ...

14

artículo

Anticipating Subsequent Waves from First Wave Parameters in the Ongoing Covid-19 Pandemic

Publicado por
Nieto-Chaupis, Huber

Publicado 2021

Enlace

This paper focuses on the mathematical construction of a model that describes the statistical properties of a second wave of infections by Corona Virus Disease 2019 (Covid-19 in short) from the information of a first one. Basically this study is done having as grounds a topological model based at rectangles. Thus, perimeters and distances between rectangles might be encompassed to a real data through valid approximations. A full trapezoid model is also proposed. The two-rectangles model appears that fits well to the Philippines covid-19 data. It is seen that while both rectangles are pretty separated, the the peak of second wave turns out to be high. From this an exponential formulation is derived, and fits well the exponential morphology as seen in Covid-19 data France.

15

artículo

Predictive machine learning applying cross industry standard process for data mining for the diagnosis of diabetes mellitus type 2

Publicado por
Garcia-Rios, Victor, Marres-Salhuana, Marieta, Sierra-Liñan, Fernando, Cabanillas-Carbonell, Michael

Publicado 2023

Enlace

Currently, type 2 diabetes mellitus is one of the world's most prevalent diseases and has claimed millions of people's lives. The present research aims to know the impact of the use of machine learning in the diagnostic process of type 2 diabetes mellitus and to offer a tool that facilitates the diagnosis of the dis-ease quickly and easily. Different machine learning models were designed and compared, being random forest was the algorithm that generated the model with the best performance (90.43% accuracy), which was integrated into a web platform, working with the PIMA dataset, which was validated by specialists from the Peruvian League for the Fight against Diabetes organization. The result was a decrease of (A) 88.28% in the information collection time, (B) 99.99% in the diagnosis time, (C) 44.42% in the diagnosis cost, and (D) 100% in the level of difficulty, concluding that the appl...

16

artículo

Predictive machine learning applying cross industry standard process for data mining for the diagnosis of diabetes mellitus type 2

Publicado por
Garcia-Rios, Victor, Marres-Salhuana, Marieta, Sierra-Liñan, Fernando, Cabanillas-Carbonell, Michael

Publicado 2023

Enlace

Currently, type 2 diabetes mellitus is one of the world's most prevalent diseases and has claimed millions of people's lives. The present research aims to know the impact of the use of machine learning in the diagnostic process of type 2 diabetes mellitus and to offer a tool that facilitates the diagnosis of the dis-ease quickly and easily. Different machine learning models were designed and compared, being random forest was the algorithm that generated the model with the best performance (90.43% accuracy), which was integrated into a web platform, working with the PIMA dataset, which was validated by specialists from the Peruvian League for the Fight against Diabetes organization. The result was a decrease of (A) 88.28% in the information collection time, (B) 99.99% in the diagnosis time, (C) 44.42% in the diagnosis cost, and (D) 100% in the level of difficulty, concluding that the appl...

17

artículo

Occurrence of Salmonella spp in pork processing plants from Colombia: Systematic review and meta-analysis

Publicado por
Carrascal-Camacho, Ana Karina, Barrientos-Anzola, Irina, Sampedro, Fernando, Rojas, Fernando, Pérez, Mónica, Dalsgaard, Anders, Pulido-Villamarín A., Adriana, Camacho-Carrillo, María Alejandra

Publicado 2023

Enlace

Se realizó una revisión sistemática de literatura en las bases datos Science Direct, PubMed, SciELO, EBSCO host, Redalyc, ProQuest y Google Scholar, seguida de un metaanálisis para estimar la prevalencia combinada de Salmonella spp en canales de cerdos en plantas de beneficio del país. Se recopilaron 3007 artículos científicos, informes técnicos, trabajos de grado y presentaciones de reuniones técnicas publicados entre 2009 y 2020. Se retiraron aquellos que no cumplían con los criterios de inclusión. Se revisaron 51 estudios en detalle, seleccionando 11 documentos que se emplearon en el metaanálisis. La prevalencia combinada de Salmonella spp en canales de cerdo fue de 9.7% (4.0-16.2%). El metaanálisis mostró una alta heterogeneidad en las prevalencias. El bajo número de reportes relacionados con la prevalencia de Salmonella en plantas de beneficio porcino puso en evidenci...

18

artículo

Estimation of nitrogen content in sugarcane based on vegetation indices derived from Sentinel-2 data

Publicado por
Filho, Jose Neto Soares, Pereira, Douglas Endrigo Perez, Noronha, Amanda Soares Regis

Publicado 2025

Enlace

Sugarcane occupies a large territorial scale in the world and is constantly searching for mechanisms to monitor nutrients in the crop production cycle, using non-destructive methods. The study aimed to estimate the nitrogen content in the sugarcane leaf was developed in the 2021/2022 harvest on two commercial fields of dryland cultivars (RB867515 = 50.75 ha) and (CVSP7870 = 48.56 ha) at the Serranópolis-Goiás mill, evaluating the efficiency of the biochemical vegetation indices Fraction of Absorbed Photosynthetically Active Radiation (fAPAR) and Canopy Chlorophyll Content (CCC) processed using the radiation transfer model RTM PROSAIL, compared to the Normalized Difference Vegetation Index (NDVI) and Green Normalized Difference Vegetation Index (GNDVI), processed using mathematical band ratio models. Both were based on a time series of Sentinel-2 data as input variables. The validation ...

19

artículo

Training processes throughout teaching performance: a study from the beliefs of university professors

Publicado por
Bailey Moreno, Josefina, Flores Fahara, Manuel

Publicado 2020

Enlace

This study aims to know how the training processes of practicing teachers contribute to the construction of beliefs about teaching at the university. Froma qualitative methodology with agrounded theory approach, professors from public and private universities were selected through theoretical sampling. In-depth interviews were conducted to collect data whichwere analytically coded. Among the results it was found that the teachers' beliefs come frominstitutional training of a normative typewhich emphasizes a functional teaching in didactic techniques, use of technology and innovation, excluding training in knowledge of the discipline they teach. Thus,it was also found that teachers carry out self-training processes on a personal initiativefor updating scientific knowledgeand develop their practices teaching with strategies that they themselves use to learn, because they believe in their o...

20

artículo

TRAINING DECISION AS AN ELEMENT OF SCIENTIFIC RESEARCH FROM THE CONCEPTUALIZATION OF STATISTICS AND DATA SCIENCE: THE OBVIOUS, NOT SO OBVIOUS

Publicado por
Argota-Pérez, George, Argota-Pérez, Yadira, Álvarez-Becerra, Rina María, Reyes-Diaz, María Gilda

Publicado 2023

Enlace

The purpose of the study was to describe the need for decision-making from the conceptualized training between Statistics and Data Science. Four elements are key in science: theory, data, methodology, and problem, because if the data is part of science then it seems wrong that there is a DataScience since no methodology from Data Science can decide, the “ideal or correct” pattern since there are multiple patterns to be understood. On the other hand, if the statistical programs are incapable of analyzing hundreds of thousands of data (it makes no sense when decisions are recognized from a random probabilistic sample and, on the contrary, not considered makes it impossible to make inferences), then the possibility of representing diversity from Statistics is limited since there is centralization in minimizing the sums of the deviations to the mean square and not understanding the diver...

Resultados Agrupados