Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data

Descripción del Articulo

This work proposes a semi-automated analysis and modeling package for Machine Learning related problems. The library goal is to reduce the steps involved in a traditional data science roadmap. To do so, Sparkmach takes advantage of Machine Learning techniques to build base models for both classifica...

Descripción completa

Detalles Bibliográficos
Autores: Bravo-Rocca, Gusseppe, Torres-Robatty, Piero, Fiestas-Iquira, Jose
Formato: capítulo de libro
Fecha de Publicación:2019
Institución:Consejo Nacional de Ciencia Tecnología e Innovación
Repositorio:CONCYTEC-Institucional
Lenguaje:inglés
OAI Identifier:oai:repositorio.concytec.gob.pe:20.500.12390/1325
Enlace del recurso:https://hdl.handle.net/20.500.12390/1325
https://doi.org/10.1007/978-3-030-11680-4_13
Nivel de acceso:acceso abierto
Materia:Statistics
Semi-automated machine learning
Data Science
Data mining
Data engineering
Big data
https://purl.org/pe-repo/ocde/ford#5.08.02
id CONC_9cbf96e4e71670d4e16698b51cfd84f6
oai_identifier_str oai:repositorio.concytec.gob.pe:20.500.12390/1325
network_acronym_str CONC
network_name_str CONCYTEC-Institucional
repository_id_str 4689
dc.title.none.fl_str_mv Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data
title Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data
spellingShingle Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data
Bravo-Rocca, Gusseppe
Statistics
Semi-automated machine learning
Data Science
Data mining
Data engineering
Big data
https://purl.org/pe-repo/ocde/ford#5.08.02
title_short Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data
title_full Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data
title_fullStr Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data
title_full_unstemmed Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data
title_sort Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data
author Bravo-Rocca, Gusseppe
author_facet Bravo-Rocca, Gusseppe
Torres-Robatty, Piero
Fiestas-Iquira, Jose
author_role author
author2 Torres-Robatty, Piero
Fiestas-Iquira, Jose
author2_role author
author
dc.contributor.author.fl_str_mv Bravo-Rocca, Gusseppe
Torres-Robatty, Piero
Fiestas-Iquira, Jose
dc.subject.none.fl_str_mv Statistics
topic Statistics
Semi-automated machine learning
Data Science
Data mining
Data engineering
Big data
https://purl.org/pe-repo/ocde/ford#5.08.02
dc.subject.es_PE.fl_str_mv Semi-automated machine learning
Data Science
Data mining
Data engineering
Big data
dc.subject.ocde.none.fl_str_mv https://purl.org/pe-repo/ocde/ford#5.08.02
description This work proposes a semi-automated analysis and modeling package for Machine Learning related problems. The library goal is to reduce the steps involved in a traditional data science roadmap. To do so, Sparkmach takes advantage of Machine Learning techniques to build base models for both classification and regression problems. These models include exploratory data analysis, data preprocessing, feature engineering and modeling. The project has its basis in Pymach, a similar library that faces those steps for small and medium-sized datasets (about ten millions of rows and a few columns). Sparkmach central labor is to scale Pymach to overcome big datasets by using Apache Spark distributed computing, a distributed engine for large-scale data processing, that tackle several data science related problems in a cluster environment. Despite the software nature, Sparkmach can be of use for local environments, getting the most benefits from the distributed processing tools.
publishDate 2019
dc.date.accessioned.none.fl_str_mv 2024-05-30T23:13:38Z
dc.date.available.none.fl_str_mv 2024-05-30T23:13:38Z
dc.date.issued.fl_str_mv 2019
dc.type.none.fl_str_mv info:eu-repo/semantics/bookPart
format bookPart
dc.identifier.uri.none.fl_str_mv https://hdl.handle.net/20.500.12390/1325
dc.identifier.doi.none.fl_str_mv https://doi.org/10.1007/978-3-030-11680-4_13
url https://hdl.handle.net/20.500.12390/1325
https://doi.org/10.1007/978-3-030-11680-4_13
dc.language.iso.none.fl_str_mv eng
language eng
dc.relation.ispartof.none.fl_str_mv Communications in Computer and Information Science
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Springer International Publishing
publisher.none.fl_str_mv Springer International Publishing
dc.source.none.fl_str_mv reponame:CONCYTEC-Institucional
instname:Consejo Nacional de Ciencia Tecnología e Innovación
instacron:CONCYTEC
instname_str Consejo Nacional de Ciencia Tecnología e Innovación
instacron_str CONCYTEC
institution CONCYTEC
reponame_str CONCYTEC-Institucional
collection CONCYTEC-Institucional
repository.name.fl_str_mv Repositorio Institucional CONCYTEC
repository.mail.fl_str_mv repositorio@concytec.gob.pe
_version_ 1844883038120443904
spelling Publicationrp03875600rp03873600rp03874600Bravo-Rocca, GusseppeTorres-Robatty, PieroFiestas-Iquira, Jose2024-05-30T23:13:38Z2024-05-30T23:13:38Z2019https://hdl.handle.net/20.500.12390/1325https://doi.org/10.1007/978-3-030-11680-4_13This work proposes a semi-automated analysis and modeling package for Machine Learning related problems. The library goal is to reduce the steps involved in a traditional data science roadmap. To do so, Sparkmach takes advantage of Machine Learning techniques to build base models for both classification and regression problems. These models include exploratory data analysis, data preprocessing, feature engineering and modeling. The project has its basis in Pymach, a similar library that faces those steps for small and medium-sized datasets (about ten millions of rows and a few columns). Sparkmach central labor is to scale Pymach to overcome big datasets by using Apache Spark distributed computing, a distributed engine for large-scale data processing, that tackle several data science related problems in a cluster environment. Despite the software nature, Sparkmach can be of use for local environments, getting the most benefits from the distributed processing tools.Consejo Nacional de Ciencia, Tecnología e Innovación Tecnológica - ConcytecengSpringer International PublishingCommunications in Computer and Information Scienceinfo:eu-repo/semantics/openAccessStatisticsSemi-automated machine learning-1Data Science-1Data mining-1Data engineering-1Big data-1https://purl.org/pe-repo/ocde/ford#5.08.02-1Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Datainfo:eu-repo/semantics/bookPartreponame:CONCYTEC-Institucionalinstname:Consejo Nacional de Ciencia Tecnología e Innovacióninstacron:CONCYTEC20.500.12390/1325oai:repositorio.concytec.gob.pe:20.500.12390/13252024-05-30 16:02:48.271http://purl.org/coar/access_right/c_14cbinfo:eu-repo/semantics/closedAccessmetadata only accesshttps://repositorio.concytec.gob.peRepositorio Institucional CONCYTECrepositorio@concytec.gob.pe#PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE#<Publication xmlns="https://www.openaire.eu/cerif-profile/1.1/" id="dca70b44-a582-474a-948e-a45dc0815d54"> <Type xmlns="https://www.openaire.eu/cerif-profile/vocab/COAR_Publication_Types">http://purl.org/coar/resource_type/c_1843</Type> <Language>eng</Language> <Title>Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data</Title> <PublishedIn> <Publication> <Title>Communications in Computer and Information Science</Title> </Publication> </PublishedIn> <PublicationDate>2019</PublicationDate> <DOI>https://doi.org/10.1007/978-3-030-11680-4_13</DOI> <Authors> <Author> <DisplayName>Bravo-Rocca, Gusseppe</DisplayName> <Person id="rp03875" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Torres-Robatty, Piero</DisplayName> <Person id="rp03873" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Fiestas-Iquira, Jose</DisplayName> <Person id="rp03874" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> </Authors> <Editors> </Editors> <Publishers> <Publisher> <DisplayName>Springer International Publishing</DisplayName> <OrgUnit /> </Publisher> </Publishers> <Keyword>Statistics</Keyword> <Keyword>Semi-automated machine learning</Keyword> <Keyword>Data Science</Keyword> <Keyword>Data mining</Keyword> <Keyword>Data engineering</Keyword> <Keyword>Big data</Keyword> <Abstract>This work proposes a semi-automated analysis and modeling package for Machine Learning related problems. The library goal is to reduce the steps involved in a traditional data science roadmap. To do so, Sparkmach takes advantage of Machine Learning techniques to build base models for both classification and regression problems. These models include exploratory data analysis, data preprocessing, feature engineering and modeling. The project has its basis in Pymach, a similar library that faces those steps for small and medium-sized datasets (about ten millions of rows and a few columns). Sparkmach central labor is to scale Pymach to overcome big datasets by using Apache Spark distributed computing, a distributed engine for large-scale data processing, that tackle several data science related problems in a cluster environment. Despite the software nature, Sparkmach can be of use for local environments, getting the most benefits from the distributed processing tools.</Abstract> <Access xmlns="http://purl.org/coar/access_right" > </Access> </Publication> -1
score 13.2911825
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).