Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data
Descripción del Articulo
This work proposes a semi-automated analysis and modeling package for Machine Learning related problems. The library goal is to reduce the steps involved in a traditional data science roadmap. To do so, Sparkmach takes advantage of Machine Learning techniques to build base models for both classifica...
Autores: | , , |
---|---|
Formato: | capítulo de libro |
Fecha de Publicación: | 2019 |
Institución: | Consejo Nacional de Ciencia Tecnología e Innovación |
Repositorio: | CONCYTEC-Institucional |
Lenguaje: | inglés |
OAI Identifier: | oai:repositorio.concytec.gob.pe:20.500.12390/1325 |
Enlace del recurso: | https://hdl.handle.net/20.500.12390/1325 https://doi.org/10.1007/978-3-030-11680-4_13 |
Nivel de acceso: | acceso abierto |
Materia: | Statistics Semi-automated machine learning Data Science Data mining Data engineering Big data https://purl.org/pe-repo/ocde/ford#5.08.02 |
id |
CONC_9cbf96e4e71670d4e16698b51cfd84f6 |
---|---|
oai_identifier_str |
oai:repositorio.concytec.gob.pe:20.500.12390/1325 |
network_acronym_str |
CONC |
network_name_str |
CONCYTEC-Institucional |
repository_id_str |
4689 |
dc.title.none.fl_str_mv |
Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data |
title |
Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data |
spellingShingle |
Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data Bravo-Rocca, Gusseppe Statistics Semi-automated machine learning Data Science Data mining Data engineering Big data https://purl.org/pe-repo/ocde/ford#5.08.02 |
title_short |
Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data |
title_full |
Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data |
title_fullStr |
Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data |
title_full_unstemmed |
Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data |
title_sort |
Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data |
author |
Bravo-Rocca, Gusseppe |
author_facet |
Bravo-Rocca, Gusseppe Torres-Robatty, Piero Fiestas-Iquira, Jose |
author_role |
author |
author2 |
Torres-Robatty, Piero Fiestas-Iquira, Jose |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Bravo-Rocca, Gusseppe Torres-Robatty, Piero Fiestas-Iquira, Jose |
dc.subject.none.fl_str_mv |
Statistics |
topic |
Statistics Semi-automated machine learning Data Science Data mining Data engineering Big data https://purl.org/pe-repo/ocde/ford#5.08.02 |
dc.subject.es_PE.fl_str_mv |
Semi-automated machine learning Data Science Data mining Data engineering Big data |
dc.subject.ocde.none.fl_str_mv |
https://purl.org/pe-repo/ocde/ford#5.08.02 |
description |
This work proposes a semi-automated analysis and modeling package for Machine Learning related problems. The library goal is to reduce the steps involved in a traditional data science roadmap. To do so, Sparkmach takes advantage of Machine Learning techniques to build base models for both classification and regression problems. These models include exploratory data analysis, data preprocessing, feature engineering and modeling. The project has its basis in Pymach, a similar library that faces those steps for small and medium-sized datasets (about ten millions of rows and a few columns). Sparkmach central labor is to scale Pymach to overcome big datasets by using Apache Spark distributed computing, a distributed engine for large-scale data processing, that tackle several data science related problems in a cluster environment. Despite the software nature, Sparkmach can be of use for local environments, getting the most benefits from the distributed processing tools. |
publishDate |
2019 |
dc.date.accessioned.none.fl_str_mv |
2024-05-30T23:13:38Z |
dc.date.available.none.fl_str_mv |
2024-05-30T23:13:38Z |
dc.date.issued.fl_str_mv |
2019 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/bookPart |
format |
bookPart |
dc.identifier.uri.none.fl_str_mv |
https://hdl.handle.net/20.500.12390/1325 |
dc.identifier.doi.none.fl_str_mv |
https://doi.org/10.1007/978-3-030-11680-4_13 |
url |
https://hdl.handle.net/20.500.12390/1325 https://doi.org/10.1007/978-3-030-11680-4_13 |
dc.language.iso.none.fl_str_mv |
eng |
language |
eng |
dc.relation.ispartof.none.fl_str_mv |
Communications in Computer and Information Science |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Springer International Publishing |
publisher.none.fl_str_mv |
Springer International Publishing |
dc.source.none.fl_str_mv |
reponame:CONCYTEC-Institucional instname:Consejo Nacional de Ciencia Tecnología e Innovación instacron:CONCYTEC |
instname_str |
Consejo Nacional de Ciencia Tecnología e Innovación |
instacron_str |
CONCYTEC |
institution |
CONCYTEC |
reponame_str |
CONCYTEC-Institucional |
collection |
CONCYTEC-Institucional |
repository.name.fl_str_mv |
Repositorio Institucional CONCYTEC |
repository.mail.fl_str_mv |
repositorio@concytec.gob.pe |
_version_ |
1844883038120443904 |
spelling |
Publicationrp03875600rp03873600rp03874600Bravo-Rocca, GusseppeTorres-Robatty, PieroFiestas-Iquira, Jose2024-05-30T23:13:38Z2024-05-30T23:13:38Z2019https://hdl.handle.net/20.500.12390/1325https://doi.org/10.1007/978-3-030-11680-4_13This work proposes a semi-automated analysis and modeling package for Machine Learning related problems. The library goal is to reduce the steps involved in a traditional data science roadmap. To do so, Sparkmach takes advantage of Machine Learning techniques to build base models for both classification and regression problems. These models include exploratory data analysis, data preprocessing, feature engineering and modeling. The project has its basis in Pymach, a similar library that faces those steps for small and medium-sized datasets (about ten millions of rows and a few columns). Sparkmach central labor is to scale Pymach to overcome big datasets by using Apache Spark distributed computing, a distributed engine for large-scale data processing, that tackle several data science related problems in a cluster environment. Despite the software nature, Sparkmach can be of use for local environments, getting the most benefits from the distributed processing tools.Consejo Nacional de Ciencia, Tecnología e Innovación Tecnológica - ConcytecengSpringer International PublishingCommunications in Computer and Information Scienceinfo:eu-repo/semantics/openAccessStatisticsSemi-automated machine learning-1Data Science-1Data mining-1Data engineering-1Big data-1https://purl.org/pe-repo/ocde/ford#5.08.02-1Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Datainfo:eu-repo/semantics/bookPartreponame:CONCYTEC-Institucionalinstname:Consejo Nacional de Ciencia Tecnología e Innovacióninstacron:CONCYTEC20.500.12390/1325oai:repositorio.concytec.gob.pe:20.500.12390/13252024-05-30 16:02:48.271http://purl.org/coar/access_right/c_14cbinfo:eu-repo/semantics/closedAccessmetadata only accesshttps://repositorio.concytec.gob.peRepositorio Institucional CONCYTECrepositorio@concytec.gob.pe#PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE#<Publication xmlns="https://www.openaire.eu/cerif-profile/1.1/" id="dca70b44-a582-474a-948e-a45dc0815d54"> <Type xmlns="https://www.openaire.eu/cerif-profile/vocab/COAR_Publication_Types">http://purl.org/coar/resource_type/c_1843</Type> <Language>eng</Language> <Title>Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data</Title> <PublishedIn> <Publication> <Title>Communications in Computer and Information Science</Title> </Publication> </PublishedIn> <PublicationDate>2019</PublicationDate> <DOI>https://doi.org/10.1007/978-3-030-11680-4_13</DOI> <Authors> <Author> <DisplayName>Bravo-Rocca, Gusseppe</DisplayName> <Person id="rp03875" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Torres-Robatty, Piero</DisplayName> <Person id="rp03873" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Fiestas-Iquira, Jose</DisplayName> <Person id="rp03874" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> </Authors> <Editors> </Editors> <Publishers> <Publisher> <DisplayName>Springer International Publishing</DisplayName> <OrgUnit /> </Publisher> </Publishers> <Keyword>Statistics</Keyword> <Keyword>Semi-automated machine learning</Keyword> <Keyword>Data Science</Keyword> <Keyword>Data mining</Keyword> <Keyword>Data engineering</Keyword> <Keyword>Big data</Keyword> <Abstract>This work proposes a semi-automated analysis and modeling package for Machine Learning related problems. The library goal is to reduce the steps involved in a traditional data science roadmap. To do so, Sparkmach takes advantage of Machine Learning techniques to build base models for both classification and regression problems. These models include exploratory data analysis, data preprocessing, feature engineering and modeling. The project has its basis in Pymach, a similar library that faces those steps for small and medium-sized datasets (about ten millions of rows and a few columns). Sparkmach central labor is to scale Pymach to overcome big datasets by using Apache Spark distributed computing, a distributed engine for large-scale data processing, that tackle several data science related problems in a cluster environment. Despite the software nature, Sparkmach can be of use for local environments, getting the most benefits from the distributed processing tools.</Abstract> <Access xmlns="http://purl.org/coar/access_right" > </Access> </Publication> -1 |
score |
13.2911825 |
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).