Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data

Bravo-Rocca, Gusseppe; Torres-Robatty, Piero; Fiestas-Iquira, Jose

Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data

Descripción del Articulo

This work proposes a semi-automated analysis and modeling package for Machine Learning related problems. The library goal is to reduce the steps involved in a traditional data science roadmap. To do so, Sparkmach takes advantage of Machine Learning techniques to build base models for both classifica...

Descripción completa

Detalles Bibliográficos
Autores:	Bravo-Rocca, Gusseppe, Torres-Robatty, Piero, Fiestas-Iquira, Jose
Formato:	capítulo de libro
Fecha de Publicación:	2019
Institución:	Consejo Nacional de Ciencia Tecnología e Innovación
Repositorio:	CONCYTEC-Institucional
Lenguaje:	inglés
OAI Identifier:	oai:repositorio.concytec.gob.pe:20.500.12390/1325
Enlace del recurso:	https://hdl.handle.net/20.500.12390/1325 https://doi.org/10.1007/978-3-030-11680-4_13
Nivel de acceso:	acceso abierto
Materia:	Statistics Semi-automated machine learning Data Science Data mining Data engineering Big data https://purl.org/pe-repo/ocde/ford#5.08.02

id	CONC_9cbf96e4e71670d4e16698b51cfd84f6
oai_identifier_str	oai:repositorio.concytec.gob.pe:20.500.12390/1325
network_acronym_str	CONC
network_name_str	CONCYTEC-Institucional
repository_id_str	4689
dc.title.none.fl_str_mv	Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data
title	Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data
spellingShingle	Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data Bravo-Rocca, Gusseppe Statistics Semi-automated machine learning Data Science Data mining Data engineering Big data https://purl.org/pe-repo/ocde/ford#5.08.02
title_short	Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data
title_full	Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data
title_fullStr	Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data
title_full_unstemmed	Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data
title_sort	Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data
author	Bravo-Rocca, Gusseppe
author_facet	Bravo-Rocca, Gusseppe Torres-Robatty, Piero Fiestas-Iquira, Jose
author_role	author
author2	Torres-Robatty, Piero Fiestas-Iquira, Jose
author2_role	author author
dc.contributor.author.fl_str_mv	Bravo-Rocca, Gusseppe Torres-Robatty, Piero Fiestas-Iquira, Jose
dc.subject.none.fl_str_mv	Statistics
topic	Statistics Semi-automated machine learning Data Science Data mining Data engineering Big data https://purl.org/pe-repo/ocde/ford#5.08.02
dc.subject.es_PE.fl_str_mv	Semi-automated machine learning Data Science Data mining Data engineering Big data
dc.subject.ocde.none.fl_str_mv	https://purl.org/pe-repo/ocde/ford#5.08.02
description	This work proposes a semi-automated analysis and modeling package for Machine Learning related problems. The library goal is to reduce the steps involved in a traditional data science roadmap. To do so, Sparkmach takes advantage of Machine Learning techniques to build base models for both classification and regression problems. These models include exploratory data analysis, data preprocessing, feature engineering and modeling. The project has its basis in Pymach, a similar library that faces those steps for small and medium-sized datasets (about ten millions of rows and a few columns). Sparkmach central labor is to scale Pymach to overcome big datasets by using Apache Spark distributed computing, a distributed engine for large-scale data processing, that tackle several data science related problems in a cluster environment. Despite the software nature, Sparkmach can be of use for local environments, getting the most benefits from the distributed processing tools.
publishDate	2019
dc.date.accessioned.none.fl_str_mv	2024-05-30T23:13:38Z
dc.date.available.none.fl_str_mv	2024-05-30T23:13:38Z
dc.date.issued.fl_str_mv	2019
dc.type.none.fl_str_mv	info:eu-repo/semantics/bookPart
format	bookPart
dc.identifier.uri.none.fl_str_mv	https://hdl.handle.net/20.500.12390/1325
dc.identifier.doi.none.fl_str_mv	https://doi.org/10.1007/978-3-030-11680-4_13
url	https://hdl.handle.net/20.500.12390/1325 https://doi.org/10.1007/978-3-030-11680-4_13
dc.language.iso.none.fl_str_mv	eng
language	eng
dc.relation.ispartof.none.fl_str_mv	Communications in Computer and Information Science
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Springer International Publishing
publisher.none.fl_str_mv	Springer International Publishing
dc.source.none.fl_str_mv	reponame:CONCYTEC-Institucional instname:Consejo Nacional de Ciencia Tecnología e Innovación instacron:CONCYTEC
instname_str	Consejo Nacional de Ciencia Tecnología e Innovación
instacron_str	CONCYTEC
institution	CONCYTEC
reponame_str	CONCYTEC-Institucional
collection	CONCYTEC-Institucional
repository.name.fl_str_mv	Repositorio Institucional CONCYTEC
repository.mail.fl_str_mv	repositorio@concytec.gob.pe
_version_	1870084317933207552
spelling	Publicationrp03875600rp03873600rp03874600Bravo-Rocca, GusseppeTorres-Robatty, PieroFiestas-Iquira, Jose2024-05-30T23:13:38Z2024-05-30T23:13:38Z2019https://hdl.handle.net/20.500.12390/1325https://doi.org/10.1007/978-3-030-11680-4_13This work proposes a semi-automated analysis and modeling package for Machine Learning related problems. The library goal is to reduce the steps involved in a traditional data science roadmap. To do so, Sparkmach takes advantage of Machine Learning techniques to build base models for both classification and regression problems. These models include exploratory data analysis, data preprocessing, feature engineering and modeling. The project has its basis in Pymach, a similar library that faces those steps for small and medium-sized datasets (about ten millions of rows and a few columns). Sparkmach central labor is to scale Pymach to overcome big datasets by using Apache Spark distributed computing, a distributed engine for large-scale data processing, that tackle several data science related problems in a cluster environment. Despite the software nature, Sparkmach can be of use for local environments, getting the most benefits from the distributed processing tools.Consejo Nacional de Ciencia, Tecnología e Innovación Tecnológica - ConcytecengSpringer International PublishingCommunications in Computer and Information Scienceinfo:eu-repo/semantics/openAccessStatisticsSemi-automated machine learning-1Data Science-1Data mining-1Data engineering-1Big data-1https://purl.org/pe-repo/ocde/ford#5.08.02-1Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Datainfo:eu-repo/semantics/bookPartreponame:CONCYTEC-Institucionalinstname:Consejo Nacional de Ciencia Tecnología e Innovacióninstacron:CONCYTEC20.500.12390/1325oai:repositorio.concytec.gob.pe:20.500.12390/13252024-05-30 16:02:48.271http://purl.org/coar/access_right/c_14cbinfo:eu-repo/semantics/closedAccessmetadata only accesshttps://repositorio.concytec.gob.peRepositorio Institucional CONCYTECrepositorio@concytec.gob.pe#PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE##PLACEHOLDER_PARENT_METADATA_VALUE#<Publication xmlns="https://www.openaire.eu/cerif-profile/1.1/" id="dca70b44-a582-474a-948e-a45dc0815d54"> <Type xmlns="https://www.openaire.eu/cerif-profile/vocab/COAR_Publication_Types">http://purl.org/coar/resource_type/c_1843</Type> <Language>eng</Language> <Title>Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data</Title> <PublishedIn> <Publication> <Title>Communications in Computer and Information Science</Title> </Publication> </PublishedIn> <PublicationDate>2019</PublicationDate> <DOI>https://doi.org/10.1007/978-3-030-11680-4_13</DOI> <Authors> <Author> <DisplayName>Bravo-Rocca, Gusseppe</DisplayName> <Person id="rp03875" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Torres-Robatty, Piero</DisplayName> <Person id="rp03873" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> <Author> <DisplayName>Fiestas-Iquira, Jose</DisplayName> <Person id="rp03874" /> <Affiliation> <OrgUnit> </OrgUnit> </Affiliation> </Author> </Authors> <Editors> </Editors> <Publishers> <Publisher> <DisplayName>Springer International Publishing</DisplayName> <OrgUnit /> </Publisher> </Publishers> <Keyword>Statistics</Keyword> <Keyword>Semi-automated machine learning</Keyword> <Keyword>Data Science</Keyword> <Keyword>Data mining</Keyword> <Keyword>Data engineering</Keyword> <Keyword>Big data</Keyword> <Abstract>This work proposes a semi-automated analysis and modeling package for Machine Learning related problems. The library goal is to reduce the steps involved in a traditional data science roadmap. To do so, Sparkmach takes advantage of Machine Learning techniques to build base models for both classification and regression problems. These models include exploratory data analysis, data preprocessing, feature engineering and modeling. The project has its basis in Pymach, a similar library that faces those steps for small and medium-sized datasets (about ten millions of rows and a few columns). Sparkmach central labor is to scale Pymach to overcome big datasets by using Apache Spark distributed computing, a distributed engine for large-scale data processing, that tackle several data science related problems in a cluster environment. Despite the software nature, Sparkmach can be of use for local environments, getting the most benefits from the distributed processing tools.</Abstract> <Access xmlns="http://purl.org/coar/access_right" > </Access> </Publication> -1
score	13.436705

Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data

Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).

Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data

Descripción del Articulo

Ejemplares Similares