Machine Learning: Comparison of algorithms for determining water quality in the Rímac river
Descripción del Articulo
The evaluation of the quality of the water in rivers is necessary to manage the efficiency of its use, being necessary to carry out physicochemical and biological analyzes to determine its healthiness, but it implies in its determination of a series of parameters that use various analytical methods...
| Autor: | |
|---|---|
| Formato: | tesis de grado |
| Fecha de Publicación: | 2021 |
| Institución: | Universidad de Lima |
| Repositorio: | ULIMA-Institucional |
| Lenguaje: | español |
| OAI Identifier: | oai:repositorio.ulima.edu.pe:20.500.12724/14791 |
| Enlace del recurso: | https://hdl.handle.net/20.500.12724/14791 |
| Nivel de acceso: | acceso abierto |
| Materia: | Aprendizaje automático (Inteligencia artificial) Calidad del agua Ríos Lima (Perú) Machine learning Water quality Rivers https://purl.org/pe-repo/ocde/ford#2.02.04 |
| id |
RULI_826ba46034858b5ce08add5c65b5398f |
|---|---|
| oai_identifier_str |
oai:repositorio.ulima.edu.pe:20.500.12724/14791 |
| network_acronym_str |
RULI |
| network_name_str |
ULIMA-Institucional |
| repository_id_str |
3883 |
| dc.title.es_PE.fl_str_mv |
Machine Learning: Comparison of algorithms for determining water quality in the Rímac river |
| title |
Machine Learning: Comparison of algorithms for determining water quality in the Rímac river |
| spellingShingle |
Machine Learning: Comparison of algorithms for determining water quality in the Rímac river Marroquín Peralta, Juan Miguel Aprendizaje automático (Inteligencia artificial) Calidad del agua Ríos Lima (Perú) Machine learning Water quality Rivers https://purl.org/pe-repo/ocde/ford#2.02.04 |
| title_short |
Machine Learning: Comparison of algorithms for determining water quality in the Rímac river |
| title_full |
Machine Learning: Comparison of algorithms for determining water quality in the Rímac river |
| title_fullStr |
Machine Learning: Comparison of algorithms for determining water quality in the Rímac river |
| title_full_unstemmed |
Machine Learning: Comparison of algorithms for determining water quality in the Rímac river |
| title_sort |
Machine Learning: Comparison of algorithms for determining water quality in the Rímac river |
| author |
Marroquín Peralta, Juan Miguel |
| author_facet |
Marroquín Peralta, Juan Miguel |
| author_role |
author |
| dc.contributor.advisor.fl_str_mv |
García López, Yván Jesús |
| dc.contributor.author.fl_str_mv |
Marroquín Peralta, Juan Miguel |
| dc.subject.es_PE.fl_str_mv |
Aprendizaje automático (Inteligencia artificial) Calidad del agua Ríos Lima (Perú) |
| topic |
Aprendizaje automático (Inteligencia artificial) Calidad del agua Ríos Lima (Perú) Machine learning Water quality Rivers https://purl.org/pe-repo/ocde/ford#2.02.04 |
| dc.subject.en_EN.fl_str_mv |
Machine learning Water quality Rivers |
| dc.subject.ocde.none.fl_str_mv |
https://purl.org/pe-repo/ocde/ford#2.02.04 |
| description |
The evaluation of the quality of the water in rivers is necessary to manage the efficiency of its use, being necessary to carry out physicochemical and biological analyzes to determine its healthiness, but it implies in its determination of a series of parameters that use various analytical methods that often they are tedious and time consuming to calculate. The present study makes a comparison of machine learning models such as Multiple Linear Regression (MLR), Neural Network Backpropagation (BPNN) and Support Vector Regression (SVR) to estimate Dissolved Oxygen (DO) and Biochemical Oxygen Demand (BOD) to determine the quality of the water of the Rímac river. Water samples were collected from 26 stations and non-point sources of contamination along the Rímac River with 624 records made during the years 2010 to 2012. The physical and chemical parameters introduced in the models include pH, turbidity, total dissolved solids, temperature, electrical conductivity, dissolved oxygen, biochemical oxygen demand, chemical oxygen demand, hardness, chloride, sulfate, calcium, magnesium, and nitrate. The dependent variables of the output models include biochemical oxygen demand (BOD) and dissolved oxygen (DO). The independent variables that were selected for the BOD, these were: pH, EC, turbidity, Nitrites, TOC, COD, iron, and chlorides. For DO, they were temperature, Nitrites, COD, Nitrates, STD, Chlorides and Total Solids. Both dependent parameters have 8 independent variables and the highest correlation coefficient values. The models were trained for learning and validation of 70% and 30% of the data set, respectively. The BPNN presented for the estimation of BOD, with 16 hidden nodes, values of R2 = 0.857 for training and 0.481 for the test phase; For the estimation of DO, with 8 hidden nodes, this was R2 = 0.768 in training and test phase of 0.605. These values were higher than the MLR and SVR, which showed that the BPNN was the best selection. Finally, the classification of water quality as Good, Fair and Poor obtained a precision of 0.88 with a sensitivity of 0.86 and an f1-score of 85%, which evidenced its effectiveness when carrying out this process. |
| publishDate |
2021 |
| dc.date.accessioned.none.fl_str_mv |
2021-12-15T16:24:30Z |
| dc.date.available.none.fl_str_mv |
2021-12-15T16:24:30Z |
| dc.date.issued.fl_str_mv |
2021 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/bachelorThesis |
| dc.type.other.none.fl_str_mv |
Tesis |
| format |
bachelorThesis |
| dc.identifier.citation.es_PE.fl_str_mv |
Marroquin Peralta, J. M. (2021). Machine Learning: Comparison of algorithms for determining water quality in the Rímac river [Tesis para optar el Título Profesional de Ingeniero de Sistemas, Universidad de Lima]. Repositorio institucional de la Universidad de Lima. https://hdl.handle.net/20.500.12724/14791 |
| dc.identifier.uri.none.fl_str_mv |
https://hdl.handle.net/20.500.12724/14791 |
| identifier_str_mv |
Marroquin Peralta, J. M. (2021). Machine Learning: Comparison of algorithms for determining water quality in the Rímac river [Tesis para optar el Título Profesional de Ingeniero de Sistemas, Universidad de Lima]. Repositorio institucional de la Universidad de Lima. https://hdl.handle.net/20.500.12724/14791 |
| url |
https://hdl.handle.net/20.500.12724/14791 |
| dc.language.iso.none.fl_str_mv |
spa |
| language |
spa |
| dc.relation.ispartof.fl_str_mv |
SUNEDU |
| dc.rights.*.fl_str_mv |
info:eu-repo/semantics/openAccess |
| dc.rights.uri.*.fl_str_mv |
https://creativecommons.org/licenses/by-nc-sa/4.0/ |
| eu_rights_str_mv |
openAccess |
| rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/4.0/ |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
Universidad de Lima |
| dc.publisher.country.none.fl_str_mv |
PE |
| publisher.none.fl_str_mv |
Universidad de Lima |
| dc.source.es_PE.fl_str_mv |
Repositorio Institucional - Ulima Universidad de Lima |
| dc.source.none.fl_str_mv |
reponame:ULIMA-Institucional instname:Universidad de Lima instacron:ULIMA |
| instname_str |
Universidad de Lima |
| instacron_str |
ULIMA |
| institution |
ULIMA |
| reponame_str |
ULIMA-Institucional |
| collection |
ULIMA-Institucional |
| bitstream.url.fl_str_mv |
https://repositorio.ulima.edu.pe/bitstream/20.500.12724/14791/3/license_rdf https://repositorio.ulima.edu.pe/bitstream/20.500.12724/14791/5/Juan_Miguel_Marroquin_Peralta.pdf.txt https://repositorio.ulima.edu.pe/bitstream/20.500.12724/14791/7/Tesis.pdf.txt https://repositorio.ulima.edu.pe/bitstream/20.500.12724/14791/2/Tesis.pdf https://repositorio.ulima.edu.pe/bitstream/20.500.12724/14791/4/license.txt https://repositorio.ulima.edu.pe/bitstream/20.500.12724/14791/6/Juan_Miguel_Marroquin_Peralta.pdf.jpg https://repositorio.ulima.edu.pe/bitstream/20.500.12724/14791/8/Tesis.pdf.jpg |
| bitstream.checksum.fl_str_mv |
5a4ffbc01f1b5eb70a835dac0d501661 9449d86616e4bf58ec26010cc0b9c852 9449d86616e4bf58ec26010cc0b9c852 90ef22e3b1254dac121e0a6ab0c1a736 8a4605be74aa9ea9d79846c1fba20a33 4716c22acafc30f2d4c8fffde167d977 4716c22acafc30f2d4c8fffde167d977 |
| bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 MD5 MD5 |
| repository.name.fl_str_mv |
Repositorio Universidad de Lima |
| repository.mail.fl_str_mv |
repositorio@ulima.edu.pe |
| _version_ |
1846611986825084928 |
| spelling |
García López, Yván JesúsMarroquín Peralta, Juan Miguel2021-12-15T16:24:30Z2021-12-15T16:24:30Z2021Marroquin Peralta, J. M. (2021). Machine Learning: Comparison of algorithms for determining water quality in the Rímac river [Tesis para optar el Título Profesional de Ingeniero de Sistemas, Universidad de Lima]. Repositorio institucional de la Universidad de Lima. https://hdl.handle.net/20.500.12724/14791https://hdl.handle.net/20.500.12724/14791The evaluation of the quality of the water in rivers is necessary to manage the efficiency of its use, being necessary to carry out physicochemical and biological analyzes to determine its healthiness, but it implies in its determination of a series of parameters that use various analytical methods that often they are tedious and time consuming to calculate. The present study makes a comparison of machine learning models such as Multiple Linear Regression (MLR), Neural Network Backpropagation (BPNN) and Support Vector Regression (SVR) to estimate Dissolved Oxygen (DO) and Biochemical Oxygen Demand (BOD) to determine the quality of the water of the Rímac river. Water samples were collected from 26 stations and non-point sources of contamination along the Rímac River with 624 records made during the years 2010 to 2012. The physical and chemical parameters introduced in the models include pH, turbidity, total dissolved solids, temperature, electrical conductivity, dissolved oxygen, biochemical oxygen demand, chemical oxygen demand, hardness, chloride, sulfate, calcium, magnesium, and nitrate. The dependent variables of the output models include biochemical oxygen demand (BOD) and dissolved oxygen (DO). The independent variables that were selected for the BOD, these were: pH, EC, turbidity, Nitrites, TOC, COD, iron, and chlorides. For DO, they were temperature, Nitrites, COD, Nitrates, STD, Chlorides and Total Solids. Both dependent parameters have 8 independent variables and the highest correlation coefficient values. The models were trained for learning and validation of 70% and 30% of the data set, respectively. The BPNN presented for the estimation of BOD, with 16 hidden nodes, values of R2 = 0.857 for training and 0.481 for the test phase; For the estimation of DO, with 8 hidden nodes, this was R2 = 0.768 in training and test phase of 0.605. These values were higher than the MLR and SVR, which showed that the BPNN was the best selection. Finally, the classification of water quality as Good, Fair and Poor obtained a precision of 0.88 with a sensitivity of 0.86 and an f1-score of 85%, which evidenced its effectiveness when carrying out this process.application/pdfspaUniversidad de LimaPEinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/4.0/Repositorio Institucional - UlimaUniversidad de Limareponame:ULIMA-Institucionalinstname:Universidad de Limainstacron:ULIMAAprendizaje automático (Inteligencia artificial)Calidad del aguaRíosLima (Perú)Machine learningWater qualityRivershttps://purl.org/pe-repo/ocde/ford#2.02.04Machine Learning: Comparison of algorithms for determining water quality in the Rímac riverinfo:eu-repo/semantics/bachelorThesisTesisSUNEDUTítulo ProfesionalIngeniería de sistemasUniversidad de Lima. Facultad de Ingeniería y ArquitecturaIngeniero de sistemashttps://orcid.org/0000-0001-9577-418861207672087095https://purl.org/pe-repo/renati/level#tituloProfesionalRamos Ponce, Oscar EfrainQuiroz Villalobos, Lennin PaulGarcia Lopez, Yvan Jesushttps://purl.org/pe-repo/renati/type#tesisCC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8914https://repositorio.ulima.edu.pe/bitstream/20.500.12724/14791/3/license_rdf5a4ffbc01f1b5eb70a835dac0d501661MD53TEXTJuan_Miguel_Marroquin_Peralta.pdf.txtJuan_Miguel_Marroquin_Peralta.pdf.txtExtracted texttext/plain64674https://repositorio.ulima.edu.pe/bitstream/20.500.12724/14791/5/Juan_Miguel_Marroquin_Peralta.pdf.txt9449d86616e4bf58ec26010cc0b9c852MD55Tesis.pdf.txtTesis.pdf.txtExtracted texttext/plain64674https://repositorio.ulima.edu.pe/bitstream/20.500.12724/14791/7/Tesis.pdf.txt9449d86616e4bf58ec26010cc0b9c852MD57ORIGINALTesis.pdfTesis.pdfTesisapplication/pdf1096172https://repositorio.ulima.edu.pe/bitstream/20.500.12724/14791/2/Tesis.pdf90ef22e3b1254dac121e0a6ab0c1a736MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-81748https://repositorio.ulima.edu.pe/bitstream/20.500.12724/14791/4/license.txt8a4605be74aa9ea9d79846c1fba20a33MD54THUMBNAILJuan_Miguel_Marroquin_Peralta.pdf.jpgJuan_Miguel_Marroquin_Peralta.pdf.jpgGenerated Thumbnailimage/jpeg10077https://repositorio.ulima.edu.pe/bitstream/20.500.12724/14791/6/Juan_Miguel_Marroquin_Peralta.pdf.jpg4716c22acafc30f2d4c8fffde167d977MD56Tesis.pdf.jpgTesis.pdf.jpgGenerated Thumbnailimage/jpeg10077https://repositorio.ulima.edu.pe/bitstream/20.500.12724/14791/8/Tesis.pdf.jpg4716c22acafc30f2d4c8fffde167d977MD5820.500.12724/14791oai:repositorio.ulima.edu.pe:20.500.12724/147912025-09-25 12:03:00.146Repositorio Universidad de Limarepositorio@ulima.edu.peTk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo= |
| score |
13.056711 |
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).