Soil organic carbon content mapping along the coast of northern Peru: an ensemble machine learning approach

Descripción del Articulo

Introduction: Soil organic carbon (SOC) content plays a fundamental role in regulating the global carbon cycle and mitigating climate change. It is also a key marker of soil health and a vital plant component. Its distribution in space varies in dry ecosystems, where climate and land use affect it....

Descripción completa

Detalles Bibliográficos
Autores: Salazar Coronel, Wilian, Carbajal Llosa, Carlos Miguel, Chuchon Remon, Rodolfo Juan
Formato: artículo
Fecha de Publicación:2026
Institución:Instituto Nacional de Innovación Agraria
Repositorio:INIA-Institucional
Lenguaje:inglés
OAI Identifier:oai:repositorio.inia.gob.pe:20.500.12955/3082
Enlace del recurso:http://hdl.handle.net/20.500.12955/3082
http//doi.org/10.3389/fsoil.2026.1745154
Nivel de acceso:acceso abierto
Materia:Machine learning
Aprendizaje automático
Soil organic carbon
Carbono orgánico del suelo
Topographic indices
Indices topográficos
Vegetation indices
Indices de vegetación
Digital soil mapping
Cartografía digital del suelo
Ensemble modeling
Modelado ensemble
https://purl.org/pe-repo/ocde/ford#4.01.04
Fertilidad del suelo; Soil fertility; Zona árida; Arid zones; Cuencas hidrográficas; Watersheds
Descripción
Sumario:Introduction: Soil organic carbon (SOC) content plays a fundamental role in regulating the global carbon cycle and mitigating climate change. It is also a key marker of soil health and a vital plant component. Its distribution in space varies in dry ecosystems, where climate and land use affect it. This study aimed to estimate and map SOC in the Motupe River Basin, northern Peru, by applying machine learning algorithms and ensemble methods. Methods: Four predictive models were evaluated: Support Vector Regression (SVR), Random Forest (RF), Artificial Neural Network (ANN), and Extreme Gradient Boosting (XGBoost), together with two ensemble approaches—simple averaging and weighted — integrating topographic, climatic, edaphic, and vegetation indices variables. Spatial autocorrelation was minimized by spatial block cross-validation. Uncertainty was measured with bootstrapping and the Prediction Interval Ratio (PIR) derived from 90% prediction intervals. Results and discussion: Best performance was achieved by XGBoost (R² = 0.83), weighted ensemble (R² = 0.70), and RF (R² = 0.63). The most influential predictors were EVI, GNDVI, temperature, TRI, and pH. SOC contents showed relatively higher concentrations (>0.7%) in areas with greater vegetation density, within a semi-arid context where SOC levels are generally low. In contrast, lower areas exhibited reduced SOC contents (< 0.6%). The uncertainty analysis indicated that SOC predictions had high to moderate confidence (PIR < 0.2) in the middle-and upper zones of the basin, and moderate confidence (0.1–0.2) in the lower areas. The results suggest that machine learning and ensemble methods improve SOC prediction, benefiting the sustainable management of soil fertility and quality in arid and semi-arid ecosystems of northern Peru.
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).