Soil organic carbon content mapping along the coast of northern Peru: an ensemble machine learning approach
Descripción del Articulo
Introduction: Soil organic carbon (SOC) content plays a fundamental role in regulating the global carbon cycle and mitigating climate change. It is also a key marker of soil health and a vital plant component. Its distribution in space varies in dry ecosystems, where climate and land use affect it....
| Autores: | , , |
|---|---|
| Formato: | artículo |
| Fecha de Publicación: | 2026 |
| Institución: | Instituto Nacional de Innovación Agraria |
| Repositorio: | INIA-Institucional |
| Lenguaje: | inglés |
| OAI Identifier: | oai:repositorio.inia.gob.pe:20.500.12955/3082 |
| Enlace del recurso: | http://hdl.handle.net/20.500.12955/3082 http//doi.org/10.3389/fsoil.2026.1745154 |
| Nivel de acceso: | acceso abierto |
| Materia: | Machine learning Aprendizaje automático Soil organic carbon Carbono orgánico del suelo Topographic indices Indices topográficos Vegetation indices Indices de vegetación Digital soil mapping Cartografía digital del suelo Ensemble modeling Modelado ensemble https://purl.org/pe-repo/ocde/ford#4.01.04 Fertilidad del suelo; Soil fertility; Zona árida; Arid zones; Cuencas hidrográficas; Watersheds |
| Sumario: | Introduction: Soil organic carbon (SOC) content plays a fundamental role in regulating the global carbon cycle and mitigating climate change. It is also a key marker of soil health and a vital plant component. Its distribution in space varies in dry ecosystems, where climate and land use affect it. This study aimed to estimate and map SOC in the Motupe River Basin, northern Peru, by applying machine learning algorithms and ensemble methods. Methods: Four predictive models were evaluated: Support Vector Regression (SVR), Random Forest (RF), Artificial Neural Network (ANN), and Extreme Gradient Boosting (XGBoost), together with two ensemble approaches—simple averaging and weighted — integrating topographic, climatic, edaphic, and vegetation indices variables. Spatial autocorrelation was minimized by spatial block cross-validation. Uncertainty was measured with bootstrapping and the Prediction Interval Ratio (PIR) derived from 90% prediction intervals. Results and discussion: Best performance was achieved by XGBoost (R² = 0.83), weighted ensemble (R² = 0.70), and RF (R² = 0.63). The most influential predictors were EVI, GNDVI, temperature, TRI, and pH. SOC contents showed relatively higher concentrations (>0.7%) in areas with greater vegetation density, within a semi-arid context where SOC levels are generally low. In contrast, lower areas exhibited reduced SOC contents (< 0.6%). The uncertainty analysis indicated that SOC predictions had high to moderate confidence (PIR < 0.2) in the middle-and upper zones of the basin, and moderate confidence (0.1–0.2) in the lower areas. The results suggest that machine learning and ensemble methods improve SOC prediction, benefiting the sustainable management of soil fertility and quality in arid and semi-arid ecosystems of northern Peru. |
|---|
Nota importante:
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).
La información contenida en este registro es de entera responsabilidad de la institución que gestiona el repositorio institucional donde esta contenido este documento o set de datos. El CONCYTEC no se hace responsable por los contenidos (publicaciones y/o datos) accesibles a través del Repositorio Nacional Digital de Ciencia, Tecnología e Innovación de Acceso Abierto (ALICIA).