Soil Organic Carbon Modelling with Different Input Variables: The Case of the Western Lowlands of Eritrea
In Eritrea, efforts are being made to tackle the widespread land degradation and promote natural resources and the agricultural sector. However, these efforts lack digital resources assessment, mapping, planning and monitoring. Thus, we developed soil organic carbon (SOC) predictor models for the Western Lowlands of the country, employing 6 machine learning models with different input variables (36, 27, 15, and 08) obtained following these variables selection strategies: (1) all proposed SOC predictor variables; (2) very high multicollinearity (≥0.900 **) reduction; (3) high multicollinearity (≥0.700 **) reduction; (4) the Boruta feature selection algorithm. The results revealed that SOC levels were generally low (mean = 0.43%). Grazing lands, rainfed croplands, and irrigated farmlands all exhibited similarly low SOC values, attributed to unsustainable land management practices that deplete soil nutrients. In contrast, natural forestlands exhibited significantly higher SOC concentrations, highlighting their potential for soil carbon sequestration. Among the tested models, the XGBoost algorithm using 27 covariates achieved the highest predictive performance (RMSE = 0.118, R2 = 0.758, RPD = 2.252), whereas the multiple linear regression (MLR) model with 8 variables yielded the lowest performance (RMSE = 0.141, R2 = 0.742, RPD = 1.883). Compared to the Boruta-based feature selection, the MLR, PLS, XGBoost, Cubist, and GB models showed performance improvements of 10.41%, 10.06%, 6.72%, 6.50%, and 3.15%, respectively. Rainfall emerged as the most influential predictor of SOC spatial variability in the study area. Other important predictors included temperature, soil taxonomy, SWIR2 and NIR bands from Landsat 8 imagery, as well as sand and clay contents. We conclude that reducing very high multicollinearity is essential for improving model performance across all tested algorithms, while reducing moderate multicollinearity is not consistently necessary. The developed SOC prediction models demonstrate robust predictive capabilities and can serve as effective tools for supporting soil fertility management, land restoration planning, and climate change mitigation strategies in the Western Lowlands of Eritrea.