Dissertation (May 15, 2026): Edson Landim de Almeida

Student: Edson Landim de Almeida

Title: Estimation of Soil Organic Carbon Using Quantile Approaches and Explainable Artificial Intelligence

Advisors: Diego Brandão (advisor) and Jorge Soares (co-advisor)

Committee Diego Brandão (Cefet/RJ), Jorge Soares (Cefet/RJ), Kele Belloze (Cefet/RJ) and Marcos Ceddia (UFRRJ)

Day/hour: May 15, 2026/ 9 a.m.

Room: https://teams.microsoft.com/meet/272684253086940?p=V1Gq8L6jg3r6sPZIuq

Abstract: Soil organic carbon (SOC) plays a central role in biogeochemical cycles, soil fertility, and climate change mitigation, constituting one of the largest carbon reservoirs in the terrestrial system. Although direct quantification of SOC using laboratory methods is accurate, it is costly and time-consuming and limits sampling density in large-scale surveys. In this context, statistical and machine learning techniques have been used to estimate carbon content from more easily obtained edaphic and environmental attributes. This study investigates the estimation of soil organic carbon using pedological data provided by the Brazilian Agricultural Research Corporation (Embrapa), integrating multiple linear regression, Random Forest, Quantile Regression Forests (QRF), and Explainable Artificial Intelligence techniques based on SHAP. Multiple linear regression was used as an interpretable baseline, while Random Forest was employed to capture nonlinear relationships and interactions among soil attributes. The quantile-based random forest approach enabled the analysis of different regions of the conditional distribution of SOC, considering the 0.1, 0.5, and 0.9 quantiles. Results showed that SOC estimation is characterized by high variability, asymmetry, and heteroscedasticity, with relevant differences among the evaluated land uses. The conventional Random Forest model captured general trends in the data but exhibited greater error dispersion in the higher-carbon ranges. The QRF model expanded the analysis by enabling the representation of lower and upper bounds on the response, providing a more informative interpretation of the uncertainty associated with the predictions. The SHAP analysis indicated the relevance of attributes related to soil color, depth, and texture, in agreement with pedological knowledge. Overall, the results indicate that integrating tree-based models, quantile approaches, and explainability techniques constitutes a promising strategy for estimating and interpreting soil organic carbon, thereby advancing methodologies for digital soil mapping and environmental monitoring.