Document Type

Journal Article

Publication Title

Artificial Intelligence in Geosciences

Volume

5

Publisher

Elsevier

School

School of Science

RAS ID

71520

Comments

Fouedjio, F., & Arya, E. (2024). Locally varying geostatistical machine learning for spatial prediction. Artificial Intelligence in Geosciences, 5, 100081. https://doi.org/10.1016/j.aiig.2024.100081

Abstract

Machine learning methods dealing with the spatial auto-correlation of the response variable have garnered significant attention in the context of spatial prediction. Nonetheless, under these methods, the relationship between the response variable and explanatory variables is assumed to be homogeneous throughout the entire study area. This assumption, known as spatial stationarity, is very questionable in real-world situations due to the influence of contextual factors. Therefore, allowing the relationship between the target variable and predictor variables to vary spatially within the study region is more reasonable. However, existing machine learning techniques accounting for the spatially varying relationship between the dependent variable and the predictor variables do not capture the spatial auto-correlation of the dependent variable itself. Moreover, under these techniques, local machine learning models are effectively built using only fewer observations, which can lead to well-known issues such as over-fitting and the curse of dimensionality. This paper introduces a novel geostatistical machine learning approach where both the spatial auto-correlation of the response variable and the spatial non-stationarity of the regression relationship between the response and predictor variables are explicitly considered. The basic idea consists of relying on the local stationarity assumption to build a collection of local machine learning models while leveraging on the local spatial auto-correlation of the response variable to locally augment the training dataset. The proposed method's effectiveness is showcased via experiments conducted on synthetic spatial data with known characteristics as well as real-world spatial data. In the synthetic (resp. real) case study, the proposed method's predictive accuracy, as indicated by the Root Mean Square Error (RMSE) on the test set, is 17% (resp. 7%) better than that of popular machine learning methods dealing with the response variable's spatial auto-correlation. Additionally, this method is not only valuable for spatial prediction but also offers a deeper understanding of how the relationship between the target and predictor variables varies across space, and it can even be used to investigate the local significance of predictor variables.

DOI

10.1016/j.aiig.2024.100081

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Share

 
COinS