Skip to main content
placeholder image

Multi-scale process modelling and distributed computation for spatial data

Journal Article


Abstract


  • Recent years have seen a huge development in spatial modelling and prediction methodology, driven by the increased availability of remote-sensing data and the reduced cost of distributed-processing technology. It is well known that modelling and prediction using infinite-dimensional process models is not possible with large data sets, and that both approximate models and, often, approximate-inference methods, are needed. The problem of fitting simple global spatial models to large data sets has been solved through the likes of multi-resolution approximations and nearest-neighbour techniques. Here we tackle the next challenge, that of fitting complex, nonstationary, multi-scale models to large data sets. We propose doing this through the use of superpositions of spatial processes with increasing spatial scale and increasing degrees of nonstationarity. Computation is facilitated through the use of Gaussian Markov random fields and parallel Markov chain Monte Carlo based on graph colouring. The resulting model allows for both distributed computing and distributed data. Importantly, it provides opportunities for genuine model and data scalability and yet is still able to borrow strength across large spatial scales. We illustrate a two-scale version on a data set of sea-surface temperature containing on the order of one million observations, and compare our approach to state-of-the-art spatial modelling and prediction methods.

Publication Date


  • 2020

Citation


  • Zammit-Mangion, A., & Rougier, J. (2020). Multi-scale process modelling and distributed computation for spatial data. Statistics and Computing, 30(6), 1609-1627. doi:10.1007/s11222-020-09962-6

Scopus Eid


  • 2-s2.0-85088044305

Start Page


  • 1609

End Page


  • 1627

Volume


  • 30

Issue


  • 6

Abstract


  • Recent years have seen a huge development in spatial modelling and prediction methodology, driven by the increased availability of remote-sensing data and the reduced cost of distributed-processing technology. It is well known that modelling and prediction using infinite-dimensional process models is not possible with large data sets, and that both approximate models and, often, approximate-inference methods, are needed. The problem of fitting simple global spatial models to large data sets has been solved through the likes of multi-resolution approximations and nearest-neighbour techniques. Here we tackle the next challenge, that of fitting complex, nonstationary, multi-scale models to large data sets. We propose doing this through the use of superpositions of spatial processes with increasing spatial scale and increasing degrees of nonstationarity. Computation is facilitated through the use of Gaussian Markov random fields and parallel Markov chain Monte Carlo based on graph colouring. The resulting model allows for both distributed computing and distributed data. Importantly, it provides opportunities for genuine model and data scalability and yet is still able to borrow strength across large spatial scales. We illustrate a two-scale version on a data set of sea-surface temperature containing on the order of one million observations, and compare our approach to state-of-the-art spatial modelling and prediction methods.

Publication Date


  • 2020

Citation


  • Zammit-Mangion, A., & Rougier, J. (2020). Multi-scale process modelling and distributed computation for spatial data. Statistics and Computing, 30(6), 1609-1627. doi:10.1007/s11222-020-09962-6

Scopus Eid


  • 2-s2.0-85088044305

Start Page


  • 1609

End Page


  • 1627

Volume


  • 30

Issue


  • 6