Regularised Estimation & Cross-Validation in Spatiotemporal Statistics

A review of regularised estimation methods and cross-validation in spatiotemporal statistics

Otto, P., Fassò, A., Maranzano, P. (2024): A review of regularised estimation methods and cross-validation in spatiotemporal statistics. Statistics Surveys, 18, 299–340. (DOI)

Abstract

This review surveys regularised (penalised) estimation for geostatistical and spatial autoregressive models, and cross-validation (CV) strategies that remain valid under spatial/temporal dependence. Topics include variable selection in fixed effects (LASSO, adaptive LASSO, SCAD, MCP), shrinkage of covariance and precision structures (tapering, graphical LASSO, low-rank/Vecchia/Cholesky factorizations), data-driven learning of spatial interaction matrices, and target-oriented CV schemes (blocked and buffered LTO/LLO/LLTO). The paper emphasises scalability for big geospatial data and software considerations.

Background

Big geospatial datasets require models that are interpretable and computationally feasible. Regularisation provides both: it controls complexity, selects relevant structure, and enables scalable inference in mixed-effects geostatistics and spatial autoregression. The survey unifies these ideas and connects them to CV procedures tailored for dependence.

Key takeaways

General objective: \[ \hat{\theta}=\arg\min_{\theta}\;\big\{L(\theta;X,y)+P(\theta,\theta_0,\lambda)\big\}, \] with loss \(L\) (e.g., negative log-likelihood) and penalty \(P\) toward target \(\theta_0\) (e.g., zero for sparsity).
Fixed effects: LASSO/adaptive-LASSO/SCAD/MCP for variable selection; group penalties for functional and additive components.
Random effects (geostatistics): covariance tapering for sparsity; graphical LASSO for sparse precision; low-rank and Vecchia approximations for scalability.
Spatial autoregression: penalised estimation can select among multiple weight matrices or estimate \(W\) under structural constraints.
Cross-validation under dependence: use blocked/buffered schemes—Leave-Time-Out (LTO), Leave-Location-Out (LLO), and Leave-Location-and-Time-Out (LLTO)—to avoid leakage.

Intuition

Regularisation trades a small amount of bias for a large variance reduction and computational tractability. In space–time, this often means (i) selecting a small set of informative covariates or basis functions, (ii) enforcing sparsity in precision matrices to reflect conditional independences, or (iii) constraining spatial interactions to interpretable structures. CV must mirror the data’s dependence to yield honest out-of-sample performance.

Regularisation highlights

Fixed-effects selection: high-dimensional regressors via (adaptive) LASSO; extensions to functional predictors with penalised splines.
Covariance/precision structure: tapering (distance-based zeros), sparse precision via graphical LASSO, and low-rank Cholesky/Vecchia for large N.
Spatial weights: boosting or LASSO to choose among candidate \(W_k\); constrained procedures to estimate \(W\) (identifiability via structure or sparsity).

Cross-validation that respects dependence

Avoid random k-fold: it overestimates skill under spatial/temporal autocorrelation.
LTO/LLO/LLTO: blocked and buffered partitions in time and/or space; ensure independence between training and test sets (e.g., hv-blocks, spatial buffers).
Diagnostics: compare performance decay with distance (SPEPs) and check “area of applicability” to avoid extrapolation far from training support.

Practical advice

Start simple: penalise fixed effects first; add covariance/precision regularisation if residual dependence persists.
For big N: prefer tapering or low-rank/Vecchia approximations; for multivariate data, consider sparse precision structures.
When interactions matter: define a plausible set of weight matrices \( \{W_k\} \) (distance, contiguity, directionality, networks) and select via penalisation.
Always validate with LTO/LLO/LLTO (with buffers). If random k-fold looks “too good”, re-do CV with target-oriented blocking.

Applications

Environmental monitoring (geostatistical interpolation and forecasting), regional economics (panel SAR/CAR with variable selection), health and epidemiology (areal models with sparse precision), transportation and networks (directional weights), and remote sensing (functional predictors and basis selection).

Research interests

I am interested in statistical data science and statistical methodology for data in multidimensional spaces, e.g., geo-referenced data. Most of my papers have one theory in common - Tobler's first law of geography: "everything is related to everything else, but near things are more related than distant things."