--- title: "An introduction to the rSPDE package" author: "David Bolin and Alexandre B. Simas" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{An introduction to the rSPDE package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: references.bib --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) set.seed(1) ``` ```{r inla_link, include = FALSE} inla_link <- function() { sprintf("[%s](%s)", "`R-INLA`", "https://www.r-inla.org") } ``` ## Introduction In this vignette we provide a brief introduction to the `rSPDE` package. The main approach for constructing the rational approximations is the covariance-based rational SPDE approach of [@xiong22](https://doi.org/10.1080/10618600.2023.2231051). The package contains three main "families" of functions that implement the approach: * an interface to `r inla_link()`; * an interface to [`inlabru`](http://inlabru.org/); * a stand-alone implementation of the approach. To illustrate these different functions, we begin by using the package to generate a simple data set, which then will be analyzed using the different approaches. Further details on each family of functions is given in the following additional vignettes: * [R-INLA implementation of the rational SPDE approach](rspde_inla.html) * [inlabru implementation of the rational SPDE approach](rspde_inlabru.html) * [Rational approximation with the rSPDE package](rspde_cov.html) The `rSPDE` package also has a separate group of functions for performing the operator-based rational approximations introduced in [@bolin19](https://www.tandfonline.com/doi/full/10.1080/10618600.2019.1665537). These are especially useful when performing rational approximations for fractional SPDE models with non-Gaussian noise. An example in which such approximation is suitable is when one has the so-called type-G Lévy noises. We refer the reader to [@wallin15](https://onlinelibrary.wiley.com/doi/full/10.1111/sjos.12141), [@bolin13](https://onlinelibrary.wiley.com/doi/abs/10.1111/sjos.12046) and [@asar20](https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/rssc.12405) for examples of models driven by type-G Lévy noises. We also refer the reader to the [`ngme` package](https://github.com/davidbolin/ngme) where one can fit such models. We explore the functions for performing the operator-based rational approximation on the vignette: * [Operator-based rational approximation with the rSPDE package](rspde_base.html) ## A toy data set We begin by generating a toy data set. For this illustration, we will simulate a data set on a two-dimensional spatial domain. To this end, we need to construct a mesh over the domain of interest and then compute the matrices needed to define the operator. We will use the `r inla_link()` package to create the mesh and obtain the matrices of interest. We will begin by defining a mesh over $[0,1]\times [0, 1]$: ```{r, message=FALSE, warning=FALSE, fig.align='center'} library(fmesher) n_loc <- 1000 loc_2d_mesh <- matrix(runif(n_loc * 2), n_loc, 2) mesh_2d <- fm_mesh_2d( loc = loc_2d_mesh, cutoff = 0.05, max.edge = c(0.1, 0.5) ) plot(mesh_2d, main = "") ``` We now use the `matern.operators()` function to construct a rational SPDE approximation of order $m=2$ for a Gaussian random field with a Matérn covariance function on $[0,1]\times [0, 1]$. We choose $\nu=0.5$ which corresponds to exponential covariance. We also set $\sigma=1$ and the range as $0.2$. ```{r, message=FALSE} library(rSPDE) sigma <- 2 range <- 0.25 nu <- 1.3 kappa <- sqrt(8 * nu) / range op <- matern.operators( mesh = mesh_2d, nu = nu, range = range, sigma = sigma, m = 2, parameterization = "matern" ) tau <- op$tau ``` We can now use the `simulate` function to simulate a realization of the field $u$: ```{r} u <- simulate(op) ``` Let us now consider a simple Gaussian linear model where the spatial field $u(\mathbf{s})$ is observed at $m$ locations, $\{\mathbf{s}_1 , \ldots , \mathbf{s}_m \}$ under Gaussian measurement noise. For each $i = 1,\ldots,m,$ we have $$ \begin{align} y_i &= u(\mathbf{s}_i)+\varepsilon_i\\ \end{align}, $$ where $\varepsilon_1,\ldots,\varepsilon_{m}$ are iid normally distributed with mean 0 and standard deviation 0.1. To generate a data set `y` from this model, we first draw some observation locations at random in the domain and then use the `spde.make.A()` functions (that wraps the functions `fm_basis()`, `fm_block()` and `fm_row_kron()` of the `fmesher` package) to construct the observation matrix which can be used to evaluate the simulated field $u$ at the observation locations. After this we simply add the measurment noise. ```{r} A <- spde.make.A( mesh = mesh_2d, loc = loc_2d_mesh ) sigma.e <- 0.1 y <- A %*% u + rnorm(n_loc) * sigma.e ``` The generated data can be seen in the following image. ```{r,fig.align = "center", echo=TRUE} library(ggplot2) library(viridis) df <- data.frame(x1 = as.double(loc_2d_mesh[, 1]), x2 = as.double(loc_2d_mesh[, 2]), y = as.double(y)) ggplot(df, aes(x = x1, y = x2, col = y)) + geom_point() + scale_color_viridis() ``` The simulated random field is shown in the following figure. ```{r, fig.show='hold', fig.align = "center",echo=TRUE, message=FALSE,warning=FALSE} proj <- fm_evaluator(mesh_2d, dims = c(100, 100)) field <- fm_evaluate(proj, field = as.vector(u)) field.df <- data.frame(x1 = proj$lattice$loc[,1], x2 = proj$lattice$loc[,2], y = as.vector(field)) ggplot(field.df, aes(x = x1, y = x2, fill = y)) + geom_raster() + xlim(0,1) + ylim(0,1) + scale_fill_viridis() ``` ## Fitting the model with `R-INLA` implementation of the rational SPDE approach We will now fit the model of the toy data set using our `r inla_link()` implementation of the rational SPDE approach. Further details on this implementation can be found in [R-INLA implementation of the rational SPDE approach](rspde_inla.html). We begin by loading the `INLA` package and creating the $A$ matrix, the index, and the `inla.stack` object. ```{r} library(INLA) Abar <- rspde.make.A(mesh = mesh_2d, loc = loc_2d_mesh) mesh.index <- rspde.make.index(name = "field", mesh = mesh_2d) st.dat <- inla.stack( data = list(y = as.vector(y)), A = Abar, effects = mesh.index ) ``` We now create the model object. We need to set an upper bound for the smoothness parameter $\nu$. The default value for this is $4$. If we increase the upper bound for $\nu$ we also increase the computational cost, and if we decrease the upper bound we also decrease the computatoinal cost. For this example we set `nu.upper.bound=2`. See the [R-INLA implementation of the rational SPDE approach](rspde_inla.html) for further details. ```{r} rspde_model <- rspde.matern( mesh = mesh_2d, nu.upper.bound = 2, parameterization = "spde" ) ``` Finally, we create the formula and fit the model to the data: ```{r message=FALSE, warning=FALSE} f <- y ~ -1 + f(field, model = rspde_model) rspde_fit <- inla(f, data = inla.stack.data(st.dat), family = "gaussian", control.predictor = list(A = inla.stack.A(st.dat)), num.threads = "1:1" ) ``` We can get a summary of the fit: ```{r} summary(rspde_fit) ``` To get a summary of the fit of the random field only, we can do the following: ```{r} result_fit <- rspde.result(rspde_fit, "field", rspde_model) summary(result_fit) tau <- op$tau result_df <- data.frame( parameter = c("tau", "kappa", "nu"), true = c(tau, kappa, nu), mean = c( result_fit$summary.tau$mean, result_fit$summary.kappa$mean, result_fit$summary.nu$mean ), mode = c( result_fit$summary.tau$mode, result_fit$summary.kappa$mode, result_fit$summary.nu$mode ) ) print(result_df) ``` We can also obtain the summary in the `matern` parameterization by setting the `parameterization` argument to `matern`: ```{r} result_fit_matern <- rspde.result(rspde_fit, "field", rspde_model, parameterization = "matern") summary(result_fit_matern) result_df_matern <- data.frame( parameter = c("sigma", "range", "nu"), true = c(sigma, range, nu), mean = c( result_fit_matern$summary.std.dev$mean, result_fit_matern$summary.range$mean, result_fit_matern$summary.nu$mean ), mode = c( result_fit_matern$summary.std.dev$mode, result_fit_matern$summary.range$mode, result_fit_matern$summary.nu$mode ) ) print(result_df_matern) ``` ## Kriging with `R-INLA` implementation of the rational SPDE approach Let us now obtain predictions (i.e., do kriging) of the latent field on a dense grid in the region. We begin by creating the grid of locations where we want to compute the predictions. To this end, we can use the `rspde.mesh.projector()` function. This function has the same arguments as the function `inla.mesh.projector()` the only difference being that the rSPDE version also has an argument `nu` and an argument `rspde.order`. Thus, we proceed in the same fashion as we would in `r inla_link()`'s standard SPDE implementation: ```{r create_proj_grid} projgrid <- rspde.mesh.projector(mesh_2d, xlim = c(0, 1), ylim = c(0, 1) ) ``` This lattice contains 100 × 100 locations (the default) which are shown in the following figure: ```{r plot_prd, fig.align='center'} coord.prd <- projgrid$lattice$loc plot(coord.prd, type = "p", cex = 0.1) ``` Let us now calculate the predictions jointly with the estimation. To this end, first, we begin by linking the prediction coordinates to the mesh nodes through an $A$ matrix ```{r A_prd} A.prd <- projgrid$proj$A ``` We now make a stack for the prediction locations. We have no data at the prediction locations, so we set `y= NA`. We then join this stack with the estimation stack. ```{r stk.prd} ef.prd <- list(c(mesh.index)) st.prd <- inla.stack( data = list(y = NA), A = list(A.prd), tag = "prd", effects = ef.prd ) st.all <- inla.stack(st.dat, st.prd) ``` Doing the joint estimation takes a while, and we therefore turn off the computation of certain things that we are not interested in, such as the marginals for the random effect. We will also use a simplified integration strategy (actually only using the posterior mode of the hyper-parameters) through the command `control.inla = list(int.strategy = "eb")`, i.e. empirical Bayes: ```{r fit_prd, message=FALSE, warning=FALSE} rspde_fitprd <- inla(f, family = "Gaussian", data = inla.stack.data(st.all), control.predictor = list( A = inla.stack.A(st.all) ), control.inla = list(int.strategy = "eb"), num.threads = "1:1" ) ``` We then extract the indices to the prediction nodes and then extract the mean and the standard deviation of the response: ```{r stk.mean.sd} id.prd <- inla.stack.index(st.all, "prd")$data m.prd <- matrix(rspde_fitprd$summary.fitted.values$mean[id.prd], 100, 100) sd.prd <- matrix(rspde_fitprd$summary.fitted.values$sd[id.prd], 100, 100) ``` Finally, we plot the results. First the mean: ```{r plot_pred, echo=TRUE, fig.align='center'} field.pred.df <- data.frame(x1 = projgrid$lattice$loc[,1], x2 = projgrid$lattice$loc[,2], y = as.vector(m.prd)) ggplot(field.pred.df, aes(x = x1, y = x2, fill = y)) + geom_raster() + xlim(0,1) + ylim(0,1) + scale_fill_viridis() ``` Then, the marginal standard deviations: ```{r plot_pred_sd, fig.align='center', echo=TRUE} field.pred.sd.df <- data.frame(x1 = proj$lattice$loc[,1], x2 = proj$lattice$loc[,2], sd = as.vector(sd.prd)) ggplot(field.pred.sd.df, aes(x = x1, y = x2, fill = sd)) + geom_raster() + xlim(0,1) + ylim(0,1) + geom_raster() + scale_fill_viridis() ``` ## Fitting the model with `inlabru` implementation of the rational SPDE approach We will now fit the same model of the toy data set using our [`inlabru`](http://inlabru.org/) implementation of the rational SPDE approach. Further details on this implementation can be found in [`inlabru` implementation of the rational SPDE approach](rspde_inlabru.html). We begin by loading the `inlabru` package: ```{r message=FALSE} library(inlabru) ``` The creation of the model object is the same as in `R-INLA`'s case: ```{r} rspde_model <- rspde.matern( mesh = mesh_2d, nu.upper.bound = 2, parameterization = "spde" ) ``` The advantage with `inlabru` is that we do not need to form the stack manually, but can simply collect the required data in a `data.frame()`. Further, we can turn the data into an `sf` object. ```{r} library(sf) toy_df <- data.frame(coord1 = loc_2d_mesh[,1], coord2 = loc_2d_mesh[,2], y = as.vector(y)) toy_df <- st_as_sf(toy_df, coords = c("coord1", "coord2")) ``` Finally, we create the component and fit: ```{r message=FALSE, warning=FALSE} cmp <- y ~ -1 + field(geometry, model = rspde_model) rspde_bru_fit <- bru(cmp, data=toy_df, options=list( family = "gaussian", num.threads = "1:1") ) ``` At this stage, we can get a summary of the fit just as in the `R-INLA` case: ```{r} summary(rspde_bru_fit) ``` and also obtain a summary of the field only: ```{r} result_fit <- rspde.result(rspde_bru_fit, "field", rspde_model) summary(result_fit) tau <- op$tau result_df <- data.frame( parameter = c("tau", "kappa", "nu"), true = c(tau, kappa, nu), mean = c( result_fit$summary.tau$mean, result_fit$summary.kappa$mean, result_fit$summary.nu$mean ), mode = c( result_fit$summary.tau$mode, result_fit$summary.kappa$mode, result_fit$summary.nu$mode ) ) print(result_df) ``` Let us obtain a summary in the `matern` parameterization by setting the `parameterization` argument to `matern`: ```{r} result_fit_matern <- rspde.result(rspde_bru_fit, "field", rspde_model, parameterization = "matern") summary(result_fit_matern) result_df_matern <- data.frame( parameter = c("sigma", "range", "nu"), true = c(sigma, range, nu), mean = c( result_fit_matern$summary.std.dev$mean, result_fit_matern$summary.range$mean, result_fit_matern$summary.nu$mean ), mode = c( result_fit_matern$summary.std.dev$mode, result_fit_matern$summary.range$mode, result_fit_matern$summary.nu$mode ) ) print(result_df_matern) ``` ## Kriging with `inlabru` implementation of the rational SPDE approach Let us now obtain predictions (i.e., do kriging) of the latent field on a dense grid in the region. We begin by creating the grid of the locations where we want to evaluate the predictions. We begin by creating a regular grid in and then extract the coorinates: ```{r} pred_coords <- data.frame(coord1 = projgrid$lattice$loc[,1], coord2 = projgrid$lattice$loc[,2]) pred_coords <- st_as_sf(pred_coords, coords = c("coord1", "coord2")) ``` Let us now compute the predictions. An advantage with `inlabru` is that we can do this after fitting the model to the data: ```{r} field_pred <- predict(rspde_bru_fit, pred_coords, ~ data.frame(field)) ``` The following figure shows the mean of these predictions: ```{r, echo=TRUE, fig.align='center', message=FALSE, warning=FALSE} ggplot() + gg(field_pred, geom = "tile", aes(fill = mean)) + xlim(0,1) + ylim(0,1) + scale_fill_viridis() ``` The following figure shows the marginal standard deviations of the predictions: ```{r plot_pred_sd_bru, fig.align='center', echo=TRUE, message=FALSE, warning=FALSE} ggplot() + gg(field_pred, geom = "tile", aes(fill = sd)) + xlim(0,1) + ylim(0,1) + scale_fill_viridis() ``` An alternative and very simple approach is to use the `fm_pixels()` function: ```{r plot_pred_pxl, fig.align='center', warning=FALSE} pxl <- fm_pixels(mesh_2d) field_pred <- predict(rspde_bru_fit, pxl, ~field) ggplot() + gg(field_pred, geom = "tile", aes(fill = mean)) + scale_fill_viridis() + xlim(0,1) + ylim(0,1) ``` ## Fitting the model with `rSPDE` We will now fit the model of the toy data set without using `r inla_link()` or `inlabru`. To this end we will use the rational approximation functions from `rSPDE` package. Further details can be found in the vignette [Rational approximation with the rSPDE package](rspde_cov.html). We use the function `rSPDE.construct.matern.loglike()` to define the likelihood. This function is object-based, in the sense that it obtains several of the quantities it needs from the `rSPDE` model object. Notice that we already created a `rSPDE` model object to simulate the data. We will, then, use the same model object. Recall that the `rSPDE` model object we created is `op`. Let us create an object for estimation, a `data.frame` with the data and then fit the model using the `rspde_lme()` function. ```{r} op_est <- matern.operators( mesh = mesh_2d, m = 2 ) toy_df_rspde <- data.frame(coord1 = loc_2d_mesh[,1], coord2 = loc_2d_mesh[,2], y = as.vector(y)) ``` ```{r} fit_rspde <- rspde_lme(y ~ -1, data = toy_df_rspde, loc = c("coord1", "coord2"), model = op_est, parallel = TRUE) ``` We can obtain the summary: ```{r} summary(fit_rspde) ``` Let us compare with the true values: ```{r} print(data.frame( sigma = c(sigma, fit_rspde$alt_par_coeff$coeff["sigma"]), range = c(range, fit_rspde$alt_par_coeff$coeff["range"]), nu = c(nu, fit_rspde$alt_par_coeff$coeff["nu"]), row.names = c("Truth", "Estimates") )) # Time to fit print(fit_rspde$fitting_tim) ``` ## Kriging with `rSPDE` We will now do kriging on the same dense grid we did for the `r inla_link()`-based rational SPDE approach, but now using the `rSPDE` functions. To this end we will use the `predict` method on the `rSPDE` model object. Observe that we need an $A$ matrix connecting the mesh to the prediction locations. Let us now create the `data.frame` with the prediction locations: ```{r create_proj_grid_prd2} predgrid <- fm_evaluator(mesh_2d, xlim = c(0, 1), ylim = c(0, 1) ) pred_coords <- data.frame(coord1 = predgrid$lattice$loc[,1], coord2 = predgrid$lattice$loc[,2]) ``` We will now use the `predict()` method on the `rSPDE` model object with the argument `compute.variances` set to `TRUE` so that we can plot the standard deviations. Let us also update the values of the `rSPDE` model object to the fitted ones, and also save the estimated value of `sigma.e`. ```{r} pred.rspde <- predict(fit_rspde, data = pred_coords, loc = c("coord1", "coord2"), compute_variances = TRUE ) ``` Finally, we plot the results. First the mean: ```{r plot_pred2, fig.align='center', echo=TRUE} field.pred2.df <- data.frame(x1 = predgrid$lattice$loc[,1], x2 = predgrid$lattice$loc[,2], y = as.vector(pred.rspde$mean)) ggplot(field.pred2.df, aes(x = x1, y = x2, fill = y)) + geom_raster() + xlim(0,1) + ylim(0,1) + scale_fill_viridis() ``` Then, the standard deviations: ```{r plot_pred_sd2, fig.align='center', echo=TRUE} field.pred2.sd.df <-field.pred2.df <- data.frame(x1 = predgrid$lattice$loc[,1], x2 = predgrid$lattice$loc[,2], sd = as.vector(sqrt(pred.rspde$variance))) ggplot(field.pred2.sd.df, aes(x = x1, y = x2, fill = sd)) + geom_raster() + scale_fill_viridis() ``` # References