---
title: "Rational approximation with the rSPDE package"
author: "David Bolin, Alexandre B. Simas, Zhen Xiong"
date: "Created: 2021-12-04. Last modified: `r Sys.Date()`."
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Rational approximation with the rSPDE package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: references.bib
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
set.seed(1)
```

```{r inla_link, include = FALSE}
inla_link <- function() {
  sprintf("[%s](%s)", "`R-INLA`", "https://www.r-inla.org")
}
```

## Introduction

In this vignette we will introduce the covariance-based rational SPDE approach and illustrate how to perform statistical inference with it. 

The covariance-based approach is an efficient alternative 
to the operator-based rational SPDE approach by [@bolin19](https://www.tandfonline.com/doi/full/10.1080/10618600.2019.1665537)
which works when one has SPDE driven
by Gaussian white noise. We refer the reader to [@xiong22](https://doi.org/10.1080/10618600.2023.2231051)
for the theoretical details of the approach.

Details about the operator-based rational SPDE approach are given in the [Operator-based rational approximation](rspde_base.html) vignette. 
For the `r inla_link()` and `inlabru` implementations of the covariance-based rational SPDE approach we refer the reader to
the vignettes [R-INLA implementation of the rational SPDE approach](rspde_inla.html) and [inlabru implementation of the rational SPDE approach](rspde_inlabru.html) respectively.

## Covariance-based rational SPDE approach

Let us first present the idea behind the approach. In the SPDE approach, introduced in [@lindgren11](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1467-9868.2011.00777.x) we model $u$ as the solution of the following SPDE: 
$$
L^{\alpha/2}(\tau u) = W,
$$ 
where 
$L  = -\Delta +\kappa^2 I$ and $W$ is the standard Gaussian white noise. 
Here, $\alpha$, $\kappa$ and $\tau$ are the parameters of the model. In
the standard SPDE approach, $\alpha = \nu + d/2$ needs to be fixed to an integer value, where $\alpha = 2$ is the usual default value. 
In the rational SPDE approach we can use any value of $\nu>0$ and also estimate it from data.

The main idea of the covariance-based rational SPDE approach is to perform the rational approximation of the covariance operator $L^{-\alpha}$.
To this end, we begin by obtaining an approximation of the random field $u$, which is the solution of the
SPDE above, by using the finite element method (FEM): 
$$
u_h(\mathbf{s}_i)=\sum_{j=1}^{n_h} \hat{u}_j \varphi_j(\mathbf{s}_i),
$$ 
where $\{\hat{u}_j\}_{j = 1}^{n_h}$ are stochastic weights and $\{\varphi_j(\mathbf{s}_i)\}_{j = 1}^{n_h}$ are fixed piecewise linear and continuous basis functions obtained from a triangulation of the spatial domain.
We then obtain a FEM approximation of the operator $L$, which is given by $L_h$, and the covariance operator of $u_h$ is
given by $L_h^{-\alpha}$. 

Now, by using the rational approximation on $L_h$, we can approximate covariance operator 
$L_h^{-\alpha}$ as 
$$L_{h,m}^{-\alpha} = L_h^{-\lfloor\alpha\rfloor} p(L_h^{-1})q(L_h^{-1})^{-1},$$
where $\lfloor\alpha\rfloor$ denotes the integer part of $\alpha$, $m$ is the order of rational approximation, 
$p(L_h^{-1}) = \sum_{i=0}^m a_i L_h^{m-i}$ and 
$q(L_h^{-1}) = \sum_{j=0}^m b_j L_h^{m-i}$, with 
$\{a_i\}_{i = 0}^m$ and $\{b_j\}_{j = 0}^m$ being known coefficients 
obtained from a rational approximation of the function $x^{\alpha - \lfloor\alpha\rfloor}$. 

The next step is to perform a partial fraction decomposition of the rational function $p(L_h^{-1})q(L_h^{-1})^{-1}$, which yields the 
representation 
$$L_{h,m}^{-\alpha} =L_h^{-\lfloor\alpha\rfloor} \left(\sum_{i=1}^{m}  r_i  (L_h-p_i I)^{-1} +k\right).$$
Based on the above operator equation, we can write the covariance matrix of the stochastic weights $\hat{\textbf{u}}$, where $\hat{\textbf{u}}=[\hat{u}_1,...,\hat{u}_{n_h}]^\top$, 
as 
$$\mathbf{\Sigma}_{\hat{\textbf{u}}} = (\textbf{L}^{-1}\textbf{C})^{\lfloor\alpha\rfloor} \sum_{i=1}^{m}r_i(\textbf{L}-p_i\textbf{C})^{-1}+\textbf{K}, $$
where $\textbf{C} = \{C_{ij}\}_{i,j=1}^{n_h}$, $C_{ij} = (\varphi_i,\varphi_j)_{L_2(\mathcal{D})}$, is the mass matrix, $\textbf{L} = \kappa^2\textbf{C}+\textbf{G}$, $\textbf{G} = \{G_{ij}\}_{i,j=1}^{n_h}$, $G_{ij}=(\nabla\varphi_i,\nabla\varphi_j)_{L_2(\mathcal{D})}$, is the stiffness matrix, and
$$
\textbf{K}=\left\{
	\begin{array}{lcl}
		k\textbf{C}      &      & {\lfloor\alpha\rfloor=0}\\
		k\textbf{L}^{-1}(\textbf{C}\textbf{L}^{-1})^{\lfloor\alpha\rfloor-1}    &      & {\lfloor\alpha\rfloor\geq 1}\\
	\end{array} \right. .
	$$
	

The above representation shows that we can express $\hat{\textbf{u}}$ as 
$$
\hat{\textbf{u}}=\sum_{i=1}^{m+1}\textbf{x}_i,
$$
where $\textbf{x}_i = (x_{i,1}, \ldots, x_{i,n_h})$,
$$\textbf{x}_i \sim N(\textbf{0},\textbf{Q}_i^{-1}),$$
and $\textbf{Q}_i$ is the precision matrix of $\textbf{x}_i$, which is given by 
$$
\textbf{Q}_i=\left \{
	\begin{array}{lcl}
		(\textbf{L}-p_i\textbf{C})(\textbf{C}^{-1}\textbf{L})^{\lfloor\alpha\rfloor}/r_i,      &      & {i = 1,...,m}\\
		 \textbf{K}^{-1},   &      & {i = m+1}\\
	\end{array}. \right.
	$$

We, then, replace the Matérn latent field by the latent vector given above, which
has precision matrix given by
$$\textbf{Q}=\begin{bmatrix}\textbf{Q}_1& &\\&\ddots&\\& &\textbf{Q}_{m+1}\end{bmatrix}.$$
We thus have a Markov approximation which can be used for computationally efficient inference. For example, assume that we observe 
$$y_j = u_h(\mathbf{s}_j) + \varepsilon_j,\quad j=1,\ldots, N,$$
where $\varepsilon_j\sim N(0,\sigma_\varepsilon^2)$ are iid measurement noise. Then, we have that
$$
y_j = u_h(\mathbf{s}_j) + \varepsilon_j = \sum_{l=1}^{n_h} \hat{u}_l \varphi_l(\mathbf{s}_j) + \varepsilon_j = \sum_{i=1}^{m+1} \sum_{l=1}^{n_h} x_{i,l} \varphi_l(\mathbf{s}_j) + \varepsilon_j.
$$
This can be written in a matrix form as
$$\textbf{y} = \overline{\textbf{A}} \textbf{X} + \boldsymbol{\varepsilon},$$
where $\textbf{y} = [y_1,\ldots,y_N]^\top, \textbf{X} = [\textbf{x}_1^\top,\ldots,\textbf{x}_{m+1}^\top]^\top$, $\boldsymbol{\varepsilon} = [\varepsilon_1,\ldots,\varepsilon_N]^\top$,
$$\overline{\textbf{A}}=\begin{bmatrix}\textbf{A}&\cdots&\textbf{A}\end{bmatrix}_{n\times n_h(m+1)},$$ 
and
$$\textbf{A}=\begin{bmatrix}\varphi_1(s_1)&\cdots&\varphi_{n_h}(s_1)\\\vdots&\vdots&\vdots\\\varphi_1(s_n)&\cdots&\varphi_{n_h}(s_n)\end{bmatrix}.$$
We then arrive at the following hierarchical model:
$$\begin{align} \textbf{y}\mid \textbf{X} &\sim N(0,\sigma_\varepsilon\textbf{I})\\ \textbf{X}&\sim N(0,\textbf{Q}^{-1}) \end{align}.$$

With these elements, we can, for example, use `r inla_link()` to compute the posterior distribution of the three parameters we want to estimate.


## Constructing the approximation

In this section, we explain how to to use the function `matern.operators()`
with the default argument `type`, that is, `type="covariance"`, which is
constructs the covariance-based rational approximation. We will
also illustrate the usage of several methods and functions related to
the covariance-based rational approximation. We will use functions to sample from Gaussian fields
with stationary Matérn covariance function, compute the log-likelihood function, and do spatial prediction.

The first step for performing the covariance-based rational SPDE approximation is to define the FEM mesh. 
We will also illustrate how spatial models 
can be constructed if the FEM implementation of the `fmesher` package is used. 
When using the `r inla_link()` package, we also recommend the usage of our `r inla_link()` implementation
of the rational SPDE approach. For more details, see the [R-INLA implementation of the rational SPDE approach](rspde_inla.html) vignette.

We begin by loading the `rSPDE` package:

```{r, message=FALSE, warning=FALSE}
library(rSPDE)
```

Assume that we want to define a model on the interval $[0,1]$. We then start by defining a vector with mesh nodes $s_i$ where the basis functions $\varphi_i$ are centered. 
```{r}
s <- seq(from = 0, to = 1, length.out = 101)
```

We can now use `matern.operators()` to construct a rational SPDE approximation of order $m=2$ for a Gaussian random field with a Matérn covariance function on the interval. We also refer the reader to the
[Operator-based rational approximation](rspde_base.html) for a similar comparison made for the operator-based rational approximation.

```{r}
kappa <- 20
sigma <- 2
nu <- 0.8
r <- sqrt(8*nu)/kappa #range parameter
op_cov <- matern.operators(loc_mesh = s, nu = nu,
  range = r, sigma = sigma, d = 1, m = 2, parameterization = "matern"
)
```
The object `op_cov` contains the matrices needed for evaluating the distribution of the stochastic weights $\boldsymbol{\mathrm{u}}$. If we want to evaluate $u_h(s)$ at some locations $s_1,\ldots, s_n$, we need to multiply the weights with the basis functions $\varphi_i(s)$ evaluated at the locations. For this, we can construct the observation matrix $\boldsymbol{\mathrm{A}}$, with elements $A_{ij} = \varphi_j(s_i)$, which links the FEM basis functions to the locations. 
This matrix can be constructed using the function `fm_basis()` from the `fmesher` package. However, as observed in the introduction of this vignette, we have decomposed the stochastic weights $\boldsymbol{\mathrm{u}}$ into a vector of latent variables.
Thus, the $A$ matrix for the covariance-based rational approximation, which we will denote by $\overline{A}$, is actually given by the $m+1$-fold horizontal concatenation of these $A$ matrices, where $m$ is the order
of the rational approximation.
<!-- 
To compute the precision matrix from the covariance-based rational approximation one
can use the `precision()` method on the `CBrSPDEobj` object (the object returned by
the `matern.operators()` function with the default type, which is
`type="covariance"`):

# ```{r}
# Q <- precision(op_cov)
# ``` 

-->

To evaluate the accuracy of the approximation, let us compute the covariance function between the process at $s=0.5$ and all other locations in `s` and compare with the true Matérn covariance function. The covariances can be calculated by using the `cov_function_mesh()` method.
<!-- as 
$$
\overline{\boldsymbol{\mathrm{A}}} \boldsymbol{\mathrm{Q}}^{-1}\overline{\boldsymbol{\mathrm{v}}}.
$$
Here, $\boldsymbol{\mathrm{Q}}$ is the precision matrix obtained from the covariance-based rational approximation, $\boldsymbol{\mathrm{A}}$ is an identity matrix since we are evaluating the approximation in the nodes of the FEM mesh, $\overline{\boldsymbol{\mathrm{v}}}$ is the $(m+1)$-fold vertical concatenation of the vector $\boldsymbol{\mathrm{v}}$, where $\boldsymbol{\mathrm{v}}$ is a vector with all basis functions evaluated in $s=0.5$.
-->

<!-- 
# ```{r}
#library(fmesher)
#mesh_1d <- fm_mesh_1d(s)
#v <- t(fm_basis(x = mesh_1d, loc = 0.5))
#v_bar <- kronecker(matrix(1, nrow = 3), v)
#A <- Diagonal(101)
#A_bar <- kronecker(matrix(1, ncol = 3), A)
#c_cov.approx <- (A_bar) %*% solve(Q, v_bar)
# ``` 
-->

```{r}
c_cov.approx <- cov_function_mesh(op_cov, 0.5)
```

Let us now compute the true Matérn covariance function on the interval $(0,1)$,
which is the folded Matérn, see Theorem 1 in [An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach](https://www.jstor.org/stable/41262260) for further details.

```{r}
c.true <- folded.matern.covariance.1d(rep(0.5, length(s)), 
                                      abs(s), kappa, nu, sigma)
```

The covariance function and the error compared with the Matérn covariance are shown in the following figure.
```{r, fig.show='hold', fig.align = "center",echo=TRUE}
opar <- par(
  mfrow = c(1, 2), mgp = c(1.3, 0.5, 0),
  mar = c(2, 2, 0.5, 0.5) + 0.1
)
plot(s, c.true,
  type = "l", ylab = "C(|s-0.5|)", xlab = "s", ylim = c(0, 5),
  cex.main = 0.8, cex.axis = 0.8, cex.lab = 0.8
)
lines(s, c_cov.approx, col = 2)
legend("topright",
  bty = "n",
  legend = c("Matérn", "Rational"),
  col = c("black", "red"),
  lty = rep(1, 2), ncol = 1,
  cex = 0.8
)

plot(s, c.true - c_cov.approx,
  type = "l", ylab = "Error", xlab = "s",
  cex.main = 0.8, cex.axis = 0.8, cex.lab = 0.8
)
par(opar)
```

To improve the approximation we can increase the degree of the polynomials, by increasing $m$, and/or increase the number of basis functions used for the FEM approximation. Let us, for example, compute the approximation with $m=4$ using the same mesh, as well as the approximation when we increase the number of basis functions and use $m=2$ and $m=4$. We will also load the `fmesher` package to use the `fm_basis()` and `fm_mesh_1d()` functions to map between the meshes.

```{r}
library(fmesher)

op_cov2 <- matern.operators(
  range = r, sigma = sigma, nu = nu,
  loc_mesh = s, d = 1, m = 4,
  parameterization = "matern"
)

c_cov.approx2 <- cov_function_mesh(op_cov2, 0.5)

s2 <- seq(from = 0, to = 1, length.out = 501)

op_cov <- matern.operators(
  range = r, sigma = sigma, nu = nu,
  loc_mesh = s2, d = 1, m = 2,
  parameterization = "matern"
)

mesh_s2 <- fm_mesh_1d(s2)

# Map the mesh s2 to s
A2 <- fm_basis(mesh_s2, s)

c_cov.approx3 <- A2 %*% cov_function_mesh(op_cov, 0.5)

op_cov <- matern.operators(
  range = r, sigma = sigma, nu = nu,
  loc_mesh = s2, d = 1, m = 4,
  parameterization = "matern"
)

c_cov.approx4 <- A2 %*% cov_function_mesh(op_cov, 0.5)
```

The resulting errors are shown in the following figure. 

```{r, fig.show='hold',fig.align = "center",echo=TRUE}
opar <- par(mgp = c(1.3, 0.5, 0), mar = c(2, 2, 0.5, 0.5) + 0.1)
plot(s, c.true - c_cov.approx,
  type = "l", ylab = "Error", xlab = "s", col = 1,
  cex.main = 0.8, cex.axis = 0.8, cex.lab = 0.8
)
lines(s, c.true - c_cov.approx2, col = 2)
lines(s, c.true - c_cov.approx3, col = 3)
lines(s, c.true - c_cov.approx4, col = 4)
legend("bottomright",
  bty = "n",
  legend = c("m=2 coarse mesh", "m=4 coarse mesh", 
             "m=2 fine mesh", "m=4 fine mesh"),
  col = c(1, 2, 3, 4),
  lty = rep(1, 2), ncol = 1,
  cex = 0.8
)
par(opar)
```

Since the error induced by the rational approximation decreases exponentially in $m$, there is in general rarely a need for an approximation with a large value of $m$. This is good because the size of $\boldsymbol{\mathrm{Q}}$ increases with $m$, which makes the approximation more computationally costly to use. To illustrate this, let us compute the norm of the approximation error for different $m$.
```{r}
# Mapping s2 to s
A2 <- fm_basis(mesh_s2, s)

errors <- rep(0, 4)
for (i in 1:4) {
  op_cov <- matern.operators(
    range = r, sigma = sigma, nu = nu,
    loc_mesh = s2, d = 1, m = i,
    parameterization = "matern"
  )
  c_cov.approx <- A2 %*% cov_function_mesh(op_cov, 0.5)
  errors[i] <- norm(c.true - c_cov.approx)
}
print(errors)
```
We see that the error decreases very fast when we increase $m$ from $1$ to $4$,
without any numerical instability. This is an advantage of the covariance-based
rational approximation when compared to the operator-based rational approximation. 
See [Operator-based rational approximation](rspde_base.html)
for details on the numerical instability of the operator-based rational approximation.

## Using the approximation 

When we use the function `matern.operators()`, we can simulate from the model using the
`simulate()` method. To such an end we simply apply the `simulate()` method to the
object returned by the `matern.operators()` function:

```{r}
u <- simulate(op_cov)
```

If we want replicates, we simply set the argument `nsim` to the desired number 
of replicates. For instance, to generate two replicates of the model, we
simply do:

```{r}
u.rep <- simulate(op_cov, nsim = 2)
```

### Fitting a model

There is built-in support for computing log-likelihood functions and performing kriging prediction in the `rSPDE` package. To illustrate this, we use the simulation to create some noisy observations of the process. For this, we first construct the observation matrix linking the FEM basis functions to the locations where we want to simulate. We first randomly generate some observation locations and then construct the matrix.
```{r}
set.seed(1)
s <- seq(from = 0, to = 1, length.out = 501)
n.obs <- 200
obs.loc <- runif(n.obs)
mesh_s <- fm_mesh_1d(s)
A <- fm_basis(x = mesh_s, loc = obs.loc)
```

We now generate the observations as $Y_i = 2 - x1 + u(s_i) + \varepsilon_i$, where $\varepsilon_i \sim N(0,\sigma_e^2)$ is Gaussian measurement noise, $x1$ is a covariate giving the observation location. We will assume that the latent process has a Matérn covariance
with $\kappa=20, \sigma=1.3$ and $\nu=0.8$:
```{r, message=FALSE}
kappa <- 20
sigma <- 1.3
nu <- 0.8
r <- sqrt(8*nu)/kappa
op_cov <- matern.operators(
  loc_mesh = s, nu = nu,
  range = r, sigma = sigma, d = 1, m = 2,
  parameterization = "matern"
)

u <- simulate(op_cov)

sigma.e <- 0.3

x1 <- obs.loc

Y <- 2 - x1 + as.vector(A %*% u + sigma.e * rnorm(n.obs))

Y <- as.vector(A%*% u + sigma.e * rnorm(n.obs))

df_data <- data.frame(y = Y, loc = obs.loc, x1 = x1)
```

Let us create a new object to fit the model:

```{r}
op_cov_est <- matern.operators(
  loc_mesh = s, d = 1, m = 2,
  parameterization = "matern"
)
```


Let us now fit the model. To this end we will use the `rspde_lme()` function:

```{r}
fit <- rspde_lme(y~x1, model = op_cov_est,
                    data = df_data, loc = "loc")
```

We can get a summary of the fit with the `summary()` method:

```{r}
summary(fit)
```

Let us compare the parameters of the latent model:
```{r}
print(data.frame(
  nu = c(nu, fit$coeff$random_effects["nu"]),
  sigma = c(sigma, fit$coeff$random_effects["sigma"]), 
  range = c(r, fit$coeff$random_effects["range"]),
  row.names = c("Truth", "Estimates")
))

# Total time
print(fit$fitting_time)
```

Let us take a glance at the fit:

```{r}
glance(fit)
```


We can also speed up the optimization by setting `parallel=TRUE` (which uses implicitly the `optimParallel` function):

```{r, message=FALSE, warning=FALSE}
fit_par <- rspde_lme(y~x1, model = op_cov_est,
                    data = df_data, loc = "loc", parallel = TRUE)
```

Here is the summary:

```{r}
summary(fit_par)
```

Let us compare with the true values and compare the time:

```{r}
print(data.frame(
  sigma = c(sigma, fit_par$coeff$random_effects["sigma"]), 
  range = c(r, fit_par$coeff$random_effects["range"]),
  nu = c(nu, fit_par$coeff$random_effects["nu"]),
  row.names = c("Truth", "Estimates")
))

# Total time (time to fit plus time to set up the parallelization)
total_time <- fit_par$fitting_time + fit_par$time_par
print(total_time)
```

### Kriging

Finally, we compute the kriging prediction of the process $u$ at the locations in `s` based on these observations. 

Let us create the `data.frame` with locations in which we want to obtain the predictions. Observe that we also must provide the values of the covariates.

```{r}
df_pred <- data.frame(loc = s, x1 = s)
```

We can now perform kriging with the `predict()` method:

```{r}
u.krig <- predict(fit, newdata = df_pred, loc = "loc")
```


The simulated process, the observed data, and the kriging prediction are shown in the following figure.

```{r, fig.show='hold',fig.align = "center",echo=TRUE}
opar <- par(mgp = c(1.3, 0.5, 0), mar = c(2, 2, 0.5, 0.5) + 0.1)
plot(obs.loc, Y,
  ylab = "u(s)", xlab = "s",
  ylim = c(min(c(min(u), min(Y))), max(c(max(u), max(Y)))),
  cex.main = 0.8, cex.axis = 0.8, cex.lab = 0.8
)
lines(s, u.krig$mean, col = 2)
par(opar)
```

We can also use the `augment()` function and pipe the results into a plot:

```{r, message=FALSE, warning=FALSE}
library(ggplot2)
library(dplyr)

augment(fit, newdata = df_pred, loc = "loc") %>% ggplot() + 
                aes(x = loc, y = .fitted) +
                geom_line(col="red") + 
                geom_point(data = df_data, aes(x = loc, y=Y))
```

## Fitting a model with replicates

Let us illustrate how to simulate a dataset with replicates and then fit a model to such data.
Recall that to simulate a latent model with replicates, all we do is set
the `nsim` argument to the number of replicates.

We will use the `CBrSPDEobj` object (returned
from the `matern.operators()` function) from
the previous example, namely `op_cov`.

```{r}
set.seed(123)
n.rep <- 20
u.rep <- simulate(op_cov, nsim = n.rep)
```

Now, let us generate the observed values $Y$:

```{r}
sigma.e <- 0.3
Y.rep <- A %*% u.rep + sigma.e * matrix(rnorm(n.obs * n.rep), ncol = n.rep)
```

Note that $Y$ is a matrix with 20 columns, each column containing 
one replicate. We need to turn `y` into a vector and create an auxiliary vector `repl` indexing the replicates of `y`:

```{r, message=FALSE, warning=FALSE}
y_vec <- as.vector(Y.rep)
repl <- rep(1:n.rep, each = n.obs)

df_data_repl  <- data.frame(y = y_vec, loc = rep(obs.loc, n.rep))
```

We can now fit the model in the same way as before by using the `rspde_lme()` function:

```{r}
fit_repl <- rspde_lme(y_vec ~ -1, model = op_cov_est, repl = repl, 
      data = df_data_repl, loc = "loc", parallel = TRUE)
```

Let us see a summary of the fit:

```{r}
summary(fit_repl)
```

and glance:

```{r}
glance(fit_repl)
```

Let us compare with the true values:

```{r}
  print(data.frame(
    sigma = c(sigma, fit_repl$coeff$random_effects["sigma"]), 
    range = c(r, fit_repl$coeff$random_effects["range"]),
    nu = c(nu, fit_repl$coeff$random_effects["nu"]),
    row.names = c("Truth", "Estimates")
  ))

# Total time
print(fit_repl$fitting_time)
```

We can obtain better estimates of the Hessian by setting `improve_hessian` to `TRUE`, however this might make the process take longer:

```{r}
fit_repl2 <- rspde_lme(y_vec ~ -1, model = op_cov_est, repl = repl, 
      data = df_data_repl, loc = "loc", parallel = TRUE, 
      improve_hessian = TRUE)
```

Let us get a summary:
```{r}
summary(fit_repl2)
```

## Spatial data and parameter estimation

The functions used in the previous examples also work for spatial models. 
We then need to construct a mesh over the domain of interest and then compute the 
matrices needed to define the operator. These tasks can be performed, for example, 
using the `fmesher` package. Let us start by defining a mesh over $[0,1]\times [0, 1]$ 
and compute the mass and stiffness matrices for that mesh. 

We consider a simple Gaussian linear model with 30 independent replicates 
of a latent spatial field $u(\mathbf{s})$, observed at the same 
$m$ locations, $\{\mathbf{s}_1 , \ldots , \mathbf{s}_m \}$, for each replicate.
For each $i = 1,\ldots,m,$ we have

\begin{align} 
y_i &= u_1(\mathbf{s}_i)+\varepsilon_i,\\
\vdots &= \vdots\\

y_{i+29m} &= u_{30}(\mathbf{s}_i) + \varepsilon_{i+29m},
\end{align}

where $\varepsilon_1,\ldots,\varepsilon_{30m}$ are iid normally distributed
with mean 0 and standard deviation 0.1.

Let us create the FEM mesh:

```{r, message=FALSE, warning=FALSE,fig.align = "center"}
n_loc <- 500
loc_2d_mesh <- matrix(runif(n_loc * 2), n_loc, 2)
mesh_2d <- fm_mesh_2d(
  loc = loc_2d_mesh,
  cutoff = 0.05,
  offset = c(0.1, 0.4),
  max.edge = c(0.05, 0.5)
)
plot(mesh_2d, main = "")
points(loc_2d_mesh[, 1], loc_2d_mesh[, 2])
```

We can now use this mesh to define a rational SPDE approximation of order
$m=2$ for a Matérn model in the same fashion as we did above in
the one-dimensional case.
We now simulate a latent process with standard deviation $\sigma=1$ and 
range $0.1$. We will use $\nu=0.5$ so that the model has an exponential covariance function.
To this end we create a model object with the `matern.operators()` function:

```{r}
nu <- 0.7
sigma <- 1.3
range <- 0.15
d <- 2
op_cov_2d <- matern.operators(
  mesh = mesh_2d,
  nu = nu,
  range = range,
  sigma = sigma,
  m = 2,
  parameterization = "matern"
)
tau <- op_cov_2d$tau
```

Now let us simulate some noisy data that we will use to estimate the parameters of the model. To construct the observation matrix, we use the function `fm_basis()` from the `fmesher` package.
Recall that we will simulate the data
with 30 replicates. 

```{r}
n.rep <- 30
u <- simulate(op_cov_2d, nsim = n.rep)
A <- fm_basis(
  x = mesh_2d,
  loc = loc_2d_mesh
)
sigma.e <- 0.1
Y <- A %*% u + matrix(rnorm(n_loc * n.rep), ncol = n.rep) * sigma.e
```

The first replicate of the simulated random field as well as the observation locations are shown in the following figure.

```{r, fig.show='hold',fig.align = "center",echo=TRUE, warning=FALSE, message=FALSE}
library(viridis)
library(ggplot2)
proj <- fm_evaluator(mesh_2d, dims = c(70, 70))

df_field <- data.frame(x = proj$lattice$loc[,1],
                        y = proj$lattice$loc[,2],
                        field = as.vector(fm_evaluate(proj, 
                        field = as.vector(u[, 1]))),
                        type = "field")

df_loc <- data.frame(x = loc_2d_mesh[, 1],
                      y = loc_2d_mesh[, 2],
                      field = as.vector(Y[,1]),
                      type = "locations")
df_plot <- rbind(df_field, df_loc)

ggplot(df_plot) + aes(x = x, y = y, fill = field) +
        facet_wrap(~type) + xlim(0,1) + ylim(0,1) + 
        geom_raster(data = df_field) +
        geom_point(data = df_loc, aes(colour = field),
        show.legend = FALSE) + 
        scale_fill_viridis() + scale_colour_viridis()
```

Let us now create a new object to fit the model:

```{r}
op_cov_2d_est <- matern.operators(
  mesh = mesh_2d,
  m = 2
)
```

We can now proceed as in the previous cases. We set up a vector with the response variables and create an auxiliary replicates vector, `repl`, that contains the indexes of the replicates of each observation, and then we fit the model:

```{r}
y_vec <- as.vector(Y)
repl <- rep(1:n.rep, each = n_loc)
df_data_2d <- data.frame(y = y_vec, x_coord = loc_2d_mesh[,1],
                      y_coord = loc_2d_mesh[,2])

fit_2d <- rspde_lme(y ~ -1, model = op_cov_2d_est, 
          data = df_data_2d, repl = repl,
          loc = c("x_coord", "y_coord"), parallel = TRUE)
```

Let us get a summary:

```{r}
summary(fit_2d)
```

and glance:

```{r}
glance(fit_2d)
```

Let us compare the estimated results with the true values:
```{r}
print(data.frame(
  sigma = c(sigma, fit_2d$alt_par_coeff$coeff["sigma"]), 
  range = c(range, fit_2d$alt_par_coeff$coeff["range"]),
  nu = c(nu, fit_2d$alt_par_coeff$coeff["nu"]),
  row.names = c("Truth", "Estimates")
))

# Total time
print(fit_2d$fitting_time)
```

Let us now plot the prediction for replicate 3 by using the `augment` function. We begin by creating the `data.frame` we want to do prediction:

```{r}
df_pred <- data.frame(x = proj$lattice$loc[,1],
                        y = proj$lattice$loc[,2])
```

```{r}
augment(fit_2d, newdata = df_pred, loc = c("x","y"), which_repl = 3) %>% ggplot() +
              geom_raster(aes(x=x, y=y, fill = .fitted)) + xlim(0,1) + ylim(0,1) + 
              scale_fill_viridis()
```

## Further details on rspde_lme

The `rspde_lme` function provides flexibility in model fitting by allowing users to fix certain parameters at specific values or set custom starting values for the optimization process. This can be useful when you have prior knowledge about some parameters or when you want to improve convergence by providing better starting points.

### Fixing parameters
Parameters can be fixed by using the `model_options` argument with elements of the form `fix_parname = value`, where `parname` is the name of the parameter you want to fix. For stationary models, the parameters that can be fixed in the `model_options` list are:

- `fix_sigma_e`: Fix the standard deviation of the noise parameter $\sigma_\varepsilon$
- `fix_nu`: Fix the smoothness parameter ν
- `fix_sigma`: Fix the standard deviation parameter σ
- `fix_range`: Fix the range parameter
- `fix_tau`: Fix the precision parameter τ (when using the SPDE parameterization)
- `fix_kappa`: Fix the scale parameter κ (when using the SPDE parameterization)
- `fix_alpha`: Fix the fractional power α (when using the SPDE parameterization)

### Setting starting values

Similarly, you can set starting values for parameters using elements of the form `start_parname = value` in the `model_options` list. This is particularly useful when the default starting values might be far from the optimal values, which could lead to slow convergence or convergence to a local minimum. For example, to set a starting value for the range parameter, you would use `model_options = list(start_range = 0.15)`.

- `start_sigma_e`: Starting value for the standard deviation of the noise parameter $\sigma_\varepsilon$
- `start_nu`: Starting value for the smoothness parameter ν
- `start_sigma`: Starting value for the standard deviation parameter σ
- `start_range`: Starting value for the range parameter
- `start_tau`: Starting value for the precision parameter τ (when using the SPDE parameterization)
- `start_kappa`: Starting value for the scale parameter κ (when using the SPDE parameterization)
- `start_alpha`: Starting value for the fractional power α (when using the SPDE parameterization)

### Example: Fixing and setting starting values

Let us demonstrate how to use these parameters with our 2D spatial example. We will fix the smoothness parameter ν to 0.8, fix the standard deviation σ to 1, and set the starting value for the range parameter to 0.15.

```{r}
# Create a new model fit with fixed ν and σ, and a starting value for range
fit_2d_fixed <- rspde_lme(y ~ -1, model = op_cov_2d_est, 
                         data = df_data_2d, repl = repl,
                         loc = c("x_coord", "y_coord"),, 
                         model_options = list(
                           fix_nu = 0.8,      # Fix ν to 0.8
                           fix_sigma = 1,     # Fix σ to 1
                           start_range = 0.15 # Set starting value for range
                         ),
                         parallel = TRUE)
```

Let us get a summary of the fit:

```{r}
summary(fit_2d_fixed)
```

Notice in the summary that ν and σ are fixed at the specified values, and only the range parameter is estimated. When parameters are fixed, they do not appear in the parameter estimates section, but rather are treated as constants.

Let us compare this fit with our previous fit where all parameters were estimated:

```{r}
# Compare the log-likelihoods
logLik_full <- logLik(fit_2d)
logLik_fixed <- logLik(fit_2d_fixed)

cat("Log-likelihood with all parameters estimated:", logLik_full, "\n")
cat("Log-likelihood with fixed nu and sigma:", logLik_fixed, "\n")
```

### Using a previous fit to initialize the optimization

You can use a previous fit to initialize the optimization by using the `start_from_previous` argument. For example, to use the previous fit to initialize the optimization, you would use `start_from_previous = fit_2d`.

```{r}
fit_2d_start <- rspde_lme(y ~ -1, model = op_cov_2d_est, 
                         data = df_data_2d, repl = repl,
                         loc = c("x_coord", "y_coord"),, 
                         previous_fit = fit_2d)
```

Let us get a summary of the fit:

```{r}
summary(fit_2d_start)
```

## An example with a non-stationary model

Our goal now is to show how one can fit model with non-stationary $\sigma$ (std. deviation) and non-stationary $\rho$ (a range parameter).
One can also use the parameterization in terms of non-stationary SPDE parameters $\kappa$ and $\tau$. 

For this example we will consider simulated data. 

### Simulating the data

Let us consider a simple Gaussian linear model with a latent spatial field $x(\mathbf{s})$,
defined on the rectangle $(0,10) \times (0,5)$, 
where the std. deviation and range parameter satisfy the following log-linear regressions:
\begin{align}
\log(\sigma(\mathbf{s})) &= \theta_1 + \theta_3 b(\mathbf{s}),\\
\log(\rho(\mathbf{s})) &= \theta_2 + \theta_3 b(\mathbf{s}),
\end{align}
where $b(\mathbf{s}) = (s_1-5)/10$. We assume the data is observed at 
$m$ locations, $\{\mathbf{s}_1 , \ldots , \mathbf{s}_m \}$.
For each $i = 1,\ldots,m,$ we have

$$y_i = x_1(\mathbf{s}_i)+\varepsilon_i,$$

where $\varepsilon_1,\ldots,\varepsilon_{m}$ are iid normally distributed
with mean 0 and standard deviation 0.1.

We begin by defining the domain and creating the mesh:

```{r fig.align = "center"}
rec_domain <- cbind(c(0, 1, 1, 0, 0) * 10, c(0, 0, 1, 1, 0) * 5)

mesh <- fm_mesh_2d(loc.domain = rec_domain, cutoff = 0.1, 
  max.edge = c(0.5, 1.5), offset = c(0.5, 1.5))
```

We follow the same structure as `INLA`. However, `INLA` only allows one to specify 
`B.tau` and `B.kappa` matrices, and, in `INLA`, if one wants to parameterize in terms of range and
standard deviation one needs to do it manually. Here we provide the option to directly provide
the matrices `B.sigma` and `B.range`. 

The usage of the matrices `B.tau` and `B.kappa` are identical to the corresponding ones in `inla.spde2.matern()` function.
The matrices `B.sigma` and `B.range` work in the same way, but they parameterize the stardard deviation
and range, respectively.

The columns of the `B` matrices correspond to the same parameter. The first column does not
have any parameter to be estimated, it is a constant column. 

So, for instance, if one wants to share a parameter with both `sigma` and `range` (or with both `tau` and `kappa`),
one simply let the corresponding column to be nonzero on both `B.sigma` and `B.range` (or on `B.tau` and `B.kappa`).

We will assume $\nu = 0.8$, $\theta_1 = 0, \theta_2 = 1$ and $\theta_3=1$. Let us now build the model with the `spde.matern.operators()` function:


```{r}
nu <- 0.8
true_theta <- c(0,1, 1)
B.sigma = cbind(0, 1, 0, (mesh$loc[,1] - 5) / 10)
B.range = cbind(0, 0, 1, (mesh$loc[,1] - 5) / 10)
alpha <- nu + 1 # nu + d/2 ; d = 2

# SPDE model
op_cov_ns <- spde.matern.operators(mesh = mesh, 
  theta = true_theta,
  nu = nu,
  B.sigma = B.sigma, 
  B.range = B.range,
  parameterization = "matern")
```

Let us now sample the data with the `simulate()` method:

```{r, warning=FALSE}
u <- as.vector(simulate(op_cov_ns, seed = 123))
```

Let us now obtain 600 random locations on the rectangle and 
compute the $A$ matrix:

```{r}
m <-600
loc_mesh <- cbind(runif(m) * 10, runif(m) * 5)

A <- fm_basis(
  x = mesh,
  loc = loc_mesh
)
```

We can now generate the response vector `y`:
```{r}
y <- as.vector(A %*% as.vector(u)) + rnorm(m) * 0.3
```

Let us now create the object to fit the data:

```{r}
op_cov_ns_est <- op_cov_ns <- spde.matern.operators(mesh = mesh, 
  B.sigma = B.sigma, 
  B.range = B.range,
  parameterization = "matern")
```

Let us also create the `data.frame()` that contains the data and the locations:

```{r}
df_data_ns <- data.frame(y= y, x_coord = loc_mesh[,1], y_coord = loc_mesh[,2])
```

### Fitting the non-stationary rSPDE model

```{r}
fit_ns <- rspde_lme(y ~ -1, model = op_cov_ns_est, 
          data = df_data_ns, loc = c("x_coord", "y_coord"), 
          parallel = TRUE)
```

Let us get the summary:

```{r}
summary(fit_ns)
```

Let us now compare with the true values:

```{r}
print(data.frame(
  theta1 = c(true_theta[1], fit_ns$coeff$random_effects[2]), 
  theta2 = c(true_theta[2], fit_ns$coeff$random_effects[3]),
  theta3 = c(true_theta[3], fit_ns$coeff$random_effects[4]), 
  nu = c(alpha-1, fit_ns$coeff$random_effects[1])),
  row.names = c("Truth", "Estimates"))
```

### Fixing parameters in non-stationary models

In non-stationary models, the parameters are labeled as theta1, theta2, etc., corresponding to the coefficients in the B matrices. Similar to the stationary case, we can fix individual parameters or set starting values, but these options must be set in the `model_options` list argument using `fix_theta1`, `fix_theta2`, etc., or `start_theta1`, `start_theta2`, etc.

For example, if we want to fix the first coefficient theta1 to 0:

```{r}
fit_ns_fixed_theta1 <- rspde_lme(y ~ -1, model = op_cov_ns_est, 
          data = df_data_ns, loc = c("x_coord", "y_coord"), 
          model_options = list(fix_theta1 = 0),  # Fix theta1 to 0
          parallel = TRUE)
```

```{r}
summary(fit_ns_fixed_theta1)
```

We can also fix the entire theta vector at once using the `fix_theta` parameter. This is particularly useful when we have strong prior knowledge about all the parameters:

```{r}
fit_ns_fixed_all <- rspde_lme(y ~ -1, model = op_cov_ns_est, 
          data = df_data_ns, loc = c("x_coord", "y_coord"), 
          model_options = list(fix_theta = c(0, 1, 1)),  # Fix the entire theta vector
          parallel = TRUE)
```

```{r}
summary(fit_ns_fixed_all)
```

Similarly, we can provide starting values for the entire theta vector with `start_theta`:

```{r}
fit_ns_start <- rspde_lme(y ~ -1, model = op_cov_ns_est, 
          data = df_data_ns, loc = c("x_coord", "y_coord"), 
          model_options = list(start_theta = c(0, 1, 1)),  # Starting values for theta vector
          parallel = TRUE)
```

```{r}
summary(fit_ns_start)
```

## Changing the type and the order of the rational approximation

We have three rational approximations available. The BRASIL algorithm 
[@Hofreither21](https://doi.org/10.1007/s11075-020-01042-0), and two "versions"
of the Clenshaw-Lord Chebyshev-Pade algorithm, one with lower bound zero
and another with the lower bound given in [@xiong22](https://doi.org/10.1080/10618600.2023.2231051).

The type of rational approximation can be chosen by setting the `type_rational_approximation`
argument in the `matern.operators` function. The BRASIL algorithm corresponds to the choice `brasil`,
the Clenshaw-Lord Chebyshev pade with zero lower bound and non-zero lower bounds
are given, respectively, by the choices `chebfun` and `chebfunLB`.

For instance, we can create an `rSPDE` object with a `chebfunLB` rational
approximation by

```{r}
op_cov_2d_type <- matern.operators(
  mesh = mesh_2d,
  m = 2,
  type_rational_approximation = "chebfunLB"
)
tau <- op_cov_2d_type$tau
```

We can check the order of the rational approximation with the `rational.order()`
function and assign a new order with the `rational.order<-()` function:

```{r}
rational.order(op_cov_2d_type)

rational.order(op_cov_2d_type) <- 1
```

Let us fit a model using the data from the previous example:

```{r}
fit_order1 <- rspde_lme(y ~ -1, model = op_cov_2d_type, 
          data = df_data_2d,repl = repl,
          loc = c("x_coord", "y_coord"), parallel = TRUE)
```

```{r}
summary(fit_order1)
```

Let us compare with the true values:

```{r}
print(data.frame(
  sigma = c(sigma, fit_order1$alt_par_coeff$coeff["sigma"]), 
  range = c(range, fit_order1$alt_par_coeff$coeff["range"]),
  nu = c(nu, fit_order1$alt_par_coeff$coeff["nu"]),
  row.names = c("Truth", "Estimates")
))
```

Finally, we can check the type of rational approximation with the `rational.type()`
function and assign a new type by using the `rational.type<-()` function:

```{r}
rational.type(op_cov_2d_type)

rational.type(op_cov_2d_type) <- "brasil"
```

Let us now fit this model, with the data from the previous example, with `brasil` rational approximation:

```{r}
fit_brasil <- rspde_lme(y ~ -1, model = op_cov_2d_type, 
          data = df_data_2d,repl = repl,
          loc = c("x_coord", "y_coord"), parallel = TRUE)
```

```{r}
summary(fit_brasil)
```

Let us compare with the true values:

```{r}
print(data.frame(
  sigma = c(sigma, fit_brasil$alt_par_coeff$coeff["sigma"]), 
  range = c(range, fit_brasil$alt_par_coeff$coeff["range"]),
  nu = c(nu, fit_brasil$alt_par_coeff$coeff["nu"]),
  row.names = c("Truth", "Estimates")
))
```

## References