Appendix

8.1 SPDE and Triangulation

In order to fit LGM type of models with a spatial component INLA uses SPDE (Stochastic Partial Differential Equations) approach. Suppose that is it given have a continuous Gaussian random process (a general continuous surface), what SPDE does is to approximate the continuous process by a discrete Gaussian process using a triangulation of the region of the study. Let assume to have a set of points defined by a CRS system (Latitude and Longitude, Easthings and Northings), and let assume that the object of the analysis is to estimate a spatially continuous process. Instead of exploiting this property as a whole, it is estimated the process only at the vertices of this triangulation. This requires to put a Triangulation of the region of the study on top the the process and the software will predict the value of the process at its vertices, then it interpolates the rest of the points obtaining a “scattered” surface.

Figure 8.3: Triangulariation weights and associated process value, Moraga (2020) source

Imagine to have a point a location X laying inside a triangle whose vertices are S1,S2andS3. SPDE operates by setting the values of the process at location x equal to the value of the process at their vertices with some weights, and the weights are given by the Baricentric coordinates (BC). BC are proportional to the area at the point and the vertices. Let assume to have a piece of Triangulation A and let assume that the goal is to compute the value at location X. X, as in formula above A, would be equal to S1, multiplied by the area A1 dived by the whole triangle area (A) + S2 multiplied by the area A2 divided by A + S3, multiplied by the area A3 dived by (A. This would be the value of the process at location X given the triangulations (number of vertices). SPDE is actually approximating the value of the process using a weighted average of the value of the process at the triangle vertices which ir proportional to the area of the below triangle. In order to do this within INLA 4 it is needed also a Projection Matrix , figure 8.4. The Projection matrix maps the continuous GRMF (when it is assumed a GP) from the observation to the triangulation. It essentially assigns the height of the triangle for each vertex of the Triangulation to the process. Matrix A, whose dimensions are Ang. It has n rows a g columns, where n is the number of observations and g is the number of vertices of the triangulation. Each row has possibly three non-0 values, right matrix in figure 8.4, and the columns represent the vertices of the triangles that contains the point. Assume to have an observation that coincides with a vertex S1 of the triangle in 8.3, since the point is on top of the vertex (not inside), there are no weights (A1=A) and 1 would be the value at A(1,1) and 0 would be the rest f the values in the row. Now let assume to have an observation coinciding with S3 (vertex in position 3), then the result for A(2,3) would be 1 and the rest 0. Indeed when tha value is X that lies within one of the triangles all the elements of the rows will be 0, but three elements in the row corresponding of the p osition of the vertices 1=.2,2=.2andg=.6, as a result X will be weighted down for the areas.

Figure 8.4: Projection Matrix to map values from tringulation back to the GP, Moraga (2020) surce

8.2 Laplace Approximation

Michela Blangiardo Marta; Cameletti (2015) offers an INLA focused intuiton on how the Laplace approximation works for integrals. Let assume that the interest is to compute the follwing integral, assuming the notation followed throughout the analysis:

π(x)dx=exp(logf(x))dx Where X is a random variable for which it is specified a distribution function π. Then by the Taylor series expansions (Fosso-Tande 2008) it is possible to represent the logπ(x) evaluated in an exact point x=x0, so that:

(8.1)logπ(x)logπ(x0)+(xx0)logπ(x)x|x=x0+(xx0)222logπ(x)x2|x=x0

Then if it is assumed that x0 is set equal to the mode x of the distribution (the highest point), for which x=argmaxxlogπ(x), then the first order derivative with respect to x is 0, i.e. logf(x)x|x=x=0. That comes natural since once the function reaches its peak, i.e. the max then the derivative in that point is 0. Then by leaving out the first derivative element in eq. (8.1) it is obtained:

logπ(x)logπ(x)+(xx)222logπ(x)x2|x=x

Then by integrating what remained, exponantiating and taking out non integrable terms,

(8.2)π(x)dxexp(logπ(x))exp((xx)222logπ(x)x2|x=x)dx

At this point it might be already intuited the expression (8.2) is actually the density of a Normal. As a matter of fact, by imposing σ2=1/2logf(x)x2|x=x, then expression (8.2) can be rewritten as:

(8.3)π(x)dxexp(logπ(x))exp((xx)22σ2)dx

Furthermore the integrand is the Kernel of the Normal distribution having mean equal to the mode x and variance specified as σ2. By computing the finite integral of (8.3) on the closed neighbor [α,β] the approximation becomes:

αβf(x)dxπ(x)2πσ2(Φ(β)Φ(α)) where Φ() is the cumulative distribution function corresponding value of the Normal disttribution in eq. (8.3).

For example consider the Chi-squared χ2 density function (since it is easily differentiable and Gamma Γ term in denominator is constant). The following quantities of interest are the logπ(x), then the logπ(x)x, which has to be set equal to 0 to find the integrating point, and finally logπ(x) 2logπ(x)x2. The χ2 pdf, whose support is xR+ and whose degrees of freedom are k:

χ2(x,k)=x(k/21)ex/22k/2Γ(k2),x0

for which are computed:

logf(x)=(k/21)logxx/2 Ths single variable Score Function and the x=argmaxxlogπ(x),

logπ(x)x=(k/21)x12=0solving for 0 (k/21)=x/2moving addendsx=(k2)

And the Fisher Information in the x (for which it is known σ2=1/f(x))

2logπ(x)x2=(k/21)x2substitutingx=(k/21)(k2)2=2(k/21)2(k2)2multiply by 2 =(k2)2(k2)2=2(k2)=σ2change of sign and inverse 

finally,

χ2LAN(x=k2,σ2=2(k2)) Assuming k=8,16 degrees of freedom χ2 densities against their Laplace approximation, the figure 8.5 displays how the approximations fits the real density. Integrals are computed in the case of k=8 in the interval (5,7), leading to a very good Normal approximation that slightly differ from the orginal CHisquared. The same has been done for the k=16 case, whose interval is (12,17) showing other very good approximations. Note that the more are the degrees of freedom the more the chi-squared approximates the Normal leading to better approximations.

Chisquared density function with parameter $k = 8$ (top) and $k = 16$ (down) solid line. The point line refers to the corresponding Normal approximation obtained using the Laplace method

Figure 8.5: Chisquared density function with parameter k=8 (top) and k=16 (down) solid line. The point line refers to the corresponding Normal approximation obtained using the Laplace method

π(ψ)ψ=ψ1,ψ2