Let us set up the notations first. Suppose a there exists a partition of a region D∈R2 (e.g., a city). This partition is denoted by Ai, i=1,…,n. Moreover, there exists another partition of the same city, denoted Bj, where j=1,…,m. These partitions can be seen as two different administrative divisions within the same city. It is common for different government agencies to release data for different divisions of a same city, country, or state.
Assume we observe a random variable Y(⋅) at each region Ai and we are interested in predict/estimate this variable in each of the regions Bj. Now suppose the random variable Y(⋅) varies continuously over D and is defined as follows Y(s)=μ+S(s)+ε(s),s∈D⊂R2. where S(⋅)∼GP(0,σ2ρ(⋅;ϕ,κ)) and ε(⋅)i.i.d.∼N(0,σ2ρ(⋅;ϕ,κ)), with S and ε independent of each other. For now, let’s make the unrealistic assumption that all those parameters are known. Then, our assumption is that the observed data is as follows Y(Ai)=1|Ai|∫AiY(s)ds=1|Ai|∫Ai[μ+S(s)+ε(s)]ds=μ+1|Ai|∫AiS(s)ds+1|Ai|∫Aiε(s)ds, where |⋅| returns the area of a polygon. Furthermore, it can be shown that (using Fubini’s Theorem and some algebraic manipulation) Cov(Y(Ai),Y(Aj))=σ2|Ai||Aj|∫Ai×Ajρ(‖ where \rho(\cdot ; \, \phi, \kappa) is a positive definite correlation function. Now, let {\rm R}_{\kappa}(\phi) be a correlation matrix such that {\rm R}_{\kappa}(\phi)_{ij} = \frac{1}{\lvert A_i \rvert \lvert A_j \rvert} \int_{A_i \times A_j} \rho( \lVert \mathbf{s} - \mathbf{s}' \rVert; \, \phi, \kappa ) \, {\rm d} \mathbf{s} \, {\rm d} \mathbf{s}', thus, Y(A_1, \cdots, A_n) \sim {\rm N}( \mu \mathbf{1}_n, \sigma^2 {\rm R}_{\kappa}(\phi) + \tau {\rm diag}(\lvert A_1 \rvert^{-1}, \ldots, \lvert A_1 \rvert^{-1})). Then, if we assume (Y^{\top}(A_1, \cdots, A_n), Y^{\top}(B_1, \cdots, A_m)^{\top}) to be jointly normal, we use can the conditional mean of Y^{\top}(B_1, \cdots, A_m)^{\top} given Y^{\top}(A_1, \cdots, A_n) to estimate the observed random variable in the partition B_1, \ldots, B_m.
Now, suppose the parameters \boldsymbol{\theta} = (\mu, \sigma^2, \phi, \tau) are unknown. The Likelihood of Y(A_1, \ldots, A_n) can still be computed.
In particular, if we use the parametrization \nu = \tau / \sigma^2, we have closed form for the Maximum Likelihood estimators both for \mu and \sigma^2. Thus, we can optimize the profile likelihood for \phi and \nu numerically. Then, we resort on conditional Normal properties again to compute the predictions in a new different set of regions.
Areal interpolation is a nonparametric approach that interpolates Y(A_i)’s to construct Y(B_j)’s. Define an m \times n matrix \mathbf{W} = \{ w_{ij} \}, where w_{ij} is the weight associated with the polygon A_i in constructing Y(B_j). The weights are w_{ij} = \lvert A_i \cap B_j \rvert / \lvert B_j \rvert (Goodchild and Lam 1980; Gotway and Young 2002). The interpolation for \hat Y(B_1, \ldots, B_m) is constructed as \begin{equation} \label{eq:np-est} \hat{Y}(B_1, \ldots, B_m) = \mathbf{W} Y(A_1, \ldots, A_n). \end{equation} The expectation and variance of the predictor are, respectively, {\rm E}[\hat{Y}(B_1, \ldots, B_m)] = \mathbf{W} {\rm E}[Y(A_1, \ldots, A_n)] and \begin{equation} \label{eq:np-matcov} \textrm{Var}[\hat{Y}(B_1, \ldots, B_m)] = \mathbf{W} \textrm{Var}[Y(A_1, \ldots, A_n)] \mathbf{W}^{\top}. \end{equation} In practice, the covariance matrix \textrm{Var}[Y(A_1, \ldots, A_n)] is unknown and, consequently needs to be estimated.
The variance each predictor \text{Var}[\hat Y(B_i)] is needed as an uncertainty measure. It relies on both the variances of Y(A_j)’s and their covariances: \begin{align} \label{eq:np-single-var} \textrm{Var}[\hat{Y}(B_i)] = \sum_{i = 1}^n w^2_{ij} \textrm{Var} \left [ Y(A_i) \right ] + 2 \sum_{l \neq i} w_{ij} w_{il} \textrm{Cov} \left[ Y(A_i), Y(A_l) \right]. \end{align} The variances are often observed in survey data, but the covariances are not. For practical purpose, we propose an approximation for \textrm{Cov}[ Y(A_i), Y(A_l)] based on Moran’s I, a global spatial autocorrelation. Specifically, let \rho_I be the Moran’s I calculated with a weight matrix constructed with first-degree neighbors. That is, \rho_I is the average of the pairwise correlation for all neighboring pairs. For regions A_i and A_l, if they are neighbors of each other, our approximation is \begin{align} \label{eq:cova} \textrm{Cov} \left[ Y(A_i), Y(A_l) \right] = \rho_I \sqrt{\text{Var}[Y(A_i)] \text{Var}[Y(A_l)]}. \end{align} The covariance between non-neighboring A_i and A_l is discarded. The final uncertainty approximation for \textrm{Var}[\hat{Y}(B_i)] will be an underestimate. Alternatively, we can derive, at least, an upper bound for the variance of the estimates by using a simple application from the Cauchy–Schwartz inequality, in which case, \rho_I is replaced with~1.