Geostatistics : Past, Present and Future



4.26.2.1

Geostatistics : Past, Present and Future October 15, 2002

Mark D. Ecker, Department of Mathematics, University of Northern Iowa,

Cedar Falls, IA, 50614-0506

Keywords: Bayesian Methodology, Distribution-free techniques, Kriging, Likelihood-based analyses, Spatial Correlation, Spatial Prediction, Variogram

Contents

1. Introduction

2. Distribution-Free Methodology

3. Likelihood-Based Modeling

4. Model Based Prediction

5. Discussion and Future Directions

Glossary

Bayesian Methodology – A branch of likelihood-based analyses whereby

the model parameters are assumed random, unknown quantities.

Change of Support – Predicting responses at areal units when samples

are taken at (essentially) point support. More generally, when

prediction at one sampling unit is desired and sampling was

conducted at another.

Distribution-Free Techniques – Geostatistical analyses conducted

assuming no (sampling) distributional assumption for the observed

spatial data.

Ergodicity – Using spatial averages (those formed from averaging

responses over many locations) in lieu of probabilistic averages (those formed from averaging over multiple replicates of the spatial process).

Empirical Variogram – The sample or data-based estimate of the strength of

the association (variability) between between responses at pairs of sites

as a function of their separation distance.

Geometric Anisotropy – The simplest departure from isotropy whereby

the correlation structure for sites in the plane is modeled as an ellipse, in contrast to circular, isotropic correlation functions.

Geostatistics – The study of spatial data where the spatial correlation is

modeled through the variogram. Focus of geostatistical analyses is

on predicting responses at unsampled sites and for areal units.

Isotropy – The spatial process has constant variability in any direction

examined given a fixed distance.

Kriging – Spatial prediction at an unsampled site using the distribution-

free methodology.

Likelihood-Based Analyses – Geostatistical analyses assuming the

observed spatial data follow a specific sampling distribution (for

example, normality is the most common)

Nugget – The extrapolated value of the variogram from the minimum

observed distance in the sample to zero.

Posterior Distribution – Under the Bayesian paradigm, the distribution

of the model parameters integrating their prior distribution with the

sampling distribution of the observed data using Bayes Rule.

Prior Distribution – Under the Bayesian paradigm, the distribution of

the model parameters before any data have been collected.

Range – The distance at which a variogram reaches its sill (or effectively

reaches its sill).

Sill – The variability of the spatial process.

Spatial Correlation – The concept that responses at pairs of locations are

more similar for pairs closer in distance (and possibly direction) than

when pairs are further apart.

Spatial Trend – The mean structure of the spatial process expressed as a

polynomial in the spatial coordinates.

Stationarity – The spatial process has a correlation or variability

structure that does not depend upon location, but rather, upon relative distance and direction of the sites.

Variogram – The variability of the spatial process expressed as a function

of distance.

Summary

Geostatistical analyses were first developed in the 1950's as a result of interest in areal or block averages for ore reserves in the mining industry. Today, variogram estimation and spatial prediction (kriging) span all sciences were data exhibit spatial correlation. Theoretical properties of the spatial process are addressed under the distribution-free and likelihood-based perspectives. Strengths and weaknesses of these two dominant methodologies for modeling spatial correlation and predicting responses at unsampled sites and areal units are explored. Examples of hot spot detection and areal prediction show the flexibility of the Bayesian paradigm. Current bottlenecks and future directions in the field of geostatistics are discussed.

1. Introduction

The science of geostatistics has grown tremendously from its mining roots approximately 50 years ago to encompass a wide range of disciplines. Olea (1991, p. 31) defines geostatistics as “the application of statistical methods … for use in the earth sciences, particularly in geology”. Geostatistical analyses are employed in fields where the data are viewed as (essentially) point observations and prediction (at unsampled sites or at areal units) is desired. Today, geostatistical studies are conducted with hydrological data (Rouhani and Myers, 1990; Kitanidis, 1996), in mining applications (Kelker and Langenburg, 1997; Schuenemeyer and Power, 2000), air quality studies (Guttorp, Meiring and Sampson, 1994; Krajewski, Molinska and Molinska, 1996), soil science data (Webster and Oliver, 1992; Raspa and Bruno, 1993), biological applications (Ecker and Gelfand, 1999; Ritvo, Sherman, Lawrence and Samocha, 2000), economic housing data (Basu and Thibodeau, 1998; Gelfand, Ecker, Knight and Sirmans, 2002), and constructing environmental monitoring networks (Wikle and Royle, 1999).

One purpose of this chapter is to acknowledge the fundamental contributions of two geostatistical pioneers, D. G. Krige and G. Matheron, who formulated the underpinnings of geostatistics and kriging approximately 40 years ago. This author will not focus much energy on the history of geostatistics; other authors, in particular, Cressie (1990) have already retraced its history quite well. Rather, this author presents his perspective on the current status of the science of geostatistics. What are the strengths and weaknesses of the dominant methodologies used in undertaking such an analysis? What are the bottlenecks that current researchers are encountering? What is the future for the field of geostatistics?

Mining variables such as deposits of ore are often highly concentrated in veins. When these data are collected at locations or sites at a fixed time, a purely spatial analysis revolves around the tendency for pairs of sites closer in space (distance and possibly orientation) to have more similar responses than that for pairs further apart, i.e., the data exhibit spatial correlation. Predicting block ore grades (prediction of an areal block average) from a sample taken at point support was of primary interest to D. G. Krige in the 1950’s. Krige predicted areal gold concentrations in South Africa based upon large amounts of data which exhibited strong positive correlation (Cressie, 1990, p. 246). In honor of D. G. Krige's contribution, G. Matheron coined the term kriging to refer to (optimal) prediction of responses at unsampled locations. Matheron (1963) is first to publish a detailed exposition on geostatistics and kriging. He recognized that prediction is only one component of the analysis and should not be viewed as the only goal of a geostatistical analysis. As with any statistical analysis, the quality of the geostatistical prediction depends upon how well the postulated model explains the observed data.

In this spirit, two fundamental goals exist in the analysis of clustering or spatially correlated data. First, accurately explain the clustering mechanism, i.e., develop a model for the correlation structure. Second, use this proposed model to predict responses at unsampled locations or at areal units. Standard statistical techniques (such as ordinary least squares regression) fail to incorporate the clustering effect that spatial data tend to exhibit. In fact, many statistical analyses ignore the geographic coordinates entirely in treating the data as an independent and identically distributed sample rather than one (spatially correlated) vector of observations.

At the explanation stage, one needs to accurately model the spatial correlation structure. The primary method of extracting this spatial clustering is through the variogram. The variogram is at the heart of geostatistics. It expresses the variability of pairs of spatially indexed observations as a function of their separation vector. For a particular spatial region D [pic], one needs to address two basic questions: does the spatial process underlying the observed data in the region exhibit spatial correlation and if so, is this correlation equally strong (at a fixed distance) in every direction? If the answer is yes to both questions, then the spatial process is isotropic. There is an enormous body of literature whose focus is variogram modeling for isotropic spatial processes. Cressie (1993), Wackernagel (1998) and Stein (1999) are excellent current references. Under isotropy, the variogram or the spatial correlation is only a function of the Euclidean distance between sites. In addition, techniques for studying the most common departure from isotropy shall be discussed.

At the prediction stage, the standard technique of kriging incorporates the association structure (the variogram) to optimally solve a system of linear equations which predict the value at an unsampled location. Kriging produces the best linear unbiased predictor (BLUP) for the unsampled site. It maintains two distinct advantages over other interpolation techniques such as inverse distance weighting. First, standard interpolation techniques do not include any information about the spatial correlation structure. Secondly, kriging allows a prediction variability estimate to assess, under certain conditions, its prediction accuracy. Methodologies for predicting responses at unsampled sites and areal units shall be explored.

The primary modeling decision for geostatistical analyses involves choosing either the parametric family of distributions from which the responses arise or to take a distribution-free approach. While not choosing a sampling distribution for the observed data has certain advantages, it does induce limitations in the resulting statistical inference. Under the parametric or likelihood-based approach, a Gaussian random field is traditionally assumed, partly for convenience and partly for the flexibility that this family offers; however, many environmental responses are binary or positive with right-skewed distributions (such as water pollution concentrations). In the latter case, lognormality is often assumed as a default option, but other transformations to normality can be employed.

This chapter is organized as follows. Section 1 is the introduction. Section 2 outlines the distribution-free geostatistical methodology and examines its strengths and weaknesses while section 3 does the same for the parametric or likelihood-based techniques. Section 4 explores model based prediction of unsampled locations and of areal blocks. Finally, section 5 outlines some future directions in the field of geostatistics.

[pic]

2. Distribution-Free Methodology

The observed data, [pic] [pic], are collected at N locations or sites [pic]. Theoretical assumptions such as ergodicity, stationarity, isotropy, continuity and differentiability of the random field, the population (sampling) distribution, and prior distributions for various components are customarily developed for features of the underlying spatial process, [pic] These assumptions are needed to validate statistical inference, however, they tend to be difficult to verify given only the observed data.

The first assumption is that of ergodicity, which allows inference to proceed using only one (vector) replicate. By making this ergodicity assumption (Zhan, 1999), one uses spatial averages (those formed by averaging responses over sites in the region, D) in lieu of probabilistic or ensemble averages (those formed by averaging over multiple (vector) realizations of the spatial process). Unfortunately, the observed data cannot be used to check the validity of the ergodicity assumption (see Myers, 1989, pp. 351-352).

Much of the focus of geostatistics centers on developing models for the correlation structure of the observed data. The concepts of stationarity (both intrinsic and second-order stationarity) and isotropy provide theoretical underpinnings for modeling the local source of variability. Intrinsic stationarity assumes that for arbitrary locations [pic] and [pic] in [pic],

[pic] (1)

where [pic]is the variogram and [pic]is the semivariogram. Stationarity assumes that the mean and variance of the spatial process, [pic] do not depend on the exact locations [pic] and [pic]; the mean is assumed constant and the variability only depends on the separation vector, [pic] (i.e., the spatial process, [pic] is translation invariant). If the spatial variability for a fixed distance in all directions is the same (i.e., the spatial process has rotational invariance), then the spatial process is isotropic and [pic] [pic] where [pic] is the Euclidean distance between sites [pic] and [pic]. A stronger form of stationarity (second-order or weak) assumes that (Matheron, 1963),

[pic] (2)

where, under isotropy, [pic]. Smith (1996) defines a process to be homogeneous if it satisfies both isotropy and second-order stationarity.

Cressie (1988) demonstrates that the class of intrinsically stationary spatial processes strictly contains those of second order, i.e., C determines the variogram [pic]. However, [pic] does not determine C, since stationary increments for [pic] do not imply that [pic] is stationary. In such case, C(0) need not exist, hence the variogram is theoretically unbounded. Under homogeneity, the relationship between the variogram and the correlation structure is given by [pic].

In practice, the assumption of stationarity is difficult to verify since it is put on the stochastic process, not on any realizations of process, i.e., the data. For example, the sample or data based (empirical) variogram is always bounded in practice, since it is a function of real data; however, if these data are realizations of a Brownian motion (see Cressie, 1993, p. 68), then the underlying theoretical variogram is linear, which is unbounded. Furthermore, one might potentially want to model the spatial locations in the mean and covariance structures of (2), however, identifying which component (or both) exhibits nonstationarity is not easily determined. “One person's deterministic mean structure is another person's correlated error structure" (Cressie, 1993 p. 114).

A convenient parameterization of the correlation structure in (2) is [pic] [pic] where [pic] is a valid correlation function and [pic], [pic] and [pic] are model parameters. Note that [pic] is a vector of model parameters that control the strength of spatial correlation and/or the smoothness of the resulting prediction of the spatial process (continuity and differentiability of the random field). The class of all valid correlation functions in [pic] may be expressed through Bochner’s Theorem (see Cressie, 1993, p. 84). Ecker and Gelfand (1997, p. 351) provide several one parameter correlation functional forms, [pic], commonly found in geostatistical analyses. Figure 1 displays their corresponding semivariograms. The exponential, Gaussian, rational quadratic (Cauchy) and spherical are monotonic correlation functions while the Bessel is a dampened sinusoidal correlation structure.

Figure 1 : Five commonly used semivariograms in geostatistical analyses.

[pic]

Common two parameter monotonic correlation structures include the Matern and general exponential. The Matern correlation function (Matern, 1986) is defined for [pic] as

[pic][pic] (4)

where [pic], [pic], [pic]is the standard gamma function and [pic] is a modified Bessel function of the third kind of order [pic] (Abramowitz and Stegun, 1965). The shape of the postulated variogram near the origin controls the differentiability of the random field, which dictates the smoothness of the resulting spatial prediction. For the Matern model, the parameter [pic] controls the smoothness of the associated random field; [pic] ensures continuous realizations and realizations are [pic] times (mean square) differentiable where [pic] is the integer ceiling function (Handcock and Stein, 1993, p. 406). Special cases of the Matern correlation function include the exponential ([pic]) and the Gaussian ([pic]). Fortran code that evaluates the Matern correlation function (for use with, for example, Compaq Visual Fortran Professional version 6.5 - including IMSL) is included in the appendix. The general exponential correlation function is defined for [pic]

[pic] (5)

where [pic] and [pic]. Special cases of the general exponential correlation structure include the exponential ([pic]) and the Gaussian ([pic]). Though (5) may be computationally easier to work with than (4), less appealing sample function behavior results. For [pic], continuous, but nondifferentiable realizations are obtained and when [pic], analytic realizations are observed. So, excluding [pic], the general exponential correlation structure places no mass on differentiable realizations, which induces fairly rough prediction over the region.

Adopting the Matern correlation structure in preference to the general exponential because of increased differentiability is a mechanistic decision. While Stein (1987) shows that the shape of the variogram near the origin (the derivative of the variogram approaching the origin) can be consistently estimated, the strong correlation between [pic] and [pic] for the Matern model would require an enormous amount of data near the variogram origin, i.e., at close spatial resolution, to discern the true underlying smoothness of the spatial process. As an illustration, Figure 2 provides a fairly smooth, twice differentiable Matern model ([pic]) which is nearly indiscernible, in particular near the variogram origin, from a non-differentiable general exponential model.

Under homogeneity, the theoretical variogram is bounded and the sill of the variogram is [pic] which represents the theoretical variance of the spatial process. Some monotonically increasing variogram forms, such as the spherical, achieve their sill exactly i.e. for finite d, while others (exponential, Gaussian and Cauchy or rational quadratic) reach their sills asymptotically. For monotonic variograms that reach their sill exactly, the range is defined to be the distance at which the variogram reaches its sill or equivalently the distance at which [pic]becomes zero. Hence, responses at sites separated by distances greater than the range are spatially uncorrelated. For asymptotically silled variograms, two points will only be spatially uncorrelated in the limit as [pic]. In such case, one identifies the effective range, a concept that is not uniquely defined. Two possible definitions are noted; one facilitates interpretation in the variogram space of the process and the other in the correlation space. The first (McBratney and Webster, 1986, p. 623) defines the effective range, [pic], as the distance where the variogram reaches [pic]. Thus, the effective range [pic] is achieved when the spatial correlation of the process diminishes to 5%. Mathematically, [pic] solves [pic]. A second definition (Cressie, 1993, pp. 67-68) of the effective range, [pic], is the distance at which the variogram reaches 95% of its sill. Then, [pic] solves [pic]. Note that [pic] would not exist if [pic], but this would be unlikely in practice. For the one parameter, asymptotically silled variograms given in Figure 1, the relationship between the scalar correlation parameter [pic] and the effective ranges [pic] and [pic] are presented in Table 1. It is obvious that [pic] with equality if [pic]. Many authors adopt [pic] and parameterize [pic] to be the effective range [pic]. Henceforth, the term range shall refer to either the effective range or the exact range. For the dampened sinusoidal Bessel variogram and other non monotonic variograms, the range is not defined. The nugget of the variogram is [pic] which need not be zero due to measurement error and/or a microscale effect resulting from extrapolation of the variogram from the minimum sampled distance, [pic], to the origin.

Figure 2: Illustrative comparison of Matern and general exponential variograms.

[pic]

Table 1: Range definitions for monotone asymptotic correlation functions

|Correlation |[pic] |[pic] |[pic] |

|Structure | | | |

|Exponential |[pic] |[pic] |[pic] |

|Gaussian |[pic] |[pic] |[pic] |

|Cauchy |[pic] |[pic] |[pic] |

Fitting semivariogram models under the distribution-free methodology can be viewed as a curve fitting exercise. For each of the [pic] pairs of observed locations [pic] and [pic] in D, a collective plot of each Euclidean distance, [pic], versus [pic], a point estimate of the variability for the pair, is the semivariogram cloud. The Matheron (1963) estimator of the semivariogram,

[pic] (3)

smoothes the semivariogram cloud by aggregating distances into K sets [pic] where [pic] for constants [pic] and [pic] with [pic]. The constants [pic] are typically chosen to be [pic] where [pic] is the lag or width of each distance bin [pic]. A plot of [pic] versus the midpoint of the interval [pic] or the average distance of all [pic] for [pic] is called the Matheron empirical semivariogram. Figure 1 displays the five aforementioned correlation functions overlaid on the Matheron empirical semivariogram for a dataset of scallop catches. Other variogram estimators exist (see Genton, 1998, pp. 328-330; Ecker and Gelfand, 1997, p. 361), however, use of the Matheron estimator is ubiquitous.

Since a correlation matrix must be positive definite (the associated variogram must be conditionally non-negative definite), only certain functions are allowable to model variograms. Hence, a valid theoretical variogram model (such as those from Figure 1) is typically fitted to the Matheron semivariogram by techniques (Cressie, 1993, section 2.6) such as least squares, generalized least squares or fitting by inspection. The standard variogram fitting algorithm of Cressie (1985) uses a weighted least squares procedure. However, it fails to account for the correlation of the empirical variogram values at two distinct lags, since the same [pic] would typically appear multiple times both within the same variogram lag and across several different lags. Genton (1998) has proposed an algorithm that closely approximates the correlation structure of the variogram at two distinct lags. In addition, by using the variogram cloud, M[pic]ller (1999) employs the exact variogram correlation structure to develop an iterative fitting algorithm.

The distribution-free approach to variogram fitting is widely used. Its principal advantage over parametric or likelihood-based methods, including a fully Bayesian approach, is that for variogram parameter estimation and subsequent spatial prediction, one only needs intrinsic stationarity, not some distributional assumption for [pic]. Not assuming a distribution for [pic] is appealing; however, it does introduce some limitations. First, parameter estimation is not invariant to the lag choice [pic] or equivalently to the choice of sets [pic] (see Curriero and Lele, 1999). Thus, it is possible to arrive at quite different parameter estimates for the same model using different lag values. To mitigate the lag invariance criticism, Webster and Oliver (1992) suggest gathering over 150 samples. For an isotropic spatial process, McBratney and Webster (1986, p. 630) claim that if the lag is small then the smoothing or regularization of the variogram “is small enough to be ignored”. Many ad hoc rules exist in the distribution-free methodology (see Lamorey and Jacobson, 1995, pp. 331-332). For example, Journel and Huijbregts (1978) suggest only using sets [pic] where [pic] is greater than 30 and discarding sets [pic] where [pic] because the variogram tends to be erratic for higher lags.

3. Likelihood-Based Modeling

In the likelihood-based framework, the variogram only specifies a single feature of the spatial process [pic]. Likelihood-based inference (including Bayesian) regarding arbitrary features of [pic] requires specification of the joint distribution of the data [pic]. Typically, [pic] is modeled (possibly after transformation) as multivariate normal with mean and covariance given by (2). More generally,

[pic] (6)

where the mean structure, [pic], has ith component [pic] and [pic] represents all covariance model parameters. When [pic], the spatial process [pic], under normality, has strict stationarity, namely[pic]

[pic][pic] for any arbitrary vector h. The covariance matrix [pic] where [pic] and [pic]is a valid correlation function as seen in section 2. Then likelihood-based techniques (Cressie, 1993, section 2.6) including maximum likelihood (ML), restricted maximum likelihood (REML) or minimum norm quadratic (MINQ) proceed to estimate all model parameters. The Bayesian methodology treats the unknown model parameters as random variables; it assigns prior distributions to the model parameters and updates these priors given the observed data to the posterior distribution using Bayes rule.

The multivariate normal model (6) can be expressed as a spatial random effects model (Diggle, Liang and Zeger, 1994)

[pic] (7)

where [pic] is a vector of spatial effects

independent of [pic] which corresponds to measurement error and/or the microscale effect. Note that the marginal model (6) can be viewed in a hierarchical setting by letting [pic] in (7). Then, the first stage is [pic], i.e., conditionally independent [pic]'s given [pic], while the second stage [pic] describes the spatial association. Hence, even with the “constant mean’’ marginal model (with [pic]), [pic] still provides distinct site level means for the [pic]'s. Obvious extension would have, as in Diggle, Tawn and Moyeed (1998), an exponential family member, such as the binomial or Poisson distribution, at the first stage. The hierarchical interpretation of (7) is useful for fitting models from a Bayesian perspective (see Gelfand, Ecker, Knight and Sirmans, 2002; Ecker and Gelfand, 2002).

The model (6) allows for nonstationarity of the mean structure, i.e., [pic] may be a function of the geographical locations in addition to/in lieu of non geographic site level covariates. From the form of [pic], there is a trade-off between the spatial modeling of the mean structure and of the correlation matrix. “Often this problem is resolved in a substantive application by relying on scientific or habitual reasons for determining the mean structure” (Cressie, 1993, p. 219). But what if this is not the case? Traditionally, one uses random effect models to “mop up” variability not captured by the mean structure; hence, suggesting a mean structure as rich as possible. Traditionally, a geographic mean structure in (6), i.e., a trend surface, is modeled as a polynomial in the spatial coordinates. But what order should the polynomial be? Indeed, why need it be a polynomial? “The choice of a particular trend model is most often arbitrary” (Journel and Rossi, 1989, p. 716).

The primary advantage of likelihood-based methodologies over distribution-free techniques is the routine fitting of complex models. A standard geostatistical model with a constant mean ([pic]) and a one parameter correlation function from Figure 1 has only four parameters. In contrast, hierarchical Bayesian models, such as those developed by Wikle, Berliner and Cressie (1998) may contain hundreds. Complex spatial phenomena may not be explained well with only a four parameter model. Also, from a likelihood-based perspective, one need not be concerned about lag invariance issues, nor about which estimator of the semivariogram to use; the Matheron estimator in (3) might be used only to display fitted results.

The likelihood-based methodology also allows straightforward extension to a model with nonstationary mean structure. Standard distribution-free variogram modeling assumes a constant mean either using (1) or (2). Often site level covariates are available and should be incorporated into the mean structure, which from a likelihood-based perspective is straightforward. However, from the distribution-free perspective, the entire theoretical framework upon which inference is based needs to be modified (see the topics of Universal Kriging and Intrinsic Random functions of order k in Cressie (1993)). One distribution-free proposal is to model in stages; assume stationarity of the residuals from an ordinary least squares (OLS) or median polish fit of the mean structure (Cressie, 1993, pp. 46-51) and treat these residuals as a fresh dataset to explore for spatial correlation.

If the Bayesian methodology is adopted, then prior distributions are assigned to each model parameter which are subsequently updated with the observed data to the posterior distribution. Since these posterior distributions are typically analytically intractable, simulation based techniques, such as Markov Chain Monte Carlo (MCMC) or Importance Sampling Densities (ISD) are used to draw samples from the posterior distribution of each model parameter. One fundamental criticism of the Bayesian formulation over the traditional likelihood-based methodology is the inclusion of prior information. Why should prior information be included and how, exactly, does one do so? Lele and Das (2000) provide guidance on translating expert opinion into prior distributions. As a rule of thumb, one should perform a sensitivity analysis to assess the level of dependence of the overall inference on the assumed priors. Berger, De Oliveira and Bruno (2001) discuss methodologies for prior construction and provide an automatic or noninformative prior for spatial problems. Nevertheless, some practictioners find deriving the priors required for Bayesian inference unappealing and arbitrary. However, the scope of Bayesian inference (as shall be demonstrated in Section 4) is enticing.

An advantage to the Bayesian paradigm is that the variance estimates for all model parameters avoid, possibly inappropriate, asymptotic inference that is associated with other likelihood-based approaches. For example, the asymptotic properties of the maximum likelihood (ML) estimators are not well understood for dependent data under infill asymptotics. Briefly, infill asymptotics (Cressie, 1993, p. 101) involves saturating a fixed geographical region D with samples and exploring the asymptotic properties of the likelihood-based estimators. The traditional properties of the ML estimator such as consistency and asymptotic efficiency (see Cressie, 1993, pp. 481-486) occur under increasing domain asymptotics, where [pic] as [pic]. As an example with data in [pic] under infill asymptotics, Stein (1987) shows that the derivative of the semivariogram approaching the origin ([pic]) is the only quantity that can be consistently estimated.

An additional criticism of likelihood-based methods involves the fundamental assumption of normality for the spatial process. It can be argued that in the presence of spatial correlation and with only one (vector) observation, normality cannot be proven nor disproven. Nevertheless, binary data and positive, right-skewed environmental data are examples that make the model (6) untenable. Two extensions to non-Gaussian data are available. As described earlier, Diggle, Moyeed and Tawn, (1998) allow for an exponential family member, such as the binomial, for the distribution of the response. An alternative modeling strategy developed by De Oliveira, Kedem and Short (1997) and De Oliveira and Ecker (2002) assumes Gaussianity only after a suitable member from a parametric family of transformations is applied. Specifically, for some [pic], (6) is extended to

[pic] (8)

An illustrative choice for [pic] is the Box-Cox family of transformations, namely

[pic] [pic] where [pic] represents the lognormal model and [pic] is the Gaussian. Attractively under (8), the only distributional assumption needed is that the original data are transformable to normality; yet the full power of the likelihood-based methodology is available. Furthermore, from the likelihood-based perspective, one need not know the transformation parameter [pic] in (8) a priori; it is simply another model parameter to be estimated. Even the Diggle, Tawn and Moyeed (1998) extension of (6) must still decide the response distribution of the original data before inference proceeds.

Another strength of the likelihood-based approach is simultaneous estimation of all model parameters. The distribution-free technique of modeling the spatial correlation of residuals from an OLS regression fit, as described earlier, is an example fitting in stages, rather than simultaneously estimating all model parameters. As another example, the most common departure from isotropy under stationarity is that of geometric anisotropy (Zimmerman, 1993). Specifically, under isotropy [pic], for [pic] where [pic] is the distance from [pic] to [pic] in the kth spatial dimension, is extended to [pic] where B is a positive definite matrix. For sites in the plane, [pic], geometric anisotropy involves modeling elliptical correlation structures in contrast to the circular, isotropic correlation contours. Under geometric anisotropy, for example, the distribution-free approach of Borgman and Chao (1994) estimates the matrix B first and subsequently the variogram parameters. From the Bayesian perspective, Ecker and Gelfand (1999) simultaneously fit all the geometrically anisotropic model parameters.

4. Model Based Prediction

Let [pic] denote unsampled sites where prediction is desired and

[pic] is the response vector to be predicted. Then under the likelihood-based methodology, form

[pic] (9)

with [pic], [pic] where [pic]is the indicator function, i.e., [pic] equals one when x is true, otherwise [pic] is zero and [pic]. Then

[pic] (10)

where [pic] and [pic]. If [pic] and [pic] are known, the optimal predictor is the conditional expectation [pic]. When, as is the case in practical problems, [pic] and [pic] are unknown, then the distribution-free technique of kriging plugs in estimates [pic] and [pic] to calculate [pic](Kitanidis, 1986). Essentially, kriging at an unsampled site, [pic], produces the best linear unbiased predictor (BLUP), [pic] where [pic]’s are chosen so that [pic]. The weights are arrived at by solving a system of linear kriging equations assuming the estimated variogram parameters are known (see Wackernagel, 1998, pp. 83-85).

From the Bayesian perspective, the predictive distribution averages over the (posterior) distribution of the model parameters

[pic] (11)

where [pic]is the posterior distribution of model parameters. Adopting the Bayesian perspective, Le and Zidek (1992), Handcock and Stein (1993), Handcock and Wallis (1994), De Oliveira, Kedem and Short (1997) and Gaudard, Karson, Linder and Sinha (1999) each focus on prediction under isotropy. If [pic] is known or if [pic] where [pic] is known and [pic] is unknown, then Kitanidis (1986) provides analytical results for (11). Otherwise, numerical evaluation of (11) is required, whereby the mean of [pic]is typically used as the point estimate for [pic].

One of the primary strengths of the Bayesian methodology is the flexibility of the predictive inference. As an example, De Oliveira and Ecker (2002) consider the problem of hot spot detection for data that contain a spatial trend. A hot spot is a generic term referring to locations where the spatial process takes values that are, in some way, extreme. Their goals are to detect areas of extreme nitrogen concentration in the Chesapeake Bay. The first goal is to identify absolute hot spots. If [pic] for a specific threshold value, [pic], is too high (greater than some threshold probability, [pic]) then [pic] is declared an absolute hot spot. Alternative definitions of absolute hot spots depend upon how one might summarize the predictive distribution for [pic]; one could quantify [pic] an absolute hot spot if the median of[pic] exceeds c, or if the upper [pic] prediction interval for [pic] is larger than [pic].

In addition, because some parts of the region may be more ecological fragile than others, a site might be a hot spot relative to its neighbors, but have response much smaller than [pic]. In which case, [pic] and/or [pic] might depend upon location [pic]. Thus, one can define [pic] as a relative hot spot if [pic] when a specific value of a threshold function, [pic], is too high ([pic]). If [pic] is not known a priori, then one needs to decide upon a “reference condition”. “By examining the data from the analysis of samples collected within the study area, an envelope of reference conditions would emerge for comparisons with candidate hot spots” (Long and Wilson, 1997). De Oliveira and Ecker (2002) use the mean function of the original process (the spatial trend) as the reference condition, i.e., [pic] for some [pic] where [pic] is [pic] in (6) with only geographic covariates. Again, depending on how one summarizes a posterior distribution, alternative definitions for a relative hot spot may include declaring [pic] a relative hot spot if the upper [pic] prediction interval for [pic] exceeds [pic].

The statistical motivation for relative hot spots is derived from (7) under normality of [pic]; [pic] being larger than d is equivalent to [pic]. Thus, from the Bayesian perspective, evaluating such predictive probabilities, given posterior distributions from model parameters, is straightforward. Such inference under other likelihood-based techniques and traditional distribution-free methods is, at best, challenging.

Another illustration of the predictive power of Bayesian inference involves the change of support issue, which was D.G. Krige’s original problem of using (essentially) point samples to infer about the global mean for some specified region. In moving from point support to area support, the variability of the global mean decreases as the support size increases i.e., averaging over larger areas decreases variability (Cressie, 1993, section 5.2). One might be interested in block kriging or detecting areal hot spots, i.e., given a fixed region [pic] one may desire inference of the form

[pic] (12)

where [pic] represents the volume of the region [pic]. Note that block kriging would involve numerous [pic]’s. Wackernagel (1998, pp. 86-87) demonstrates how to modify the distribution-free kriging equations to predict [pic]. Lahiri, Kaiser, Cressie and Hsu (1999) use a distribution-free subsampling procedure to evaluate functionals where (12) is a special case. From the likelihood-based perspective, one can form, analogous to (9), (Haylock and O'Hagan, 1996; Gelfand, Zhu and Carlin, 2001)

[pic] (13)

where [pic]=[pic]=[pic] =[pic] and Var([pic]= [pic]=[pic]. Also, M is a column vector with elements [pic] with ith element [pic]=[pic]. Analytic results for these integrals are available for judicious choices of the functional form of [pic], otherwise numerical approximations are needed.

Analogous to (10), form the conditional distribution [pic] where [pic] and [pic]. Then analogous to (11), the Bayesian predictive distribution for [pic] is

[pic] (14)

where [pic]is the posterior distribution. As before, given draws of [pic] and [pic] from the posterior distribution, evaluation of (14) is conceptually straightforward.

5. Discussion and Future Directions

Until very recently, one of the major limitations in geostatistical analyses has been the availability of mainstream statistical computer packages to conduct such analyses. The S-Plus SpatialStats module (Kaluzny, Vega, Cardoso and Shelly, 1998) and SAS Proc Mixed (Littell, Milliken, Stroup and Wolfinger, 1996, chapter 9) can routinely analyze geostatistical data from a distribution-free perspective. For the likelihood-based inference, an add-on to the software package BUGS (Bayesian inference Using Gibbs Sampling, see mrc-bsu.cam.ac.uk/bugs) called GeoBUGS is currently being developed to fit spatial models, however, most likelihood-based modelers write their own computer programs.

One bottleneck in likelihood-based analyses is the sheer amount of available data. Few likelihood-based analyses can handle more than several hundred observations; a likelihood-based analysis with even 1000 observations in not practical today. Essentially, in evaluating the likelihood, one needs to invert the N[pic]N covariance matrix numerous times; an order [pic] operation in terms of available computing resources and time. The problem is even more acute for iterative Bayesian MCMC techniques such as Gibbs sampling, which might require hundreds of thousands of such matrix inversions. Recently, Gelfand and Ravishanker (1998) have proposed replacing matrix inversion with simulation using an importance sample as a first step in addressing this issue. However, more research and better “number crunching” computing power is required. For very large data sets, such as those analyzed by Huang, Cressie and Gabrosek (2000), the distribution-free methodology is the only option for geostatistical modeling.

While improved geostatistical models for, for example, nonstationary (see Sampson and Guttorp, 1992) and anisotropic (Ecker and Gelfand, 2002) spatial processes are certainly needed, the future of geostatistics lies in spatio-temporal modeling. Today, spatio-temporal models are in their infancy; see Kyriakidis and Journel (1999) for a review. Spatial models provide a “snapshot in time”; observing and modeling the process over time is in concordance with fundamental goals of environmental monitoring. For example, one might be concerned with the potential for the deterioration of water quality in the future, in addition to what exactly is the concentration now. Seasonality, temporal trends and the evolution of the spatial process cannot possibly be captured in a purely spatial analysis. Numerous spatial analyses over time could be entertained, but one might expect the spatial process to be somewhat similar today to what it was just a short time ago. Hence, unified spatio-temporal models can be used to efficiently explain observed data and predict at unsampled sites or areal units in the future. Indeed, considerable resources are already moving in the direction of spatio-temporal geostatistical modeling (see Christakos, 2000).

6. Appendix

Fortran subroutine using IMSL to evaluate the Matern correlation function, (4).

Main program call (nu = Smoothness Parameter, phi=Range Parameter):

corr = matern(nu, phi, dist)

Fortran Subroutine:

double precision function matern(nu, phi, dist)

integer nin, max, k

parameter(max=100)

double precision bs(max), x, xnu

double precision nu, phi, dist

double precision two, const, bsk, gam

xnu = nu - int(nu)

nin = int(nu) + 1

x = dist*phi

two = 2

call dbsks(xnu, x, nin, bs)

c ------------------------------------------------------------

c THIS CODE EVALUATES THE INTERATIVE DBSKS IMSL SUBROUTINE

c do 10 k=1,nin

c write(7,99) xnu+k-1, x, bs(k)

c99 format(' K sub ', f6.3, ' (', f6.3, ') = ', f15.10)

c10 continue

c ------------------------------------------------------------

bsk = bs(nin)

gam = dgamma(nu)

const = two**(nu-1)

matern = 1/const * 1/gam * (x)**nu * bsk

return

end

7. References

Abramowitz, M and Stegun, I.A. (1965). Handbook of Mathematical

Functions. New York: Dover.

Basu, S. and Thibodeau, T.G. (1998). Analysis of Spatial Correlation in House

Prices. Journal of Real Estate, Finance and Economics. 17(1),

pp. 61-86.

Berger, J.O., De Oliveira, V. and Bruno, B. (2001). Objective Bayesian Analysis

of Spatially Correlated Data. Journal of the American Statistical

Association. 96(456), pp. 1361-1374.

Borgman, L. and Chao, L. (1994). Estimation of the Multidimensional

Covariance Function in Case of Anisotropy. Mathematical Geology. 26(2),

pp. 161-179.

Christakos, G. (2000). Modern Spatiotemporal Geostatistics. New York: Oxford

University Press.

Cressie, N. (1985). Fitting Variogram Models by Weighted Least Squares.

Mathematical Geology. 17, pp. 563-586.

Cressie, N. (1988). Spatial Prediction and Ordinary Kriging. Mathematical

Geology. 20(4), pp. 405-421. Erratum, (1989). Mathematical

Geology. 21(4), pp. 493-494.

Cressie, N. (1990). The Origins of Kriging. Mathematical Geology. 22(3),

pp. 239-252.

Cressie, N. (1993). Statistics for Spatial Data. New York: John Wiley and

Sons.

Curriero, F. C. and Lele, S. (1999). A Composite Likelihood Approach to

Semivariogram Estimation. Journal of Agricultural, Biological and

Environmental Statistics. 4(1), pp. 9-28.

De Oliveira, V, and Ecker, M.D. (2002). Bayesian Hot Spot Detection in the

Presence of Spatial Trend: Application to Total Nitrogen Concentrations in

the Chesapeake Bay. Environmetrics. 13, pp. 85-101.

De Oliveira, V., Kedem, B. and Short, D. (1997). Bayesian Prediction of

Transformed Gaussian Random Fields. Journal of the American Statistical Association 92, pp. 1422-1433.

Diggle, P.J., Liang, K-Y. and Zeger, S.L. (1994). Analysis of Longitudinal Data. New York: Oxford Science Publications.

Diggle, P.J., Tawn, J.A. and Moyeed, R.A. (1998). Model-based Geostatistics

(with discussion). Applied Statistics. 47, pp. 299-350.

Ecker, M.D. and Gelfand, A.E. (1997). Bayesian Variogram Modeling for an

Isotropic Spatial Process. Journal of Agricultural, Biological and Environmental Statistics. 2, pp. 347-369.

Ecker, M.D. and Gelfand, A.E. (1999). Bayesian Modeling and Inference for

Geometrically Anisotropic Spatial Data. Mathematical Geology. 31(1), pp. 67-83.

Ecker, M.D. and Gelfand, A.E. (2002). Spatial Modeling and Prediction under

Range Anisotropy. Environmental and Ecological Statistics. (to appear)

Gaudard, M., Karson, M., Linder, E. and Sinha, D. (1999). Bayesian

Spatial Prediction. Environmental and Ecological Statistics. 6(2),

pp. 147-171.

Gelfand, A.E., Ecker, M.D., Knight, J. and Sirmans, C.F. (2002). The

Dynamics of Location in Home Prices. Journal of Real Estate Finance and Economics. (to appear).

Gelfand, A.E. and Ravishanker, N. (1998). Inference for Bayesian Variogram

Models with Large Data Sets. Technical Report, Department of Statistics,

University of Connecticut.

Gelfand, A.E., Zhu, L. and Carlin, B.P. (2001). On the Change of Support

Problem for Spatio-Temporal Data. Biostatistics. 2, pp. 31-45.

Genton, M.G. (1998). Variogram Fitting by Generalized Least Squares

Using an Explicit Formula for the Covariance Matrix. Mathematical

Geology. 30, pp. 323-345.

Guttorp, P., Meiring, W. and Sampson, P. (1994). A Space-Time Analysis of

Ground-Level Ozone Data. Environmetrics. 5, pp. 241-254.

Handcock, M. and Stein, M. (1993). A Bayesian Analysis of Kriging.

Technometrics. 35(4), pp. 403-410.

Handcock, M. and Wallis, J. (1994). An Approach to Statistical Spatial-

Temporal Modeling of Meteorological Fields (with discussion).

Journal of the American Statistical Association. 89(426), pp. 368-390.

Haylock, R.G. and O'Hagan, A. (1996). On Inference for Outputs of

Computationally Expensive Algorithms with Uncertainty on the Inputs.

In J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith (eds.). Bayesian Statistics 5. Oxford: Clarendon Press, pp. 629-637.

Huang, H-C., Cressie, N. and Gabrosek, J. (2000). Fast Spatial Prediction of

Global Processes from Satellite Data. Technical Report, Department of

Statistics, The Ohio State University.

Journel, A. and Huijbregts, C. (1978). Mining Geostatistics. New York:

Academic Press.

Journel, A. and Rossi, M. (1989). When Do We Need a Trend Model in Kriging?

Mathematical Geology. 21(7), pp. 715-739.

Kaluzny, S.P., Vega, S.C., Cardoso, T.P. and Shelly, A.A. (1998). S+SpatialStats

User’s Manual. New York: Springer-Verlag.

Kelker, D. and Langenburg, W. (1997). Ellipsoid Estimation in Coal Reflectance

Anisotropy. Mathematical Geology. 29(2), pp. 185-198.

Kitanidis, P.K. (1986). Parameter Uncertainty in Estimation of Spatial

Functions: Bayesian Analysis. Water Resources Research. 22(4),

pp. 499-507.

Krajewski, P., Molinska, A. and Molinska, K. (1996). Elliptical Anisotropy

in Practice - A Study of Air Monitoring Data. Environmetrics. 7,

pp. 291-298.

Kyriakidis, P.C. and Journel, A.G. (1999). Geostatistical Space-Time Models: A

Review. Mathematical Geology. 31(6), pp. 651-684.

Lahiri, S., Kaiser, M., Cressie, N. and Hsu, N-J. (1999). Prediction of Spatial

Cumulative Distribution Functions Using Subsampling (with discussion). Journal of the American Statistical Association 94, pp. 86-110.

Lamorey, G. and Jacobson, E. (1995). Estimation of Semivariogram Parameters

and Evaluation of the Effects of Data Sparsity. Mathematical Geology.

27(3), pp. 327-358.

Le, N. and Zidek, J. (1992). Interpolation with Uncertain Spatial Covariances: A

Bayesian Alternative to Kriging. Journal of Multivariate Analysis. 43,

pp. 351-374.

Lele, S. and Das, A. (1999). Elicited Data and Incorporation of Expert Opinion

for Statistical Inference in Spatial Studies. Mathematical Geology. 32(4),

pp. 465-487.

Littell, R.C., Milliken, G.A., Stroup, W.W. and Wolfinger, R.D. (1996). SAS

System for Mixed Models. Cary, NC: The SAS Institute.

Long, E.R. and Wilson, C. J. (1997). On the Identification of Toxic Hot Spots

using Measures of Sediment Quality Triad. Marine Pollution Bulletin. 34,

pp. 373-374.

Matern, B. (1986). Spatial Variation. New York: Springer.

Matheron, G. (1963). Principles of Geostatistics. Economic Geology. 58,

pp. 1246-1266.

McBratney, A. and Webster, R. (1986). Choosing Functions for Semi-

variograms of Soil Properties and Fitting them to Sampling

Estimates. Journal of Soil Science. 37, pp. 617-639.

M[pic]ller, W. (1999). Least-squares Fitting from the Variogram Cloud.

Statistics and Probability Letters. 43, pp. 93-98.

Myers, D. (1989). To Be or Not to Be … Stationary? That is the Question.

Mathematical Geology. 21(3), pp. 347-362.

Olea, R.A. (1991). Geostatistical Glossary and Multilingual Dictionary.

New York: Oxford University Press

Raspa, G. and Bruno, R. (1993). Multivariate Geostatistics for Soil

Classification. In A. Soares, editor, Geostatistics Troia, `92. Boston:

Kluwer Academic Publishers. pp. 793-804.

Ritvo, G., Sherman, M., Lawrence, A. and Samocha, T. (2000). Estimation of

Soil Phosphorus Levels in Shrimp Ponds. Journal of Agricultural,

Biological and Environmental Statistics. 5(1), pp. 115-129.

Rouhani, S. and Myers, D. (1990). Problems in Space-Time Kriging of

Geohydrological Data. Mathematical Geology. 22(5), pp. 611-623.

Sampson, P. and Guttorp, P. (1992). Nonparametric Estimation of

Nonstationary Spatial Covariance Structure. Journal of the American

Statistical Association. 87(417), pp. 108-119.

Schuenemeyer, J.H. and Power, H.C. (2000). Uncertainty Estimation for

Resource Assessment – An Application to Coal. Mathematical Geology. 32(5), pp. 521-541.

Smith, R. (1996). Estimating Nonstationary Spatial Correlations. Technical

Report, Department of Statistics. University of North Carolina.

Stein, M. (1987). Minimum Norm Quadratic Estimation of Spatial Variograms.

Journal of the American Statistical Association. 82(399), pp. 765-772.

Stein, M. (1999). Interpolation of Spatial Data. New York: Springer-Verlag.

Wackernagel, H. (1998). Multivariate Geostatistics, 2nd Edition. New York:

Springer-Verlag.

Webster R. and Oliver, M., (1992). Sample Adequately to Estimate Variograms

of Soil Properties. Journal of Soil Sciences. 43, pp. 177-192.

Wikle, C., Berliner, M. and Cressie, N. (1998). Hierarchical Bayesian Space-

Time Models. Environmental and Ecological Statistics. 5, pp. 117-154.

Wikle, C. and Royle, A. (1999). Space-Time Dynamic Designs of Environmental

Monitoring Networks. Journal of Agricultural, Biological and

Environmental Statistics. 4(4), pp. 489-507.

Zhan, H. (1999). On the Ergodicity Hypothesis in Heterogeneous Formations.

Mathematical Geology, 31(1), pp. 113-134.

Zimmerman, D. (1993). Another Look at Anisotropy in Geostatistics.

Mathematical Geology, 25(4), pp. 453-47.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download