A New Approach to Specify and Estimate Non-Normally …



A New Approach to Specify and Estimate Non-Normally Mixed Multinomial Probit Models

Chandra R. Bhat*

The University of Texas at Austin

Dept of Civil, Architectural and Environmental Engineering

1 University Station C1761, Austin TX 78712-0278

Phone: 512-471-4535, Fax: 512-475-8744

E-mail: bhat@mail.utexas.edu

and

Raghuprasad Sidharthan

The University of Texas at Austin

Dept of Civil, Architectural and Environmental Engineering

1 University Station C1761, Austin TX 78712-0278

Phone: 512-471-4535, Fax: 512-475-8744

E-mail: raghu@mail.utexas.edu

*corresponding author

Original version: July 4, 2011

Revised version: February 11, 2012

ABSTRACT

The current paper proposes the use of the multivariate skew-normal distribution function to accommodate non-normal mixing in cross-sectional and panel multinomial probit (MNP) models. The combination of skew-normal mixing and the MNP kernel lends itself nicely to estimation using Bhat’s (2011) maximum approximate composite marginal likelihood (MACML) approach. Simulation results for the cross-sectional case show that our proposed approach does well in recovering the underlying parameters, and also highlights the pitfalls of ignoring non-normality of the continuous mixing distribution when such non-normality is present. At the same time, the proposed model obviates the need to assume a pre-specified parametric distribution for the mixing, and allows the estimation of a very flexible, but still parsimonious, mixing distribution form.

Keywords: multinomial probit, mixed models, maximum approximate composite marginal likelihood, maximum simulated likelihood, multivariate skew-normal distribution

1. Introduction

Econometric discrete choice analysis is an essential component of studying individual choice behavior and is used in many diverse fields to model consumer demand for commodities and services. The decision principle used in almost all discrete choice models corresponds to utility maximization, which is based on the Lancastrian (1971) notion of the assignment of a composite utility to each alternative in the choice set (based on alternative and individual attributes) followed by the choice of the alternative with the highest utility. Further, since the analyst does not observe all individual and context-related factors that contribute to choice decisions, one or more stochastic elements (or random error terms) are introduced in the utility of alternatives. Different ways of introducing the stochastic elements lead to different discrete choice model structures. Thus, consider a cross-sectional choice situation with a single choice occasion per individual, and assume independence among the choice behaviors of individuals.[1] Then, the simplest model form, corresponding to the multinomial logit (MNL) model introduced by Luce and Suppes (1965) and McFadden (1974), assumes a single composite independently and identically distributed or IID (across alternatives) random utility error term with a Gumbel (or Type I extreme-value) distribution. This leads to the simple and elegant MNL model form, but also leaves the model form saddled with the familiar independence from irrelevant alternatives (IIA) property. Maintaining a single composite Gumbel error term in utilities, while relaxing the independence assumption (across alternatives), moves the model form from the multinomial logit to the generalized extreme-value (GEV) class of models proposed by McFadden (1978). On the other hand, relaxing the identically distributed assumption (across alternatives) with the Gumbel distribution assumption leads to the Heteroscedastic Extreme Value (HEV) model form proposed by Bhat (1995). Finally, still maintaining a single composite error term but now with a normal distribution, when combined with relaxation of the independence and/or identical distribution assumptions, generates the multinomial probit (MNP) model form originally proposed by Hausman and Wise (1978) and Daganzo (1979). Of these model forms, the MNP form allows the most flexible error covariance structures (up to certain limits of identifiability; see Train, 2009, Chapter 5), though it also entails more estimation effort since it requires the evaluation of a multidimensional normal orthant probability function with an ([pic]) dimensional integral in the general case (where I is the number of alternatives).

A substantial amount of the early theoretical developments in discrete choice modeling was focused on a single composite error term. Over the past decade and a half, attention has shifted more toward the use of multiple error terms through the introduction of a mixing random distribution structure in the utility function of alternatives that is independent of the kernel error term. Essentially, the mixing structure superimposes additional stochastic terms over the “kernel” error term discussed in the previous paragraph. There are several reasons for this shift toward mixing structures. First, in a cross-sectional context, it is very plausible that there are unobserved variations across individuals in the sensitivity to relevant exogenous attributes (such as differential sensitivity due to unobserved factors to travel time and travel cost in a travel mode choice model). Ignoring these variable-specific stochasticity effects and instead using a single composite error term in the utility function will, in general, lead to inconsistent coefficient estimates and trade-off estimates, as well as incorrect substitution patterns across alternatives (see Bhat, 1997a).[2] A second reason for the increasing use of mixing structures is that they provide the ability to introduce heteroscedasticity across utilities in the closed-form GEV models through an error-components specification, as discussed in Train (2009). It also provides the ability to generate correlation across alternatives through an error-components specification. The use of a mixing structure over the closed-form GEV kernel-based model can then essentially achieve any desired covariance pattern. At the same time, and especially when the number of alternatives far exceeds the number of mixing random terms needed to capture the “true” covariance pattern, the maximum simulated likelihood (MSL) estimation of the mixed GEV model is generally much easier and faster than a non-mixed MNP model (see Bhat et al., 2008 and Train, 2009 for detailed discussions). A third reason for using mixing structures is that, when using GEV-based kernels, mixing structures enable the introduction of error dependencies across the choice occasions of the same decision-maker in panel or repeated choice contexts (see Li et al., 2010). Even when using an MNP kernel, the mixing structure can provide substantial econometric and computational efficiency to capture panel effects. Further, the mixing approach is almost identical when dealing with cross-sectional choice data or panel data, and poses no conceptual and likelihood estimation coding differences.

There is yet another reason to consider a mixing approach in discrete choice modeling. This has to do with explicitly specifying the random mixing distribution on variables in a way that is consistent with theoretical notions. In fact, the ability to do so is critical to the observation made by McFadden and Train (2000) that the mixed multinomial logit model is capable of approximating any random utility maximization model. Thus, for example, one may want to consider bounded distributions (such as a log-normal distribution or a Rayleigh distribution) for cost and time coefficients in a travel mode choice model, so that the coefficients on these variables are bounded at the upper end. On the other hand, the coefficients on some other variables may be appropriately considered as being unbounded. Further, there are several types of continuous distributions that may be used to capture the profile of population sensitivity to variables.[3] In the context of continuous mixing distributions, the normal distribution has been used quite extensively in the past. However, several studies (see, for example, Amador et al., 2005, Train and Sonnier, 2005, Hensher et al., 2005, Fosgerau, 2005, Greene et al., 2006, Balcombe et al., 2009, and Torres et al., 2011) have underscored the potentially serious mis-specification consequences (in terms of theoretical considerations, data fit, as well as trade-off evaluations) of using the normal distribution. In particular, the symmetric nature of the normal distribution, when combined with mean values that may not be too far away from zero, implies that a significant fraction of individuals may have an unexpected sign on variables (such as a positive coefficient on cost or time). For instance, Train and Sonnier (2005), in their analysis of vehicle choice, found that 22% of the population preferred vehicles with a higher purchase price, and 37% of the population preferred vehicles with a higher operating cost, when they used a normal distribution for the cost coefficients. On the other hand, when Train and Sonnier used a log-normal distribution and a bounded Johnson’s SB distribution for the cost coefficients, such results were avoided and they also obtained better data fits. Finally, another issue with using normally distributed cost and other coefficients is that this leads to a breakdown of the WTP calculation because the moments of the ratio of two normally distributed random terms do not exist (see Cedilnik et al., 2006, Daly et al., 2011).

As indicated already, there have been several earlier studies that have successfully estimated non-normal distributions for the mixing distribution. All of these studies use a multinomial logit model kernel over which mixing is specified. However, the general experience has been that, even when successful, such estimations take a longer time for convergence (relative to normal distributions). This is particularly so for asymmetric distributions with long tails, such as the log-normal distribution. Further, in some cases, the maximum simulated likelihood (MSL) of models with non-normal mixing fails due to numeric/computational problems. It is not uncommon to see researchers consider non-normal distributions only to eventually revert to the use of a normal distribution (see, for example, Bartels et al., 2006 and Small et al., 2005). In addition to these problems specific to the use of non-normal distributions, MSL inference techniques can have other limitations, including a rapid degradation in accuracy as the number of dimensions of mixing increases, and problems with the accuracy (or lack thereof) of the covariance matrix of the estimator. These issues may be traced back to the use of a simulation approach to evaluate the log-likelihood function, which leads to a highly nonlinear and non-smooth second derivatives surface of the log-simulated likelihood function.

Recently, Bhat (2011) proposed an alternative maximum approximate composite marginal likelihood (MACML) inference approach to estimate the multinomial probit (MNP) model. His basis for preferring an MNP kernel rather than a multinomial logit or GEV kernel originates from several considerations. First, in cases such as a spatial analysis where the utility of spatial alternatives are correlated based on proximity, or in situations where the utility of individuals for alternatives have a spatial dependency component based on the usual spatial error/lag formulations used in spatial econometrics (see Anselin, 1988), the resulting parametric covariance structure across alternatives or across decision-makers is simply infeasible or extremely inefficient to incorporate with a mixing approach over a restrictive Gumbel kernel covariance surface. Second, when a normal mixing distribution is used, the resulting “mixed MNP” model collapses back to an MNP model due to the closure property of the normal distribution under affine transformations. This, along with the MACML inference procedure, implies the need only to evaluate univariate and bivariate cumulative normal distribution function evaluations, regardless of the number of alternatives or the number of choice occasions per individual or the nature of social/spatial dependence structures. Further, the MACML procedure uses an analytic approximation method rather than a simulation evaluation method to evaluate the multivariate normal cumulative distribution function, which improves the ability to accurately and precisely recover the parameters and their covariance matrix estimates (because of the smooth nature of the first and second derivatives of the approximated analytic log-likelihood function). The net result is that the MNP kernel with the MACML inference approach leads to substantial computational gains compared to the MSL estimation of normally-mixed MNL and GEV models, as well as enables estimation in cases where the MSL estimation of mixed MNL and GEV approaches are simply infeasible.

One problem, however, with Bhat’s MACML approach as it stands is that it is only applicable to the normally-mixed case. However, as discussed earlier, a normal mixing distribution may not be appropriate in several cases. What is needed then is a model that is able to include both a general covariance kernel structure as well as non-normal mixing, while also still being able to be estimated using the MACML approach. This is the objective of the current paper. Specifically, we introduce the use of a multivariate skew-normal distribution function for mixing with an MNP kernel model. The skew-normal distribution, considered by O’Hagan and Leonard (1976) and formalized by Azzalini (1985) for the univariate case, has been extended to the multivariate case by Azzalini and Dalla Valle (1996) and Azzalini and Capitanio (1999). Since these initial contributions, more research on different types of multivariate generations of the skew-normal distribution and their properties have been undertaken (see Gonzalez-Farias et al., 2004, Arellano-Valle and Genton, 2005, Gupta et al., 2004, Arellano-Valle and Azzalini 2006, 2008, Azzalini, 2011). As discussed later, the multivariate skew normal (MSN) distribution retains several attractive properties of the multivariate normal distribution, and an MNP kernel model mixed with this distribution also lends itself nicely to estimation using the MACML approach. At the same time, the MSN distribution is tractable, parsimonious in parameters that regulate the distribution and its skewness, and includes the normal distribution as a special interior point case. It also is a very flexible unimodal density structure that allows a “seamless” and “continuous” variation from normality to non-normality, and can replicate a variety of smooth unimodal density shapes with tails to the left or right as well as with a high modal value (sharp peaking) or low modal value (flat plateau). The asymmetry accruing from the skewness of the distribution also can allow the density to be pretty much confined to the positive (or negative) half-line. In this sense, it includes a likeness of the log-normal density function as a special case, but with tails that are thin as in the normal density function (which makes estimation easier than in the log-normal case). Despite these desirable properties, there has been little explicit consideration of the skew normal distribution for random terms even in the linear regression field with continuous observations (but see Jara et al., 2008, Meintanis and Hlavka, 2010, and Molenaar et al., 2010), and there has been no consideration whatsoever of this distribution in the discrete choice field.[4]

The rest of this paper is structured as follows. The next section discusses the fundamental structure and properties of the univariate and multivariate skew normal distributions. The third section presents the model framework and estimation procedure for the proposed skew-normally mixed MNL model. Section 4 undertakes a simulation exercise to assess the ability of the proposed model to recover underlying parameters. Finally, Section 5 summarizes the key findings of the paper.

2. THE SKEW-NORMAL DISTRIBUTION

The literature on the skew-normal distribution is quite vast, but also scattered. In this section, we compile and present all the most relevant properties of the distribution in the context of application for mixed MNP models. The section begins with a characterization of the univariate skew-normal distribution and then proceeds to the more relevant case of the multivariate skew-normal distribution.

2.1. The Univariate Skew-Normal Distribution

A random variable Y is labeled as being skew-normally distributed with a location parameter [pic] [pic] a scale parameter [pic] [pic] and a shape parameter [pic][pic] if its probability density function is as follows:

[pic], (1)

where [pic] and [pic] represent the standard normal density and cumulative distribution function, respectively. When [pic]the density collapses to that of a normal distribution with mean and variance parameters of [pic] and [pic], respectively. Setting Y = [pic]we obtain a standardized version of the probability density function of the skew-normal distribution (corresponding to the density function of Z that has a location parameter of 0 and scale parameter of 1) given by [pic]. The density function for Y in Equation (1) may be written in terms of the standard density function as [pic] where [pic] Appendix A.1 presents the moment generating function and the moments of the standardized skew-normal distribution (SSN).

An important stochastic representation for Z that is useful for random generation from the SSN distribution is obtained using a conditioning mechanism. Specifically, consider two bivariate normally distributed variables [pic] and [pic]:

[pic] (2)

Then, [pic] has the SSN density function [pic], where the relationship between [pic] and [pic] is as follows: [pic] (see Appendix A.2 for a derivation). Using this conditioning mechanism, the cumulative distribution function for Z may be obtained as follows:

[pic] (3)

Thus, the cumulative SSN distribution function may be written in terms of a bivariate cumulative standard normal distribution function, and the cumulative distribution function for the non-standardized skew-normally distributed variable Y may be obtained as:

[pic] (4)

For the extension to the multivariate skew-distribution, and especially for use with the multinomial probit model, an alternate parameterization of Z (referred to by Arellano-Valle and Azzalini, 2006 as the unified skew-normal variable) will be helpful. This is based on the conditioning mechanism discussed above. In this alternate parameterization, the univariate SSN density function is written as [pic] and the univariate cumulative distribution function is written as [pic], with [pic] replacing [pic].

Figure 1 shows the shapes of the normal density function (solid line) and the SSN density functions for three positive values of [pic] (the plots are mirrored across the y-axis for negative values of [pic]). As the value of the shape parameter [pic]increases, the skewness of the distribution increases and the density shows sharper peaking. As [pic] the SSN density tends toward a half-normal density function. Note also that, as the shape parameter increases, the right skewness increases not because the extreme right tail gets longer but because the left tail becomes shorter and shorter (relative to the normal distribution). This is a desirable property in the likelihood convergence of mixed models, and is unlike the log-normal distribution whose right tail gets very long rapidly as the variance of the distribution increases.

2.2. The Multivariate Skew-Normal Distribution Function

There are several multivariate versions of the skew-normal distribution in the literature (see Arellano-Valle and Azzalini, 2006 for a discussion of these many variants, and a unified treatment of these). All of these share several properties similar to the multivariate normal distribution. In this paper, we select the multivariate skew distribution version originally proposed by Azzalini and Dalla Valle (1996) for a number of reasons. This version is efficient in the number of additional parameters to be estimated, allows independence between skew-normally distributed and normally-distributed elements in a multivariate vector (useful in selectively imposing skew-normality only on certain coefficients), is closed under any affine transformation of the skew-normally distributed vector (is the key to the MACML estimation of the MNP model), and is closed under the sum of independent skew-normally distributed and normally distributed vectors of the same dimensions (is the key to non-normally mixing distributions superimposed on an MNP kernel). As importantly, the cumulative distribution function of a D-variate skew normally distributed variable of the Azzalini and Dalla Valle type requires only the evaluation of a [pic]-dimensional multivariate cumulative normal distribution function.

Consider a multivariate skew-normally (MVSN) distributed random variable vector [pic] with a [pic]-location parameter vector [pic] ([pic] and a [pic]-symmetric positive-definite covariance matrix [pic]. Let the correlation matrix corresponding to [pic] be [pic], and let [pic] be a [pic]-diagonal matrix formed by the standard deviations of [pic] ([pic] is the jth diagonal element of the matrix [pic]). Then, we may write: [pic] Setting Y = [pic] we obtain a standardized version of the multivariate probability density function of the skew-normal distribution (corresponding to the density function of Z that has a location parameter of 0 and a correlation matrix [pic]). As in the univariate case, it can be shown that the random variable Z is obtained through a latent conditioning mechanism on a [pic]-variate normally distributed vector [pic] where [pic] is a latent [pic]-vector and [pic] is a [pic]-vector:

[pic] (5)

[pic]is a [pic]-vector, each of whose elements must lie between –1 and +1. The matrix [pic]is also a positive-definite correlation matrix. Then, [pic] has the standard multivariate skew-normal (SMVSN) density function shown below:

[pic]. (6)

where [pic] and [pic] represent the standard multivariate normal density function of D dimensions and the standard univariate cumulative distribution function, respectively. We write [pic] The probability density function of the random variable Y [pic] may be written in terms of the SMVSN density function above as:

[pic] (7)

The moment generating function of Z and its first three moments are presented in Appendix A.3.

The cumulative distribution function for Z may be obtained as:

[pic] (8)

The corresponding cumulative distribution function for Y is:

[pic] (9)

The close correspondence with the normal distribution leads to several desirable properties of the multivariate skew-normal (MVSN) distribution. The ones that are key to the proposal in this paper to use the MSN distribution for mixing in MNP models are listed and discussed below.

Property 1:

The sum of a MVSN distributed vector [pic](of dimension [pic]) [pic] and an independently distributed multivariate normally (MVN) distributed vector [pic] (also of dimension [pic]) [pic] is still MVSN distributed:

[pic] where [pic] [pic], and [pic] is the diagonal matrix of standard deviations of [pic].

Proof: There are several ways to prove this property, but perhaps the easiest is to use the moment generating functions of [pic]and [pic]. Specifically, we have (from Appendix A.3):

[pic] (10)

The above expression is once again in the MVSN moment generating form in Appendix (A.3), from which the property is proved.

Property 2:

The affine transformation of the MVSN distributed vector [pic](of dimension [pic]) [pic] as [pic], where [pic] is a [pic] matrix is also a MVSN distributed vector of dimension [pic]:

[pic] where [pic]

[pic] and [pic] is the diagonal matrix of standard deviations of [pic].

Proof: The moment generating function of [pic]may be written as:

[pic] (11)

This proves the result. The two properties above provide the marginal distribution of the utilities under a MNP kernel mixed with skew normally distributed and normally distributed random coefficients, which is critical to the MACML estimation of the resulting model, as discussed next.

3. THE MODEL FRAMEWORK

We develop the model framework first in the context of a cross-sectional MNP model and then discuss the panel formulation. However, the skew-normal mixing can also be imposed on any other form of the MNP model, including settings with spatial dependencies and social dependencies across decision units, and combinations of temporal, spatial, and social dependencies.

3.1. Cross-Sectional MNP Formulation and Estimation

Consider a random-coefficients formulation in which the utility that an individual q [pic]associates with alternative i [pic] is written as:

[pic] (12)

where [pic] is a [pic]-column vector of exogenous attributes, [pic] is another [pic]-column vector of exogenous attributes (including dummy variables for constants, except in one of the I alternative utilities), [pic] is an individual-specific [pic]-column vector of MVSN-distributed coefficients that varies across individuals based on unobserved individual attributes, [pic] is another individual-specific [pic]-column vector of MVN-distributed coefficients that varies across individuals based on unobserved individual attributes (but with the coefficients on the dummy variables for the constants maintained as fixed coefficients in the vector [pic]), and [pic] is assumed to have a general covariance structure subject to identifiability considerations (let [pic] In many situations, such as in a path choice model (see Yai et al., 1997) or a model with spatial location alternatives (see Bhat and Guo, 2007), a specific parametric structure, based on theoretical considerations appropriate to the context, can be placed on [pic] Similarly, in a pure random coefficients specification (as in Hausman and Wise, 1978), one may consider [pic] to be an identity matrix (or an identity matrix scaled by 0.5 or any other constant). Such specifications help in econometric identification as well as econometric efficiency. If a general covariance structure is adopted, there are many ways to ensure identification. An appealing approach is to take the differences of the error terms with respect to the first error term. Let [pic] and let [pic]. Then, up to a scaling factor, the covariance matrix of [pic] [pic] is identifiable. Next, scale the top left diagonal element of this error-differenced covariance matrix to 1. Thus, there are [pic] free covariance terms in the [pic] matrix [pic]. Finally, to ensure that whenever differences are taken with respect to the chosen alternative during the maximum approximate composite marginal likelihood (MACML) estimation, these differences are consistent with the same error covariance matrix [pic] for the undifferenced error term vector [pic], [pic] is constructed from [pic] by adding a top row of zeros and a first column of zeros (see Train, 2003; page 134). During the MACML estimation, then, we can obtain the [pic]covariance matrix of the error differences taken with respect to the mth alternative as [pic] where [pic] is a [pic] matrix which corresponds to the identity matrix of size [pic] with an extra column of –1’s added as the mth column.

In Equation (12), we will assume that the random vectors [pic], [pic], and [pic]are independent of each other for each individual, as well as that these vectors are independent of the corresponding coefficients of other individuals (this latter assumption can be relaxed within our modeling framework, as will be needed for accommodating spatial or social dependency effects). From the earlier definitions, we can write [pic] with [pic], and [pic] with [pic]. Also let [pic] ([pic] vector), [pic] ([pic]matrix), and [pic] ([pic]matrix). Then, we can write:

[pic] (13)

Let [pic] indicate the eth element of the column vector [.]. Equation (12) can equivalently be written using Equation (13) as:

[pic] (14)

Define [pic] and [pic] Also, assume that individual q chooses alternative mq. In the utility differential form, we may write Equation (14) as:

[pic] (15)

Then stack the utility differentials [pic][pic]in the following order: [pic] an [pic] vector. Correspondingly, let [pic] an [pic] vector, and define [pic]matrix), [pic]matrix) and [pic]Based on properties 1 and 2 earlier in the paper, we can derive the location and other parameters of the vector [pic], which is also skew-normally distributed. Specifically, by successive applications of property 2 and then property 1, we obtain the following important result:

[pic] (16)

[pic] (17)

[pic] is the diagonal matrix of standard deviations of [pic]. The parameters to be estimated include the b and c vectors, the elements of the covariance matrices [pic], and the [pic] parameter vector. Collect all these elements into a single vector [pic]. Then, one can use the result above to obtain the likelihood contribution of individual q choosing alternative m, which takes the I-dimensional integral form below:

[pic] (18)

It is straightforward to see that if all the elements of [pic] are zero, then the likelihood function above collapses to that of an MNP model. If not, the likelihood corresponds to a skew-normally mixed MNP model.

The I-dimensional integral in the likelihood contribution of each individual corresponds to the multivariate normal cumulative distribution function. The evaluation of such a function cannot be pursued using quadrature techniques due to the curse of dimensionality when the dimension of integration exceeds two (see Bhat, 2003). Consequently, the probability expression is typically approximated using Geweke-Hajivassiliou-Keane (GHK) simulator-based or the Genz-Bretz (GB) simulator-based techniques in the classical maximum simulated likelihood (MSL) inference approach (see Bhat et al., 2010 for a detailed description of these simulators) or using Markov Chain Monte Carlo (MCMC) techniques in the Bayesian inference approach (see Albert and Chib, 1993, McCulloch and Rossi, 2000, and Train, 2009). However, these MSL and Bayesian techniques can require extensive simulation, can be time-consuming, are not always very straightforward to implement, and can create convergence assessment problems as the number of dimensions of integration increases. On the other hand, the maximum approximate composite marginal likelihood (MACML) approach for estimation of MNP models, in which the MVNCD function is evaluated using an analytic approximation method, is quite accurate and very fast.

There is, however, one very important issue that still needs to be dealt with. This concerns the positive definiteness of several matrices in Equation (12). Specifically, for the estimation to work, we need to ensure the positive definiteness of the following matrices: [pic](note that the positive definiteness of [pic] ensures the positive definiteness of [pic] and therefore [pic]; this holds because of the property that any principal square sub-matrix of a positive definite matrix is also positive definite). Of these, one can guarantee the positive-definiteness of [pic] in a straightforward fashion using a Cholesky decomposition approach (by parameterizing the likelihood function in terms of the Cholesky-decomposed parameters). To guarantee the positive definiteness of the correlation matrix [pic]we use the approach of Bhat and Srinivasan (2005). Specifically, let L be the Cholesky decomposition matrix for [pic] We need to guarantee that the parameters embedded within L are such that [pic] is a correlation matrix. This is done by parameterizing the diagonal terms of L as follows:

[pic] (19)

In the estimation, the Cholesky elements in the matrix L are estimated, guaranteeing that [pic]is indeed a correlation matrix.

3.2. Panel (or Repeated-Choice) MNP Formulation and Estimation

For the panel formulation, we introduce the index ‘t’ for choice occasion. For ease in presentation, we will use the same number of choice occasions for each individual. Extension to the case of varying number of choice occasions per individual is straightforward.

Consider the random-coefficients formulation in which the utility that an individual q [pic]associates at time period t [pic] with alternative i [pic] is written as:

[pic] (20)

where all notations are as earlier except for the introduction of the index ‘t’. However, note that the vector [pic] is now a[pic]-column vector of exogenous attributes without including a constant. [pic] is a normal random-effect term capturing time-stationary preference effects of individual q for alternative i. Also, as earlier, consider the (I×1)-vector [pic], and assume that [pic] with the same normalizations on [pic] as in the cross-sectional case (note that the [pic] error terms are considered independent across individuals and choice occasions, and [pic], [pic], and [pic] are also assumed independent for each individual q; [pic] and [pic] are also independent across individuals). Next, stack the error terms [pic] into an (I×1)-vector [pic] and let [pic] However, since only utility differentials matter, take the differentials of these random effects with respect to the first alternative [pic]. Then, only the mean vector [pic] and covariance matrix [pic] of [pic]are identified. At the same time, whenever utility differences are taken with respect to the chosen alternative during the MACML estimation, these utility differences should be consistent with the same mean vector [pic] and error covariance matrix [pic]for the undifferenced error term vector [pic]. To achieve this, we set [pic] (that is, the first element of the vector [pic] is set to zero), and construct [pic] from [pic] by adding a first row of zeros and a first column of zeros.

We now set out some additional notation. Write [pic], [pic]

([pic] vector), [pic] ([pic]vector) so that [pic] Define [pic] ([pic]vector), [pic] ([pic]vector), [pic] ([pic]vector), [pic] ([pic]vector), [pic] ([pic]vector), [pic] ([pic]matrix), [pic] ([pic]matrix), [pic] ([pic]matrix), [pic] ([pic]matrix). Let [pic]be a column vector of ones of dimension T, and let [pic] be a matrix of ones of dimension T×T. Then, we can write:

[pic] (21)

Let [pic] indicate the eth element of the column vector [.], and let [pic] Equation (20) can be equivalently written using Equation (21) as:

[pic] (22)

Define [pic] and [pic] Also, assume that individual q chooses alternative mqt at the tth choice instance. In the utility differential form, we may write Equation (22) as:

[pic] (23)

Then stack the utility differentials [pic][pic]in the following order: [pic]an [pic] vector, and [pic] an [pic] vector. Correspondingly, let [pic]an [pic] vector; [pic] an [pic] vector. It is easy to see that [pic] has a mean vector Hq. To determine the covariance matrix of [pic], a few additional matrix definitions are needed. Define [pic]matrix), [pic]matrix), [pic]matrix), and [pic]matrix). Let [pic] and define Mq as an [pic] block diagonal matrix, with each block diagonal having [pic] rows and I columns corresponding to the qth individual’s tth choice instance. This [pic] matrix for individual q and observation time period t corresponds to an [pic] identity matrix with an extra column of [pic]’s added as the [pic]th column. For instance, consider the case of T = 2, and I = 4. Let the qth individual be observed to choose alternative 2 in time period 1 and alternative 1 in time period 2. Then Mq takes the form below.

[pic]. (24)

Finally, we obtain the results below:

[pic] (25)

[pic] (26)

[pic] is the diagonal matrix of standard deviations of [pic] The parameters to be estimated include the A, b, and c vectors, the elements of the covariance matrices [pic], and the [pic] parameter vector. Collect all these elements into a single vector [pic] Then, one can use the result above to obtain the likelihood contribution of individual q choosing alternative m, which takes the [pic]-dimensional integral form below:

[pic] (27)

In this panel setting, the parameter vector [pic] is estimated by defining “events” in the MACML procedure as the pairs of choice observations across the choice occasions of the individual. Letting the individual’s choice at time t be denoted by the index [pic], the CML function for individual q is:

[pic] (28)

where [pic] The computational effort is reduced in the CML above because only pairwise marginal multivariate probabilities are being considered across choice occasions. However, each multivariate orthant probability above still has a dimension equal to [pic]:

[pic] (29)

where [pic], [pic] and [pic] are appropriate sub-matrices of [pic] and [pic], respectively (that is, they include elements corresponding to the tth and wth choice occasions of the individual). But such an orthant probability is conveniently computed using the approximation part of the MACML, leading to solely bivariate and univariate cumulative normals.

4. SIMULATION ANALYSIS

In this section, we undertake a simulation experiment with two objectives in mind. The first objective is to examine the ability of the MACML estimation method to recover parameters in the MNP model with skew-normally distributed coefficients. The second objective is to illustrate the problems that may arise from ignoring the skewness in the random coefficient distribution, which is equivalent to assuming that the distribution is normally distributed when it actually is not.

4.1. Experimental Set-Up

A cross-sectional formulation is used for the simulation experiments. Two cases are considered: (1) a three alternative case with three exogenous variables and (2) a five alternative case with five exogenous variables. In both the cases, the values of each of the exogenous variables for the alternatives are drawn from a standard univariate normal distribution. In particular, a sample of 5000 realizations of the exogenous variables is generated corresponding to 5000 individuals. The first case specifies a skew-normally distributed random coefficient vector [pic] on all the three exogenous variables, and the second case specifies a skew-normally distributed random coefficient vector [pic]on the first three exogenous variables and a normally distributed random coefficient vector [pic]on the remaining two exogenous variables. For the five-dimensional simulation case, the coefficient vector [pic]is assumed to be a realization from [pic] with:

[pic] and [pic] (30)

In the simulation experiments, the coefficient vector [pic] is assumed to be a realization from [pic], where

[pic][pic]and [pic]. (31)

The correlation matrix [pic] above is constructed in a specific manner so that the off-diagonal elements of the corresponding Cholesky matrix are all zero, except for the first column which now contains the skew parameters (= –0.7) as its elements.[5] Essentially, this way of constructing the correlation matrix assumes that all the correlations in the augmented four-dimensional correlation matrix (corresponding to the three-dimensional skew-normally distributed random coefficient vector) originates in the skew distribution of the coefficients, with no residual correlation beyond that generated by the skew. Such a specification is parsimonious, and can be used to reduce the number of parameters to be estimated in the skew-normal MNP model. For instance, in the MNP with three skew-normal coefficients, there is a reduction from nine correlation parameters to just three. More generally, in a model with D skew-normal coefficients, there is a reduction from [pic] to D parameters in the augmented correlation matrix. Clearly, this can be an effective way to allow a large number of skew-normally distributed coefficients without an explosion in the number of model parameters to be estimated. The other benefit of such a specification is that the skew parameter vector [pic] is directly estimated because it “sits” as the first column of the Cholesky matrix (minus the first row element).

Another point to note about our skew specification for the [pic] vector is that the negative values for b and [pic] provide a negative location parameter and leftward skew for the marginal distributions of each of the [pic] coefficients that is similar to a (negative) log-normal distribution. Such a specification may be considered for cost and other coefficients. Of course, in reality, the skew-normal distribution can be used for all parameters to allow a range of “seamless” and “continuous” marginal distribution possibilities that ranges from normality to non-normality.

The method to generate realizations from the MVSN distribution for [pic] is based on first drawing a multivariate standard normal vector with correlation matrix [pic] in the usual way. This constitutes a draw for the latent underlying [pic]-variate normally distributed vector [pic] where [pic] is a latent [pic]-vector and [pic] is a [pic]-vector (see Equation (5); D = 3 in the current case). From this multivariate standard normal draw, a D-variate vector from the multivariate standard skew normal distribution is generated as follows:

[pic] (32)

Finally, the error term vector [pic] is drawn from [pic] where[pic] is the identity matrix of dimension I (in the notation of Equation (19), [pic] Thus, we assume and maintain the IID normal assumption for [pic] in the current simulation experiment. The alternative with the highest utility for each individual q is then identified as the chosen alternative.

The above data generation process is undertaken 40 times with different realizations of the [pic],[pic], and [pic] vectors [pic] to generate 40 different data sets. The MACML estimator is applied ten times to each dataset, with different sets of permutations (across the ten runs on the same dataset) to decompose the multivariate normal cumulative distribution or MVNCD function into a product of marginal and conditional probabilities (see Bhat, 2011). In each of the ten runs on the same dataset, ten different random permutations are generated and used for each individual (the random permutation varies across individuals) to approximate the MVNCD function for that individual. The approximation error for each parameter (due to using the analytic approximation to the MVNCD function) is obtained by computing the standard deviation of estimated parameters among the 10 different parameter estimates on the same data set.

A number of performance measures are identified to assess the performance of the MACML approach in being able to recover the underlying “true” parameters (which is the first objective of our simulation exercise). The performance measures, and the various steps to compute these measures, are described below:

1) Estimate the MACML parameters for each data set s and for each of 10 independent sets of permutations for computing the MVNCD function.

2) For each data set s, estimate the standard errors (s.e.) (using the sandwich covariance matrix estimator; see McFadden and Train, 2000).

3) For each data set s, compute the mean estimate for each model parameter across the 10 random permutations used. Label this as MED, and then take the mean of the MED values across the data sets to obtain a mean estimate. Compute the absolute percentage bias (APB) as:

[pic].

4) For each data set s, compute the median s.e. for each model parameter across the 10 draws. Call this MSED, and then take the mean of the MSED values across the 40 data sets and label this as the asymptotic standard error.

5) Next, compute the standard deviation of the MED values across the 40 data sets to obtain the finite sample standard error for each parameter, and label this as the empirical standard error. Note that the asymptotic standard error is essentially an approximation to this empirical standard error, and the consistency of the estimator for the asymptotic standard error implies that the asymptotic and empirical standard error estimates should be close to one another.

6) Next, for each data set s, compute the approximation standard deviation for each parameter as the standard deviation in the estimated parameter values across the 10 independent permutations (about the MED value). Call this standard deviation as APPMED. For each parameter, take the mean of APPMED across the different data sets. Label this as the approximation standard error for each parameter.

7) For each parameter, compute an approximation adjusted asymptotic standard error as follows:[pic]. Similarly, compute an approximation adjusted empirical standard error as follows: [pic].

The second objective of examining the implications of ignoring skewness when actually present is achieved by generating data exactly as discussed above. Once generated, we estimate a simple normally-mixed MNP model on the data, assuming (incorrectly) that [pic]= 0 (using ten random permutation per individual in the computation of the MVNCD function, exactly as earlier). We will refer to this model as the MNP-normal (or MNP-N) model. We compare this MNP-N model with the skew normally-mixed (or MNP-SN) model. For this comparison, we ignore approximation error issues and undertake a single MNP-N estimation on each of the 40 datasets generated. We then randomly pick one of the MNP-SN model estimates for each of the 40 datasets (as already estimated earlier), and use that to compare with the MNP-N model. The performances of the two models are evaluated by (1) comparing the mean APB values across parameters and (2) undertaking a likelihood ratio test (LRT) for each of the 40 datasets. For the mean APB computation, the APB in the skewness parameters is not included in the MNP-SN model because the MNP-N implicitly assumes that [pic]= 0 (this allows an “apples to apples” comparison between the MNP-N and MNP-SN models). For the likelihood ratio test, we compare the test statistic for each data set with the table chi-squared distribution value with three degrees of freedom (corresponding to each of the three skew parameters in the [pic] vector being zero). The number of times out of the 40 data sets that the MNP-SN model rejects the MNP-N model is then obtained, along with the mean value of the LRT statistic across the 40 data sets.

4.2. Simulation Results

4.2.1 Ability of MACML to Recover Model Parameters

The results for the first objective of evaluating the ability to recover model parameters are summarized in the Table 1 for the three alternative case with three exogenous variables, and in Table 2 for the five alternative case with five exogenous variables.

4.2.1.1 The Three Alternative Case with Three Exogenous Variables

The results in Table 1 for the three alternative case indicate that the MACML method does reasonably well in recovering the true parameters. The absolute percentage bias (APB) ranges from 7.1% to 11.2% across the parameters, with a mean value of 9.2% (see the last row of the table under the “absolute percentage bias” column). The APB values are generally somewhat smaller and more stable (across parameters) for the location parameters of the distributions of the [pic] parameter vector (i.e., the b parameter estimates in the table) than for the skew parameter estimates (i.e., the [pic] values) or the scale parameter estimates of the distribution of the [pic] parameter vector (i.e., the [pic] parameters in the table). This is not surprising, because the b parameters enter more linearly in the likelihood function of Equation (18) (through the mean of the MVNCD function) than do the skew and scale parameters (that enter more non-linearly and in a complex manner through the covariance matrix of the MVNCD function). One can also observe that all the parameters associated with the third variable are recovered better than the first two variables, perhaps because of the higher standard deviation of this coefficient (=1.25) relative to the other two coefficients. When there is higher variation in a coefficient, it provides more information in the data to pin down the moments of its distribution.

The asymptotic and empirical standard error values (reflecting sampling standard error) are quite close to one another, reflecting the consistency of the MACML estimator of the asymptotic covariance matrix. These sampling standard error estimates of the parameters indicate good efficiency of the MACML estimator, with the standard errors being between 8%-15% of the mean values of the estimator. Also, the approximation standard error estimates are smaller than the sampling standard errors. On average, the approximation standard error is about 60% of the corresponding asymptotic and empirical standard error values. On the other hand, in a similar simulation setting, the approximation standard error of the MACML estimator with just one permutation per individual (as opposed to ten used here) was found to be only of the order of 13% of the sampling standard errors when the MACML approach was applied to a strictly normally-distributed coefficients model (see Bhat and Sidharthan, 2011). Clearly, even though the skew-normally distributed coefficients can be viewed as originating from an augmented and truncated multivariate normal distribution, and the cumulative distribution function of the skew-normal distribution may be written as that of a normal distribution function with an added dimension, the introduction of asymmetry does appear to introduce more approximation error in the MACML approach. This is an issue that needs further examination in the future. Nonetheless, this should not detract from the fact that the MACML estimator still does very well. In fact, the final column provides the approximation-adjusted asymptotic and empirical standard errors for the MACML estimator, which are only 13-25% higher than the corresponding unadjusted standard errors. Also, the approximation-adjusted standard errors are still only 10-17% of the corresponding mean values of the estimators, indicating that the approximation standard errors introduced by the MACML approach are small in the larger inference context.

4.2.1.2. Five Alternative Case with Five Variables

The results for the five alternative case with five variables are summarized in Table 2. The APB is of the same order as that in the case with three skew-normal coefficients, and ranges from 3% to 18.5% with a mean of 9.4%. As in the previous section, the APB values are smaller and more stable for the b parameter estimates than for the [pic] and [pic] parameter estimates. Further, there is a clear increase in the APB values for the [pic] and [pic] parameter estimates compared to the case with three coefficients. However, the APB for the parameters characterizing the normally distributed coefficients (see the c and the [pic] parameters in the fourth and fifth row panels of Table 2, respectively) are estimated very well, with the APBs ranging from 3-6.5% (mean of the APBs for these parameters is 4.5%, which is less than half of the overall mean APB of 9.4%).

The sampling (asymptotic and empirical) standard error values of the parameters continue to indicate good efficiency of the MACML estimator, with the sampling standard errors ranging between 5%-14% of the mean values of the estimator. Also, the approximation standard error estimates continue to be smaller than the sampling standard error estimates. On average, the approximation standard errors are about 45% of the corresponding asymptotic standard error estimates and 40% of the corresponding empirical standard error values, which is even better than the three-dimensional case. While the approximation errors are close to the sampling standard errors for the skewness elements [pic], this is because the standard errors are extremely small for these elements in the first place. At the end, the approximation-adjusted asymptotic and empirical standard errors are only 5-16% of the mean values of the estimator, which is about the same range as the unadjusted standard errors as a percentage of the mean values.

To summarize, the MACML inference approach does very well in recovering the parameters in a skew-normally mixed MNP model (with or without normally mixed coefficients). However, there is also evidence that there is some kind of a relative degradation of performance when skew-normally distributed coefficients are introduced (relative to the case when there are only normally-distributed coefficients, in which case the MACML approach does extremely well). Some of this degradation is surely attributable to the more difficult asymmetric shapes that need to be characterized with skew-normal distributions. More explorations are needed to examine such behavior. However, despite the relative degradation, the MACML model is able to recover all parameters well, with the approximation errors being quite inconsequential in the larger sampling inference context.

4.2.2 Effects of Ignoring Skewness in the Coefficient Distribution

This section focuses on the implications of ignoring skewness when actually present. The results are presented in Table 3 for both the three dimensional case (three alternatives-three variable case) and the five dimensional case (five alternatives-five variable case). The results clearly show the poor performance of the MNP-N model (which assumes away any skewness) relative to the MNP-SN model (which explicitly accommodates skewness). The mean APB value across the location parameters is of the order of 60% in the MNP-N model compared to the corresponding mean APB value of 6-8% from the MNP-SN model. The scale parameters also have a larger mean APB in the MNP-N model compared to the MNP-SN model. Overall, the use of a normal distribution when there is skew in the random parameters can lead to seriously mis-estimated distributions for the random parameters. This, in turn, will then lead to mis-estimated willingness to pay and welfare measures. An interesting observation from the five-dimensional analysis, though, is that if there are truly normally distributed coefficients in the model, these do not appear to be substantially affected by mis-specifications on the other coefficients (as can be noticed from the similar mean APB values for the mean elements of the [pic] vector and the covariance elements of the [pic] vector).

The log-likelihood values at convergence from the MNP-SN model is always better than from the MNP-N model in all the 40 generated data sets. The mean value of the log-likelihood ratio statistic across all the 40 data sets for each of the three-dimensional and five-dimensional cases is provided in Table 3. Also, for each and every data set, the log-likelihood ratio statistic is higher than the corresponding chi-squared table value (see the last row of Table 3).

Overall, the results clearly highlight the bias in characterizing the distribution of random coefficients if skewness effects in the coefficients are ignored when actually present.

5. CONCLUSION

In the current paper, we propose the use of the multivariate skew-normal distribution function to accommodate non-normal mixing in MNP models. The multivariate skew normal (MSN) distribution retains several attractive properties of the multivariate normal distribution. It is tractable, parsimonious in parameters that regulate the distribution and its skewness, and includes the normal distribution as a special interior point case. It also is a very flexible unimodal density structure that allows a “seamless” and “continuous” variation from normality to non-normality, and can replicate a variety of smooth unimodal density shapes. At the same time, we propose the use of an MNP kernel because the combination of skew-normal mixing over the MNP kernel lends itself perfectly to estimation using the maximum approximate composite marginal likelihood (MACML) approach. This is because of two properties of the skew distribution. The first is that it is closed under any affine transformation of the skew-normally distributed vector, and the second is that it is closed under the sum of a skew-normally distributed vector and a normally distributed vector of the same dimensions. As importantly, the cumulative distribution function of the D-variate skew normally distributed variable requires only the evaluation of a [pic]-dimensional multivariate cumulative normal distribution function. All of these properties are gainfully exploited in the paper to formulate an MNP model with non-normal mixing, while also being able to estimate the model in a simple and computationally efficient MACML approach. To our knowledge, this is the first paper to propose and formulate a skew-normally mixed MNP model.

A simulation exercise is undertaken to evaluate the ability of the proposed approach to recover parameters in the skew-normally mixed MNP model. Two cases are considered: (1) a three alternative case with three exogenous variables and (2) a five alternative case with five exogenous variables. The first case considers a three-variate skew normal distribution for the coefficients on the three exogenous variables, while the second case considers a three-variate skew normal distribution for three variables and a bivariate normal for two variables. The results show that our proposed approach does very well in recovering the parameters in a skew-normally mixed MNP model. In addition, the simulation results clearly highlight the bias in characterizing the distribution of random coefficients as well as the poor data fit if skewness, when actually present, is ignored away. Ongoing efforts are focused on additional simulation experiments to examine the effectiveness of the approach in settings with spatial dependencies and social dependencies across decision units, and combinations of temporal, spatial, and social dependencies.

ACKNOWLEDGEMENTS

The authors would like to acknowledge support from the Sustainable Cities Doctoral Research Initiative at the Center for Sustainable Development at The University of Texas at Austin. The authors are grateful to Lisa Macias for her help in formatting this document, and to an anonymous referee and Fred Mannering for helpful comments on an earlier version of this document.

REFERENCES

Aigner, D.J., Lovell, C.A.K., Schmidt, P., 1977. Formulation and estimation of stochastic frontier production function model. Journal of Econometrics 6(1), 21-37.

Albert, J.H., Chib, S., 1993. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 88(422), 669-679.

Amador, F.J., Gonzales, R., Ortuzar, J., 2005. Preference heterogeneity and willingness to pay for travel time savings. Transportation 32(6), 627-647.

Anselin, L., 1988. Spatial econometrics: Methods and models. Kluwer Academic Publishers, Dordrecht, The Netherlands.

Arellano-Valle, R.B., Azzalini, A., 2006. On the unification of families of skew-normal distributions. Scandinavian Journal of Statistics 33(3), 561-574.

Arellano-Valle, R.B., Azzalini, A. 2008. The centred parametrization for the multivariate skew-normal distribution. Journal of Multivariate Analysis 99(7), 1362-1382.

Arellano-Valle, R.B., Genton, M.G.. 2005. On fundamental skew distributions. Journal of Multivariate Analysis 96(1), 93-116.

Azzalini, A., 1985. A class of distributions which includes the normal ones. Scandinavian Journal of Statistics 12(2), 171-178.

Azzalini, A., 2011. Selection models under generalized symmetry settings. Annals of the Institute of Statistical Mathematics, forthcoming.

Azzalini, A., Capitanio, A., 1999. Statistical applications of the multivariate skew normal distribution. Journal of the Royal Statistical Society: Series B 61(3) 579-602.

Azzalini, A., Dalla Valle, A., 1996. The multivariate skew-normal distribution. Biometrika 83(4), 715-726.

Balcombe, K., Chalak, A., Fraser, I.M., 2009. Model selection for the mixed logit with bayesian estimation. Journal of Environmental Economics and Management 57(2), 226–237.

Bartels, R., Fiebig, D.G., van Soest, A., 2006. Consumers and experts: an econometric analysis of the demand for water heaters. Empirical Economics 31(2), 369-391.

Bastin, F., Cirillo, C., Toint, P.L., 2010. Estimating non-parametric random utility models, with an application to the value of time in heterogeneous populations. Transportation Science 44(4) 537-549.

Bhat, C.R., 1995. A heteroscedastic extreme-value model of intercity mode choice. Transportation Research Part B 29(6), 471-483.

Bhat, C.R., 1997a. Work travel mode choice and number of nonwork commute stops. Transportation Research Part B 31(1), 41-54.

Bhat, C.R., 1997b. An endogenous segmentation mode choice model with an application to intercity travel. Transportation Science 31(1), 34-48.

Bhat, C.R., 2003. Simulation estimation of mixed discrete choice models using randomized and scrambled Halton sequences. Transportation Research Part B 37(9), 837-855. 

Bhat, C.R., 2011. The maximum approximate composite marginal likelihood (MACML) estimation of multinomial probit-based unordered response choice models. Transportation Research Part B 45(7), 923-939.

Bhat, C.R., Guo, J.Y., 2007. A comprehensive analysis of built environment characteristics on household residential choice and auto ownership levels. Transportation Research Part B 41(5), 506-526

Bhat, C.R., Sidharthan, R., 2011. A simulation evaluation of the maximum approximate composite marginal likelihood (MACML) estimator for mixed multinomial probit models. Transportation Research Part B 45(7), 940-953.

Bhat, C.R., Srinivasan, S., 2005. A multidimensional mixed ordered-response model for analyzing weekend activity participation. Transportation Research Part B 39(3), 255-278.

Bhat, C.R., Eluru N., Copperman, R.B., 2008. Flexible model structures for discrete choice analysis. In Handbook of Transport Modelling, 2nd edition, Hensher, D.A., Button, K.J. (eds.), Ch. 5, pp. 75-104, Elsevier Science.

Bhat, C.R., Varin, C., Ferdous, N., 2010. A comparison of the maximum simulated likelihood and composite marginal likelihood estimation approaches in the context of the multivariate ordered response model. In Advances in Econometrics: Maximum Simulated Likelihood Methods and Applications, Greene W., Hill R.C. (eds.), Vol. 26, pp. 65-106, Emerald Group Publishing Limited.

Birnbaum, Z.W., 1950. Effect of linear truncation on a multinormal population. Annals of Mathematical Statistics 21(2), 272-279.

Birol, E., Karousakis, K., Koundouri, P., 2006. Using economic valuation techniques to inform water resources management: A survey and critical appraisal of available techniques and an application. Science of the Total Environment 365(1-3), 105-122.

Campbell, D., Doherty, E., Hynes, S., Van Rensburg, T., 2010. Combining discrete and continuous mixing approaches to accommodate heterogeneity in price sensitivities in environmental choice analysis. 84th Agricultural Economics Society Annual Conference, March 29-31, Edinburgh, Scotland.

Cedilnik. A., Kosmelj, K., Blejec, A., 2006. Ratio of two random variables: a note on the existence of its moments. Metodološki Zvezki - Advances in Methodology and Statistics 3(1), 1-7.

Cherchi, E., Cirillo, C., Polak, J., 2009. User benefit assessment in presence of random taste heterogeneity: comparison between parametric and nonparametric models. Transportation Research Record 2132, 78-86

Chintagunta, P.K., Jain, D.C., Vilcassim, N.J., 1991. Investigating heterogeneity in brand preferences in logit models for panel data. Journal of Marketing Research 28(4), 417-428.

Daganzo, C., 1979. Multinomial Probit: The Theory and its Application to Demand Forecasting. Academic Press, New York.

Daly, A., Hess, S., Train, K., 2011. Assuring finite moments for willingness to pay in random coefficient models. Transportation 39(1), 19-31.

Ellison, B.E., 1964. Two theorems for inferences about the normal distribution with applications in acceptance sampling. Journal of the American Statistical Association 59(305), 89-95.

Fosgerau, M., 2005. Unit income elasticity of the value of travel time savings. Presented at 8th NECTAR Conference, Las Palmas G.C., June 2-4.

Fosgerau, M., 2006. Investigating the distribution of the value of travel time savings. Transportation Research Part B 40(8), 688-707.

González-Farías, G., Domínguez-Molina, A., Gupta, A.K., 2004. Additive properties of skew normal random vectors. Journal of Statistical Planning and Inference 126(2), 521-534.

Greene, W.H., Hensher, D.A., 2003. A latent class model for discrete choice analysis: contrasts with mixed logit. Transportation Research Part B 37(8), 681-698.

Greene W.H, Hensher, D.A., Rose, J.M., 2006. Accounting for heterogeneity in the variance of the unobserved effects in mixed logit models (NW transport study data). Transportation Research Part B 40(1), 75-92

Gupta, A.K., González-Farías, G., Domínguez-Molina, A., 2004. A multivariate skew normal distribution. Journal of Multivariate Analysis 89(1), 181-190.

Hausman, J.A., Wise, D.A., 1978.  A conditional probit model for qualitative choice: discrete decisions recognizing interdependence and heterogeneous preferences. Econometrica 46(2), 403-426.

Hess, S., Bierlaire, M., Polak, J.W., 2007. A systematic comparison of continuous and discrete mixture models. European Transport 37, 35-61

Hensher, D.A., Rose, J.M., Greene, W.H., 2005. Applied Choice Analysis: A Primer. Cambridge University Press, Cambridge, U.K.

Hynes, S., Hanley, N., Scarpa, R., 2008. Effects on welfare measures of alternative means of accounting for preference heterogeneity in recreational demand models. American Journal of Agricultural Economics 90(4), 1011-1027.

Jara, A., Quintana, F., San Martín, E., 2008. Linear mixed models with skew-elliptical distributions: a Bayesian approach. Computational Statistics and Data Analysis 52(11), 5033-5045.

Kamakura, W.A., Russell, G.J., 1989. A probabilistic choice model for market segmentation and elasticity structure. Journal of Marketing Research 26(4) 379-390.

Lancaster, K., 1971. Consumer Demand: A New Approach. Columbia University Press, New York

Li, Z., Hensher, D.A., Rose, J.M., 2010. Willingness to pay for reliability in passenger transport: a review and some new empirical evidence. Transportation Research Part E 46(3), 384-403.

Luce, R., Suppes, P., 1965. Preference, utility, and subjective probability. In Handbook of Mathematical Psychology, Volume III, Luce, R., Bush, R., Galanter E. (eds.), John Wiley & Sons, New York.

McCulloch, R.E., Rossi, P.E., 2000. Bayesian analysis of the multinomial probit model. In Simulation-Based Inference in Econometrics, Mariano, R., Schuermann, T., Weeks, M.J., (eds.), 158-178, Cambridge University Press, New York.

McFadden, D., 1974. The measurement of urban travel demand. Journal of Public Economics 3(4), 303-328.

McFadden, D., 1978.  Modeling the choice of residential location. Transportation Research Record 672, 72-77.

McFadden, D., Train, K., 2000. Mixed MNL models for discrete response. Journal of Applied Econometrics 15(5), 447-470.

Meintanis, S.G., Hlávka, Z., 2010. Goodness-of-fit tests for bivariate and multivariate skew-normal distributions. Scandinavian Journal of Statistics 37(4), 701-714.

Molenaar, D., Dolan, C.V., Verhelst, N.D., 2010. Testing and modeling non-normality with the one-factor model. British Journal of Mathematical and Statistical Psychology 63(2), 293-317.

O’Hagan, A., Leonard, T., 1976. Bayes estimation subject to uncertainty about parameter constraints. Biometrika 63(1), 201-203.

Small, K.A., Winston, C., Yan, J., 2005. Uncovering the distribution of motorists’ preferences for travel time and reliability. Econometrica 73(4), 1367-1382.

Torres, C., Hanley, N., Riera, A., 2011. How wrong can you be? Implications of incorrect utility function specification for welfare measurement in choice experiments. Journal of Environmental Economics and Management 62(1), 111-121.

Train, K., 2003. Discrete Choice Methods with Simulation, 1st ed. Cambridge University Press, Cambridge.

Train, K.E., 2008. EM algorithms for nonparametric estimation of mixing distributions. Journal of Choice Modelling 1(1), 40-69

Train, K., 2009. Discrete Choice Methods with Simulation, 2nd ed. Cambridge University Press, Cambridge.

Train, K., Sonnier, G., 2005. Mixed logit with bounded distributions of correlated partworths. In Applications of Simulation Methods in Environmental and Resource Economics, Scarpa, R., Alberini, A., (eds.), Ch. 7, pp. 117-134, Springer, Dordrecht, The Netherlands.

Weinstein, M.A., 1964. The sum of values from a normal and a truncated normal distribution. Technometrics 6(1), 104-105.

Yai, T., Iwakura, S., Morichi, S., 1997. Multinomial probit with structured covariance for route choice behavior. Transportation Research Part B 31(3), 195-207.

Appendix A.1

The moments of the SSN distribution are most easily obtained from the moment generating function of Z, which is given by:

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]In the above expression, [pic]From above, the first three moments of the distribution may be written as follows with [pic]:

[pic]

[pic],

where [pic] is the Pearson index of skewness that is a measure of asymmetry. When [pic][pic]=0 as should be the case for the normal distribution. The moments for the variable [pic], which is non-standard skew-normally distributed, may be obtained as [pic], [pic] and [pic]

Appendix A.2

From Equation (5),

[pic]

Then,

[pic]

where [pic] is the standard bivariate normal density function.

[pic]

[pic]

[pic]

[pic]

Appendix A.3

The moment generating function of Z is:

[pic].

The first three moments of the distribution may subsequently be obtained from the function above in a straightforward fashion with [pic]:

[pic]

[pic],

The moments for the variable [pic], which is non-standard skew-normally distributed, may be obtained as [pic], [pic], and [pic]For future reference, we will also write the moment generating function of Y (obtained from Equation (11)) as follows:

[pic]

LIST OF FIGURES

Figure 1. Shape of the SSN density function for a number of positive values of ρ

LIST OF TABLES

Table 1. Simulation Results for the Three Alternative-Three Variable Case

Table 2. Simulation Results for the Five Alternative-Five Variable Case

Table 3. Effects of Ignoring Skewness in the Mixing Distribution (when present)

[pic]

Figure 1. Shape of the SSN density function for a number of positive values of ρ

Table 1. Simulation Results for the Three Alternative-Three Variable Case

|Parameter |True Value |Parameter Estimates |Standard Error (SE) Estimates |

| |

|b1 |

|ρ1 |

|ω1 |1.000 |1.112 |11.2% |

| |

|b1 |

|ρ1 |

|ω1 |

|c1 |

|σ1 |1.000 |1.065 |

| |Skew Normal |Normal |Skew Normal |Normal |

|Mean APB |  |  |  |  |

|Location parameters of the βq vector |7.7% |58.8% |5.8% |60.4% |

|Scale parameters of the βq vector |8.8% |18.3% |15.4% |18.3% |

|Mean values of the γq vector |- |- |4.1% |3.4% |

|Covariance elements of the γq vector |- |- |5.1% |4.3% |

|Across all parameters βq and γq vector |8.3% |38.6% |7.9% |23.3% |

|Mean log- likelihood value at convergence |-2056.6 |-2095.0 |-4132.3 |-4219.7 |

|Mean value of the log-likelihood ratio statistic across datasets |76.9 |174.8 |

|Number of times the likelihood ratio test (LRT) favors the skew normal |Every Time when compared to [pic] |Every Time when compared to [pic] |

|model | | |

-----------------------

[1] The use of a cross-sectional choice situation with independence across individual decision-maker choices is simply for exposition convenience in this introduction section.

[2] There are a few exceptions to this rule, one of which is when an MNP kernel error term is mixed with normally distributed random coefficients. Assuming the usual linear-in-parameters utility functional form, the net effect is that the combination of variable-specific random terms and the kernel error term can be recast back into an MNP utility form with a single composite error term (due to the closure property of the normal distribution under affine transformations -- a linear transformation followed by a translation)[pic] |h

Kªhî=ÉaJhî=ÉaJmH

sH. That is, the marginal distribution of utility obtained by integrating out the normal mixing distribution puts the utility back into a normal distribution form. In fact, this was the genesis of Hausman and Wise’s MNP model formulation, in which the “composite” error terms of the alternatives have a covariance matrix that is parameterized based on the mixing structure. However, this kind of affine closure is not achieved with GEV or HEV kernel models. Further, closure is also not generally achieved with a non-normal mixing distribution with the MNP “kernel”, except in a special case which is exploited in this paper.

[3] Note here that discrete distributions may also be used for the mixing. If the mixing vector is assumed to take M possible value states with state-specific probabilities, this leads to the familiar latent class model used in marketing (see Kamakura and Russell, 1989, Chintagunta et al., 1991) and transportation (see Bhat, 1997b, Greene and Hensher, 2003, Hess et al., 2007, and Train, 2008). On the other hand, if a discrete distribution is considered separately for each individual random coefficient, this is essentially a non-parametric distribution (see Bastin et al., 2010, Cherchi et al., 2009, Fosgerau, 2006). However, the use of a continuous distribution dominates the literature, at least in part because it offers efficiency in the number of mixing distribution parameters to be estimated. Further several studies that have compared discrete distribution methods with continuous distributions have not found a clear pattern of which of the two approaches is superior (see, for instance, Greene and Hensher, 2003, Birol et al., 2006, and Hynes et al., 2008). Some recent studies have also considered a combination of discrete and continuous distributions for the mixture in the form of a mixture of normal distributions (see Campbell et al., 2010), though such mixtures of normal distributions have some of the same problems as the simple normal distribution (as discussed subsequently).

[4] However, it should be noted that the skew normal distribution has appeared implicitly in the context of such models as the stochastic frontier model (see Aigner et al., 1977) and in other studies involving the study of truncated normal variables (for example, Birnbaum, 1950 and Weinstein, 1964). This is because one of the stochastic representations of a skew-normally distributed variable happens to be as the convolution of a normal variable and a half-normal variable. However, the explicit use of the skew-normal as a distributional assumption for one or more random terms, as in the current paper, has seen little consideration in the econometric field.

[5] The Cholesky matrix of [pic] is [pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download