Nathaniel E. Helwig models with large samples & Ping Ma selection in ...

[Pages:30]This article was downloaded by: [University of Wisconsin - Madison] On: 22 October 2014, At: 09:56 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Computational and Graphical Statistics

Publication details, including instructions for authors and subscription information:

Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples

Nathaniel E. Helwiga & Ping Maa a Department of Statistics, University of Illinois Accepted author version posted online: 13 Jun 2014.

To cite this article: Nathaniel E. Helwig & Ping Ma (2014): Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples, Journal of Computational and Graphical Statistics, DOI: 10.1080/10618600.2014.926819

To link to this article:

Disclaimer: This is a version of an unedited manuscript that has been accepted for publication. As a service to authors and researchers we are providing this version of the accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proof will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to this version also.

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the "Content") contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// page/terms-and-conditions

Downloaded by [University of Wisconsin - Madison] at 09:56 22 October 2014

ACCEPTED MANUSCRIPT

Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance

models with large samples

Nathaniel E. Helwig Department of Statistics, University of Illinois

and Ping Ma Department of Statistics, University of Illinois

Abstract The current parameterization and algorithm used to fit a smoothing spline analysis of variance (SSANOVA) model are computationally expensive, making a generalized additive model (GAM) the preferred method for multivariate smoothing. In this paper, we propose an efficient reparameterization of the smoothing parameters in SSANOVA models, and a scalable algorithm for estimating multiple smoothing parameters in SSANOVAs. To validate our approach, we present two simulation studies comparing our reparameterization and algorithm to implementations of SSANOVAs and GAMs that are currently available in R. Our simulation results demonstrate that (a) our scalable SSANOVA algorithm outperforms the currently used SSANOVA algorithm, and (b) SSANOVAs can be a fast and reliable alternative to GAMs. We also provide an example with oceanographic data that demonstrates the practical advantage of our SSANOVA framework. Supplementary materials that are available online can be used to replicate the analyses in this paper.

Keywords: Algorithms, Multivariate analysis, Nonparametric methods, Smoothing and nonparametric regression, Smoothing splines

1 Introduction

A typical nonparametric regression model can be written as yi = (xi) + ei for i {1, . . . , n} where yi R is the response variable for the i-th observation, xi (xi1, . . . , xip) is the p ? 1 predictor vector for the i-th observation, is an unknown smooth function relating the response and predictor

A shortened version of this paper was selected as a winner of the 2013 Student Paper Competition sponsored by the Statistical Computing and Statistical Graphics Sections of the American Statistical Association.

This work was funded by NSF grants DMS-1055815, DMS-0800631, and DMS-1228288.

ACCEPTED MANUSCRIPT

1

ACCEPTED MANUSCRIPT

variables, and ei N(0, 2) is independent, normally distributed measurement error. Given a set

of values (xi, yi) for i {1, . . . , n}, a popular approach for estimating is the minimization of a

penalized least-squares functional of the form

n

(1/n) (yi - (xi))2 + J()

(1)

i=1

where J is a quadratic penalty functional that quantifies the roughness of , and (0, ) is a

global smoothing parameter that controls the trade-off between the goodness-of-fit of the data and

the smoothness of .

Downloaded by [University of Wisconsin - Madison] at 09:56 22 October 2014

Two possible approaches for estimating the that minimizes Equation (1) are smoothing spline analysis of variance models (SSANOVAs; see Gu, 2013b; Wahba, 1990) and generalized additive models (GAMs; see Wood, 2004, 2006); note that is subscripted because the optimal function depends on the chosen smoothing parameters. In general, SSANOVAs benefit from a solid theoretical foundation, but are computationally expensive to fit. In contrast, GAMs are fast to compute, but are not built upon the same solid theoretical framework as SSANOVAs. Due to their fast performance, GAMs are typically preferred over SSANOVAs in practice, despite the theoretical advantages of the SSANOVA framework.

Kim and Gu (2004) give an SSANOVA approximation using q n selected knots, and their algorithm requires O(nq2) flops to estimate for each choice of smoothing parameters. In contrast, when using Wood's (2004) approach to fit a GAM with q selected knots, the algorithm requires O(nq2) flops for the initialization, followed by O(q3) flops for each choice of smoothing parameters. Furthermore, using the conventional SSANOVA parameterization, there are often more smoothing parameters than predictors, whereas a GAM uses one smoothing parameter for each predictor. As a result, fitting an SSANOVA (using Kim & Gu's, 2004, algorithm) typically takes substantially longer than fitting the corresponding GAM (using Wood's, 2004, algorithm), and the timing difference increases with n.

In this paper, we propose an efficient reparameterization of the smoothing parameters in SSANOVA models, such that there is one smoothing parameter for each predictor. We also propose a new al-

ACCEPTED MANUSCRIPT

2

ACCEPTED MANUSCRIPT

gorithm for fitting an SSANOVA using q selected knots. This new algorithm only requires O(nq2) flops for the initialization; then, after the initialization, the estimation of only requires O(q3) flops for each choice of smoothing parameters. In the following sections, we present our proposed SSANOVA reparameterization (Section 2), describe a new algorithm for estimating multiple smoothing parameters in SSANVOA models (Section 3), present two simulation studies comparing our approach to other approaches (Section 4), and demonstrate the benefits of our approach using an example (Section 5).

Downloaded by [University of Wisconsin - Madison] at 09:56 22 October 2014

2 An Efficient Reparameterization

2.1 Conventional SSANOVA Parameterization

The minimizing Equation (1) can be estimated via a tensor product reproducing kernel Hilbert

space

(RKHS)

of

functions

H

p j=1

HX

j

on

the

domain

X

p j=1

Xj,

where

HX j

is

a

RKHS

of

functions on X j, which denotes the domain of the j-th predictor. In most cases, HXj can be decom-

posed into three orthogonal subspaces, such as HXj = H?j Hn~ j Hcj, where Hnj H?j Hn~ j

is the null space of the j-th predictor (i.e., Hnj { : J() = 0}) with H?j denoting a space of

constant functions (i.e., H?j { : 1}), and Hcj is the contrast space of the j-th predictor (i.e.,

Hcj { : 0 < J() < }). The reproducing kernel (RK) of H has the form X =

p j=1

Xj ,

where

Xj is the RK of HXj, which has the form Xj ?j + n~ j + cj with ?j, n~ j, and cj denoting the

RKs of H?j, Hn~ j, and Hcj, respectively (see Gu, 2013b; Wahba, 1990).

The marginal space decompositions HXj = H?j Hn~ j Hcj imply that the tensor product space H can be decomposed into the summation of 3p orthogonal subspaces. Typically, the tensor prod-

uct space is written as H = Hn Hc, where Hn and Hc are the tensor product null and contrast

spaces, respectively. Using the conventional SSANOVA parameterization, the contrast space of

H is written as Hc = ks=1Hk, where the Hk are orthogonal subspaces with corresponding inner

products , k. Then, the inner product of Hc is defined as

s k=1

k-1

,

k

,

where

=

(1, . . . , s)

are local smoothing parameters with k (0, ). Using this definition of the inner product for Hc,

it can be shown that the RK of Hc is given by c =

s k=1

k k ,

where

k

denotes

the

RK

of

Hk.

ACCEPTED MANUSCRIPT

3

ACCEPTED MANUSCRIPT

Downloaded by [University of Wisconsin - Madison] at 09:56 22 October 2014

For example, with p = 2 predictors using cubic marginals, H = Hn Hc where

Hn H? Hn~1 Hn~2 (Hn~1 Hn~2 )

(2)

Hc Hc1 Hc2 (Hc1 Hn~2 ) (Hn~1 Hc2 ) (Hc1 Hc2 )

are the null and contrast spaces of H. The RK has the form X = n + c, where

n 1 + n~1 + n~2 + n~1 n~2

(3)

c 1c1 + 2c2 + 3c1 n~2 + 4n~1 c2 + 5c1 c2

are the RKs of Hn and Hc, respectively, and = (1, . . . , 5) are the additional smoothing parameters. Note that there are s = 5 unique k parameters for only p = 2 predictors; this allows for flexible smoothing, but creates a difficult multivariate optimization problem.

2.2 Proposed SSANOVA Reparameterization

To improve the efficiency of the SSANOVA framework, we define the marginal RKs as

Xj = nj + jcj

(4)

where nj = ?j +n~ j is the RK of the j-th predictor's null space, cj is the RK of the j-th predictor's contrast space, and j (0, ) is the j-th predictor's smoothing parameter. Note that this definition of the marginal RKs corresponds to the marginal inner products , Xj = , nj + -j 1 , cj, which implies that j rescales the marginal penalty of the j-th predictor. Also, note that this reparameterization imposes a structure on the smoothing parameters in the tensor product RKHS, which can result in substantial computational savings.

For example, with p = 2 predictors, the efficient SSANOVA reparameterization uses

c = 1c1 n2 + 2n1 c2 + 12c1 c2

(5)

as the contrast space RK. In contrast to the conventional parameterization (see Equation (3)), the reparameterized RK in Equation (5) only uses two unique smoothing parameters. Thus, the repa-

ACCEPTED MANUSCRIPT

4

Downloaded by [University of Wisconsin - Madison] at 09:56 22 October 2014

ACCEPTED MANUSCRIPT

rameterization requires a bivariate optimization, whereas the traditional parameterization involves a five-variable optimization problem. The efficiency of the reparameterization is even more pronounced in higher dimensions: with p = 3 predictors, we have a trivariate optimization problem using the reparameterization, whereas the traditional parameterization would require the optimization of a function with respect to 19 k parameters.

2.3 Properties of Proposed Reparameterization

First, note that for p = 1 predictor, the efficient reparameterization is identical to the conventional parameterization because there is only one (or ) parameter, which can be absorbed into the c coefficients. Second, note that for p > 1 predictors, the efficient reparameterization is identical to the conventional parameterization whenever the SSANOVA model contains strictly additive effects of the predictors; note that this differs from other SSANOVA reparameterizations (e.g., COSSO; Lin & Zhang, 2006). So, the efficient and conventional parameterizations only differ with respect to their treatment of two-way (and higher-way) interactions between predictors. When one or more interaction effects are present, the efficient reparameterization will require fewer unique smoothing parameters than the conventional parameterization.

To better understand the constraints of the efficient reparameterization, it is helpful to consider the case of p = 2 predictors. Comparing the efficient reparameterization in Equation (5) to the conventional parameterization in Equation (3), note that the reparameterization assumes that 1 = 3, 2 = 4, and 5 = 12 when defining the contrast space RK. Although seemingly limited, this reparameterization is actually rather flexible. For example, setting 1, 2 1 allows the nonparametric interaction space (Hc1 Hc2) to dominate the contrast RK; in contrast, setting 1, 2 1 allows the nonparametric main effect spaces (Hc1 and Hc2) and parametric-nonparametric interaction effect spaces (Hc1 Hn~2 and Hn~1 Hc2) to dominate the contrast RK. The relative influence of each predictor on the contrast RK can be controlled by adjusting the ratio 1/2; for example, setting 1 > 2 allows the first predictor more influence on the contrast RK (assuming the same spline type for each predictor).

ACCEPTED MANUSCRIPT

5

ACCEPTED MANUSCRIPT

The main limitation of the efficient reparameterization is its lack of flexibility (compared to the conventional parameterization) when the SSANOVA model is misspecified. Continuing with the case of p = 2 predictors, suppose that the true data-generating model is of the form y^ = 1(x1) + 2(x2), i.e., there is no interaction effect. In theory, using the conventional SSANOVA parameterization it would be possible to obtain the correct (additive) model by estimating ^k = 0 for k {3, 4, 5}. In contrast, using the efficient reparameterization the solution would contain a small bias, given that the parametric-nonparametric interaction effect spaces cannot be fully separated from the nonparametric main effect spaces. However, with large n and j parameters selected using the GCV score (see Section 3), we have found that the bias introduced by the efficient reparameterization is negligible in most cases (see Section 4.2 & Helwig, 2013).

Downloaded by [University of Wisconsin - Madison] at 09:56 22 October 2014

3 Fast and Stable SSANOVA Algorithm

Overview. The scalable algorithm proposed in this section is unrelated to the efficient reparameterization given in the previous section. Throughout this section, we write the algorithm in terms of the conventional parameterization with k parameters. However, remembering that the efficient reparameterization imposes a certain structure on the k parameters, this scalable algorithm could be easily applied to the efficient reparameterization using the j parameters.

3.1 SSANOVA Computation

Given a set of smoothing parameters = (/1, . . . , /k) and a set of selected knots {xt}qt=1, it is

well-known that the minimizing Equation (1) can be approximated as (x) =

u v=1

dvv(x)

+

q t=1

ct

c(x,

x t

),

where

{v}uv=1

are

basis

functions

spanning

Hn,

c

denotes

the

RK

of

Hc,

and

d = {dv}u?1 and c = {ct}q?1 are the unknown function coefficient vectors (see Kim & Gu, 2004; Gu

& Wahba, 1991). Using this representation, Equation (1) can be approximated as

(1/n) y - Kd - Jc 2 + c Qc

(6)

where 2 denotes the squared Frobenius norm, y {yi}n?1, K {v(xi)}n?u for i {1, . . . , n} and

v {1, . . . , u}, J =

s k=1

k

Jk

where

Jk

{k(xi, xt)}n?q

for

i

{1, . . . , n}

and

t

{1, . . . , q},

and

ACCEPTED MANUSCRIPT

6

Downloaded by [University of Wisconsin - Madison] at 09:56 22 October 2014

ACCEPTED MANUSCRIPT

Q =

s k=1

kQk

where

Qk

{k (x t ,

x w )}q?q

for

t,

w

{1,

.

.

.

,

q}.

Given a choice of , the optimal function coefficients are given by

dc^^ = KJKK

J

K J J + nQ

KJ y

(7)

where () denotes the Moore-Penrose pseudoinverse of the input matrix. The fitted values are given by y^ = Kd^ + Jc^ = Sy, where

S = K

J

KJ

K K

J

K J J + nQ

KJ

(8)

is the smoothing matrix, which depends on the chosen smoothing parameters in . A popular criterion for estimating smoothing parameters in penalized regression models is the generalized cross-validation (GCV) score of Craven and Wahba (1979), which is given by

GCV() = {n (In - S)y 2}/{[n - tr(S)]2}.

(9)

The estimates ^ and ^ that minimize the GCV score have desirable properties (see Craven & Wahba, 1979; Gu, 2013b; Gu & Wahba, 1991; Li, 1986), so our algorithm focuses on a fast GCV score evaluation for given smoothing parameters.

Kim and Gu's (2004) algorithm seeks to find the smoothing parameters that minimize the GCV score in Equation (9). For each choice of smoothing parameters, Kim and Gu's algorithm forms the inner (cross-product) portion of S, and uses a pivoted Cholesky decomposition to find the inverse (); given the needed inverse calculation, the fitted values can be easily obtained, so the GCV score can be easily evaluated. However, obtaining the inner (cross-product) portion of the smoothing matrix requires O(nq2) flops, which can be quite costly for large n. So, because Kim and Gu's algorithm requires iterative work that depends on the (possibly quite large) sample size n, the algorithm is not scalable for large samples.

ACCEPTED MANUSCRIPT

7

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download