Department of Statistics, Yale University



Department of Statistics, Yale University

STAT242b Theory of Statistics

Suggested Solutions to Homework 9

Compiled by Marco Pistagnesi

Problem 1

Our goal is to find [pic] that minimize the error [pic].

[pic]

To minimize, given that the second derivative is always positive (check), we take a ( derivative and set it to zero:

[pic]

Thus we see that [pic] is the least squares estimate of μ.

Problem 2

We take from class that the variance of the B is:

[pic]

We want to pick the xi to minimize the term in (1). We can do so by maximizing [pic]. This is precisely maximizing the variance of the xi terms, which are placed in the interval [-1,1]. To maximize the variance of numbers placed within an interval, the mean should be put in the center of the interval (thus [pic]) and the various xi items should be placed on either side of the mean at the ends of the interval in equal quantities on either endpoint of the closed interval.

In other words, we want to place half of the xi at xi = 1 and the other half at xi = –1.

Problem 3

a) We use the prediction method (seen in class) taking the midterm grade to be the predictor:

[pic]

b) Again, we use the prediction method, this time taking the final grade to be the predictor:

[pic]

Problem 4

We want to minimize

[pic]

with respect to (, and thus take a (-derivative and set equal to zero (again check convexity!):

[pic]

For the variance, we determine it as follows:

[pic] (1)

where in (i) we used the fact that the x’s are fixed (non random) and the fact that the variance of the sum of independent r.v.’s is the sum of the variances, and in (ii) we used again that the x’s are fixed and that the variance is the same for all the errors. Note that at the end we can write the variance in 2 different ways.

Now for the consistency part, there has been confusion, and so let us spend some time on that.

Our hypotheses for the model are that the (i are independent (but not necessarily identically distributed!) with mean zero and variance (2 . This allows us to derive (1) as the variance for beta (without such hypotheses, we wouldn’t be able to state (i) and (ii)). Now for the consistency we can appeal to a result (result[1]) that says that if an estimator is unbiased and its variance goes to zero, then it is consistent. Since our beta is indeed unbiased, a sufficient condition for it to be consistent is that its variance goes to zero. In other terms, from the expression above, we require [pic]. A less general condition (by looking at the other expression that we had above for the variance) is that [pic] (that depends upon n) settles to a constant as n increases. In fact if this happens the whole variance would be driven to zero by the factor n at the denominator.

To make the point, you had to discuss one of these 2 extra conditions on the sum of the x’s. Stating just the hypotheses on the errors is crucial (to get that form of the variance), but not enough.

Couple more things to note. About the errors, normality needs not be assumed. It is never appealed to in (1). At last, some people, instead of appealing to result[1], tried to prove consistency from the definition of convergence in probability. Fine, but do it right! A neat (I think) way of doing this is by means of Chebichev inequality (recall from 241). Then we can write:

[pic]

and we note that to get the last quantity go to zero as n goes to infinity we need to impose the same restriction on the sum of the x’s as above, so that (obviously) the 2 approaches are equivalent.

Problem 5

I am not going to do the actual regressions and graphs as (almost) everybody did it right. Yet I will spend some words on what many did poorly, i.e. the comment and analysis, the most important part! The regression with the log transformation seems more accurate. Why? What are the things to look at to come to this conclusions?: R^2 (with caution), residual standard errors for the regression as a whole (gives an idea of how much is left over from our model), significance of the estimated coefficients (given by the t tests), and finally graphical inspection of the residuals: if they don’t look white but rather show some pattern the regression is poor (why? Go back to the theory).

All of these indicators are better for the transformed model. Second step, as far as the (required) comparison is concerned, is to ask why is the second model better? If you roughly look at the scatter plot of the data, it is clear that the relation between the 2 variables is not linear, but more curve-shaped. The log transformation account for an exponential component in the data (it is equivalent to y=exp(bla bla)) and hence is a more suitable model to describe a nonlinear relation in the data.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download