Exercise 1: Understanding Regression Output



Instructions: read this document and complete the exercises. You are NOT to work together, except perhaps if you have non-econometric troubles with Excel. You can e-mail me any questions you may have. It is due Tuesday, in class.

Exercise 1: Understanding Regression Output

For this exercise, you will need to use Excel.

1. Enter the data for Exercise 3.2 on page 61(which continues in Exercise 4.6 on page 86) into Excel and run a regression on these 5 observations. When you run the regression (using the Data Analysis tool in the Tools menu), make sure to check on the residual box. This will force Excel to produce a list of the predicted values for Y and the residual values. You should get b1 = 1.0 and b2 = 1.0 (you have already done this regression “by hand”). Print out this spreadsheet (but first use column “autofit” so that all results are visible.). You will use it to answer the next few questions.

2. Examine the questions and answers in Exercise 4.6, parts e), f) and g). You have already answered these questions with a “regression by hand”. You need to learn how Excel presents these results when it runs a regression.

i) Part e) tells you to calculate the residuals [pic] . Circle these 5 values on the Excel spreadsheet.

ii) Part f) tells you to calculate [pic]. The formula for this is[pic]. Circle on the spreadsheet where Excel has calculated [pic], [pic], and [pic]and also circle the names that Excel uses for these values so that you can associate the terminology with the formulas.

iii) Part g) tells you to calculate [pic]. Excel does not printout this estimated variance for you, but it does printout its square root se(b2). Circle it on your Excel spreadsheet. What does Excel call it?

3. Go back to the raw data on X and Y in your spreadsheet. Calculate the sample correlation between the X and Y data. Sample correlation is simply the ratio of sample covariance to the product of the standard deviation on X and standard deviation on Y.

See slides 2.30 and 2.39 from Chapter 2. This is:

[pic]

It is very simple to calculate this value in Excel. Assuming your data on X appear in cells A2:B6 and your data on Y appear in cells B2:B6, the formula is: =CORREL(A2:A6,B2:B6).

Calculate this value and then find it on the Excel printout of regression results. Also square this correlation (r2xy) and find it on the Excel printout. Also, identify the names that Excel uses for these values.

Hand in the Excel regression output with your answers marked on it.

Exercise 2: Understanding Regression Output Part 2

Use your regression output from Exercise 3.17 in which you estimated the regression

[pic]

where Ht is housing starts for single family homes measured in thousands of units (houses) and Mt was the 30-year mortgage rate in %. Note: if you did not do this assignment or think that your data were not correct, I have posted the data on the web site in an Excel spreadsheet that you can download and use to run the regression. You should get b1 = 1571.17 and b2 = -48.09

Print out your Excel output from running the regression using Regression in the Data Analysis Tools. However, I suggest that you do not check the residual box when running the regression, since there were 381 observations and Excel will produce a column of 381 residuals and predicted values. This would be too much to printout. On the regression output, identify the following values:

[pic]

[pic]

[pic]

[pic]

rxy and r2xy

Hand in your Excel print out of the regression results with these values identified. Note that you are NOT calculating these quantities; you are merely identifying them on the printout. Your work in exercise 1 will help you identify the values.

Exercise 3: The T-distribution.

You should all have studied and used the t-distribution in your previous statistics course. The actual theory behind the t-distribution is fairly advanced. Your textbook in Chapter 5 presents some of this material; it discusses how a t random variable is the ratio of a standard normal random variable and a Chi-squared random variable. For our purposes, we don’t need to go into these details. Instead, I want you to understand what has a t-distribution and how to use the t-table (it works quite differently from the Z-table).

First, remember what a standard normal random variable is: Take a normally distributed random variable, minus out its mean and divide by its standard deviation, this gives you a standard normal random variable that has a mean of 0 and a standard deviation of 1.0.

[pic] so that [pic]

However, if ( is unknown and must be estimated using the sample standard deviation, s, where [pic], then we are adding an additional source of uncertainty to the new standardized value. The sample standard deviation is also a random variable. Standardizing a normal random variable by dividing by the sample standard deviation results in a new random variable that has t-distribution. This distribution is pictured on page 94 of your textbook. It is bell-shaped like the standard normal, but it has “wider” tails to allow for the additional uncertainty that is added when dividing through by the sample standard deviation.

Similar to the normal distribution, we use a table to get probabilities. However, the T-table works very differently. It appears on the inside of the front cover of your text book.

Each t value has some measure of degrees of freedom. For our simple regression models it will be T-2 where T is the sample size. Go to the table and look up DF (“degrees of freedom”) of 1 (it could be any value, I chose 10 just for demonstration). You will find this in the first column of the table. Go down to the value 10. In this row, you are given t-values (not probabilities). The probabilities appear at the top of each column: there are only 5 of them: 0.10, 0.05, 0.025, 0.01 and 0.005. In the row for DF = 10 you find

|DF |( = 0.10 |( = 0.05 |( = 0.025 |( = 0.01 |( = 0.005 |

|… |… |… |… |… |… |

|10 |1.372 |1.812 |2.228 |2.764 |3.169 |

How to interpret these values: The probability that the random variable t is greater than or equal to 1.372 is 0.10 (10%): P(t ( 1.372) = 0.10.

The top of the table tells us that the table contains “Right-tail critical values for the t-distribution”. Because the t random variable is symmetric, we also know that

P(t ( -1.372) = 0.10.

What happens if we want to calculate the probability that the t random variable with 10 degrees of freedom takes on any value, such as P(t ( 2.00)? Well, this table will only allow us to approximate it:

The table tells us that P(t ( 1.812) = 0.05 and P(t ( 2.228) = 0.025 (verify this by examining the table).

Therefore, we can say that the probability lies between 2.5% and 5%. This is: 0.025 ( P(t ( 2.00) ( 0.05.

Alternatively, we could use Excel: The formula in Excel is =TDIST( ) in the brackets you need to enter 3 values: the first is the value for t. In our example, this would be 2.00. The second value is the degrees of freedom, and the third value is either a 1 or a 2, for one or two tails. To answer the probability: P(t ( 2.00) with 10 degrees of freedom, we would enter the formula: =TDIST(2.00,10,1). Hit return and Excel returns the value 0.036694, which lies between 0.025 and 0.05 as we indicated above. If you wanted to find the probability that t is greater than 2.00 OR less than –2.00, this would be the sum of the two tails. In excel, the formula would be =TDIST(2.00,10,2) and Excel would return the value 0.073388. (Verify these results using Excel before proceeding).

Answer these questions:

a) Approximate P(t (. 2.00) for 38 degrees of freedom using the table

b) Find P(t (. 2.00) for 38 degrees of freedom using the Excel

c) Find tc where P(t ( tc) = 0.05 for 38 degrees of freedom using the T-table

d) Find tc where P(t ( tc) or P(t ( tc) = 0.05 for 38 degrees of freedom. For this one, the following diagram will help:

-----------------------

1.372

this area under the curve to the right of 1.372 is equal to 0.10

t values

0

0

tc

-tc

you want the sum of the two tail areas to be 0.05, so you need to know what tc value leaves 0.025 in each tail?

0.025

0.025

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download