ECON 761: F tests and t tests with Dummy Variables

ECON 761: F tests and t tests with Dummy Variables

L. Magee

November, 2007

--------------------------------------?

The main part of this handout contains output from a Stata program with commentary. The do and log files are given at the end.

Loading, generating, and summarizing the data The data set Ftest B.raw contains 100 observations on the dependent variable y, and a "categorical" variable x. x equals 1, 2 or 3 depending on which of three groups the observation belongs to.

. infile y x using "C:\Documents and Settings\courses\761 and 762\f06\handouts\Ftest_B" (100 observations read) . summarize y x

Variable |

Obs

Mean Std. Dev.

Min

Max

-------------+-----------------------------------------------------

y|

100 10.19334 5.452461 -3.00243 23.07246

x|

100

1.65 .6571287

1

3

The variable x should not be used directly in the regressions. That would force the "effect" of being in the x=2 group to be halfway between the x=1 and x=3 groups, even though these x numbers are just labels. We need to create group dummy variables, also known as indicator variables. One way to do this in Stata is with the xi command. (xi is especially convenient if there are many groups.) In this example, xi creates two dummy variables, Ix 2 and Ix 3. Ix 2, for example, equals one when the original x value equals 2, and Ix 2 equals 0 when x=2.

. xi i.x i.x

_Ix_1-3

(naturally coded; _Ix_1 omitted)

. g _Ix_1 = 1 - _Ix_2 - _Ix_3

The above xi command did not create a Ix 1 variable, so this was generated separately using a generate command (g for short).

When looking over the summary statistics from summarize, dummy variables can be spotted as variables where the minimum and maximum values equal 0 and 1. The mean of a dummy variable equals the proportion of the observations that have that attribute. Here, 45% of these observations are in group 1, 45% are in group 2, and 10% are in group 3.

1

. summarize _Ix_1 _Ix_2 _Ix_3

Variable |

Obs

Mean Std. Dev.

Min

Max

-------------+-----------------------------------------------------

_Ix_1 |

100

.45

.5

0

1

_Ix_2 |

100

.45

.5

0

1

_Ix_3 |

100

.1 .3015113

0

1

Regression 1 In this first regression Ix 1 is left out. In other words, the "reference group" is group 1. The coefficient on the dummy variable Ix 2 represents the difference between groups 2 and 1, and the coefficient on the dummy variable Ix 3 represents the difference between groups 3 and 1. The t-ratio on Ix 2 of 3.14, and its P -value of 0.002, indicate that the means of groups 1 and 2 are statistically significantly different at the 1% level. The P -value on Ix 3 is 0.227, indicating that the difference between the means of groups 1 and 3 is not statistically significant even at the 10% level.

. reg y _Ix_2 _Ix_3

Source |

SS

df

MS

-------------+------------------------------

Model | 272.883773

2 136.441887

Residual | 2670.32014 97 27.5290736

-------------+------------------------------

Total | 2943.20391 99 29.7293325

Number of obs =

F( 2, 97) =

Prob > F

=

R-squared

=

Adj R-squared =

Root MSE

=

100 4.96 0.0089 0.0927 0.0740 5.2468

------------------------------------------------------------------------------

y|

Coef. Std. Err.

t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

_Ix_2 | 3.468338 1.106126

3.14 0.002

1.272984 5.663692

_Ix_3 | 2.231158 1.834302

1.22 0.227 -1.409424

5.87174

_cons | 8.409467 .7821491 10.75 0.000 6.857118 9.961817

------------------------------------------------------------------------------

The test command below produces an F test of the joint hypothesis that the true coefficients of Ix 2 and Ix 3 both equal zero in the model that was just estimated. Taken together, the two restrictions imply that the means of groups 1, 2 and 3 are all equal, or that this characteristic "has no effect on (the mean of) y". This test is rejected at the 1% level by the F test since the P -value is smaller than .01.

It might seem predictable that this F test would reject, since we already rejected the null hypothesis that groups 1 and 2 have equal means using the t test on the Ix 2 coefficient. But the conclusion of the F test of the joint null hypothesis is not always consistent with the conclusions

2

of the t tests for the individual null hypotheses. If the joint null hypothesis is the main one of interest, then it is better to focus attention on the F test than on the individual t tests.

. test _Ix_2 _Ix_3

( 1) _Ix_2 = 0.0 ( 2) _Ix_3 = 0.0

F( 2, 97) = Prob > F =

4.96 0.0089

Regression 2 In this regression, group 2 is the reference group. The results for the coefficient on Ix 1 are the same as the results for the coefficient on Ix 2 in the first regression, with sign reversals where appropriate. This is because in each case, the coefficient represents the difference between the means of groups 1 and 2.

The result for Ix 3 is different from anything in the first regression. It is comparing groups 2 and 3, which was not done in any of the tests shown in the first regression.

. reg y _Ix_1 _Ix_3

Source |

SS

df

MS

-------------+------------------------------

Model | 272.883773

2 136.441887

Residual | 2670.32014 97 27.5290736

-------------+------------------------------

Total | 2943.20391 99 29.7293325

Number of obs =

F( 2, 97) =

Prob > F

=

R-squared

=

Adj R-squared =

Root MSE

=

100 4.96 0.0089 0.0927 0.0740 5.2468

------------------------------------------------------------------------------

y|

Coef. Std. Err.

t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

_Ix_1 | -3.468338 1.106126 -3.14 0.002 -5.663692 -1.272984

_Ix_3 | -1.23718 1.834302 -0.67 0.502 -4.877762 2.403402

_cons | 11.87781 .7821491 15.19 0.000 10.32546 13.43015

------------------------------------------------------------------------------

The following joint test gives exactly the same test statistics and conclusion as the F test shown after regression 1. Both are testing the same joint null hypothesis, namely that the three group means in the population are equal.

3

. test _Ix_1 _Ix_3

( 1) _Ix_1 = 0.0 ( 2) _Ix_3 = 0.0

F( 2, 97) = Prob > F =

4.96 0.0089

Regression 3 Now group 3 is the reference group. Recall that group 3 had fewer observations than groups 1 and 2. A common mistake when examining the results of a regression like regression 3 is to conclude that the groups do not matter, because the t statistics on Ix 1 and Ix 2 both are insignificant. But the joint test (that all three groups are equal) gives results that are identical to the joint tests from regressions 1 and 2, and suggests that the groups do matter.

The t tests in Regression 3 are insignificant because they each involve comparing a group (1 or 2) to the group 3 that only has a few observations. Since we don't have as much information about the mean of group 3, the standard errors involved in these t tests are bigger. Neither test is comparing group 1 to group 2, and that is the comparison that gives the strongest indication that the groups are not all the same.

. reg y _Ix_1 _Ix_2

Source |

SS

df

MS

-------------+------------------------------

Model | 272.883773

2 136.441887

Residual | 2670.32014 97 27.5290736

-------------+------------------------------

Total | 2943.20391 99 29.7293325

Number of obs =

F( 2, 97) =

Prob > F

=

R-squared

=

Adj R-squared =

Root MSE

=

100 4.96 0.0089 0.0927 0.0740 5.2468

------------------------------------------------------------------------------

y|

Coef. Std. Err.

t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

_Ix_1 | -2.231158 1.834302 -1.22 0.227 -5.87174 1.409424

_Ix_2 | 1.23718 1.834302

0.67 0.502 -2.403402 4.877762

_cons | 10.64062 1.659189

6.41 0.000

7.347595 13.93366

------------------------------------------------------------------------------

. test _Ix_1 _Ix_2

( 1) _Ix_1 = 0.0 ( 2) _Ix_2 = 0.0

F( 2, 97) = Prob > F =

4.96 0.0089

4

This next test command is given just to further illustrate that the result of the joint test is unaffected by linear transformations of the restrictions. The two restrictions tested below individually are different from the two restrictions tested above. But the pair of restrictions tested below imply the same two underlying coefficient restrictions as the pair of restrictions tested above, and the result of the joint tests are the same. The accumulate option appearing with the second test command below tells Stata to test the second restriction jointly with the first one.

. test _Ix_1+4*_Ix_2=0

( 1) _Ix_1 + 4.0 _Ix_2 = 0.0

F( 1, 97) = 0.09 Prob > F = 0.7608

. test _Ix_2-3*_Ix_1=0, accumulate

( 1) _Ix_1 + 4.0 _Ix_2 = 0.0 ( 2) - 3.0 _Ix_1 + _Ix_2 = 0.0

F( 2, 97) = Prob > F =

4.96 0.0089

Conclusions The choice of reference group is arbitrary and has no econometrically important effects on the results. However, this choice does affect which set of pairwise group comparisons get summarized by the t statistics that are automatically produced in the regression output. For this reason, it is a good idea to choose the reference group to be one for which these pairwise comparisons are interesting. It usually is not a good idea to make it a group that has just a few observations or an unclear interpretation, especially something like "province unknown"!

In the above examples, there is one characteristic that splits the data into three groups. An example is labour force status: employed; unemployed; not in labour force. In many regressions there are several sets of dummy variable regressors appearing together along with non-dummy or "continuous" variables. The points given in the above examples still apply. When the regressors include several sets of dummy variables, the reference group has several characteristics (e.g. female, living in Ontario, single, with a university degree, . . .) corresponding to the dummy variables left out of each set. The coefficients describe how changing one characteristic in a particular way changes the mean, holding the other characteristics constant. For example the coefficient on a "living in Quebec" dummy variable indicates the effect of living in Quebec instead of living in the reference province, which might be Ontario, say, for a person with otherwise identical controlled-for characteristics. This interpretation becomes more complicated, though, when there are interaction variables.

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download