Two Groups and One Continuous Variable



Two Groups and One Continuous VariablePsychologists and others are frequently interested in the relationship between a dichotomous variable and a continuous variable – that is they have two groups of scores. There are many ways such a relationship can be investigated. I shall discuss several of them here, using the data on sex, height, and weight of graduate students, in the file SexHeightWeight.sav. For each of 49 graduate students, we have sex (female, male), height (in inches), and weight (in pounds).After screening the data to be sure there are no errors, I recommend preparing a schematic plot – side by side box-and-whiskers plots. In SPSS, Analyze, Descriptive Statistics, Explore. Scoot ‘height’ into the Dependent List and ‘sex’ into the Factor List. In addition to numerous descriptive statistics, you get this schematic plot:The height scores for male graduate students are clearly higher than for female graduate students, with relatively little overlap between the two distributions. The descriptive statistics show that the two groups have similar variances and that the within-group distributions are not badly skewed, but somewhat playtkurtic. I would not be uncomfortable using techniques that assume normality.Student’s T Test. This is probably the most often used procedure for testing the null hypothesis that two population means are equal. In SPSS, Analyze, Compare Means, Independent Samples T Test, height as test variable, sex as grouping variable, define groups with values 1 and 2.The output shows that the mean height for the sample of men was 5.7 inches greater than for the women and that this difference is significant by a separate variances t test, t(46.0) = 8.005, p < .001. A 95% confidence interval for the difference between means runs from 4.28 inches to 7.08 inches.When dealing with a variable for which the unit of measure is not intrinsically meaningful, it is a good idea to present the difference in means and the confidence interval for the difference in means in standardized units. While I don’t think that is necessary here (you probably have a pretty good idea regarding how large a difference of 5.7 inches is), I shall for pedagogical purposes compute Cohen’s d (the sample statistic) and a confidence interval for Cohen’s (the parameter). In doing so, I shall use a special SPSS script and the separate variances values for t and df. See the document Confidence Intervals, Pooled and Separate Variances T. For these data, d = 2.36 (quite a large difference) and the 95% confidence interval for runs from 1.61 (large) to 3.09 (even larger). We can be quite confident that the difference in height between men and women is large.Point Biserial Correlation. Here we simple compute the Pearson r between sex and height. The value of that r is .76, and it is statistically significant, t(47) = 8.005, p < .001. This analysis is absolutely identical to an independent samples t test with a pooled error term (see the t test output above) or an Analysis of Variance with pooled error. The value of r here can be used as a strength of effect estimate. Square it and you have an estimate of the percentage of variance in height that is “explained” by sex.One-Way Independent Samples Parametric ANOVA. This too is absolutely equivalent to the t test. The value of F obtained will equal the square of the value of t, the numerator df will be one, the denominator df will be N – 2 (just like with t), and the (appropriately one-tailed for nondirectional hypothesis) p value will be identical to the two-tailed p value from t. Our t of 8.005 when squared will yield an F of 64.08.ANOVAheightSum of SquaresdfMean SquareFSig.Between Groups386.9541386.95464.078.000Within Groups283.821476.039Total670.77648Discriminant Function Analysis. This also is equivalent to the independent samples t test, but looks different. In SPSS, Analyze, Classify, Discriminant, sex as grouping variable, height as independent variable. Under Statistics, ask for unstandarized discriminant function coefficients. Under Classify, ask that prior probabilities be computed from group sizes and that a summary table be displayed. In discriminant function analysis a weighted combination of the predictor variables is used to predict group membership. For our data, that function is DF = -27.398 + .407*height. The correlation between this discriminant function and sex is .76 (notice that this identical to the point-biserial r computed earlier) and is statistically significant, 2(1, N = 49) = 39.994, p < .001. Using this model we are able correctly to predict a person’s sex 83.7% of the time.Logistic Regression . This technique is most often used to predict group membership from two or more predictor variables. The predictors may be dichotomous or continuous variables. Sex is a dichotomous variable in the data here, so let us test a model predicting height from sex. In SPSS, Analyze, Regression, Binary Logistic. Identify sex as the dependent variable and height as a covariate.The 2 statistic shows us that we can predict sex significantly (p < .001) better if we know the person’s height than if all we know is the marginal distribution of the sexes (28 women and 21 men). The odds ratio for height is 2.536. That is, the odds that a person is male are multiplied by 2.536 for each one inch increase in height. That is certainly a large effect. The classification table show us that using the logistic model we could, use heights, correctly predict the sex of a person 83.7% of the time. If we did not know the person’s height, our best strategy would be to predict ‘woman’ every time, and we would be correct 28/49 = 57% of the time.Omnibus Tests of Model Coefficients Chi-squaredfSig.Step 1Step38.4671.000 Block38.4671.000 Model38.4671.000Wilcoxon Rank Sums Test . If we had reason not to trust the assumption that the population data are normally distributed, we could use a procedure which makes no such assumption, such as the Wilcoxon Rank Sums Test (which is equivalent to a Mann-Whitney U Test). In SPSS, Analyze, Nonparametric Tests, Two Independent Samples, height as test variable, sex as grouping variable with values 1 and 2, Exact Test selected.The output shows that the difference between men and women is statistically significant. SPSS gives mean ranks. Most psychologists would prefer to report medians. From Explore, used earlier, the medians are 64 (women) and 71 (men).Kruskal-Wallis Nonparametric ANOVA on RanksAs with the parametric ANOVA, this ANOVA can be used with 2 or more independent samples. While the results of the analysis will not be absolutely equivalent to those of a Wilcoxon rank sum test, it does test the same null hypothesis.Test Statisticsa,bheightChi-Square27.854df1Asymp. Sig..000a. Kruskal Wallis Testb. Grouping Variable: sexResampling Statistics. Using the bootstrapping procedure in David Howell’s resampling program and the SexHeight.dat file, a 95% confidence interval for the difference in medians runs from 4 to 8. Since the null value (0) is not included in this confidence interval, the difference between group medians is statistically significant. Using the randomization test for differences between two medians, the nonrejection region runs from -2.5 to 3. Since our observed difference in medians is -7, we reject the null hypothesis of no difference.Return to Wuensch’s Stats LessonsKarl L. WuenschEast Carolina UniversityMarch, 2015 ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download