Match the following histograms with these variables ...



Match the following histograms with these variables (collected on statistics students):

1. Verbal SAT 5. Number of hours exercise per week

2. Coke vs. Pepsi 6. Male vs. Female

3. Last 2 digits of SSN 7. Guess of prof’s age

4. Amount paid for last haircut 8. Number of CDs

(a) (b) (c)

[pic][pic][pic]

(d) (e) (f)

[pic][pic][pic]

(g) (h)

[pic][pic]

SPSS instructions:

1. Descriptives: (mean, median, min, max, st. dev, var, histograms, boxplots, stem-and-leaf plots,…)

• Analyze -> Descriptive Statistics -> Explore OR

• Analyze -> Descriptive Statistics -> Descriptives

2. Histograms: (slightly more flexible/editable)

• Graphs -> Interactive -> Histogram

• Note that if you click on the “Histogram” tab, you can set an option to draw a normal curve and also to set the bin widths.

• You choose whether you want count or percentage on the y-axis.

3. Boxplots: (again, slightly more flexible/editable)

• Graphs-> Interactive -> Boxplot

• If you want to make separate boxplots for some variable (for example, you could make a haircut boxplot split on gender), you put the response (haircut) on the y-axis and the explanatory (gender) on the x-axis.

4. Scatterplots: (will also give you the regression line)

• Graphs -> Interactive -> Scatterplot

• The response goes on the y-axis, the explanatory goes on the x-axis.

• Note that the plot gives the estimated regression line and r2 (which is the correlation squared)

5. Quantiles: (Q1, Q3, percentiles,…)

• Analyze -> Descriptive Statistics -> Frequencies

• Click on “Statistics” and check the “Quartiles” box

SPSS Lab Project Instructions:

1. Download the data from the web site: geyser.sav (SPSS format.) The data are 223 consecutive eruptions from Old Faithful geyser in Yellowstone National Park. The duration value is the length (in minutes) of the eruption. The down_time variable is the number of minutes between each eruption.

2. Using SPSS, make a histogram of each of the variables. (Graphs -> Histogram -> put variable into the variable box.)

3. Change the width of the histogram boxes by using the interactive graph. (Graphs -> Interactive -> Histogram -> drag the variable into the box below the “count" box. By clicking on the “Histogram" tab, (from this view) you can manipulate the interval size. Your options are to set interval size automatically, set it by the number of intervals, or set it by the width of the intervals.) Play with this option, and create at least 3 histograms with varying interval size. Which histogram do you think is the most informative?

4. From the “Histogram" tab (where the interval size info is) check the “Normal Curve" box. This option will place a normal curve on top of your histogram. Print this histogram.

5. Use the exploratory statistics to create summary values and graphs for the duration variable. (Analyze -> Descriptive Statistics -> Explore... -> Put “duration" in the dependent list.)

6. Create a scatterplot for the two variables “duration” and “down_time”.

Histogram Write-Up:

1. Were there any plots for which the variable association was not clear? Explain.

2. Would you say any (or all?) of these plots followed a “normal" distribution? Explain.

Geyser Write-Up (use the ‘duration’ variable):

3. Describe which histogram (from the ones with various bin sizes) was the most informative. Attach a copy of this histogram.

4. About how many of the eruptions lasted more than 4.5 minutes? More than 5 minutes? Less than 2 minutes?

5. Change the “count" to “percentage" and answer the previous question using percentages.

6. For the overlying normal curve, do you think the data look normal? Explain.

7. How many distinct clusters (or peaks) can you identify in the “duration" distribution? Roughly where do they fall? Give a reasonable explanation for what might lead to these clusters.

8. The mean and the median are slightly different, what does this tell you about the skewness of the data?

9. A boxplot often gives us information about the outliers in a data set. This particular data set doesn't seem to have any outliers. For this particular variable, what information have we lost when we present the data in a boxplot?

10. In the scatterplot, justify your choice of explanatory and response variables. Report the correlation coefficient and the equation of the linear regression.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download