Practice Midterm Exam - Statistical Science



Statistics 103 May 5, 2005

Final Exam

Instructions: Write your answers on the exam in the spaces after the questions. For maximum credit, show all work.

You are permitted to use two sheets of notes, front and back, and a calculator. Any other form of aid is not permitted. If you need clarification on any part of the exam, contact Prof. Reiter.

Provide the information requested below in the adjacent empty spaces.

NAME (print): LAB TIME: .

Demographic Questions (not used for grading in any way):

|Page |Points Possible |Score |

|4 |20 | |

|5 |12 | |

|6 |12 | |

|7 |12 | |

|8 |10 | |

|9 |12 | |

|10 |10 | |

|11 |12 | |

|Total |100 | |

1) Did you take AP Statistics in high school? No ___ Yes ____

2) What was your score on the AP Statistics exam? _____ or circle “Did not take it.”

QUESTIONS 1 – 4 REFER TO THE DATASET DESCRIBED BELOW

In 1970s, Harris Trust and Savings Bank was sued for sex discrimination. The law suit alleged that the Bank systematically paid female employees lower salaries than male employees. The key evidence in the case was data on salaries of employees. Both the prosecution and the defense presented data analyses in attempts to support their cases.

In the problems below, we analyze a subset of the data for one type of employee: the skilled, entry-level clerical workers. You can assume that these data are a random sample of the population of skilled, entry-level clerical workers who work at this Bank.

DESCRIPTION OF THE DATA

========================

There were 61 female and 32 male employees in the data set. The following are variables we consider on this exam.

bsal: Annual salary at time of hire.

sal77: Annual salary in 1977 (the latest year in the study).

educ: Years of education.

exper: Number of months working at other companies prior to being hired at the Bank.

senior: Number of months worked at Bank since hired

age: Age in months

There are no problems on this page. Starting below, the next two pages display output from exploratory data analyses that you should use to answer exam questions. The questions begin on page 4.

Correlations among selected variables, based on all 93 employees

| |Sal77 |Exper |Senior |Educ |

|Sal77 |1.00 |-0.37 |0.13 |0.42 |

|Exper | |1.00 |-0.07 |-0.10 |

|Senior | | |1.00 |0.06 |

|Educ | | | |1.00 |

[pic]

Histogram of sal77 for females. Histogram of sal77 for males.

Scatterplot Matrix: Each graph is based on all 93 employees.

[pic]

EXAM PROBLEMS BEGIN HERE

1. (2 points per part) For parts 1a-1d, circle the answer that is closest to the truth.

a) Estimate the 75th percentile of sal77 for the male employees minus the 75th percentile of sal77 for the female employees. 2200

b) Estimate the standard deviation of sal77 for the females: 1200

c) Estimate the percentage of female employees whose sal77 exceeds $10,000.

30%

d) The standard error for the average of sal77 for males is ______________ the standard error for the average of sal77 for females. larger than (due to smaller sample size)

2. (2 points per part). For 2a – 2f, circle the appropriate answer.

a) Which one of the following three scatter plots displays the relationship between sal77 and experience? Circle the letter of the correct plot.

The plot with the negative slope and not especially tight pattern.

b) Which variable has the weakest linear association with sal77? senior

c) Which of the following lines is the fitted regression line for predicting sal77 (Y) from bsal (X)? Circle the correct line.

Y = 4620 + 1.065 X. You can verify this by plugging in values of bsal into each line, and you’ll see that this is the only line that gives any reasonable predictions of sal77.

d) Which pair of variables has correlation closest to zero?

bsal, age

e) Which variable has the largest standard deviation? sal77 (it has the biggest numbers)

f) True or false. In the regression of sal77 (Y) on bsal (X), the plot of residuals versus bsal shows no evidence of violations of the regression assumptions.

True. The scatter plot of sal77 (Y) on bsal (X) shows no indications of non-linear relationships, so the regression line would be a good fit to the data.

3. The differences between salaries for men and women.

a) (5 points) The sample average and sample variance of bsal for males equal 5937 and 477066, respectively. The sample average and sample variance of bsal for females equal 5139 and 291460, respectively. Give an interval for the difference in the population average bsal for male skilled, entry-level clerical workers employed at the Bank and the population average bsal for female skilled, entry-level clerical workers employed at the Bank. Use a 99% confidence level. Use 40 degrees of freedom to approximate the Welch-Satterthwaite degrees of freedom (it’s equal to 51).

[pic]

b) (1 points) Based on your interval in part a, circle the choice that best completes the statement:

The confidence interval suggests that the population average bsal for male skilled, entry-level clerical workers employed at the Bank ______________________ the population average bsal for female skilled, entry-level clerical workers employed at the Bank.

is larger than

c) (6 points) Test the null hypothesis that the population percentages of men and women hired in the bank are equal. Write your null and alternative hypotheses, the value of the test statistic, the p-value, and your conclusion. Use a two-sided alternative. Consider p-values in the .05 range as small. Be sure to address the question of interest in your conclusion (write more than reject/not reject the null hypothesis).

Let p = the population percentage of females hired by the bank. Then,

Ho: p=0.5. Ha: p not = 0.5

[pic]

We double this to get the p-value, since we have a two-sided hypothesis, and we get a p-value of .003.

Assuming males and females are hired in equal rates, there is only a 3 out of 1000 chance we’d get a sample percentage of females of 65.6%. This is a small chance. Therefore, we reject the null hypothesis. There does in fact seem to be evidence that the Bank hires males and females with differing percentages (at least for this type of worker).

4. Predicting salaries

In the regression of sal77 (Y) on educ (X), the following output is obtained.

Bivariate Fit of sal77 By educ

Summary of Fit

| | |

|RSquare |0.177259 |

|RSquare Adj |0.168218 |

|Root Mean Square Error |1632.19 |

|Mean of Response |10392.9 |

|Observations (or Sum Wgts) |93 |

Parameter Estimates

|Term | |Estimate |Std Error |t Ratio |Prob>|t| |

|Intercept | |6264.513 |947.6058 |6.61 | ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download