Statistics AP/GT

Math 3339

Homework 7 (Chapters 9, 11 & 12) Name:__________________________________ PeopleSoft ID:_______________

Instructions: ? Homework will NOT be accepted through email or in person. Homework must be submitted through CourseWare BEFORE the deadline. ? Print out this file and complete the problems or you can complete it using your computer. ? Use blue or black ink or a dark pencil if completing this by hand. ? Write your solutions in the space provided. You must show all work for full credit. ? Submit this assignment at under "Assignments" and choose HW7. ? Total possible points: 15.

1. *The following data is looking at how long it takes to get to work. Let x = commuting distance (miles) and y = commuting time (minutes) x 15 16 17 18 19 20 y 42 35 45 42 49 46 a. Give a scatterplot of this data and comment on the direction, form and strength of this relationship. b. Determine the least-squares estimate equation for this data set. c. Give the r2, comment on what that means. d. Give the residual plot based on the least-squares estimate equation. e. Test if this least-squares estimate equation specify a useful relationship between commuting distance and commuting time.

a. This is a positive relationship, somewhat strong, somewhat linear.

b. Output from R Studio

Call:

lm(formula = y ~ x)

Residuals:

1

2

3

4

5

6

3.048 -5.638 2.676 -2.010 3.305 -1.381

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 13.6667 16.9572 0.806 0.465

x

1.6857

0.9644 1.748 0.155

Residual standard error: 4.034 on 4 degrees of freedom Multiple R-squared: 0.433, Adjusted R-squared: 0.2913 F-statistic: 3.055 on 1 and 4 DF, p-value: 0.1554

Equation: = 13.6667 + 1.6857 c. R2 = 0.433; About 43.3% of the variation in the time can be explained by this equation.

d. e. H0: 1 = 0 and Ha: 1 0, test statistic = t = 0.9644, p-value = 0.155, Fail to reject the null

hypothesis. There is no evidence that there is a relationship between the commuting time and the distance. Using this data.

Problems came from Devore, Jay and Berk, Kenneth, Modern Mathematical Statistics with Applications, Thomson Brooks/Cole, 2007.

2. *The following another set of data that looking at how long it takes to get to work. Let x = commuting distance (miles) and y = commuting time (minutes)

x 5 10 15 20 25 50 y 16 32 44 45 63 115

a. Give a scatterplot of this data and comment on the direction, form and strength of this relationship. b. Determine the least-squares estimate equation for this data set. c. Give the r2, comment on what that means. d. Give the residual plot based on the least-squares estimate equation. e. Test if this least-squares estimate equation specify a useful relationship between commuting

distance and commuting time. f. Compare this least-square estimate equation to the previous least-squares estimate equation in

problem 1. In which situation would the least-squares equation be least effective? Justify your answer.

a. This is a positive strong, linear relationship

b. Output from R studio

Call:

lm(formula = y ~ x)

Residuals:

1

2

3

4

5

6

-2.58033 2.70820 3.99672 -5.71475 1.57377 0.01639

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 7.8689

2.8760 2.736 0.0521 .

x

2.1423

0.1132 18.930 4.59e-05 ***

---

Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Residual standard error: 4.034 on 4 degrees of freedom Multiple R-squared: 0.989, Adjusted R-squared: 0.9862 F-statistic: 358.4 on 1 and 4 DF, p-value: 4.587e-05

Equation: = 7.8689 + 2.1423 c. R2 = 0.989, About 98.9% of the variation in time can be explained by this equation.

Problems came from Devore, Jay and Berk, Kenneth, Modern Mathematical Statistics with Applications, Thomson Brooks/Cole, 2007.

d. Residual plot

e. H0: 1 = 0 and Ha: 1 0, test statistic = 18.930, p-value = 0.0000459, Reject the null hypothesis. There is very strong evidence of a relationship between distance and time using this data.

f. Problem 1 would be least effective because the R2 is smaller and there appears to not be a relationship between distance and time. Where as in problem 2, distance seems to be significant to determine time.

Problems came from Devore, Jay and Berk, Kenneth, Modern Mathematical Statistics with Applications, Thomson Brooks/Cole, 2007.

3. The cost of a home depends on the number of bedrooms in the house. Suppose the following data is

recorded for homes in a given town

price (in thousands) 300 250 400 550 317 389 425 289 389 559

No. bedrooms

3 3

4

5

4

3

6

3

4

5

a) Make a scatterplot b) Fit the data with a least squares regression line. c) Give a 95% confidence interval for the slope. d) If one house has one more number of rooms than another house, how much additional cost would

we expect for the price? e) Test the hypothesis that an extra bedroom costs $60,000 against the alternative that it costs more.

a. Scatterplot

b. R studio output

Call:

lm(formula = price ~ beds)

Residuals:

Min

1Q Median

-108.00 -53.95 -5.75

3Q 59.77

Max 99.10

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 94.40

97.98 0.963 0.3635

beds

73.10

23.76 3.076 0.0152 *

---

Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Residual standard error: 75.15 on 8 degrees of freedom Multiple R-squared: 0.5419, Adjusted R-squared: 0.4846 F-statistic: 9.462 on 1 and 8 DF, p-value: 0.01521

Equation: = 94.40 + 73.10 ?

c. Confidence interval:

> 73.1+c(-1,1)*qt(1.95/2,8)*23.76 [1] 18.30934 127.89066

Problems came from Devore, Jay and Berk, Kenneth, Modern Mathematical Statistics with Applications, Thomson Brooks/Cole, 2007.

d. This is the definition of the slope. So the price will increase by $73,100 for each additional

bedroom added.

e. H0: 1 = 60 and Ha: 1 > 60

t

=

73.1-60 23.76

=

0.5513

p-value = 1 ? pt(0.5513,8) = 0.2982, Fail to reject the null hypothesis.

There is no evidence that the slope is greater than 60.

Problems came from Devore, Jay and Berk, Kenneth, Modern Mathematical Statistics with Applications, Thomson Brooks/Cole, 2007.

4. Section 11.1.4, problem 2 The table below shows summary statistics for normally distributed measurements on 5 groups. The population variances are all equal. Construct an anova table and determine if there is a difference in the population means by calculating a p-value.

N Mean var grp1 10 52.40 243.38 grp2 21 55.00 142.00 grp3 16 36.25 246.73 grp4 20 53.65 173.82 grp5 18 47.50 267.91

..

=

1052.4+2155+1636.25+2053.65+1847.5 10+21+16+20+18

=

49.25882

SSTr = 10(52.4 ? 49.25882)2 + 21(55 ? 49.2588)2 + 16(36.25 ? 49.25882)2

+ 20(53.65 ? 49.25882)2 + 18(47.5 ? 49.25882)2 = 3939.856

SSE = 9(243.38) + 20(142) + 15(246.73) + 19(173.82) + 17(267.91) = 16588.42

ANOVA TABLE

DF

Treatment

4

Error

80

Total

85

SS 3939.856 16588.42 20528.27588

MS 984.964 207.3553

F 4.750128

p-value = 1 ? pf(4.750128,4,80) = 0.0017, Reject the null hypothesis. At least one of the means is different.

Problems came from Devore, Jay and Berk, Kenneth, Modern Mathematical Statistics with Applications, Thomson Brooks/Cole, 2007.

5. A study was conducted to examine the effect of pets in stressful situations. Fifteen subjects were randomly

assigned to each of three groups to do a stressful task alone (the control group), with a good friend present, or

with their dog present. The subject's mean heart rate (in beats per minutes) during the task is one measure of

the effect of stress. The data has is the mean heart rates during stress with a pet (P), with a friend (F) and for

the control group (C).

Control Friend

Pet

80.369 99.692 69.169

87.446

83.4 70.169

90.015 102.154 75.985

99.046 80.277 86.446

75.477 88.015 68.862

87.231 92.492 64.169

91.754 91.354 97.538

87.785 100.877

85

77.8 101.062 72.262

62.646

81.6 58.692

84.738 89.815 79.662

84.877

98.2 69.231

73.277 76.908 69.538

84.523 86.985 70.077

70.877 97.046 65.446

This data is in the homework and calendar website called "Stress" .

a. Make a side by side box plot of the heart rates by the three groups. To do this in R use: boxplot(Rate~Group,data=Stress)

Does there seem to be a difference in the heart rates of the three groups? Do any of the groups show outliers or extreme skewness?

b. We want to test if there is a difference in the mean heart rates for the three groups. Give the null hypothesis of this test.

c. Does the data suggest that there is a difference among the three groups? Use = 0.05. d. If there seems to be a difference, complete a Bonferroni pairwise test to determine which or if all the

means are different from each other.

Problems came from Devore, Jay and Berk, Kenneth, Modern Mathematical Statistics with Applications, Thomson Brooks/Cole, 2007.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download