Data Analysis 1016-319



Chapter 3 Review – Simple Linear Regression Model

Can the number of watts be used to help determine the price of a microwave? The table below contains data on microwave ovens found at .

Power (watts) |1100 |700 |700 |1200 |1200 |1200 |1200 |1000 |1000 |1000 |700 | |Price ($) |80 |80 |50 |90 |100 |90 |110 |90 |75 |80 |63 | |

1. MODEL

a) Describe (in words) the X and Y variables; verify that X and Y are numerical.

b) Write out the simple linear regression model.

c) State the basic assumptions of the model.

2. ESTIMATING THE MODEL

a) Estimate the slope and intercept of the line.

o CALCULATOR: Use LinReg(a + bx)

b) Write the equation of the estimated regression line.

c) Interpret the slope of the line IN CONTEXT.

3. PREDICTION OF Y USING THE MODEL

a) Predict the Price of a microwave that has a Power of 900 watts.

b) Provide a range of possible Power values (watts) for which you are comfortable using this model to predict Price. EXPLAIN your choice.

4. RESIDUALS

a) Obtain the residuals and write them in the table below.

o CALCULATOR: They are already stored in the list “RESID”

Power (watts) |1100 |700 |700 |1200 |1200 |1200 |1200 |1000 |1000 |1000 |700 | |Price ($) |80 |80 |50 |90 |100 |90 |110 |90 |75 |80 |63 | |RESID | | | | | | | | | | | | |

b) Compute SSResid.

o CALCULATOR: Obtain 1-Var Stats for the list “RESID”, use [pic]

5. EVALUATING THE MODEL

a) Compute se.

b) Compute the coefficient of determination.

c) Based on the values of se and r2, evaluate the performance of the model for this data.

Solution to Simple Linear Regression Model

1. MODEL

a) X = Power of a microwave (watts), Y = Price of a microwave ($); both Power and Price are numerical

b) Y = ( + (X + e

c) The distribution of e at any particular X value has mean value 0, has the same standard deviation, and is normal. The random deviations e1, e2, …, en are independent.

2. ESTIMATING THE MODEL

a) a = 18.68, b = 0.06386

b) [pic]

c) For every one watt increase in Power, we expect Price of a microwave to increase by $0.06, on average. (For every 100 watt increase in Power, we expect Price of a microwave to increase by $6, on average.)

3. PREDICTION OF Y USING THE MODEL

a) [pic]

b) I would be comfortable using this model for Power values between 700 and 1200 watts because that is the range of my data.

4. RESIDUALS

a) RESID |-8.93 |16.61 |-13.39 |-5.32 |4.68 |-5.32 |14.68 |7.45 |-7.55 |-2.55 |-0.39 | |b) SSResid = 948.2

5. EVALUATING THE MODEL

a) [pic]

b) r2 = 0.654

c) The fit of this model is fair – 65.4% of the variation in Price is explained by the approximate linear relationship with Watts, and a typical deviation in Price from what the model predicts (i.e. a typical residual) is $10.

Chapter 15 – Inferences About the Slope

How is resting body temperature ((F) dependent on heart rate (beats per minute)? The Minitab output below is an analysis of body temperature data from the Journal of Statistics Education Data Archive (Shoemaker, 1996) using a simple linear regression model.

[pic]

The regression equation is

Temp = 96.3 + 0.0263 Heart Rate

Predictor Coef SE Coef T P

Constant 96.3068 0.6577 146.43 0.000

Heart Rate 0.026335 0.008876 2.97 0.004

S = 0.711969 R-Sq = 6.4% R-Sq(adj) = 5.7%

Analysis of Variance

Source DF SS MS F P

Regression 1 4.4618 4.4618 8.80 0.004

Residual Error 128 64.8832 0.5069

Total 129 69.3449

HYPOTHESIS TEST FOR (

Perform a hypothesis test for model utility using the steps below.

A. POPULATION

▪ Describe (in words) the population characteristic

▪ State H0 and Ha (using ()

B. STATISTICAL METHOD

▪ Set a reasonable level for (

▪ Write the formula of the test statistic (using the hypothesized value from H0)

C. SAMPLE

▪ Describe the sample: Determine b, sb, and the residual df.

▪ Check that the sample meets the necessary assumptions.

D. STATISTICAL RESULTS

▪ Obtain the value of the test statistic and the p-value.

E. CONCLUSION

▪ Reject H0 OR Fail to Reject H0

▪ Make a concluding statement

CONFIDENCE INTERVAL FOR (

Construct a 95% confidence interval for the slope of the regression line.

A. POPULATION

▪ Describe (in words) the population characteristic (same as for hypothesis test).

B. STATISTICAL METHOD

▪ How much confidence is desired?

▪ Write the formula for the confidence interval.

C. SAMPLE (same as for hypothesis test)

▪ Describe the sample: Determine b, sb, and the residual df.

▪ Check that the sample meets the necessary assumptions.

D. STATISTICAL RESULTS

▪ Construct the CI using the formula from part B. SHOW YOUR WORK.

E. CONCLUSION

▪ Make a concluding statement for your CI.

Solution to Inferences About the Slope

HYPOTHESIS TEST FOR (

A. POPULATION

▪ ( = average increase in temperature for each additional beat per minute in heart rate

▪ H0: ( = 0, Ha: (( 0

B. STATISTICAL METHOD

▪ ( = 0.05

▪ [pic]

C. SAMPLE

▪ b = 0.0263, sb = 0.008876, residual df = 128

▪ The scatterplot of the data shows a linear pattern; variability of the points does not appear to change with x.

D. STATISTICAL RESULTS

▪ t = 2.97, p-value = 0.004

E. CONCLUSION

▪ Reject H0

▪ The data provides sufficient evidence to conclude that the average increase in temperature for each additional beat per minute in heart rate is not zero (i.e. that heart rate is useful in determining body temperature).

CONFIDENCE INTERVAL FOR (

A. POPULATION

▪ ( = average increase in temperature for each additional beat per minute in heart rate

B. STATISTICAL METHOD

▪ 95% confidence

▪ b ( (t critical value)sb

C. SAMPLE

▪ b = 0.0263, sb = 0.008876, residual df = 128

▪ Scatterplot of the data show linear pattern; variability of points does not appear to change with x.

D. STATISTICAL RESULTS

▪ t critical value = 1.98

▪ 95% CI = 0.0263 ( 1.98(0.008876) = (0.0087, 0.0439)

E. CONCLUSION

▪ We are 95% confident that the average increase in temperature for each additional beat per minute in heart rate is between 0.0087 and 0.0439 beats per minute.

Introduction to Statistics and Data Analysis

Chapter 13 – Model Adequacy

Every day, James Bond puffs his way through 70 high-tar, unfiltered cigarettes made with a blend of black Turkish tobaccos (Men’s Health Forum). Doesn’t he know that smoking is bad for your health? Cigarette smoke contains carbon monoxide, a colorless, odorless gas that reduces the ability of the blood to carry oxygen.

The Federal Trade Commission evaluates cigarettes to determine their tar (mg) and carbon monoxide (mg) contents. An analysis of a random sample of 35 brands using a simple linear regression model had the following results:

[pic]

The regression equation is

CO = 4.86 + 0.663 Tar

Predictor Coef SE Coef T P

Constant 4.858 1.005 4.83 0.000

Tar 0.66323 0.08169 8.12 0.000

S = 2.26717 R-Sq = 66.6% R-Sq(adj) = 65.6%

Analysis of Variance

Source DF SS MS F P

Regression 1 338.78 338.78 65.91 0.000

Residual Error 33 169.62 5.14

Total 34 508.40

1. Getting started…

a) State the simple linear regression model and its assumptions.

b) What is the equation of the estimated regression line?

c) Identify the values of se and r2. How well does the model perform?

d) Use the scatterplot to roughly check the model assumptions.

2. You can better check model adequacy (assumptions and performance) by examining the residuals (errors) for the data.

Standardized Residuals vs X. Examine the five plots below.

Which of the plots shows an unusual observation? ________

a potentially influential observation? ________

evidence of nonconstant variance? ________

evidence of curvi-linear relationship? ________

[pic] [pic] [pic]

[pic] [pic]

Normal Probability Plot of Standardized Residuals

Which of the following plots show evidence of non-normality? EXPLAIN.

[pic][pic] [pic]

3. Back to the cigarette data…

a) Do the residual plots below show

Unusual observations? Influential observations?

Evidence of nonconstant variance?

Evidence of curvi-linear relationship?

Evidence of non-normality?

[pic] [pic]

b) Have the assumptions of the simple linear regression model been met? EXPLAIN.

Solution to Model Adequacy

1. Getting Started

a) Y = ( + (X + e

Assume: The distribution of e at any particular X value has mean value 0, has the same standard deviation, and is normal. The random deviations e1, e2, …, en are independent.

b) [pic]

c) se = 2.27, r2 = 0.666

The model performs fairly well: 66.6% of the variability in CO is explained by the approximate linear relationship with Tar, and a typical deviation from the predicted CO level is 2.27 mg.

d) The scatterplot of the data shows a mostly linear pattern; variability of the points does not appear to change with x.

2. Doing Better

Standardized Residuals vs X

Plot D shows an unusual observation (large residual)

Plot B shows a potentially influential observation (small residual, but x is far from rest of data)

Plot E shows nonconstant variance (variance increases as x increases)

Plot A shows a curvi-linear relationship

Normal Probability Plot of Standardized Residuals

Plots #1 and #2 show evidence of non-normality (curved, points outside confidence bands)

3. Back to the Cigarette Data

a) Evaluate Residual Plots

Possible unusual observations; no influential observations

Some evidence of nonconstant variance (variability may be increasing with x)

No evidence of curvi-linear relationship

No evidence of non-normality

There is some doubt that the assumptions of the model have been met – in particular, the constant variance assumption. This makes any results from the model untrustworthy.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download