Chapter 11



Chapter 11

|Correlation and Regression |

Estimation of Relationship

Statistics as a Core Knowledge of Management

“The social scientist will acquire his dignity and his strength when he has worked out his method. He will do that by turning into opportunity the need of the great society for instruments of analysis by which an invisible and most stupendously difficult environment can be made intelligible.” (Walter Lippman in Public Opinion)

Issue:

“Understanding the population using multivariate data.”

Regression

The framework:

Identify one variable as the dependent variable.

The rest are independent variables (predictors, explanatory variables, etc.)

Goal of data analysis:

a. Forecasting

Estimation of the effect

Examples for bivariate data:

2 Income and Vacation Expenditures

Target Population: families living a city

5 Sales Area and Sales

Target Population: All stores of a chain

7 Price and Unit Sales for a New Shampoo Formula

Target Population: All test settings

8 Distance to the UW Campus and Apartment Rent

Target Population: All apartments in Seattle

Four Steps for Regression

A. Scatterplot and Correlation Coefficient

B. Least Squares Regression Equation

C. Specification of Population Regression and Data Collection Model

D. Statistical Inference on the Population Regression Coefficients

Scatterplot

• Display of Bivariate Data

Y = Salary vs. X =Years Experience (n = 6 employees)

We learn from the plot:

3 Increasing

(b) Linear

(c) Not exact , but strong

(Correlation r = 0.8667)

Example 1: Mortgage Rates & Fees

Y = Interest Rate vs. X = Loan Fee

Description of the Relationship

a)

b)

c)

(Correlation is r = – 0.441)

Example 2: The Stock Market

Y = Today’s vs. X = Yesterday’s Percent Change

(Correlation is r = 0.18)

Example 3: Maximizing Yield

Y = Output Yield vs. X =Temperature for an industrial process.

[pic]

(a)

(b)

(c)

(Correlation r = – 0.0155)

Example 4 : Telecommunications

Y = Circuit Miles vs. X = Investment (lower left)

For telecommunications firms

[pic]

Special Feature: Unequal Variability

Variability is stabilized by taking logarithms.

Correlation r = 0.820 after the log transformation

Example 5: Cost and Quantity

Y = Cost vs. X = Number Produced

[pic]

Special Feature: An outlier is visible:

• The correlation coefficient is unreliable

Scatterplot and Correlation Coefficient

Computer Demonstration

Computing the Sample Covariance and Correlation Coefficient, r

Formula (see text page 439)

[pic]

[pic]

Example:

n = 6

SDX = SDY =

Covariance =

The Least-Squares Regression Line

Y=a+bX

the regression line summarizes scatterplot for predicting Y from X

least squares line: the line that minimizes the sum of squared deviations (actual Y - predicted Y)

a) computing the slope and the Intercept

slope [pic] = 15.318

intercept [pic] = 1.673

Excel Regression Menu Output

Performance of the Regression Line

Checking by Individual Data Points

7 Predicted Value

8 Residual = Data Value – Predicted Value

Overall Measures

10 Standard Error of Estimate

11 R-Squared or Adjusted R-squared

1. Predicted Values and Residuals

Predicted Value comes from Least-Squares Line

Mary (with 20 years of experience) has predicted salary

15.32+1.673(20) = 48.8

Residual = actual Y - predicted Y

Mary’s residual = 55 – 48.8 = 6.2

She earns about $6,200 more than the predicted salary for a person with 20 years of experience

A person who earns less than predicted will have a negative residual

2. Understanding the SE of Estimate & R-Squared

2 (a): Computing the Standard Error of Estimate

Approximate size of prediction errors (residuals)

[pic]

Example (Salary vs. Experience)

Se = 6.52

[pic]

Interpretation

Actual salaries are about 6.52 (i.e., $6,520) away from the predicted salaries.

About 68% of the data are within one “standard error of estimate” of the least-squares line

2 (b): Computing the R-Squared, Adjusted R-Squared

|Prediction |Unexplained |Unexplained |

|Method |SS |Variance |

| |[pic] |[pic] |

|Sample Mean | | |

| |[pic] |[pic] |

|Regression | | |

| |[pic] |[pic] |

|Ratio | | |

| | |[pic] |

|R-Squared |[pic] | |

| | | |

|Unadjusted / | | |

|Adjusted | | |

Population Regression as Model of

Data Collection

Population Regression

Y = α + βX + ε

α + βX Population relationship, on average

ε Randomness of individuals

ε follows N(0, σ)

Inference for the Population Slope, β

Standard Error of the Slope

[pic]

Approximately how far the observed slope b is from the population slope, β.

Example (Salary vs. Experience)

Sb =

95% Confidence Interval for β.

Test of Significance of the Slope

Why Test?

Illusory Regression

seem to show a relationship, when, in fact, the population is just random.

Simulation: Samples of size n = 10

from a population with no relationship (correlation 0)

Sample correlations are not zero!

[pic]

Regression and Causality

Regression does not imply causality

Example:

-----------------------

X

The point of the sample means

Y

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download