Chapter 11
Chapter 11
|Correlation and Regression |
Estimation of Relationship
Statistics as a Core Knowledge of Management
“The social scientist will acquire his dignity and his strength when he has worked out his method. He will do that by turning into opportunity the need of the great society for instruments of analysis by which an invisible and most stupendously difficult environment can be made intelligible.” (Walter Lippman in Public Opinion)
Issue:
“Understanding the population using multivariate data.”
Regression
The framework:
Identify one variable as the dependent variable.
The rest are independent variables (predictors, explanatory variables, etc.)
Goal of data analysis:
a. Forecasting
Estimation of the effect
Examples for bivariate data:
2 Income and Vacation Expenditures
Target Population: families living a city
5 Sales Area and Sales
Target Population: All stores of a chain
7 Price and Unit Sales for a New Shampoo Formula
Target Population: All test settings
8 Distance to the UW Campus and Apartment Rent
Target Population: All apartments in Seattle
Four Steps for Regression
A. Scatterplot and Correlation Coefficient
B. Least Squares Regression Equation
C. Specification of Population Regression and Data Collection Model
D. Statistical Inference on the Population Regression Coefficients
Scatterplot
• Display of Bivariate Data
Y = Salary vs. X =Years Experience (n = 6 employees)
We learn from the plot:
3 Increasing
(b) Linear
(c) Not exact , but strong
(Correlation r = 0.8667)
Example 1: Mortgage Rates & Fees
Y = Interest Rate vs. X = Loan Fee
Description of the Relationship
a)
b)
c)
(Correlation is r = – 0.441)
Example 2: The Stock Market
Y = Today’s vs. X = Yesterday’s Percent Change
(Correlation is r = 0.18)
Example 3: Maximizing Yield
Y = Output Yield vs. X =Temperature for an industrial process.
[pic]
(a)
(b)
(c)
(Correlation r = – 0.0155)
Example 4 : Telecommunications
Y = Circuit Miles vs. X = Investment (lower left)
For telecommunications firms
[pic]
Special Feature: Unequal Variability
Variability is stabilized by taking logarithms.
Correlation r = 0.820 after the log transformation
Example 5: Cost and Quantity
Y = Cost vs. X = Number Produced
[pic]
Special Feature: An outlier is visible:
• The correlation coefficient is unreliable
Scatterplot and Correlation Coefficient
Computer Demonstration
Computing the Sample Covariance and Correlation Coefficient, r
Formula (see text page 439)
[pic]
[pic]
Example:
n = 6
SDX = SDY =
Covariance =
The Least-Squares Regression Line
Y=a+bX
the regression line summarizes scatterplot for predicting Y from X
least squares line: the line that minimizes the sum of squared deviations (actual Y - predicted Y)
a) computing the slope and the Intercept
slope [pic] = 15.318
intercept [pic] = 1.673
Excel Regression Menu Output
Performance of the Regression Line
Checking by Individual Data Points
7 Predicted Value
8 Residual = Data Value – Predicted Value
Overall Measures
10 Standard Error of Estimate
11 R-Squared or Adjusted R-squared
1. Predicted Values and Residuals
Predicted Value comes from Least-Squares Line
Mary (with 20 years of experience) has predicted salary
15.32+1.673(20) = 48.8
Residual = actual Y - predicted Y
Mary’s residual = 55 – 48.8 = 6.2
She earns about $6,200 more than the predicted salary for a person with 20 years of experience
A person who earns less than predicted will have a negative residual
2. Understanding the SE of Estimate & R-Squared
2 (a): Computing the Standard Error of Estimate
Approximate size of prediction errors (residuals)
[pic]
Example (Salary vs. Experience)
Se = 6.52
[pic]
Interpretation
Actual salaries are about 6.52 (i.e., $6,520) away from the predicted salaries.
About 68% of the data are within one “standard error of estimate” of the least-squares line
2 (b): Computing the R-Squared, Adjusted R-Squared
|Prediction |Unexplained |Unexplained |
|Method |SS |Variance |
| |[pic] |[pic] |
|Sample Mean | | |
| |[pic] |[pic] |
|Regression | | |
| |[pic] |[pic] |
|Ratio | | |
| | |[pic] |
|R-Squared |[pic] | |
| | | |
|Unadjusted / | | |
|Adjusted | | |
Population Regression as Model of
Data Collection
Population Regression
Y = α + βX + ε
α + βX Population relationship, on average
ε Randomness of individuals
ε follows N(0, σ)
Inference for the Population Slope, β
Standard Error of the Slope
[pic]
Approximately how far the observed slope b is from the population slope, β.
Example (Salary vs. Experience)
Sb =
95% Confidence Interval for β.
Test of Significance of the Slope
Why Test?
Illusory Regression
seem to show a relationship, when, in fact, the population is just random.
Simulation: Samples of size n = 10
from a population with no relationship (correlation 0)
Sample correlations are not zero!
[pic]
Regression and Causality
Regression does not imply causality
Example:
-----------------------
X
The point of the sample means
Y
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related searches
- chapter 11 psychology answers
- philosophy 101 chapter 11 quizlet
- developmental psychology chapter 11 quizlet
- chapter 11 psychology quizlet answers
- psychology chapter 11 quiz quizlet
- chapter 11 personality psychology quizlet
- chapter 11 management quizlet
- 2 corinthians chapter 11 explained
- 2 corinthians chapter 11 kjv
- chapter 11 lifespan development quizlet
- the outsiders chapter 11 12
- chapter 11 and pension plans