Outliers, Leverage & Influential points in regression
Outliers, Leverage & Influential points in regression
A famous data set found in Freedman et al. (1991) ‘Statistics’ refers to the percapita consumption of cigarettes in various countries in 1930 and the death rates (number of deaths per million people) from lung cancer for 1950. Here you see the data and the scatter plot with two regression lines. In one of the regression lines all the 11 observations are considered, in the other (dotted line) the observation corresponding to the USA was not involved in the calculations
| |[pic] |
|Country Cigarette Deaths | |
|Per capita p.mill. | |
|1 Australia 480 180 | |
|2 Canada 500 150 | |
|3 Denmark 380 170 | |
|4 Finland 1100 350 | |
|5 Great Britain 1100 460 | |
|6 Iceland 230 60 | |
|7 Netherlands 490 240 | |
|8 Norway 250 90 | |
|9 Sweden 300 110 | |
|10 Switzerland 510 250 | |
|11 USA 1300 200 | |
Notice that the lines are very different and also the value of the R-square is very different (see computer output below) and all the difference was made just by one point or observation.
|Regression with all the data: |Regression without the U.S.A. |
|The regression equation is |The regression equation is |
|y = 67.6 + 0.228 x |y = 9.1 + 0.369 x |
|R-Sq = 54.4% |R-Sq = 88.9% |
A point that makes a lot of difference in a regression case, is called ‘an influential point’.
Usually influential points have two characteristics:
• They are outliers, i.e. graphically they are far from the pattern described by the other points, that means that the relationship between x and y is different for that point than for the other points. In this case the death rate for the USA is lower than what we could have expected from the high cigarette consumption (probably health care issues are involved in this)
• They are in a position of high leverage, meaning that the value of the variable x is far from the mean [pic]. Observations with very low or very high values of x are in positions of high leverage.
In this case the USA is an outlier and is in a position of high leverage, those are the reasons behind the USA being an influential observation in the regression. Outliers that are not in a high leverage position or high leverage points that are not outliers do not tend to be influential.
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- worksheet on correlation and regression
- structural equation modeling purdue university
- linear regression
- answers to the regression examples at the end of lecture 5
- linear regression problems
- steps for conducting multiple linear regression
- outliers leverage influential points in regression
- case study logistic regression
- regression analysis simple
- example of three predictor multiple regression
Related searches
- influential people in history
- influential people in american history
- most influential americans in history
- the most influential people in history
- 100 influential people in history
- 100 most influential people in history
- most influential people in us
- most influential people in us history
- top 10 most influential people in history
- most influential philosophers in history
- influential people in world history
- 100 most influential americans in history