Outliers, Leverage, and Influence

Outliers, Leverage, and Influence

James H. Steiger

Department of Psychology and Human Development Vanderbilt University

James H. Steiger (Vanderbilt University)

Outliers, Leverage, and Influence

1 / 45

Outliers, Leverage, and Influence

1 Introduction

2 Significance Tests for Outliers and Influential Cases

An Outlier Test

A Significance Test for Influence 3 Problems with Multiple Outliers

Masking

Swamping

4 What Should We Do with Outliers?

Contaminated Observations

Rare Cases

An Incorrect Model

5 Remedial Actions

James H. Steiger (Vanderbilt University)

Outliers, Leverage, and Influence

2 / 45

Introduction

Introduction

After an initial fit of a model (including possible nonlinear transformations of X and or Y ), it may become clear that certain observations are unusual in the sense that they are extremely atypical of X and or Y , either in the univariate or bivariate sense. In this module, we develop and discuss some of the common techniques and related terminology connected with the identification and evaluation of unusual observations.

James H. Steiger (Vanderbilt University)

Outliers, Leverage, and Influence

3 / 45

Introduction

Introduction

In our examination of diagnostics based on residuals or possibly rescaled residuals, we begin by recalling that our basic model includes a model for the regression errors and, in a sense, the residuals themselves. If the fitted model does not give a set of residuals that appear to be reasonably in agreement with the model for those residuals, then we must question the model and/or its assumptions.

James H. Steiger (Vanderbilt University)

Outliers, Leverage, and Influence

4 / 45

Introduction

Introduction

A related issue is the importance of each case on estimation and other aspects of the analysis. In some data sets, the observed statistics may change in important ways if just one case is deleted from the data. We will develop methods for detecting and identifying such influential cases. This will involve two types of diagnostic statistics, distance measures and leverage values. Although our emphasis will still be graphical, we can also develop numerical indices and related statistical tests.

James H. Steiger (Vanderbilt University)

Outliers, Leverage, and Influence

5 / 45

A Graphical Demonstration

Introduction

It will help motivate our discussion if, before we dive into technicalities, we get "the big picture" in terms of a simple example. So let's start by digressing to a graphical example in RStudio. Open RStudio, make sure the manipulate package is installed, then, in the console window, type

> source("") This will open up a demonstration scatterplot, with sliders. If the sliders are not visible, click on the cog icon at the upper left of the graphics window.

James H. Steiger (Vanderbilt University)

Outliers, Leverage, and Influence

6 / 45

Introduction

A Graphical Demonstration I

You'll see a scatterplot of 20 points on two variables. One of the points is marked in red, and has a value of X = 0, Y = 1.6. The regression line for the points is plotted in blue, and at the top of the plot, 3 statistics for this red point are given. These statistics are:

James H. Steiger (Vanderbilt University)

Outliers, Leverage, and Influence

7 / 45

Introduction

A Graphical Demonstration II

1 Leverage. This is a measure of how unusual the X value of a point is, relative to the X observations as a whole. Leverage of a point has an absolute minimum of 1/n, and we can see that the red point is right in the middle of the points on the X axis, and has a residual of 0.05.

2 Studentized Residual. This is a measure of the size of the residual, standardized by the estimated standard deviation of residuals based on all the data but the red point. The red point is a barely detectable smidgen below the regression line, and has a Studentized Residual of -.025.

James H. Steiger (Vanderbilt University)

Outliers, Leverage, and Influence

8 / 45

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download