STAT 101, Module 3: Numerical Summaries for



STAT 101, Module 4: Fitting Straight Lines

(Textbook: Section 3.5)

From Linear Association and Correlation to Straight Lines

• What do we mean exactly when we say the variables x and y are linearly associated? One hypothetical answer is as follows:

If we had lots of y-values for each x-value, and if we formed the means of the y-values at each x-value, then we would say that y is linearly associated with x if the y-means fell on a straight line:

mean(y|x) = a x + b

Read “mean(y|x)” as “mean of y-values at x”. Instead of “at x” we also say “conditional on x”.

• Example: The variable Height in PennStudents.JMP has its values rounded value to half inches. If we round to full inches, we get several Weight values for each Height value, hence we can form the means of the Weights for each value of Height. This is depicted in the plot below:

o the fine dots show the Heights and Weights of individual cases, and

o the fat dots show the means of the Weights for each value of Height.

If we ignore the left-most two points and the right-most point (for which there are only single cases, hence no averaging), we find that the fat dots follow a straight line quite closely. If we had more cases (N larger), the fat dots might follow the line even more closely.

[pic]

Thus we say: Weight is linearly associated with Height if the conditional means of Weight fall near a straight line equation in Height, which is the case for this dataset.

The Straight Line Equation: Interpretation and Fitting

• The equation for the straight line in the above plot is:

mean(Weight | Height) = 5.412 Height – 216.650

Some comments and definitions:

o The coefficient 5.412 is called the slope.

It expresses the fact that a difference of 1 inch in Height corresponds to a difference of 5.412 lb in Weight on average!

[Note that by talking about differences we avoid the causal trap: we are not saying that “increasing Height increases Weight”. This would be nonsense anyway because we can’t increase Height.]

o The coefficient –216.650 is called the intercept. It would express that students with zero Height would on average have a negative weight of –216.650 lb, which is obvious nonsense. The reason is that zero Height is an unrealistic extrapolation, that is, reaching outside the range of the data. This constant is simply needed to express the best fitting line on the range of the data (Heights from about 60in to about 76in ).

o Together, slope and intercept are called the regression coefficients for the regression of Weight on Height.

o Regression means fitting a straight line to data.

o The variable Height is the predictor variable or simply the predictor. It is also called the x-variable.

o The variable Weight is the response variable or simply the response. It is also called the y-variable.

(We avoid the terms “independent” and “dependent” variables for x and y because these terms conjure up causality: “y depends on x” is proper language only in the controlled experiments familiar from high school science labs. Yet you should know which means x and y because the terms are so common even though misleading.)

o The fitted line is written in general as

[pic]

where in our case x=Height and y=Weight, and

b1 is the slope and b0 is the intercept.

If one evaluates the equation at the observed values xi , one writes

[pic].

The ‘hat’ on the y is meant to indicate an estimate or prediction of the y-values, not an actually observed y-value. Given the value of the x-variable, [pic]is our best guess of the location of the y-values.

• Q: How does one fit a straight line to x-y data?

How can these values, the slope 5.412 and the intercept –216.650, be found?

A: With the Least Squares or LS method.

[pic]

In detail: Form the so-called Residual Sum of Squares,

[pic]

The name of this quantity derives from the name residual for the deviation of the response value from the straight line estimate:

[pic],

and therefore:

[pic]

This quantity really depends on the choice of the slope b1 and the intercept b0 , hence we should write it as

RSS(b0,b1).

Imagine playing with the values of b0 and b1 till you can’t get the RSS any lower. For the Height-Weight data, for example, if we choose b1 = 5 and b0 = –200, we get RSS = 221,406.8; if, however, we choose b1 = 5.412 and b0 = –216.650, we get RSS = 170,935.4. It turns out that this is the lowest RSS value for the Height-Weight data: no other combination of values for b0 and b1 can beat it. This

Here is an applet that lets you play interactively with the slope b1 and the intercept b0:

It show five data points in red and a blue line. You can move the line up and down by gripping the intercept point, and you can play with the slope by gripping the right hand part of the line. The plot shows the squared residuals as actual squares attached to the residuals, so you can think of the RSS as the sum of the areas of the squares. The applet allows you to move the red data points also. The beauty is that you see the residuals and the RSS change in realtime as you manipulate the data points (xi,yi) and b0, b1. Unfortunately, the applet does not compute the exact Least Squares line.

(The same applet also allows you to change the fitting criterion: instead of the RSS, you can choose what we may call the RSA: the “residual sum of absolute values”. There is even a third criterion, the vertical sum of squared distances.)

You should absorb the idea that by playing with slope and intercept one can obtain a straight line that is nearest to the data in the sense that the RSS is made smallest. This is called the Least Squares method, and the coefficients b0, b1 that minimize RSS(b0, b1) are called the Least Squares or LS estimates of the intercept and slope.

Here is another applet that allows you to place or enter as many data points as you like:



You can make a guess as to the LS line, and you can then ask for the actual LS line to be shown. The coolest part is that you can move the data points around and the LS line follows in realtime. The drawback of this applet is that it does not show the RSS of your guessed line.

• Q: Why squared residuals? Why not absolute values of the residuals?

A: Once again, squares are good for doing algebra. Below we will give explicit formulas for the LS estimates of b0 and b1. There exist deeper reasons that have to do with the bellcurve, but for this “stay tuned.” (Minimizing the sum of absolute values of the residuals can be done also, but there are no explicit formulas. The first of the above applets lets you play with the RSA.)

• Q: Why vertical distances and not orthogonal distances?

A: Because we want to predict Weight from Height. That is, the formula b1x+b0 where x=Height should produce a value that is close to y=Weight, but this means vertical distance in the plot: the distance between the numbers y and b1x+b0 means the distance between the data point (x,y) and the point on the line (x, b1x+b0).

The first of the above applets lets you play with orthogonal distances also. The result is not regression for prediction but something that is called “principal components”, which has an entirely different use.

If you think about it, orthogonal distance is messy. Do you remember working with formulas for orthogonal distances of points from lines in high school? They involve Pythagoras’ formula which blends horizontal and vertical distance into oblique (=tilted) distance.

Regression in Practice

• Data example: Diamond.JMP

• JMP: Analyze > Fit Y by X > … (select variables as usual) > OK

(click little red triangle in top left of the scatterplot window) > Fit Line

Aesthetics: thicker line; (right-click on scatterplot) > Line Width Scale > 2

• Output: only relevant part is the equation under “Linear Fit”

Here:

Price (Singapore dollars) = -259.6259 + 3721.0249 Weight (carats)

• Interpretations:

o Slope: 3721 (rounded)

“diamonds that differ by 1 carat in weight differ on average by S$3721 in price”

Is this meaningful?... (check the range of the weights)

→ need to change units

“diamonds that differ by .1 carat in weight differ on average by S$372.1 in price”

Or, with ‘points’ = 1/100 carat, often used by traders:

“diamonds that differ by 1 point in weight differ on average by S$37.21 in price”

o Intercept: –259.6 (rounded)

“diamonds with zero weight have on average a price of S$–259.6”

Not meaningful: again a case of extrapolation

The intercept is needed to produce the best fitting line in the range of the data. (What is the range here?)

• Predictions: The linear equation can be used to estimate/predict average prices.

o JMP: (click little red triangle below scatterplot, next to “Linear Fit”) > Save Predicteds

This produces a new column with a formula that describes the fitted straight line.

o Every observed weight in the data has now its “estimated average price” in this new column.

o For predictions/estimated average prices for weights that are not in the dataset, add new rows to the data:

JMP: Rows > Add Rows… > … > OK

Now enter the weight values you are interested in the new rows, and the predictions will be calculated instantly.

For example, if we enter a weight of 0.30 carats, the predicted price is shown as S$854.68. For a weight of 0.38 carats, the predicted price is S$1154.36. Finally, for a weight of 0.10 carats the prediction is S$112.48.

• Comments on Extrapolation: None of the weights 0.10, 0.30, 0.38 exists in the data, hence these are true predictions. The value 0.38 is a slight extrapolation on the high side as the highest weight seen in the data is 0.35. The predicted/estimated mean price of S$1154.36 is higher than the highest price S$1086 seen in the data. Similarly, the weight 0.10 requires a slight extrapolation, this time on the low side as 0.12 is the lowest weight seen in the data. Again, the predicted/estimated mean price of S$112.48 is lower than the lowest price S$223 seen in the data.

Q1: Which of the two extrapolations would you trust less? Imagine you were the seller of two diamonds of weight 0.10 and 0.38 carats, respectively.

Q2: In general, how would you expect prices to deviate from the estimated line between 0.00 and 0.12 carats, and above 0.35 carats, respectively? Make the scatterplot with the fitted line, extend the x-range to include 0.00 and about 0.45 carats, as well as the y-range to include S$0 and about S$1,600

Rule: Know the ranges of the observed x-values. Knowing the ranges of the y-values is good also, but extrapolation is defined in terms of x.

Quantiles Weight (carats)

| | | |

|100.0% |maximum |0.35000 |

|90.0% | |0.29300 |

|75.0% |quartile |0.25000 |

|50.0% |median |0.18000 |

|25.0% |quartile |0.16000 |

|10.0% | |0.15000 |

|0.0% |minimum |0.12000 |

Quantiles Price (Singapore dollars)

| | | |

|100.0% |maximum |1086.0 |

|90.0% | |865.8 |

|75.0% |quartile |661.0 |

|50.0% |median |428.5 |

|25.0% |quartile |336.5 |

|10.0% | |315.9 |

|0.0% |minimum |223.0 |

And the Formulas are…

The LS estimates of slope and intercept can be obtained through explicit formulas:

[pic] and [pic]

(See the Textbook, p.71.) Deriving these formulas requires derivatives from calculus and will not be done here.

We will never hand-calculate these formulas from the raw x and y columns, because this is what JMP is for. We will, however, do some minor algebra. For example, it is easy to see from the formula for the correlation coefficient that

[pic]

It follows further:

s(x) = s(x) = 1 and [pic] → b1 = c(x,y) and b0 = 0 .

In particular, if we regress the z-scores of y on the z-scores of x, the least squares line equation is

[pic].

That is, after standardization of both x and y, the LS line runs through the origin and its slope is the correlation.

Some Weirdness: predicting y from x and x from y

Observation: Since we measure distance between the data points and the line in terms of vertical distance, there seems to exist an asymmetry between how we treat x and y.

Q: Comparing

• regression of y on x, that is, finding a formula in x that predicts y, and

• regression of x on y, that is, finding a formula in y that predicts x,

aren’t we getting the same lines?

A: No, we are not. The reason was given in the above observation. We can easily see the consequences in the regression of the z-scores:

[pic] (regress y on x)

[pic] (regress x on y)

If we solve the second equation to predict zy from zx, we get

[pic]

which is not the same as

[pic]

How sensible is this formula? Here are some special cases:

• If c(x,y) = 1, then the data fall on the straight line zy = zx , so of course the best prediction formula for zy is zx.

• Similarly for c(x,y) = –1.

• If c(x,y) = 0, then the x-values have no information for predicting the y-values with a linear formula. Hence the best prediction is [pic], ignoring x.

In general, note that in the formula for the slope and intercept,

if the correlation is zero, then

• the slope is zero, and

• the intercept is the mean of the y-values.

Hence the best one can do in the presence of a zero correlation is fitting a horizontal straight line at the level of the overall mean of the observed y-values. This is

Changing Units of the x- and y-variables

Problem: We have an equation [pic] that predicts precipitation (=y) from average temperature (=x) in a number of locations. Precipitation is measured in millimeter of rainfall (plus melted snow and hail and dew…), whereas temperature is given in degrees Celsius. We need to translate the equation from metric to US units. How?

Complete solution:

1. Write the starting regression equation more intuitively as

Prec(mm) = b1(mm/C) · Temp(C) + b0(mm) ,

where the parens indicate the units. An example is:

Prec(mm) = 0.558 mm/C · Temp(C) + 9.515 mm

(This equation is obtained from the dataset PhilaMonthlyTempPrec.JMP . Re-create this equation and interpret the regression coefficients.)

The target equation is

Prec(in) = b1(in/F) · Temp(F) + b0(in) .

2. Re-express the old units (mm and C) in new units (in and F). That is, re-express both Prec(mm) and Temp(C) in US units:

Prec(mm) = 25.4 ·Prec(in)

Temp(C) = 5/9 ·(Temp(F) – 32)





3. Substitute in the regression equation:

25.4 · Prec(in) = b1(mm/C) · ( 5/9· (Temp(F) – 32) )

+ b0(mm)

4. Solve for the respose in new units, Prec(in), regroup to separate into a new slope times the predictor in new units, Temp(F), and constants that form the intercept in new units:

Prec(in) = 1/25.4 · 5/9 · b1(mm/C) · Temp(F)

+ 1/25.4 · ( – 32 · 5/9 · b1(mm/C) + b0(mm) )

= 0.02187 · b1(mm/C) · Temp(F)

– 0.70 · b1(mm/C) + 1/25.4 · b0(mm)

5. Comparison with the target equation

Prec(in) = b1(in/F) · Temp(F)

+ b0(in)

yields:

b1(in/F) = 0.02187 · b1(mm/C)

b0(in) = –0.70 · b1(mm/C) + 0.03937 · b0(mm)

If it helps, you can make this more concrete by assuming some values, such as b1(mm/C) = .56 mm/C and b0(mm) = 9.6 mm.

Practice: Given a prediction formula

Prec(cm) = b1(cm/C) · Temp(C) + b0(C)

find the conversion to degrees Kelvin.

Practice: Given a prediction formula for quantity sold in million items, Q(mill), based on price in US dollars, P($),

Q(mill) = b1(mill/$) · P($) + b0(mill) ,

translate to quantitiy in thousands, Q(1000), and price in Euros, P(€), assuming the 2007/02/09 conversion of P($) = 1.30 · P(€).

Simplified Solution for simple multiplicative changes:

Often, as in the second practice example, the unit changes involve only simple multiplications, such as

Q(mill) = Q(1000)/1000

P($) = 1.30 · P(€)

In this case, the algebra is easy:

b1(1000/€) = 1000 · 1.30 · b1(mill/$)

b0(1000) = 1000 · b0(mill)

Measuring the Quality of Fit of Least Squares Lines

In principle, the RSS is the measure for the quality of the fit of the line to the data. If nothing is said to the contrary, the RSS is the value obtained by the Least Squares (LS) estimate, that is, the minimum achievable RSS(b0,b1). There is a problem with the RSS, however: we don’t know how much is much and how little is litte. This requires some standardization:

• One way of making the RSS more interpretable is by turning it into something like a standard deviation. One could divide it by N or N–1 or, actually: N–2.

[pic]

In spite of the division by N–2, think of se as having a division by N and hence an average of squared residuals under the root. If N is not tiny (greater than 30, say), the subtraction of 2 makes almost no difference.

Still, the intellectually curious will wonder why N–2. The partial answer is based on the following fact:

Fitting a straight line to residuals produces a zero slope and a zero intercept.

Checking what the conditions b1=0 and b0=0 mean, we see that they imply cov(x,e)=0 and mean(e)=0. These are two linear equations for the numbers e1, e2,…, eN . Similar to the argument for the division by N–1 in case of the standard deviation, we are in a situation where knowing N–2 of the N residuals enables us to calculate the remaining two residuals from the equations cov(x,e)=0 and mean(e)=0. Hence the division by N–2…

At any rate, think of se as the residual standard deviation. Its units are those of the response variable. This is therefore a measure of dispersion of the observed y-values around the LS line. Recall that the usual standard deviation is a measure of dispersion around the mean.

JMP calls se the “Root Mean Square Error” or “RMSE”, a term that is unfortunately very common. Our quibble is with the term “error”, which to us is not the same as residual, as we will see later. The term Root Mean Square Residual or RMSR would have been acceptable (it is also used but less often.)

• Another standardization of the RSS is by comparing it with the sum of squares around the mean. The RSS is the sum of squares around the LS line. Consider the following ratio:

[pic]

This ratio is a number between zero and one.

Why? Because:

numerator = min RSS(b0,b1)

denominator = RSS(b0=mean(y), b1=0)

Hence the numerator cannot be greater than the denominator, and both are non-negative.

The numerator measures how close the data fall to the line, while the denominator measure how close the data fall to the mean. Therefore, the ratio measures how much the LS line beats the mean: the smaller the ratio, the more the line beats the mean.

Q: When is the ratio +1, when 0?

By convention one reverses the scale by defining:

[pic][pic]

This is called “R Square” (JMP) or “R Squared”. Another way to express R2 is in terms of se2 and sy2 :

[pic]

If se2/sy2 can be thought of as “fraction of variance unexplained”, hence R2 can be thought of as the “fraction of explained variance”. Whatever, just get used to the words. It’s not a good term either. Some soften it to “fraction of variance accounted for by the regression”.

Now here is a minor miracle:

R2 = cor(x,y)2

We are not going to prove it, but it givese a new interpretation for the correlation: its square is the fraction of variance accounted for by the regression. The nearer the fraction is to 1, the better the line summarizes the association between x and y.

• JMP: R2 and RMSE (se2) are reported in the following table.

Summary of Fit

| | |

|RSquare |0.978261 |

|RSquare Adj |0.977788 |

|Root Mean Square Error |31.84052 |

|Mean of Response |500.0833 |

|Observations (or Sum Wgts) |48 |

The last two lines are self-explanatory: mean(y) and N.

We can ignore “R Square Adj” for now.

Diagnostics Check: Is Y Linearly Associated with X?

Straight lines can be fitted to any pair of quantitative variables x and y. Whether the fit makes sense is related to the question of whether the association is linear. Here is a diagnostic check that allows us to get a sense of whether the association is not really curved: draw a curve in the x-y scatterplot.

JMP: As usual, do Analyze > Fit Y by X > (select x and y) > (click the top left red triangle near the plotting area) > Fit Spline > Other… > 0.01, flexible (then play with the slider in the horizontal box at the bottom) (if there is any other output, close it by clicking on its blue icon)

By playing with the slider, we can make the fitted ‘spline’ curve more or less smooth or wiggly. Make the curve so smooth that there is no more than one inflection point, and preferably just a concave or convex curve. If the swings of the curve are large, comparable in magnitude with the residuals, then the association is more likely curved, not linear. (This is obviously not an exact science; it requires some practice.)

Examples:

Bivariate Fit of Price (Singapore dollars) By Weight (carats)

[pic]

[pic]

Smoothing Spline Fit, lambda=0.000001

| | |

|R-Square |0.986639 |

|Sum of Squares Error |28663.28 |

This curve is too wiggly. Make it smoother with a larger “lambda”. (The larger lambda, the smoother the curve.)

The following plot has a smoother curve that is almost a straight line. We have pretty much a linear association. The little bit of convexity is so small that it should not worry us. (Then again, it is consistent with the idea that convexity has to set in below 0.12 carats and above 0.35 carats. What are the reasons for this idea?)

Bivariate Fit of Price (Singapore dollars) By Weight (carats)

[pic]

[pic]

The following example is quite convincingly curved:

Bivariate Fit of MPG Highway By Weight (000 lbs)

[pic]

[pic]

Then again, we might be suspicious that the feather weight Honda Insight in the top left determines the curvature. As rule, one should not rely on individual points for any pattern. But even after removing (JMP: select, exclude, hide) the Honda Insight as well as the Toyota Prius with the second highest MPG, we still see curvature.

[pic]

Diagnostics Check: Residual Plot

We learnt how to check for non-linearity, at least informally, by fitting a curve and judging by eye whether a curve is needed or a straight line suffices to describe the data.

[pic]

Practitioners of regression go one step further by extracting the residuals from the regression and examining them separately. The idea is that if there is a linear association between x and y then the residuals should look ‘unstructured’ or random. The reason: Subtracting the line from the response should leave behind residuals that are entirely unpredictable even if one knows x.

The above plot illustrates the connection between the original x-y plot and a plot of the residuals against x, which one might call an “x-e plot”, but is called “residual plot”.

Recall that residuals have zero mean and zero correlation with x, hence a residual plot should show no structure if the x-y assocation is linear: knowing x should give us no information about e.

JMP has two ways to plot residuals versus x: either way, fit a line first, then click the red icon near “Linear Fit” below the x-y plot, then select

• Plot Residuals (a residual plot will appear below the output), or

• Save Residuals (creates a new column with residuals in the spreadsheet); plot the residuals against x with Analyze > Fit Y by X > (select x as X, the residuals as Y) > OK

The plots below show four examples of x-y plots with associated residual plots.

• The top left example is artificial and shows a perfect case of linear association.

• The bottom left example is real (Diamond.JMP) and also shows a satisfactory residual plot. Why satisfactory? Ask yourself whether it helps knowing x (Weight) to predict anything about the residuals. The answer is probably “no”.

• The top right example is artificial and shows non-linear, convex association. The residual plot is unsatisfactory because x has information about the residuals: for small and large x-values, the residuals are positive, and for intermediate x-values the residuals tend to be negative.

• The bottom right example is real (Accord-2006.JMP) and shows unsatisfactory residuals also. Similar to the preceding example, x (Year) has information about the residuals: small and large values of Year have positive residuals, and intermediate values of Year tend to have negative residuals.

(When judging the plots below, keep in mind that individual points do not make evidence. Also, it does not matter how the x-values are distributed. E.g., in the two real data example you see rounding in x, which is irrelevant.)

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

Quality of Prediction and Prediction Intervals

The predictions ŷ(x) = b0 + b1x would be more useful if one had an idea how precise it is. So one has to think about what “precision” might mean.

• Thought #1: The predictions are precise if the actual y-values are not far off from their predictions. In other words: predictions are likely to be precise if the residuals are small. Smallness, on the other hand, is judged with measures of dispersion such as the standard deviation. But we already have se , the standard deviation of the residuals, as such a measure. In fact, se is used to judge the prediction quality of a regression equation. (More on this later.) Obviously, other measures of residual dispersion could be used as well.

• Thought #2: One can go one step further by asking whether one couldn’t augment the predictions ŷ(x) with an interval around them. The idea would be to give an interval of the form “ŷ(x) ± constant”, and this interval would be constructed such that it is likely to contain a large fraction, such as 95%, of actual observations y: among all y-values observed at x, we would want that about 19 out of 20 satisfy any of the following equivalent conditions:

a) | y – ŷ(x) | ≤ constant

b) – constant ≤ y – ŷ(x) ≤ + constant

c) ŷ(x) – constant ≤ y ≤ ŷ(x) + constant

d) y ε [ŷ(x) – constant, ŷ(x) + constant ]

We will now execute on thought #2 by following up on the form b) of these conditions: it suggests that we could look for cut-off constants by examining the residuals ei = yi – ŷi of the data. More precisely, we could choose a constant such that 95% of the residuals fall between the lower and the upper cut-off. This sounds pretty much like finding the 2.5% and the 97.5% quantiles of the residual distribution. We would end up not with one cut-off but with two: a lower and an upper cut-off.

Here is by way of example with the diamond data:

• Fit the line as usual.

• Save the residuals into a new column in the spreadsheet.

JMP: (click the red triangle that belongs to the line, below the plot on the left) > Save Residuals

• Find the 2.5% and 97.5% quantiles of the residuals.

JMP: Analyze > Distribution > (select) Residuals Price> OK

The 2.5% quantile is shown as -78.36, the 97.5% quantile as 74.53

The prediction interval is therefore

[ŷ(x) – constant, ŷ(x) + constant ]

= [ (-260+3720·Weight) – 78, (-260+3720·Weight) + 75 ]

Since the values 78 and 75 are close, we are not losing much by stating the prediction interval as

(-260+3720·Weight) ± 78

in Singapore dollars. If the residual distribution were less symmetric, we would keep the two cut-off values distinct.

If we prefer another coverage, such as 90%, for the prediction interval we have to use the corresponding quantiles. Unfortunately the only other quantiles easily available in JMP are the quartiles, upper/lower 10% quantiles, upper/lower 0.5% quantiles, which lend themselves for prediction intervals with coverage 50%, 80%, and 99%, respectively.

Later, after we learn about the bell-curve, we will have another means of constructing prediction intervals, but they make more assumptions. The present quantile methods is universally applicable.

Potential problems with prediction intervals:

[pic]

[pic]

When residuals have problems, prediction has problems, too. Two kinds of problems are illustrated by the regression of ‘HealthCare…’ on ‘Education’ shown in these plots:

• The line shoots below the data points on the left. ‘HealthCare…’ does not seem to take on negative values, yet the line dips into the negative for about ‘Education’ < 2000.

• The point scatter exhibits a fan pattern, opening up from left to right.

(Try to ignore the two high points at ‘Education’ ≈ 2200 and 3000, respectively. As always, one or two points don’t make a pattern.)

Conclusions:

• The first bullet implies that prediction intervals around the line make no sense for low values of ‘Education’.

• The second bullet points to what is called non-constant variance or heteroscedasticity. This term means that the dispersion of the response is not the same across the values of the predictor. In the above plots, we see greater variance on the right and smaller variance on the left. One would therefore want a narrower prediction interval on the left (e.g., at ‘Education’ = 2400) than on the right (e.g., at ‘Education’ = 3400). Our method based on an upper and a lower cut-off creates a prediction band of constant width, which seems wastefully wide on the left.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download