OLS Assumptions - Webs



OLS Assumptions |Parameter Estimates |Centering | |

|No measurement error (biased estimates: inflated) |( = “true” intercept |a = population intercept |How: subtract each X from the mean of X (same for Y) |

| | | |Run OLS with the new values |

|No specification error |(( = “true” slope |b( = population slope | |

|Non-linear relationship modeled (biased estimates) |( = “true” error |e = population error |Why: might make results easier to interpret |

|Relevant X’s excluded (biased estimates) |“true” regression equation: for the total population |Consequences: slope (b) no change; intercept (a) |

| | |changes |

|Irrelevant X’s included (inflated standard errors) |estimated regression equation: for sample | |

|Error term assumptions | |Variance (standard error) of Slope Estimate |

|Homoskedasticity (variance of error term is constant) |Criteria used by OLS for fitting a line |Tells you how stable the line is |

|(if not, inflated/deflated standard errors) | | |

| |Basic idea is to choose estimates b1 to bk to minimize the |High variance (points compacted around the line): |

| |sum of the squared residuals (errors) |Bad ( line is less stable |

|No autocorrelation (residuals are not correlated) | | |

|(if not, inflated/deflated standard errors) | | |

| | |Low variance (points more spread around the line): |

| | |Good ( line is more stable |

|Residuals average out to 0 (this is built into OLS) |Problems with the Slope (parameter) Estimate | |

|Covariance between residuals and ind var = 0 |1.Biased slope estimate: over an infinite number of |As sample size increases, variance usually decreases |

|(if leave relevant var out, biased estimates) |samples, the estimate will not equal the true population | |

| |value | |

| | | |

|Error terms are normally distributed |2.Want efficient estimators: unbiased estimator with the |Forcing the Intercept Through the Origin |

| |least variance | |

| | |If theory predicts that if X = 0, Y should = 0 |

|R2 | |Not a good idea: |

|% of the variance in Y that is “explained” by X |Standardized Estimates (Beta Weights) |1) changes the slope (strength of the relationship) |

|Measures goodness of fit |Change in Y in standard deviation unites brought about by |2) can’t test H0: ( = 0 |

|R2 = Regression Sum of Squares (RSS) Σ (y hat - mean y)2|one standard deviation unit change in X | |

|Total Sum of Squares (TSS) Σ (yi - mean of y)2 | | |

| | |3) won’t work if you really have curvilinear rel |

| |Units are lost in the transformation (now in std dev units)|4) maybe it doesn’t make sense to talk about the line |

| | |at all if X = 0 |

|Problems with R2: |Hard to convey meaning to the reader ( b/c std dev units) | |

|1) not a measure of magnitude of rel btwn X & Y | |5) may have a bad sample; makes it appear sig. |

|2) dependent/vulnerable on the std dev of X & Y |Functional Transformations of Independent Variable |6) if you force the line you deny the chance to see if |

|can’t compare across samples | |there is something wrong with the model and if the |

|biased in small samples | |model actually predicts an intercept of 0 |

| |Used if there is a non-linear relationship btwn X & Y | |

| |Log (X) | | |

| |√X | | |

|3) addition of any variable will increase R2 | | |7) costs of leaving a in are minor compared to taking |

|include variables only because of theory | | |it out |

| | | | |

| | | |8) R2, slope, intercept all change; difficult to |

| | | |interpret |

|Confidence Intervals for β | | | |

|Over all samples, 95% of the computed confidence | | |Parameter Estimates & Degrees of Freedom |

|intervals will cover the true β | | | |

| |X2 |[pic] |Degrees of Freedom: n-k-1 |

|βi hat ± (t(/2)(std error βi hat) | | |Parameters: k+1 (for the intercept) |

| | | | |

|Confidence Intervals for E(Y0) | | |t-statistics and One Tailed Tests |

|Over all samples, 95% of the confidence intervals will | | | b . |

|cover the true Y value | | |Std err of b |

|Each Y value has its own confidence interval |To interpret the results, the new values would be plugged |One-tailed tests should be used if the researcher’s |

| |into the equation to get predicted y values |theory suggests that the relationship between the two |

| | |variables goes in a specific direction |

|Extrapolation is when you predict a value for Y with an | | |

|X that is not actually in your sample | | |

| | | |

| |Standard Error of Regression (Std Dev of Residuals) | |

| |(Root Mean Sqd Error; Std error of Estimate) | |

|Adjusted R2 | |P values |

|Adj R2 = (R2 – k/n-1) (n-1/n-(k+1)) | |A p value is the lowest significance level at which a |

| | |null hypothesis can be rejected; they are connected |

| | |with the t statistic |

|Since adding variables inflates the R2, adjusted R2 |√ | Σei2 . |√ |Mean std. error | |

|takes the degrees of freedom into account to fix this | |n-(k+1) | | | |

| | | | | | |

|Dummy Variable |Measures the goodness of fit |To determine if β ≠ 0, we would find the actual |

| | |probability of obtaining a value of the test statistic |

| | |(p value for the t statistic) as much as or greater |

| | |than that obtained in the example; we can accept or |

| | |reject the hypothesis on the basis of that number (.05 |

| | |needed) |

|Changes the intercept |Alternative to using the R2 | |

|The two groups start at different points, but have the |Not dependent on the standard deviations of X or Y | |

|same slope | | |

| |Just one number for each equation (in Y units) | |

| |Can compare across samples | |

|Interactions | |To determine if β > 0, a one-tailed test is needed; to |

| | |do this, only one half of the area under the graph is |

| | |analyzed (1/2 of the p value) |

|Changes the slopes |Multicollinearity (MC) | |

|The difference in slopes between the groups |X1 can be predicted if the values of X2 and X3 are known | |

|Should calculate predicted Y values to see impact |Can’t tell which variable is actually having an impact | |

|Use the means to calculate the predicted values for the |It exists in degrees and the magnitude of it determines |Standard Error of Parameter Estimate |

|variables used in the interaction (4 equations) but |whether or not it is a problem | |

|include all the variables | | |

| | |√ |Σei2/n-(k+1) | |

| |Inflates the standard errors: all look more significant | |Σ(xi – x hat)2 | |

|Possible interpretations (ex: interaction of |Diagnose it using VIF (variance inflation factor) Scores : | |

|gender/feeling) |scores of 4 or 5 usually cut off point for problems; higher| |

| |scores are problematic | |

|Among Republicans, women like Clinton even less | |Variance Inflation Factor (VIF) |

|Among women, Republicans like Clinton even less | |VIF= | 1 . |

| | | |1-auxR2 |

|There is an additional effect of gender on feelings |High Aux R2 (> .75) indicates high MC: you can explain a | | |

|towards Clinton |lot of X1 with the other variables | | |

| | | |

| |What to do about it: |Finding the Standard Error |

|Outliers |Get more data; pool data (but that is problematic) |Standard error of regression |

| |Combine variables (ex: socioeconomic status combines | Σ(xi – ()2 |

| |education, income, and occupational prestige which alone | |

| |tend to be highly correlated) | |

| | | |

| | |Miscellaneous Info |

| |Drop one X |Adjusted R2 ↑ Standard Error of Reg ↓ |

| |Don’t’ do this: only added it because it was theoretically |Sample Size is total df from ANOVA table + 1 |

| |important | |

| | |If std error of estimate is inflated, t will drop, p |

| | |goes up, keep Ho when it should be rejected |

| |If you drop a relevant right hand side (RHS) variable you | |

| |get biased parameter estimates | |

| | |If std error of estimate is deflated, t will go up, p |

| | |goes down, reject Ho when it should be kept |

| |It is acceptable to run 2 models: | |

| |1) with all variables |b = |Σ (xi – mean x) (yi – mean y) |

| |2) with some dropped variables—showing that it could be |Σ (xi – mean x)2 |

| |misestimating | |

| | | |

| |You are giving full information (important) | |

| | | |

| |How to deal with outliers ( Diagnose them |

| |1. DFBETA measures if there is a change in a particular |SPSS can give you all of these numbers |

| |variable when a particular case is removed; if the value is| |

| |bug, the case has a large impact | |

| | | |

| | | |

| |2. Cook’s Distance (Cook’s D) measures the influence of an |Cook’s D: |Di > 4 . |

| |observation on the model as a whole | | |

| | |n – (k+1) |

| |3. look at abs value of standardized residuals; helps flag | |

| |cases; ІeІ > 3 indicates it is pretty far off the line | |

| | | |

|Missing Values |F-test (limitations listed below) |Heteroskedasticity |

|1. listwise deletion: if a value for any X is missing, |Testing to see if the variables are significant |When variance of error term is not constant; violation |

|entire case deleted (lose lots of data) | |of OLS assumptions: Var(() = constant |

| |Null hypothesis: none of the variables have a significant | |

| |impact (b1=b2=b3=bk) | |

|2. pairwise deletion (not a great idea, but not evil): | |When to suspect it: |

|uses info in pairs of variables to estimate slope | | |

|coefficients | | |

| |Alt. Hypothesis: at least one variable has a significant |Pooled data |

| |impact | |

| |When H0 is false Mean Reg Sum of Sqrs > Mean Err SS |Learning curves with coding |

|Problem: get different sample size for each variable, | |Fatigue effects |

|some variables have more/less info | | |

| |Durbin Watson Scores: H0 ( No autocorrelation (AC) |Whenever think some portion of data will be better |

| | |reported/predicted |

|3. substitute mean (mean substitution) NOT GOOD IDEA: | AC No AC | |

|will bias over time; substitute the mean value for |AC | |

|missing ones | | |

| |Reject ? Fail to Reject |Predictive power of model not consistent across cases |

| |? Reject |or variables |

|No positive value in doing this | 0 2 |Consequences |

| |4 | |

|No guarantee that the mean of missing values will be the|Autocorrelation |Inflated standard error (conclude there is less sig) |

|same as for the ones you have values for—could be | | |

|putting the wrong value in | | |

| |Residuals are correlated; usually happens with time series |If small var(e) are located away from mean X std error |

| |data; can inflate/deflate standard errors |is too large, underconfident that |

| | |b ≠ 0 |

|4. predict the value that is missing **best option**: |Diagnosing: | |

|run regression with missing var. as dep variable—use | | |

|actual info to make an educated guess | | |

| |1. scatterplot: look for pattern in residuals |Deflated standard error (conclude there is more sig) |

| |2. regress residuals on previous residuals |If large var(e) are located away from mean X, reported |

| | |std error will be to low |

| |ei = ρei-1 + μi ( looking for sig ρ | |

|Autocorrelation |How much bias is indicated? |How to diagnose it: |

| |ІρІ % bias induced |Scatter plot |

| |.0 0% | | |

| |.2 3% | | |

| | |.5 8% | | |

| | |.8 19% | |

| | |.9 29% |Goldfield/Quandt |

| | |3. Durbin Watson Scores |Order observations by suspect x |

| | |ρ ^d ^d = 2 – 2 ρ |Throw out middle observations |

|↑ variance |↓variance, ↑stable, ↓std |0 2 |Run 2 models using all original x’s |

| |error | | |

| |1 0 |Mean residual SS1 |( F |

| | |Mean residual SS2 | |

|Heteroskedasticity | -1 4 | | |

| | |Durbin Watson: save in SPSS, look up critical value |Look up value on F chart to see if > than critical |

| | | |value |

| | | | |

| | |F-test limitations |Limitation: only diagnoses at ends, not middle |

| | |Does not tell you which is significant, just that something| |

| | |is | |

| |If have insignificant F, all b insignificant |Glejser |

| | |Save residuals |

| |Logit |Run new regression with abs residual as DV and suspect |

| | |x as only IV |

| |OLS not for dichotomous DV | |

| |Values > 1 & < 1 which are not options; actual values are |If new parameter est sig, have hetero(can predict with |

| |only 0 & 1 |residual |

| | | |

| |Induces heteroskedasticity—residuals clustered in the |Limitations: can’t check middle, only works for linear |

| |middle, but no actual values there | |

| | | |

| |Choice functions tend to be |White’s |

| | |Save residuals |

| | |Regress all IV, squares (but not dummies) & all |

| | |potential interactions on residuals |

| |Can’t model probability as a straight line, if do, | |

| |misspecify model and bias parameter estimate | |

| | |n*R2 (χ2 |

| | |df = # of regressors |

| | |Limitations: could be overkill; doesn’t tell you where |

| | |the problem is; since have so many variables, some |

| | |could be randomly significant and you could diagnose |

| | |when not really problem |

| | | |

| | | |

| | | |

| | | |

| | |WLS |

| | |Assumes that best info in data is in the observations |

| | |with least variance in error terms |

| | | |

| | |Weighs some observations more than others |

| | |Divide all by √ hetero variable |

| | |Pure hetero should not bias parameter estimates, but |

| | |could be indication of measurement/specification |

| | |problems and correcting for it could bias parameter |

| | |estimates |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

| | | |

-----------------------

Report from whole dataset and add footnote (id’d some outliers but they did not make a difference)

No

Report both

Yes

Sig difference in results?

Explained away

Look harder for explanation

Analyze with and without

Yes ІeІ > 3 std dev from Ø

moderate

Are they severe?

No

Delete, explain in footnote

Yes

Explain as anomalies?

Yes

Stop

Explained?

No

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download