Model Selection: General Techniques - Stanford University
[Pages:16]Statistics 203: Introduction to Regression and Analysis of Variance
Model Selection: General Techniques
Jonathan Taylor
- p. 1/16
Today
q Today q Crude outlier detection test q Bonferroni correction
q Simultaneous inference for
q Model selection: goals q Model selection: general q Model selection: strategies q Possible criteria
q Mallow's Cp
q AIC & BIC q Maximum likelihood
estimation q AIC for a linear model q Search strategies q Implementations in R q Caveats
s Outlier detection / simultaneous inference. s Goals of model selection. s Criteria to compare models. s (Some) model selection.
- p. 2/16
Crude outlier detection test
q Today q Crude outlier detection test q Bonferroni correction
q Simultaneous inference for
q Model selection: goals q Model selection: general q Model selection: strategies q Possible criteria
q Mallow's Cp
q AIC & BIC q Maximum likelihood
estimation q AIC for a linear model q Search strategies q Implementations in R q Caveats
s If the studentized residuals are large: observation may be an outlier.
s Problem: if n is large, if we "threshold" at t1-/2,n-p-1 we will get many outliers by chance even if model is correct.
s Solution: Bonferroni correction, threshold at t1-/2n,n-p-1.
- p. 3/16
Bonferroni correction
q Today q Crude outlier detection test q Bonferroni correction
q Simultaneous inference for
q Model selection: goals q Model selection: general q Model selection: strategies q Possible criteria
q Mallow's Cp
q AIC & BIC q Maximum likelihood
estimation q AIC for a linear model q Search strategies q Implementations in R q Caveats
s If we are doing many t (or other) tests, say m > 1 we can control overall false positive rate at by testing each one at level /m.
s Proof:
P (at least one false positive)
= P m i=1|Ti| t1-/2m,n-p-1
m
P |Ti| t1-/2m,n-p-1
i=1
=
m
m
=
.
i=1
s Known as "simultaneous inference": controlling overall false positive rate at while performing many tests.
- p. 4/16
Simultaneous inference for
q Today q Crude outlier detection test q Bonferroni correction
q Simultaneous inference for
q Model selection: goals q Model selection: general q Model selection: strategies q Possible criteria
q Mallow's Cp
q AIC & BIC q Maximum likelihood
estimation q AIC for a linear model q Search strategies q Implementations in R q Caveats
s Other common situations in which simultaneous inference occurs is "simultaneous inference" for .
s Using the facts that
N , 2(XtX)-1
2
2
?
2n-p n-p
along with 2 leads to
(
- )t(XtX)( 2
- )/p
2p/p 2n-p/(n - p)
Fp,n-p
s (1 - ) ? 100% simultaneous confidence region:
: ( - )t(XtX)( - ) p2Fp,n-p,1-
- p. 5/16
Model selection: goals
q Today q Crude outlier detection test q Bonferroni correction
q Simultaneous inference for
q Model selection: goals q Model selection: general q Model selection: strategies q Possible criteria
q Mallow's Cp
q AIC & BIC q Maximum likelihood
estimation q AIC for a linear model q Search strategies q Implementations in R q Caveats
s When we have many predictors (with many possible interactions), it can be difficult to find a good model.
s Which main effects do we include? s Which interactions do we include? s Model selection tries to "simplify" this task.
- p. 6/16
Model selection: general
q Today q Crude outlier detection test q Bonferroni correction
q Simultaneous inference for
q Model selection: goals q Model selection: general q Model selection: strategies q Possible criteria
q Mallow's Cp
q AIC & BIC q Maximum likelihood
estimation q AIC for a linear model q Search strategies q Implementations in R q Caveats
s This is an "unsolved" problem in statistics: there are no magic procedures to get you the "best model."
s In some sense, model selection is "data mining."
s Data miners / machine learners often work with very many predictors.
- p. 7/16
Model selection: strategies
q Today q Crude outlier detection test q Bonferroni correction
q Simultaneous inference for
q Model selection: goals q Model selection: general q Model selection: strategies q Possible criteria
q Mallow's Cp
q AIC & BIC q Maximum likelihood
estimation q AIC for a linear model q Search strategies q Implementations in R q Caveats
s To "implement" this, we need: x a criterion or benchmark to compare two models. x a search strategy.
s With a limited number of predictors, it is possible to search all possible models.
- p. 8/16
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- selecting the best model for multiple linear
- personalizing options good better best
- measuring good governance kaufmann kray
- the art of the up sell a presentation on good better
- model selection general techniques stanford university
- what to do key good governance practices implementing
- the good lives model of offender rehabilitation clinical
- 214 29 assessing model fit and finding a fit model
- maximize the value of your enterprise application delivery
Related searches
- stanford university philosophy department
- stanford university plato
- stanford university encyclopedia of philosophy
- stanford university philosophy encyclopedia
- stanford university philosophy
- stanford university ein number
- stanford university master computer science
- stanford university graduate programs
- stanford university computer science ms
- stanford university phd programs
- stanford university phd in education
- stanford university online doctoral programs