Lasso Regression - University of Washington
[Pages:30]1/18/2017
Lasso Regression:
Regularization for feature selection
CSE 446: Machine Learning Emily Fox
University of Washington
January 18, 2017
1
?2017 Emily Fox
CSE 446: Machine Learning
Feature selection task
2
?2017 Emily Fox
CSE 446: Machine Learning
1
1/18/2017
Why might you want to perform feature selection?
Efficiency:
- If size(w) = 100B, each prediction is expensive - If isparse , computation only depends on # of non-zeros
many zeros
=
i = ij hj(xi)
Interpretability:
- Which features are relevant for prediction?
33
?2017 Emily Fox
CSE 446: Machine Learning
Sparsity: Housing application
Lot size
Dishwasher
Single Family
Garbage disposal
Year built
Microwave
$ ?
Last sold price Last sale price/sqft
Range / Oven Refrigerator
Finished sqft
Washer
Unfinished sqft
Dryer
Finished basement sqft Laundry location
# floors
Heating type
Flooring types
Jetted Tub
Parking type
Deck
Parking amount
Fenced Yard
Cooling
Lawn
Heating
Garden
Exterior materials
Sprinkler System
...
Roof type
Structure style
4
?2017 Emily Fox
CSE 446: Machine Learning
2
1/18/2017
Option 1: All subsets or greedy variants
5
?2017 Emily Fox
CSE 446: Machine Learning
Exhaustive approach: "all subsets"
Consider all possible models, each using a subset of features How many models were evaluated?each indexed by features included
yi = i yi = w0h0(xi) + i yi = w1 h1(xi) + i
[0 0 0 ... 0 0 0] [1 0 0 ... 0 0 0] [0 1 0 ... 0 0 0]
...
... ...
yi = w0h0(xi) + w1 h1(xi) + i
[1 1 0 ... 0 0 0]
...
yi = w0h0(xi) + w1 h1(xi) + ... + wD hD(xi)+ i
[1 1 1 ... 1 1 1]
6
?2017 Emily Fox
28 = 256 230 = 1,073,741,824 21000 = 1.071509 x 10301 2100B = HUGE!!!!!!
2D
Typically, computationally
infeasible
CSE 446: Machine Learning
3
1/18/2017
Choosing model complexity?
Option 1: Assess on validation set
Option 2: Cross validation
Option 3+: Other metrics for penalizing model complexity like BIC...
7
?2017 Emily Fox
CSE 446: Machine Learning
Greedy algorithms
Forward stepwise: Starting from simple model and iteratively add features most useful to fit
Backward stepwise: Start with full model and iteratively remove features least useful to fit
Combining forward and backward steps: In forward algorithm, insert steps to remove features no longer as important
Lots of other variants, too.
8
?2017 Emily Fox
8 CSE 446: Machine Learning
4
1/18/2017
Option 2: Regularize
9
?2017 Emily Fox
CSE 446: Machine Learning
Ridge regression: L2 regularized regression
Total cost = measure of fit + measure of magnitude of coefficients
RSS(w)
||w||22=w02+...+wD2
Encourages small weights but not exactly 0
10
?2017 Emily Fox
CSE 446: Machine Learning
5
Coefficient path ? ridge
1/18/2017
coefficients ji
11
?2017 Emily Fox
CSE 446: Machine Learning
Using regularization for feature selection
Instead of searching over a discrete set of solutions, can we use regularization?
- Start with full model (all possible features) - "Shrink" some coefficients exactly to 0
? i.e., knock out certain features
- Non-zero coefficients indicate "selected" features
12
?2017 Emily Fox
CSE 446: Machine Learning
6
1/18/2017
Thresholding ridge coefficients?
Why don't we just set small ridge coefficients to 0?
0
13
?2017 Emily Fox
CSE 446: Machine Learning
Thresholding ridge coefficients?
Selected features for a given threshold value
0
14
?2017 Emily Fox
CSE 446: Machine Learning
7
Thresholding ridge coefficients?
Let's look at two related features...
0
1/18/2017
Nothing measuring bathrooms was included!
15
?2017 Emily Fox
CSE 446: Machine Learning
Thresholding ridge coefficients?
If only one of the features had been included...
0
16
?2017 Emily Fox
CSE 446: Machine Learning
8
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- multiple regression with categorical data
- regression calculator lab 3 ti30x iis
- correlation and regression james madison university
- lasso regression university of washington
- testing mediation with regression analysis
- partial regression coefficients
- 15 regression introduction
- interpreting regression coefficients for log transformed
- linear regression
- a new command for plotting regression coefficients and
Related searches
- university of washington hr jobs
- university of washington jobs listing
- university of washington human resources
- university of washington human resources dept
- university of washington baseball roster
- university of washington product management
- university of washington online mba
- university of washington printable map
- university of washington opioid taper
- university of washington opioid calculator
- university of washington program management
- university of washington graduate programs