CSE 5523: Lecture Notes 10 Linear regression
CSE 5523: Lecture Notes 10 Linear regression
Contents
10.1 Linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 10.2 Sample code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 10.3 Fitting polynomial lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 10.4 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 10.5 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 10.6 Gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 10.7 Sample code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
10.1 Linear regression
Fit weights w for numerical y (e.g. purchases) on continuous variables x (e.g. dimensions of fruits). Use a negative log Gaussian loss function, so deviation from prediction w x is Gaussian `noise':
LNLG,(y, w x) = - ln Nw x,(y)
1
= - ln
1 22
2
exp
1 - 22 (y - w
x)2
Find parameters w that minize negative log Gaussian likelihood (squared) error over all data:
0=
w
LNLG,(y, w
y,x D
x)
=
w
y,x D - ln
1 22
1 2
exp
1 - 22 (y - w
x)2
=
y,x
D
w
-
ln
1 22
1 2
exp
1 - 22 (y - w
x)2
1
=
w - ln
1 22
2
1
- 22 (y - w
x)2
y,x D
=
1 y,x D w - 22 (y - w
x)2
=
1 y,x D - 22 w (y - w
x)2
substitution sum rule
log of product is sum of logs derivative of constant product rule
1
=
1 y,x D - 22 2(y - w
x) w(y - w
x)
=
1 - 22 2(y - w x)(-x)
y,x D
=
x(y - w x)
y,x D
= X (y - Xw)
= X y - X Xw
This has root where X Xw = X y, or w = (X X)-1 X y.
concentration/precision matrix
10.2 Sample code
Sample code in pandas:
import sys import numpy import pandas
X = pandas.read_csv( sys.argv[1] ) y = pandas.read_csv( sys.argv[2] )
w = numpy.linalg.inv(X.T @ X) @ X.T @ y
print( w )
Sample input file `X.csv':
x1,x2 1,1 2,1 3,1 4,1
Sample input file `y.csv':
y 3 5 7 9
Sample output:
y 0 2.0 1 1.0
In other words: y = 2x + 1.
2
power rule
power rule multiply by 2 definition of inner product distributive axiom
10.3 Fitting polynomial lines
You can use any basis function (x ) on input x for x, as long as weights w are linear. For example: x = (x ) = [x[1] (x[1])2 (x[1])3 x[2] (x[2])2 (x[2])3] In this case the weights will be coefficients for terms in a polynomial.
10.4 Convexity
Note that since the loss function is convex, there is only one minimum. We will see other functions that don't have this property, so we'll get local optima.
10.5 Regularization
Again, we may add prior information to avoid overfitting (not zero counts, but wiggly lines). A good choice is to add a term to the loss to penalize large weight parameters:
LNLG,+L2,(y, w x) = - ln Nw x,(y) - ln N0,(w)
Optimizing this (at slope zero) gives:
V
0=
w
-
ln Nw
y,x D
x,(y) -
v=1
ln N0,(w[v])
...
=-
1 y,x D 22 w (y - w
x)2 -
V v=1
1 22
w
(w[v])2
=-
1 y,x D 22 2(y - w
x) w(y - w
x) -
V v=1
1
22 2w[v] w w[v]
=-
1 22 2(y - w
y,x D
x)(-x) -
V v=1
1 22 2w[v]
V 2
=-
(y - w x)(-x) - 2 w[v]
y,x D
v=1
2 = X (y - Xw) - 2 Iw
=X
y-X
Xw -
2 2 Iw
2 = X y - (X X + 2 I)w
product rule power rule power rule
multiply by 2 definition of inner product
distributive axiom distributive axiom
3
which has a single root: w = (X
X
+
2 2
I)-1
X
y, so it is still convex.
10.6 Gradient descent
Log Gaussian (quadratic) loss can be sensitive to outliers. We could use absolute value loss, but it is indeterminate (there are multiple optimal solutions). People often combine the two into log-cosh loss:
LLC,(y, w x) = ln cosh(y - w x) (where is the distance at which you want to start ignoring outliers). It's quadratic in the middle (so it has a single optimum) and linear at the edges (resists outliers):
=1
=
1 2
LLC,(y, w x)
y-w x
The gradient (derivative for multivariable functions) is:
w
LLC,(y, w
x) =
w
ln cosh(y - w x)
y,x D
y,x D
=
w
ln cosh(y - w
y,x D
x)
= w
ln cosh(y - w
y,x D
x)
= y,x D w ln cosh(y - w x)
=
1
cosh(y - w x) w cosh(y - w x)
y,x D
=
sinh(y - w y,x D cosh(y - w
x) x) wy - w
x
=
sinh(y - w x) (-x)
y,x D cosh(y - w x)
4
substitution distributive axiom
product rule sum rule
derivative of log derivative of cos
product rule
= tanh(y - w x)(-x)
y,x D
= - X tanh(y - Xw)
definition of tan definition of inner product
Bah, we can't find roots here (the optimization surface is non-linear), but we can use this as update! Start with any w(0); move in opposite direction of error times active inputs x, to reduce loss:
w(i) = w(i-1) - 1 - X tanh(y - Xw) N
= w(i-1) + 1 X tanh(y - Xw) N
This algorithm is called gradient descent.
10.7 Sample code
Sample code in pandas:
import sys import numpy import pandas
a = .1
X = pandas.read_csv( sys.argv[1] ) y = pandas.read_csv( sys.argv[2] ) N = len(X)
## Randomly initialize... w = pandas.DataFrame( numpy.random.rand(len(X.columns),1), X.columns, ['y'] )
## Run 100 epochs... for i in range(100):
w = w + 1/N * a * X.T @ numpy.tanh( y - X @ w )
# Print sum of loss of training data: should decrease at each epoch... print( numpy.sum( numpy.log( numpy.cosh( y - X @ w ) ) ) )
print( w )
Sample input file `X.csv':
x1,x2 1,1 2,1 3,1 4,1
Sample input file `y.csv' (now with an outlier at the third point):
5
y 3 5 3 9
Sample output:
y x1 1.801261 x2 1.113004
whereas quadratic loss gives us:
y 0 1.6 1 1.0
Graphically, we can see log cosh was less sensitive to the outlier: w from LLC,(y, w x) w from LNLG,(y, w x)
x[1]
6
y
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- introduction to numpy arrays github
- 1 solving recurrences stanford university
- lecture 13 use and interpretation of dummy variables
- 264 2008 proc tabulate and the neat things you can do with it
- slopes derivatives and tangents
- multiple regression with categorical data
- molecular testing at point of care
- cse 5523 lecture notes 10 linear regression
- working with numpy
- sentiment analysis in movie reviews using deep learning models
Related searches
- strategic management lecture notes pdf
- financial management lecture notes pdf
- business management lecture notes pdf
- organic chemistry lecture notes pdf
- corporate finance lecture notes pdf
- philosophy of education lecture notes slideshare
- business administration lecture notes pdf
- advanced microeconomics lecture notes pdf
- microeconomics lecture notes pdf
- marketing lecture notes pdf
- lecture notes in microeconomic theory
- mathematical logic lecture notes pdf