Loss Functions and Regularization - David I. Inouye

[Pages:11]Loss Functions and Regularization

David I. Inouye Tuesday, September 15, 2020

David I. Inouye

0

Outline

Loss functions

Regression losses Classification losses

Regularization

"Implicit regularization" by changing in KNN L2 regularization L1 regularization and feature selection

Caveat: Very brief introduction to these concepts

If you want to learn more, take ECE595 Machine Learning I (Prof. Stanley Chan)

David I. Inouye

1

Many machine learning methods minimize the average loss (a.k.a. risk minimization)

Remember linear regression objective:

&

=

1 arg min +

"

#

- "

#

'

#$%

We can rewrite this as:

&

= arg min 1 + "

#, "

#

#$%

where , = - ! is the loss function

Many supervised ML can be written as above

David I. Inouye

2

Many supervised ML can be written minimizing the average loss

Ordinary least squares uses squared loss: , = - '

Logistic regression uses logistic loss , = log + 1 - log 1 -

Classification error is known as 0-1 loss

, = 701,,

if = otherwise

David I. Inouye

3

Example: Absolute error is less sensitive to outliers but is harder to optimize

Absolute error loss is: , = | - |



David I. Inouye

4

Example: The hinge loss is used for learning support vector machine (SVM) classifiers

Hinge loss is defined as: , = max 0, 1 -

Note: -1, 1

(Assume = 1 below)

Classification incorrect

Classification correct

The hinge loss is the closest convex approximation to 0-1

0-1 loss is nonconvex and hard to optimize

David I. Inouye



5

Regularization is a common method to improve generalization by reducing the complexity of a model

in KNN can be seen as an implicit regularization technique

We can use explicit regularization for parametric models by adding a regularizer

min +

"

#, "

#

+

#

David I. Inouye

6/07/13/k-nearest-neighbor/

6

Brief aside: 1D polynomial regression can be computed by creating polynomial "pseudo" features

Suppose we have 1D input data, i.e., &?%

We can create pseudo polynomial features, e.g. % %' %*

) = ' '' '* &?* * ** **

Linear regression can then be used to fit a polynomial model

# = %# + ' #' + * #* ...

David I. Inouye

7

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download