Neural Networks for Machine Learning Lecture 13a The ups ...

嚜燒eural Networks for Machine Learning

Lecture 13a

The ups and downs of backpropagation

Geoffrey Hinton

Nitish Srivastava,

Kevin Swersky

Tijmen Tieleman

Abdel-rahman Mohamed

A brief history of backpropagation

?

The backpropagation algorithm

for learning multiple layers of

features was invented several

times in the 70*s and 80*s:

每 Bryson & Ho (1969) linear

每 Werbos (1974)

每 Rumelhart et. al. in 1981

每 Parker (1985)

每 LeCun (1985)

每 Rumelhart et. al. (1985)

?

?

Backpropagation clearly had great

promise for learning multiple layers

of non-linear feature detectors.

But by the late 1990*s most serious

researchers in machine learning had

given up on it.

每 It was still widely used in

psychological models and in

practical applications such as

credit card fraud detection.

Why backpropagation failed

?

The popular explanation of why

backpropagation failed in the 90*s:

每 It could not make good use of

multiple hidden layers.

?

(except in convolutional nets)

每 It did not work well in recurrent

networks or deep auto-encoders.

每 Support Vector Machines worked

better, required less expertise,

produced repeatable results,

and had much fancier theory.

?

The real reasons it failed:

每 Computers were thousands

of times too slow.

每 Labeled datasets were

hundreds of times too small.

每 Deep networks were too small

and not initialized sensibly.

These issues prevented it from

being successful for tasks where

it would eventually be a big win.

A spectrum of machine learning tasks

Typical Statistics---------------Artificial Intelligence

?

?

?

?

Low-dimensional data

(e.g. less than 100 dimensions)

Lots of noise in the data.

Not much structure in the data.

The structure can be captured

by a fairly simple model.

The main problem is separating

true structure from noise.

每 Not ideal for non-Bayesian

neural nets. Try SVM or GP.

?

?

?

?

High-dimensional data (e.g. more

than 100 dimensions)

The noise is not the main problem.

There is a huge amount of structure

in the data, but its too complicated to

be represented by a simple model.

The main problem is figuring out a

way to represent the complicated

structure so that it can be learned.

每 Let backpropagation figure it out.

Why Support Vector Machines were never a good bet for

Artificial Intelligence tasks that need good representations

?

View 1: SVM*s are just a clever ? View 2: SVM*s are just a clever

reincarnation of Perceptrons.

reincarnation of Perceptrons.

每 They expand the input to a

每 They use each input vector in

(very large) layer of nonthe training set to define a

linear non-adaptive features.

non-adaptive ※pheature§.

? The global match between a

每 They only have one layer of

test input and that training input.

adaptive weights.

每 They have a clever way of

每 They have a very efficient

simultaneously doing feature

way of fitting the weights

selection and finding weights on

that controls overfitting.

the remaining features.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download