Backpropagation

Machine Learning

Srihari

Backpropagation

Sargur Srihari

1

Machine Learning

Srihari

Topics in Backpropagation

1.

2.

3.

4.

5.

6.

Forward Propagation

Loss Function and Gradient Descent

Computing derivatives using chain rule

Computational graph for backpropagation

Backprop algorithm

The Jacobian matrix

2

Machine Learning

Srihari

A neural network with one hidden layer

D input variables x1,.., xD

M hidden unit activations

D

(1)

a j = ¡Æ w (1)

x

+

w

where j = 1,..,M

ji i

j0

i =1

Hidden unit activation functions

z j =h(aj)

K output activations

M

ak = ¡Æ wki(2)x i + wk(2)0 where k = 1,..,K

i =1

Augmented network

No. of weights in w:

T=(D+1)M+(M+1)K

=M(D+K+1)+K

Output activation functions

yk =¦Ò(ak)

? M (2) ? D (1)

?

?

(2)

yk (x,w) = ¦Ò ? ¡Æ wkj h ? ¡Æ w ji x i + w (1)

+

w

j0 ?

k0 ?

? i =1

?

? j =1

?

3

Machine Learning

Srihari

Matrix Multiplication: Forward Propagation

? Each layer is a function of layer that preceded it

? First layer is given by z =h (W(1)T x + b(1))

? Second layer is y = ¦Ò (W(2)T x + b(2))

? Note that W is a matrix rather than a vector

? Example with D=3, M=3

x=[x1,x2,x3]T

First Network layer

??

T

T

T

?? W (1) = ?W W W ? ,W (1) = ?W W W ? ,W (1) = ?W W W ?

?? 11 12 13 ??

?? 21 22 23 ??

?? 31 32 33 ??

1

2

3

w = ??

T

T

T

?? (2) ?

? ,W (2) = ?W W W ? ,W (2) = ?W W W ?

W

=

W

W

W

??? 1

?? 11 12 13 ??

?? 21 22 23 ??

?? 31 32 33 ??

2

3

Network layer output

In matrix multiplication notation

4

Machine Learning

Srihari

Loss and Regularization

y=f (x,w)

1

E=

N

N

(

(i )

E

f

(x

,w),ti

¡Æ i

i=1

x

)

Forward:

Loss

Ei

+

y

Backward:

Gradient of

Ei+R

R(W)

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download