Multilayer Learning and Backpropagation

Multilayer Learning and Backpropagation

Neural Networks

Bibliography

Rumelhart, D. E. and McClelland, J. L., Parallel Distributed Processing, MIT Press, 1986. - Chapter 8, pp. 318-362

Informal but very readable introduction to Backpropagation

Sejnowski, T. J. and C. Rosenberg, NETtalk: a parallel network that learns to read aloud," The Johns Hopkins University EE and CS Tech. Report JHU/EECS-86/01, 1986.

Example of an indepth application use of Backpropagation.

Werbos, P. J., "Beyond Regression: new tools for prediction and analysis in the behavioral sciences," PhD Thesis, Harvard Univ., Cambridge, Mass., 1974.

Backpropagation

(Rumelhart?)

Multi-layer supervised learning

Gradient Descent Weight Changer

Uses Sigmoid rather than Threshold

(Squashing Function)

Sigmoid is differentiable (Widrow differentiated before the threshold)

Threshold vs. Sigmoid

[pic]

How does Multi-layer do non linearly separable mappings

[pic]

Backpropagation Network

[pic]

Input Layer Hidden Layer(s) Output Layer

Backpropagation Derivation

It can be derived from fundamentals by

seeking negative . Can take derivative of the sigmoid.

[pic]

sigmoid: f(net) = output

[pic]

f'(net)

Most active when output is in middle of sigmoid - unstable?

Backpropagation Learning Algorithm

Until Convergence do

Present a training pattern

Calculate the error of output units (T-O)

for each hidden layer

Calculate error using error from next layer

Update weights

end

The error propagates back through the network

Network Equations

Output: Oj = f(netj) =

f'(netj) = = Oj(1 - Oj)

Δwij (general node): C Oi δj

Δwij (output node):

δj ’ (tj - Oj) f'(netj)

Δwij = C Oi δj ’ C Oi (tj - Oj) f'(netj)

Δwij (hidden node)

δj ’ f'(netj)

Δwij = C Oi δj ’ C Oi ( ) f'(netj)

[pic]

Backprop Examples

Epochs = 558

LR = .5

XOR - 2x1x1

Backprop Examples

Epochs = 6587

LR = .25

No Convergence - Local Minima

XOR - 2x2x1

Parity Problem Solution

Epochs = 2825

LR = .5

Parity - 8x8x1

Sets itself up to count

UCSD - Zipser, Ellman (Linguists)

Trained to do Phoneme Identity function. Why?

Speeding up Learning

Momentum term α

Δw(t+1) = Cδο + αΔw(t)

Speed up in flats

Filter out high frequency variations?

usually α set to .9

Dynamic learning rate and momentum

Overloading and Pruning

Different Activation Functions

Recurrent Nets

Backpropagation Summary

Empirically impressive multi-layer learning

Most used of current neural networks

Truly multi-layer?

Slow learning - Hardware

No convergence guarantees

Lack of Rigor - AI Trap?

Black magic - Eye of newt, tricks, few guidelines for initial topology

................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches