Multilayer Learning and Backpropagation
Multilayer Learning and Backpropagation
Neural Networks
Bibliography
Rumelhart, D. E. and McClelland, J. L., Parallel Distributed Processing, MIT Press, 1986. - Chapter 8, pp. 318-362
Informal but very readable introduction to Backpropagation
Sejnowski, T. J. and C. Rosenberg, NETtalk: a parallel network that learns to read aloud," The Johns Hopkins University EE and CS Tech. Report JHU/EECS-86/01, 1986.
Example of an indepth application use of Backpropagation.
Werbos, P. J., "Beyond Regression: new tools for prediction and analysis in the behavioral sciences," PhD Thesis, Harvard Univ., Cambridge, Mass., 1974.
Backpropagation
(Rumelhart?)
Multi-layer supervised learning
Gradient Descent Weight Changer
Uses Sigmoid rather than Threshold
(Squashing Function)
Sigmoid is differentiable (Widrow differentiated before the threshold)
Threshold vs. Sigmoid
[pic]
How does Multi-layer do non linearly separable mappings
[pic]
Backpropagation Network
[pic]
Input Layer Hidden Layer(s) Output Layer
Backpropagation Derivation
It can be derived from fundamentals by
seeking negative . Can take derivative of the sigmoid.
[pic]
sigmoid: f(net) = output
[pic]
f'(net)
Most active when output is in middle of sigmoid - unstable?
Backpropagation Learning Algorithm
Until Convergence do
Present a training pattern
Calculate the error of output units (T-O)
for each hidden layer
Calculate error using error from next layer
Update weights
end
The error propagates back through the network
Network Equations
Output: Oj = f(netj) =
f'(netj) = = Oj(1 - Oj)
Δwij (general node): C Oi δj
Δwij (output node):
δj ’ (tj - Oj) f'(netj)
Δwij = C Oi δj ’ C Oi (tj - Oj) f'(netj)
Δwij (hidden node)
δj ’ f'(netj)
Δwij = C Oi δj ’ C Oi ( ) f'(netj)
[pic]
Backprop Examples
Epochs = 558
LR = .5
XOR - 2x1x1
Backprop Examples
Epochs = 6587
LR = .25
No Convergence - Local Minima
XOR - 2x2x1
Parity Problem Solution
Epochs = 2825
LR = .5
Parity - 8x8x1
Sets itself up to count
UCSD - Zipser, Ellman (Linguists)
Trained to do Phoneme Identity function. Why?
Speeding up Learning
Momentum term α
Δw(t+1) = Cδο + αΔw(t)
Speed up in flats
Filter out high frequency variations?
usually α set to .9
Dynamic learning rate and momentum
Overloading and Pruning
Different Activation Functions
Recurrent Nets
Backpropagation Summary
Empirically impressive multi-layer learning
Most used of current neural networks
Truly multi-layer?
Slow learning - Hardware
No convergence guarantees
Lack of Rigor - AI Trap?
Black magic - Eye of newt, tricks, few guidelines for initial topology
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related searches
- importance of learning and development
- workplace learning and development
- learning and development definition
- learning and teaching methodologies
- traditional learning and online learning
- learning and growth goals
- learning and growth strategy
- learning and development websites
- quotes about learning and education
- learning and development strategy template
- learning and development plan template
- learning and development courses online