MT Notes: Neural Networks ( s# is the slide number in the ...



MT Notes: Neural Networks ( s# is the slide number in the NN ppt)

Note all info from the ppt (not from notes7 and notes8 html links)

What Are Artificial Neural Networks?

- An extremely simplified model of the brain (s1)

- Model network as a graph with cells as nodes and synaptic connections as weighted edges (s4)

Be familiar with Diagrams on these ppt slides:

the diagrams’ architecture and internal & external labels ( but not the equations)

(s9) simple NN

(s12) multi-layer NN

Why Use Neural Networks? (s15)

- Ability to learn

- Ability to generalize

How Do Neural Networks Work? (s16)

- The output of a neuron is a function of the weighted sum of the inputs plus a bias

- -The function of the entire neural network is simply the computation of the outputs of all the neurons

Activation Functions (s17)

- Function that Applies to the weighted sum of the inputs of a neuron to produce the output

- Majority of NN’s use sigmoid functions as their Activation Functions

- which is Smooth, continuous, and monotonically increasing

- Its derivative is always positive

Sigmoid functions have

- Bounded range - but never reaches max or min

- Consider “ON” to be slightly less than the max and “OFF” to be slightly greater than the min

The exact nature of the Activation function has little effect on the abilities of the neural network

(s21) Definitions:

activity rule - specifies how a neuron will respond to its inputs (short time scale)

learning rule - specifies how a neuron will modify itself over longer time scale

supervised neural network - networks that are given ‘training sets’ – a set of inputs and correct answers to learn from ( more on this later)

unsupervised neural network - networks that are given only a set of examples to memorize ( more on this later)

How to compute simple learning rule: (s24)

1: compute error signal, or difference between target and output: e=t-y

where e=error, t=target, y=output

2: adjust the weights accordingly

(s27)Training is the act of presenting the network with some sample data and modifying the weights to better approximate the desired function

There are two main types of training: Supervised, Unsupervised

Supervised Training: how it works: ( simple case)

- Supplies the neural network with inputs and the desired outputs

- Response of the network to the inputs is measured

- The weights are modified to reduce the difference between the actual and desired outputs

(s28) Unsupervised Training: how it works: ( simple case)

- Only supplies inputs

- The neural network adjusts its own weights so that similar inputs cause similar outputs

- The network identifies the patterns and differences in the inputs without any external assistance

Epoch : One iteration through the process of providing the network with an input and updating the network's weights

- Typically many epochs are required to train the neural network

Perceptrons were the first neural network type with the ability to learn (S29)

- simple with only one layer

- Made up of only input neurons and output neurons typical with only on/off states in the input neurons

How Do Perceptrons Learn? (s30)

- Uses supervised training

- If the output is not correct, the weights are adjusted according to the formula:

¦w-new = w-old + n(desired - output)*input where n is the learning rate

SEE SLIDE DIAGRAM

Multilayer Feedforward Networks (s37)

- Most common neural network. An extension of the perceptron

- They have: Multiple layers

- 1 The addition of one or more “hidden” layers in between the input and output layers

- 2 Activation function is not simply a threshold

- Usually a sigmoid function

- rmation flows in one direction

- The outputs of one layer act as inputs to the next layer

Backpropagation (s39-41) SEE SLIDE DIAGRAM

A common training technique for Multi-Layer NN

- First calculate error of output units and use this to change the top layer of weights.

- Next calculate error for hidden units based on errors on the output units it feeds into.

- Finally update bottom layer of weights based on errors calculated for hidden units.

Backpropagation Training Algorithm (s42)

- Set weights to small random real values.

- Until all training examples produce the correct value (within ε), or

mean squared error ceases to decrease, or other termination criteria:

Begin epoch

For each training example, d, do:

Calculate network output for d’s input values

Compute error between current output and correct output for d

Update weights by backpropagating error and using learning rule

End epoch

Backpropagation Training Algorithm Issues (s43)

• Not guaranteed to converge to zero training error, may converge to local optima or oscillate indefinitely.

• However, in practice, does converge to low error for many large networks on real data.

• Many epochs (thousands) may be required, hours or days of training for large networks.

• To avoid local-minima problems, run several trials starting with different random weights (random restarts).

– Take results of trial with lowest training set error.

– Build a committee of results from multiple trials (possibly weighting votes by training set accuracy).

Backpropagation (s45)

- Most common method of obtaining the many weights in the network

- It is a form of supervised training

- The basic backpropagation algorithm is based on minimizing the error of the network using the

derivatives of the error function This is why using sigmoid rather than linear activation functions are important!

(s46)Most common measure of error (in backprop) is the mean square error: E = (target – output)2

(s47)Calculation of the derivatives flows backwards through the network, hence the name, backpropagation

(s48)The learning rate is important (for backpropagation )

- Too small: Convergence extremely slow

- Too large : May not converge

Momentum is :

- Tends to aid convergence

- Applies smoothed averaging to the change in weights

(s49) Local Minima (in Backpropagation of multilayer NNs)

- Key problem is avoiding local minima

- Traditional techniques for avoiding local minima:

- Simulated annealing

- Perturb the weights in progressively smaller amounts

- Genetic algorithms

- Use the weights as chromosomes

- Apply natural selection, mating, and mutations to these chromosomes

(s50)Counterpropagation (CP) Networks

Is Another multilayer feedforward network

- Up to 100 times faster than backpropagation

- Not as general as backpropagation

- Made up of three layers:

- Input

- Kohonen

- Grossberg (or just Output)

Counterpropagation

SEE SLIDE DIAGRAM

(S59) Training and Verification

- Training set : A group of samples used to train the neural network

- Testing set: A group of samples used to test the performance of the neural network

This is Used to estimate the error rate

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download