The Perceptron



The Perceptron

The perceptron is a program that learn concepts, i.e.

it can learn to respond with True (1) or False (0) for

inputs we present to it, by repeatedly "studying"

examples presented to it.

The Perceptron is a single layer neural network whose

weights and biases could be trained to produce a

correct target vector when presented with the

corresponding input vector. The training technique

used is called the perceptron learning rule. The

perceptron generated great interest due to its ability

to generalize from its training vectors and work with

randomly distributed connections. Perceptrons are

especially suited for simple problems in pattern

classification.

Our perceptron network consists of a single neuron

connected to two inputs through a set of 2 weights,

with an additional bias input.

The perceptron calculates its output using the

following equation:

P * W + b > 0

where P is the input vector presented to the

network, W is the vector of weights and b is the bias.

Backpropagation

Backpropagation is a supervised, error-correcting

learning algorithm. It realizes a gradient descent in

error (where error means the difference of the actual

output of the system and a target output). A simpler

version of backpropagation -- the simple delta rule or

the perceptron convergence procedure, which are

applicable only in networks with only one modifiable

connection layer -- can be proven to find a solution

for all input-output mappings which are realizable in

that simpler architecture. The error surface in such

networks has only one minimum, and the system moves on

this error surface towards this minimum (and stays

there after it has reached it). This is not true for

backpropagation in a network with more than one

modifiable connection layers. That is, although in

practice one can almost always find a network

architecture (even with only two layers of modifiable

connections) which can realize a particular

input-output mapping, this is not guaranteed. This is

because in a network with hidden layer(s) of nodes --

i.e. nodes which are neither in the input nor in the

output layer -- the error surface has, in addition to

the global, ``lowest'', minimum also local minima, and

the system can get stuck in such local error minima.

The theoretical foundations for the derivation of the

backpropagation learning rule can be found in

Rumelhart, Hinton, & Williams (1986). Here only a

technical description is given.

Figure shows a typical architecture for networks with

backpropagation as the learning rule.

Characteristic is the presence of hidden layers --

only one is shown in the figure -- between the input

and the output layers. Also characteristic is that the

connectivity structure is feed-forward: that is, there

are connections from the input layer nodes to the

hidden layer nodes and from the hidden layer nodes to

the output layer nodes, but there are no connections

backward, for example, from the hidden layer nodes to

the input layer nodes. There is also no lateral

connectivity within the layers. Also, connectivity

between the layers is total in that sense that each

input layer node is connected to each hidden layer

node and each hidden layer node is connected to each

output layer node. Before learning, the weights of

these connections are set to small random values.

Backpropagation learning proceeds in the following

way: an input pattern is chosen from a set of input

patterns. This input pattern determines the

activations of the input nodes. Setting the

activations of the input layer nodes is followed by

the activation forward propagation phase: the

activation values of first the hidden units and then

the output units are computed. This is done by using a

sigmoid activation function, like the following one:

where

[]activation of node in a hidden layer or in the

output layer

[]

[]activation of node in the previous layer

[]weight of connection from the node in the previous

layer, to the node in a hidden layer or in the output

layer

[]constant (determining the steepness of the sigmoid)

[]bias (determining the shift of the sigmoid function

along the ``input'' axis).

The bias is normally realized in backpropagation

networks as the part of the input coming from a ``bias

node''. The bias node has an activation of 1 during

the whole learning process, and it is connected to

each hidden and output layer node. The weights of

these connections are changed by learning, just like

all other weights.

Figure shows this function (for and ).

This activation forward propagation phase is followed

by the error backward propagation phase. First the

error for the output layer nodes is computed by using

the following formula:

where

[]error for node j of the output layer

[]``target'' activation for node j of the output layer

[]actual activation for node i of the output layer

Then, successively, the error values for all hidden

layer nodes are computed:

where

[]error for node in a hidden layer,

[]error for node in the layer above,

[]weight for the connection between node in the

hidden layer and node in the layer above,

[]activation of node in the hidden layer.

At the end of the error backward propagation phase,

nodes of the network (except the input layer nodes)

will have error values. The error value of a node is

used to compute new weights for the connections which

lead to the node. Very generally, the weight change is

done by using the following formula:

where

[]weight of the connection between node in the

previous layer and node in the output layer or in a

hidden layer,

[]weight change for the connection between node in

the previous layer and node in the output layer or in

a hidden layer.

The values are computed in the same way for each node

in the network:

where

[]weight change for the connection between node in

the previous layer and node in the output layer or in

a hidden layer,

[]learning rate,

[]error for node ,

[]activation for node in previous layer, from which

the connection originates,

[]``momentum'' parameter (a value of 0.9 is used in

the model),

[] for the previous weight change.

The weight change can be done right after the

computation of the values (``on line'' procedure).

Alternatively, the values can be summed up for all

input patterns in the training set, and the actual

weight change is done after each input pattern has

been presented once (``off line'' procedure). The

backpropagation learning rule is described in

Rumelhart, Hinton, & Williams (1986).

ECE4801: Self-Organizing Feature Map, Project 5

 

In this project, you will write a code for applying Self-Organizing Feature Map (SOM) to data that are uniformly distributed over a square region. Please follow the implementation methods and the algorithm described below.

 

First, you will need to create a 10x10 two-dimensional array of neurons each of which has only two weights (no bias terms). In VB, you may write it as

Dim w(0 To 9, 0 To 9, 0 To 1) as Single

where w(i,j,k) denotes i-th row, j-th column, and k-th weight. The weights must be initialized with random numbers.

 

The inputs are determined by the random number generator. Since the Rnd function in VB generates a number between 0 and 1, simply creating inputs using

x(0) = Rnd

x(1) = Rnd

will generate uniformly distributed input vectors in a square region in theory (the random number generator in VB does not have a true uniform distribution). For the graph preparation, please refer to the sample program provided as a link for this project.

 

Next, the weight update procedure works as follows:

 

Step 1: Pick a random vector x(n) where n denotes the present iteration number.

 

Step 2: Determine the winner node by finding the node with distance min || x(n) – wi (n)|| of all i.

 

Step 3: Determine the neighbor nodes of the winner node. Use a square shape neighbor function. We will call the nodes inside the neighbor hood as Nc.

 

Step 4: Update the weights of nodes insider the neighbor Nc using

wi (n+1) = wi (n) + ((n) (x(n) – wi (n)) for i( Nc

 

Step5: Plot the network.

 

Step6: Go to Step1 until the maximum number of iterations reaches.

 

The iteration is terminated based on a predetermined maximum number of iterations. I used 5000 as the maximum number of iterations, but I recommend you to try a bigger number. The neighbor size must shrink with the progression of iterations. A simple method can be devised using a manual schedule of the neighbor size and iteration rate as a function of iteration number. Here is a program segment that you can modify:

 

Select Case True

Case t < 50

Nsize = 5

eta = 0.25

Case (t >= 50 And t < 200)

Nsize = 4

eta = 0.2

Case (t >= 200 And t < 500)

Nsize = 3

eta = 0.13

Case (t >= 500 And t < 1500)

Nsize = 2

eta = 0.1

Case (t >= 1500 And t < 3000)

Nsize = 1

eta = 0.08

Case Else

Nsize = 0

eta = 0.02

End Select

 

In the above example, t denotes the iteration number, and Nsize denotes the radius of the square neighbor from the winner node, and eta denotes the learning rate. To be more specific, Nsize is illustrated using the following examples of neighbor patters:

* * * * *

* * * * * * * *

* * * * * * * * *

* * * * * * * *

* * * * *

 

Nsize=0 Nsize=1 Nsize=2

 

The reason of defining Nsize this way is that you can now use simple for-loops for defining the nodes inside the neighbors, i.e.,

For i=max(0,center_i - Nsize) to min(9, center_i + Nsize)

For j=max(0,center_j – Nzise) to min(9, center_j + Nzise)

……..

Next j

Next i

 

where the winner node index is (center_i, center_j) and min() and max() simply return minimum or maximum of the two values your are comparing.

 

The manual adjustment illustrated above will allow you to understand the behavior of the SOM if you begin playing around with different types of training parameter schedules. For example, change eta, t ranges, and Nsize.

 

First test is for the uniform distribution within the square area between (0,0) and (1,1).

 

For the second test, experiment the following distribution of input data, which is an XOR pattern.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

For your convenience, the plot routine is provided in the link of this project. Please don’t forget to do check-off. This project requires a check-off.

 

Report:

 

1. Describe what SOM does, how it works, what are the implementation issues.

2. Provide two direct application examples of self-organization

3. Point out the drawbacks of SOM

4. Attach the listing

 

 

 

 

 

 

 

 

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download