The Perceptron
The Perceptron
The perceptron is a program that learn concepts, i.e.
it can learn to respond with True (1) or False (0) for
inputs we present to it, by repeatedly "studying"
examples presented to it.
The Perceptron is a single layer neural network whose
weights and biases could be trained to produce a
correct target vector when presented with the
corresponding input vector. The training technique
used is called the perceptron learning rule. The
perceptron generated great interest due to its ability
to generalize from its training vectors and work with
randomly distributed connections. Perceptrons are
especially suited for simple problems in pattern
classification.
Our perceptron network consists of a single neuron
connected to two inputs through a set of 2 weights,
with an additional bias input.
The perceptron calculates its output using the
following equation:
P * W + b > 0
where P is the input vector presented to the
network, W is the vector of weights and b is the bias.
Backpropagation
Backpropagation is a supervised, error-correcting
learning algorithm. It realizes a gradient descent in
error (where error means the difference of the actual
output of the system and a target output). A simpler
version of backpropagation -- the simple delta rule or
the perceptron convergence procedure, which are
applicable only in networks with only one modifiable
connection layer -- can be proven to find a solution
for all input-output mappings which are realizable in
that simpler architecture. The error surface in such
networks has only one minimum, and the system moves on
this error surface towards this minimum (and stays
there after it has reached it). This is not true for
backpropagation in a network with more than one
modifiable connection layers. That is, although in
practice one can almost always find a network
architecture (even with only two layers of modifiable
connections) which can realize a particular
input-output mapping, this is not guaranteed. This is
because in a network with hidden layer(s) of nodes --
i.e. nodes which are neither in the input nor in the
output layer -- the error surface has, in addition to
the global, ``lowest'', minimum also local minima, and
the system can get stuck in such local error minima.
The theoretical foundations for the derivation of the
backpropagation learning rule can be found in
Rumelhart, Hinton, & Williams (1986). Here only a
technical description is given.
Figure shows a typical architecture for networks with
backpropagation as the learning rule.
Characteristic is the presence of hidden layers --
only one is shown in the figure -- between the input
and the output layers. Also characteristic is that the
connectivity structure is feed-forward: that is, there
are connections from the input layer nodes to the
hidden layer nodes and from the hidden layer nodes to
the output layer nodes, but there are no connections
backward, for example, from the hidden layer nodes to
the input layer nodes. There is also no lateral
connectivity within the layers. Also, connectivity
between the layers is total in that sense that each
input layer node is connected to each hidden layer
node and each hidden layer node is connected to each
output layer node. Before learning, the weights of
these connections are set to small random values.
Backpropagation learning proceeds in the following
way: an input pattern is chosen from a set of input
patterns. This input pattern determines the
activations of the input nodes. Setting the
activations of the input layer nodes is followed by
the activation forward propagation phase: the
activation values of first the hidden units and then
the output units are computed. This is done by using a
sigmoid activation function, like the following one:
where
[]activation of node in a hidden layer or in the
output layer
[]
[]activation of node in the previous layer
[]weight of connection from the node in the previous
layer, to the node in a hidden layer or in the output
layer
[]constant (determining the steepness of the sigmoid)
[]bias (determining the shift of the sigmoid function
along the ``input'' axis).
The bias is normally realized in backpropagation
networks as the part of the input coming from a ``bias
node''. The bias node has an activation of 1 during
the whole learning process, and it is connected to
each hidden and output layer node. The weights of
these connections are changed by learning, just like
all other weights.
Figure shows this function (for and ).
This activation forward propagation phase is followed
by the error backward propagation phase. First the
error for the output layer nodes is computed by using
the following formula:
where
[]error for node j of the output layer
[]``target'' activation for node j of the output layer
[]actual activation for node i of the output layer
Then, successively, the error values for all hidden
layer nodes are computed:
where
[]error for node in a hidden layer,
[]error for node in the layer above,
[]weight for the connection between node in the
hidden layer and node in the layer above,
[]activation of node in the hidden layer.
At the end of the error backward propagation phase,
nodes of the network (except the input layer nodes)
will have error values. The error value of a node is
used to compute new weights for the connections which
lead to the node. Very generally, the weight change is
done by using the following formula:
where
[]weight of the connection between node in the
previous layer and node in the output layer or in a
hidden layer,
[]weight change for the connection between node in
the previous layer and node in the output layer or in
a hidden layer.
The values are computed in the same way for each node
in the network:
where
[]weight change for the connection between node in
the previous layer and node in the output layer or in
a hidden layer,
[]learning rate,
[]error for node ,
[]activation for node in previous layer, from which
the connection originates,
[]``momentum'' parameter (a value of 0.9 is used in
the model),
[] for the previous weight change.
The weight change can be done right after the
computation of the values (``on line'' procedure).
Alternatively, the values can be summed up for all
input patterns in the training set, and the actual
weight change is done after each input pattern has
been presented once (``off line'' procedure). The
backpropagation learning rule is described in
Rumelhart, Hinton, & Williams (1986).
ECE4801: Self-Organizing Feature Map, Project 5
In this project, you will write a code for applying Self-Organizing Feature Map (SOM) to data that are uniformly distributed over a square region. Please follow the implementation methods and the algorithm described below.
First, you will need to create a 10x10 two-dimensional array of neurons each of which has only two weights (no bias terms). In VB, you may write it as
Dim w(0 To 9, 0 To 9, 0 To 1) as Single
where w(i,j,k) denotes i-th row, j-th column, and k-th weight. The weights must be initialized with random numbers.
The inputs are determined by the random number generator. Since the Rnd function in VB generates a number between 0 and 1, simply creating inputs using
x(0) = Rnd
x(1) = Rnd
will generate uniformly distributed input vectors in a square region in theory (the random number generator in VB does not have a true uniform distribution). For the graph preparation, please refer to the sample program provided as a link for this project.
Next, the weight update procedure works as follows:
Step 1: Pick a random vector x(n) where n denotes the present iteration number.
Step 2: Determine the winner node by finding the node with distance min || x(n) – wi (n)|| of all i.
Step 3: Determine the neighbor nodes of the winner node. Use a square shape neighbor function. We will call the nodes inside the neighbor hood as Nc.
Step 4: Update the weights of nodes insider the neighbor Nc using
wi (n+1) = wi (n) + ((n) (x(n) – wi (n)) for i( Nc
Step5: Plot the network.
Step6: Go to Step1 until the maximum number of iterations reaches.
The iteration is terminated based on a predetermined maximum number of iterations. I used 5000 as the maximum number of iterations, but I recommend you to try a bigger number. The neighbor size must shrink with the progression of iterations. A simple method can be devised using a manual schedule of the neighbor size and iteration rate as a function of iteration number. Here is a program segment that you can modify:
Select Case True
Case t < 50
Nsize = 5
eta = 0.25
Case (t >= 50 And t < 200)
Nsize = 4
eta = 0.2
Case (t >= 200 And t < 500)
Nsize = 3
eta = 0.13
Case (t >= 500 And t < 1500)
Nsize = 2
eta = 0.1
Case (t >= 1500 And t < 3000)
Nsize = 1
eta = 0.08
Case Else
Nsize = 0
eta = 0.02
End Select
In the above example, t denotes the iteration number, and Nsize denotes the radius of the square neighbor from the winner node, and eta denotes the learning rate. To be more specific, Nsize is illustrated using the following examples of neighbor patters:
* * * * *
* * * * * * * *
* * * * * * * * *
* * * * * * * *
* * * * *
Nsize=0 Nsize=1 Nsize=2
The reason of defining Nsize this way is that you can now use simple for-loops for defining the nodes inside the neighbors, i.e.,
For i=max(0,center_i - Nsize) to min(9, center_i + Nsize)
For j=max(0,center_j – Nzise) to min(9, center_j + Nzise)
……..
Next j
Next i
where the winner node index is (center_i, center_j) and min() and max() simply return minimum or maximum of the two values your are comparing.
The manual adjustment illustrated above will allow you to understand the behavior of the SOM if you begin playing around with different types of training parameter schedules. For example, change eta, t ranges, and Nsize.
First test is for the uniform distribution within the square area between (0,0) and (1,1).
For the second test, experiment the following distribution of input data, which is an XOR pattern.
For your convenience, the plot routine is provided in the link of this project. Please don’t forget to do check-off. This project requires a check-off.
Report:
1. Describe what SOM does, how it works, what are the implementation issues.
2. Provide two direct application examples of self-organization
3. Point out the drawbacks of SOM
4. Attach the listing
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related searches
- happiness is the meaning and the purpose of life the whole aim and end of human
- list the equipment required to measure the following and name the type of sampli
- the euro in decline how the currency could spoil the global financial system
- activity 1 1 match the word from the first column with the correct definition
- the english supremacy act of 1534 declared the to be the supreme head of th
- on the way to lunch the students stopped at the bathroom
- next experiment with the values in the calculator to complete the chart use up
- the penguins like to swim in the ocean and the seals do too
- the reflection of the moon danced across the ocean wave
- the sound of the sand blowing on the beach was soothing
- complete the table by writing the name of the cell organelle beside its structur
- the first four terms of a fibonacci sequence are a 2a 3a the sum of the f