Basic functions implemented in our neural networks algorithm:



CAP 6615

Project 2 Report

Rajesh Pydipati

---------------------------------------------------------------------------------------------------------------------

Basic functions implemented in the neural networks algorithm: 

•         A one hidden layer MLP network with feed-forward

•         Trained with back-propagation

•         Different and randomly selected training and test data sets. 

Software used: MATLAB (Run on Windows Platform)

Important Considerations: 

•       Number of layers

•      Number of Processing elements in each layer

•         Randomizing the training and test data sets

•         Expressive power

•         Training error

•         Activation function

•         Scaling input

•         Target values

•         Initializing weights

•         Learning rate

•         Momentum learning

•         Stopping criterion

•         Criterion function 

Let us now carefully understand the effect of each of these considerations/constraints on the overall performance of the neural network based classifier 

1. Number of layers:

Multilayer neural networks implement linear discriminants, in a space where the inputs have been mapped non-linearly. Non-linear multilayer networks have greater computational or expressive power than simple 2-layer networks (input & output layers), and can implement more functions. Given a sufficient number of hidden units, any function can be represented. For this project, a one hidden layer MLP has been chosen in order to reduce the complexity of the decision hyper plane. 

2. Number of Processing elements in each layer:

The number of PE’s in the input and output layer can be easily understood based on the key features in the input and output space. A clear observation of the input of each feature set reveals the principal components that can be used as distinguished features between the three classes of IRIS leaves that we plan to classify. Every pattern has 4 features (attributes), representing petal width, petal length, sepal width, and sepal length (expressed in centimeters). An attempt at trying to reduce the input space, in order to reduce the overall complexity of the classifier has been made. However, it should not be forgotten that neglecting any of the features without proper reasoning might amount to losing key features and hence reduce accuracy of our classifier. Thus the input space has been analyzed for the principal components using the PCA algorithm, which has also been implemented in the source code used for arriving at the neural networks based solution in this project. The number of output PE’s is 3 due to the need for classifying each input space data set into one of the three different classes as mentioned in the problem definition. Choosing the number of PE’s in the hidden layer is, however, a more intuitive task. It is found that PE’s equal to 3 give best results. However varying the PE’s with in a acceptable number did not alter the accuracies much.

3. Randomizing the training and test data sets:

For most practical outputs, the need for randomizing the training and test data sets is important. A total of 13,25,38,50 training data sets, respectively, were used for each class, in each part of the project. Correspondingly, a total 37,25,12,50 test data sets have been used. This data has been randomly permuted before being fed forward in the network. 

4. Expressive power:

Although we will have cause to use networks with different activation function for each layer or each unit of each layer, to simplify the mathematical analysis, identical non-linear activation functions were used. 

5. Training error:

The training error on a pattern is the sum over output units of the squared difference between the desired output d k and the actual output y k.

J (w) = ½* || d – y || 2

The training error for the hidden layer is calculated by the back propagation of the output layer errors.  

6. Activation function:

The important constraint is that this should be continuous and differentiable. The sigmoid is a smooth, differentiable, non-linear and saturating function. A minor benefit is that the derivative can be easily expressed in terms of itself. That’s why, this function was chosen. 

7. Scaling input:

In order to avoid difficulty due to difference of scale for each input, the input patterns should be shifted so that the average over the training set of each feature is zero. Since online protocols do not have the full data set at any one time, the scaling of the inputs was not found necessary. 

8. Target values:

For a finite value of net k, the output could never reach the saturation value, and thus there would be error. Full training would never terminate because weights would become extremely large, as the net k would be driven to plus or minus infinity. Thus target values corresponding to 2*(desired-1) were used here. 

9. Initializing weights:

For uniform learning, i.e. for all weights to reach their equilibrium values at the same time, initializing the weights is very crucial. In case of non-uniform learning, one category is learnt well before the others and so overall error rate is typically higher than necessary, due to redistribution of error. To ensure uniform learning, the weights have been randomly initialized for each given layer. 

10. Learning rate:

The optimal step size is given by step opt = (d 2 J/ d w 2) –1. The proper setting of this parameter greatly affects the convergence as well as the classification accuracy. After a lot of trials, it was found that this parameter should be very small (say a value in the range (1/10 to 1/1000)), to get close to accurate results.

11. Momentum learning:

Error surfaces often have multiple minima in which d J (w) / d w is very small. These arise when there are too many weights and thus the error depends only weakly upon any one of them. Momentum allows the network to learn more quickly. The effect of the momentum term for the narrow steep regions of the weight learning space is to focus the movement in a downhill direction by averaging out the components of the gradient which alternate in sign [Gupta, Homma et al]. After a lot of trails, it was found that the momentum learning parameter should be less than 1 for this particular application. Varying this parameter with in between 0 and 1, did not adversely affect the performance. However it should be noted that increasing this parameter value significantly reduces the convergence speed.

12. Stopping criterion:

Usually the stopping criterion used is when the error falls below the error achieved on a separate validation set (in which no data set for the test set has also been in the training set). But here after training and testing individually for each class, it was found that the error is typically 0.4 when it converges. So this was the stopping criterion used, in order to avoid over fitting and due to its simplicity in implementing. 

13. Criterion function:

The squared error has been used as the criterion function for this project. 

Results:

The results of the neural networks based classifier has been presented in the form of a confusion matrix, the columns of which represent the classification results and the rows of which represent the class numbers of our classification. The diagonal terms illustrate the correct results and the off diagonal terms in each column; illustrate the wrong classification results, for that particular set of features (corresponding to each class). 

Next, we give the various results that were obtained for various tests that were conducted.

1. Train features =13 and Test features = 37

trainConf ( classification results on the training data

testConf ( classification results on the test data

Step size change

a) Initial step size = 0.0400

Number of processing elements in the hidden layer = 3

Momentum factor = 0.9000

trainConf = 13 0 0

0 13 0

0 0 13

testConf = 37 0 0

0 33 0

0 4 37

b) Initial step size = 0.004

Number of processing elements in the hidden layer = 3

Momentum factor = 0.9000

trainConf = 13 0 0

0 11 0

0 2 13

testConf = 37 2 0

0 25 0

0 10 37

Momentum parameter change

a) Initial step size = 0.0400

Number of processing elements in the hidden layer = 3

Momentum factor = 0.9000

trainConf = 13 0 0

0 13 1

0 0 12

testConf = 37 0 0

0 34 3

0 3 34

b) Initial step size = 0.0400

Number of processing elements in the hidden layer = 3

Momentum factor = 1.3000

trainConf = 13 13 13

0 0 0

0 0 0

testConf = 37 37 37

0 0 0

0 0 0

Number of processing elements in the Hidden layer

The choice of the number of PE’s in the hidden layer is a intuitive task. It was observed that the PE’s should be in a reasonable number. The number should not be too large. Suppose in this case if we use 30 PE’s in the hidden layer, the results were erroneous. However, for a reasonable number of PE’s (3 to 7) it was observed that the results were fairly accurate.

a) Initial step size = 0.0400

Number of processing elements in the hidden layer = 30

Momentum factor = 0.9000

trainConf = 13 9 0

0 0 0

0 4 13

testConf = 37 21 0

0 0 0

0 16 37

b) Initial step size = 0.0400

Number of processing elements in the hidden layer = 3

Momentum factor = 0.9000

trainConf = 13 0 0

0 13 0

0 0 13

testConf = 37 0 0

0 33 1

0 4 36

2. Train features =25 and Test features = 25

Step size change

a) Initial step size = 0.0400

Number of processing elements in the hidden layer = 3

Momentum factor = 0.9000

trainConf = 25 0 0

0 24 1

0 1 24

testConf = 25 0 0

0 24 6

0 1 19

b) Initial step size = 0.004

Number of processing elements in the hidden layer = 3

Momentum factor = 0.9000

trainConf = 25 0 0

0 24 1

0 1 24

testConf = 25 0 0

0 24 5

0 1 20

Momentum parameter change

a) Initial step size = 0.0400

Number of processing elements in the hidden layer = 3

Momentum factor = 0.9000

trainConf = 25 0 0

0 24 1

0 1 24

testConf = 25 0 0

0 24 6

0 1 19

b) Initial step size = 0.0400

Number of processing elements in the hidden layer = 3

Momentum factor = 1.3000

trainConf = 25 25 25

0 0 0

0 0 0

testConf = 25 25 25

0 0 0

0 0 0

Number of processing elements in the Hidden layer

a) Initial step size = 0.0400

Number of processing elements in the hidden layer = 30

Momentum factor = 0.9000

trainConf = 0 0 0

0 0 0

25 25 25

testConf = 0 0 0

0 0 0

25 25 25

b) Initial step size = 0.0400

Number of processing elements in the hidden layer = 3

Momentum factor = 0.9000

trainConf = 25 0 0

0 24 1

0 1 24

testConf = 25 0 0

0 24 6

0 1 19

3. Train features =38 and Test features = 12

Step size change

a) Initial step size = 0.0400

Number of processing elements in the hidden layer = 3

Momentum factor = 0.9000

trainConf = 38 0 0

0 35 1

0 3 37

testConf = 12 0 0

0 11 1

0 1 11

b) Initial step size = 0.0040

Number of processing elements in the hidden layer = 3

Momentum factor = 0.9000

trainConf = 38 0 0

0 35 4

0 3 34

testConf = 12 0 0

0 12 2

0 0 10

Momentum parameter change

a) Initial step size = 0.0400

Number of processing elements in the hidden layer = 3

Momentum factor = 0.9000

trainConf = 38 0 0

0 35 1

0 3 37

testConf = 12 0 0

0 11 1

0 1 11

b) Initial step size = 0.0400

Number of processing elements in the hidden layer = 3

Momentum factor = 1.3000

trainConf = 38 38 38

0 0 0

0 0 0

testConf = 12 12 12

0 0 0

0 0 0

Number of processing elements in the Hidden layer

a) Initial step size = 0.0400

Number of processing elements in the hidden layer = 30

Momentum factor = 0.9000

trainConf = 0 0 0

0 0 0

38 38 38

testConf = 0 0 0

0 0 0

12 12 12

b) Initial step size = 0.0400

Number of processing elements in the hidden layer = 3

Momentum factor = 0.9000

trainConf = 38 0 0

0 35 1

0 3 37

testConf = 12 0 0

0 11 1

0 1 11

4. Train features =50 and Test features = 50

Step size change

a) Initial step size = 0.0400

Number of processing elements in the hidden layer = 3

Momentum factor = 0.9000

trainConf = 50 0 0

0 46 1

0 4 49

testConf = 50 0 0

0 46 1

0 4 49

b) Initial step size = 0.004

Number of processing elements in the hidden layer = 3

Momentum factor = 0.9000

trainConf = 50 0 0

0 43 0

0 7 50

testConf = 50 0 0

0 43 0

0 7 50

Momentum parameter change

a) Initial step size = 0.0400

Number of processing elements in the hidden layer = 3

Momentum factor = 0.9000

trainConf = 50 0 0

0 46 1

0 4 49

testConf = 50 0 0

0 46 1

0 4 49

b) Initial step size = 0.0400

Number of processing elements in the hidden layer = 3

Momentum factor = 1.3000

trainConf = 50 50 50

0 0 0

0 0 0

testConf = 50 50 50

0 0 0

0 0 0

Number of processing elements in the Hidden layer

a) Initial step size = 0.0400

Number of processing elements in the hidden layer = 3

Momentum factor = 0.9000

trainConf = 50 0 0

0 46 1

0 4 49

testConf = 50 0 0

0 46 1

0 4 49

b) Initial step size = 0.0400

Number of processing elements in the hidden layer = 30

Momentum factor = 0.9000

trainConf = 0 0 0

0 0 0

50 50 50

testConf = 0 0 0

0 0 0

50 50 50

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download