The MATLAB Notebook v1.5.2



EMGT378

Homework-6

By: Yuping Wang

E10.4 In this exercise we will modify the reference pattern P2 from Problem P10.3:

[pic], [pic]

i. Assume that the patterns occur with equal probability. Find the mean square error and sketch the contour plot.

ii. Find the maximum stable learning rate.

iii. Write a MATLAB M-file to implement the LMS algorithm for this problem. Take 40 steps of the algorithm for a stable learning rate. Use the zero vector as the initial guess. Sketch the trajectory on the contour plot.

iv. Take 40 steps of the algorithm after setting the initial values of both parameters to 1. Sketch the final decision boundary.

v. Compare the final parameters from parts (iii) and (iv). Explain your results.

Answer:

i. The mean square error can be expressed as [pic], and c, h and R can be calculated by:

[pic]

[pic]

so the mean square error performance index is

[pic]

[pic]

The eigen values and eigen vectors of Hessian matrix of F(x) is

A=2*[1 1;1 1];

[V,D] = eig (A)

V =

0.7071 0.7071

-0.7071 0.7071

D =

0 0

0 4.0000

Since all the eigen values are non-negative, and one is zero, there's a weak minimum in this problem. The surface and contour plot of F(x) are shown in Fig.1 and Fig.2.

ii. The maximum stable learning rate will satisfy [pic].

iii. Take [pic]. After 40 iterations we got [pic] (W1 in the code), and the trajectory is shown in Fig.2 as dotted symols.

clear

[X,Y] = meshgrid(-3 : .1 : 3);

F = 1 - 2 * (X + Y) + (X + Y).^2;

surf(X,Y,F)

title('Fig.1 Surface plot of Mean Square Error');

figure;

contour(X,Y,F)

title('Fig.2 Trajectory for different initial weights');

hold on;

%Initialize data

P = [1 -1;1 -1];

T = [1 -1];

alfa = 0.2;

W1 = [0;0];

W2 = [1;1];

for k = 1 : 2

if (k == 1)

W = W1;

else

W = W2;

end

plot(W(1), W(2),'r*')

text(-0.3,-0.3,'W_0 =(0,0)');

text(1,1.2,'W_0 =(1,1)');

%Train the network

for step = 1 : 20

for i = 1 : 2

a = purelin(W' * P(:,i));

e = T(i) - a;

W = W + 2 * alfa * e * P(:,i);

if (k == 1)

plot(W(1), W(2),'k.')

W1 = W;

else

plot(W(1), W(2),'b+')

W2 = W;

end

end

end

end

W1

W2

hold off;

W1 =

0.5000

0.5000

W2 =

0.5000

0.5000

[pic]

[pic]

iv. iv. For initial weights [1 1], after 40 iterations, we got [pic] (W2 in the code), and the trajectory is also shown in Fig.2 as "+" symols. And the decision boundary is given in Fig.3.

P = [1 -1;1 -1];

W = [0.5 0.5];

figure;

plot(P(1,1),P(2,1),'r+');

hold on;

plot(P(1,2),P(2,2),'r+');

%Decision Boundary

x = -2 : .1 : 2;

y =(-W(1)*x )/W(2);

plot(x,y);

axis([-2 2 -2 2]);

title('Fig.3 Decision boundary for E10.4');

hold off;

[pic]

v. From the figure above, we can see that LMS algorithm, unlike the perceptron learning rule, places the decision boundary as far from the patterns as possible.

Also, we see that for 2 different initial weights, the solution converges to the same value. However, if other arbitrary initial points are selected, we could get different solutions, since the performance index only has a weak minimum, i.e. the minimum is not unique. There's a line on which all the mean square error is minimum (zero).

E10.5 We again use the reference patterns and targets from Problem P10.3, and assume that they occur with equal probability. This time we want to train an ADALINE network with a bias. We now have three parameters to find: [pic].

i. Find the mean square error and the maximum stable learning rate.

ii. Write a MATLAB M-file to implement the LMS algorithm for this problem. Take 40 steps of the algorithm for a stable learning rate. Use the zero vector as the initial guess. Sketch the final decision boundary.

iii. Take 40 steps of the algorithm after setting the initial values of all parameters to 1. Sketch the final decision boundary.

iv. Compare the final parameters and the decision boundaries from parts (iii) and (iv). Explain your results.

Answer:

i. In this problem, [pic], [pic].

When bias is considered, we'll have [pic] and [pic], then c, h and R

can be calculated by:

[pic]

[pic]

So the mean square error performance index is

[pic]

[pic]

The eigen values and eigen vectors of Hessian matrix of F(x) is

A=2*[1 0 1;0 1 0;1 0 1];

[V,D] = eig (A)

V =

0 0.7071 0.7071

1.0000 0 0

0 -0.7071 0.7071

D =

2.0000 0 0

0 0 0

0 0 4.0000

The maximum stable learning rate will satisfy [pic].

ii. Take [pic]. After 40 iterations we got [pic]. The decision boundary is given in Fig.4.

clear

%Initialize data

P = [1 1;1 -1];

T = [1 -1];

alfa = 0.2;

W = [0;0];

b = 0;

%Train the network

for step = 1 : 20

for i = 1 : 2

a = purelin(W' * P(:,i) + b);

e = T(i) - a;

W = W + 2 * alfa * e * P(:,i);

b = b + 2 * alfa * e;

end

end

W

b

%Display in graph

figure;

plot(P(1,1),P(2,1),'r+');

hold on;

plot(P(1,2),P(2,2),'r+');

%Decision Boundary

x = -2 : .1 : 2;

y =(-W(1,1)*x - b)/W(2,1);

plot(x,y);

axis([-2 2 -2 2]);

title('Fig.4 Decision boundary for x_0=[0 0 0]');

hold off;

W =

-0.0000

1.0000

b =

-3.0058e-015

[pic]

iii. Take [pic]. After 40 iterations we got [pic]. The decision boundary is given in Fig.5.

W =

0.0000

1.0000

b =

6.9295e-015

[pic]

iv. Since LMS algorithm tends to find a decision boundary as far from the patterns as possible, the resulting biases for each initial point are close to zero, so that the boundary falls almost halfway between the two vectors.

Also, since the Hessian matrix has zero eigenvalue, there'd exist a weak minimum.

E11.7 For the network shown in the figure below, the initial weights and biases are chosen to be [pic].

The network transfer functions are [pic], [pic],

and the input / target pair is given to be [pic].

Perform one iteration of backpropagation with [pic].

[pic]

Answer:

First find the derivative of the transfer functions:

[pic], [pic]

Propagate the input through the network:

[pic]

[pic]

Find sensitivities:

[pic]

[pic]

Update the weights and biases:

[pic]

[pic]

[pic]

[pic]

E11.11 Write a MATLAB program to implement the backpropagation algorithm for the 1-2-1 network shown in Fig11.4. Choose the initial weights and biases to be random numbers uniformly distributed between –0.5 and 0.5 (using the MATALB function rand), and train the network to approximate the function

[pic] [pic].

Try several different values for the learning rate[pic], and use several different initial conditions. Discuss the convergence properties of the algorithm.

Answer:

Several cases were run using the following code, for learning rate [pic] = 0.2, 0.4, 0.6 respectively. For each [pic], two different initial conditins are used, and both the final approximation curve and some intermediate results are shown in the figures.

clear

%Initialize data

W1 = rand(2,1) - 0.5;

W2 = rand(1,2) - 0.5;

b1 = rand(2,1) - 0.5;

b2 = rand - 0.5;

a1 = zeros(2,1);

%Output the initial set

W1_0 = W1

b1_0 = b1

W2_0 = W2

b2_0 = b2

alfa = 0.2; %learning rate

tol = 0.001; %tol: tolerance

mse = 1; %mse: mean square error

iter = 0;

figure;

while (mse > tol)

mse = 0;

i = 0;

iter = iter + 1;

for P = -2 : .1 :2

i = i + 1;

T = 1 + sin(pi*P/8);

a1 = logsig(W1*P + b1);

a2 = purelin(W2*a1 + b2);

mse = mse + (T - a2)^2;

A(i) = a2;

dlogsig = [(1 - a1(1))* a1(1) 0;0 (1 - a1(2))* a1(2)];

s2 = -2 * (T - a2);

s1 = dlogsig * W2' * s2;

W2 = W2 - alfa * s2 * a1';

W1 = W1 - alfa * s1 * P;

b2 = b2 - alfa * s2;

b1 = b1 - alfa * s1;

end

P = -2 : .1 : 2;

if (mod(iter,10) == 0)

plot(P,A,'g:')

end

hold on;

end

%Display in graph

P = -2 : .1 : 2;

T = 1 + sin(pi*P/8);

%figure;

plot(P,T,'r-',P,A,'b+')

title('Fig6.1 learning rate = 0.2, initial set #1');

text(-1.8,1.7,'red ---- original function');

text(-1.8,1.6,'blue ---- approximation');

text(-1.8,1.5,'green ---- intermediate results');

xlabel('P'), ylabel('Target vs. output');

W1

b1

W2

b2

iter

Case 1.1 [pic] = 0.2, the solution converged after 318 iterations.

W1_0 =

0.1813

-0.1205

b1_0 =

0.2095

-0.0711

W2_0 =

0.3318 0.0028

b2_0 =

-0.1954

W1 =

0.7448

0.6206

b1 =

-0.0800

-0.1009

W2 =

1.5828 0.7052

b2 =

-0.0993

iter =

318

[pic]

Case 1.2 [pic] = 0.2, the solution converged after 237 iterations.

W1_0 =

-0.3103

-0.3066

b1_0 =

0.0417

-0.3491

W2_0 =

0.1822 -0.1972

b2_0 =

0.1979

W1 =

0.7909

-0.5887

b1 =

-0.2529

-0.4091

W2 =

1.4986 -0.7943

b2 =

0.6568

iter =

237

[pic]

Case 2.1 [pic] = 0.4, the solution converged after 13 iterations.

W1_0 =

-0.1216

0.3600

b1_0 =

-0.0034

0.3998

W2_0 =

0.3537 0.0936

b2_0 =

0.3216

W1 =

0.8861

1.2248

b1 =

-0.1264

0.2634

W2 =

0.7618 1.0548

b2 =

0.0852

iter =

13

[pic]

Case2.2 [pic] = 0.4, the solution converged after 315 iterations.

W1_0 =

0.1449

0.3180

b1_0 =

-0.2103

-0.1588

W2_0 =

0.1602 -0.1580

b2_0 =

0.0341

W1 =

0.7178

0.7196

b1 =

-0.0857

-0.1080

W2 =

1.2420 1.0103

b2 =

-0.0777

iter =

315

[pic]

Case 3.1 [pic] = 0.6, the solution converged after 303 iterations.

W1_0 =

0.2271

-0.1907

b1_0 =

-0.1296

0.2027

W2_0 =

0.3385 0.0681

b2_0 =

0.0466

W1 =

1.1146

0.9723

b1 =

0.1758

0.1192

W2 =

1.0237 0.7772

b2 =

0.0754

iter =

303

[pic]

Case 3.2 [pic] = 0.6, the solution converged after 331 iterations.

W1_0 =

-0.0551

0.1946

b1_0 =

0.4568

0.0226

W2_0 =

0.1213 0.2948

b2_0 =

0.3801

W1 =

1.3190

0.7729

b1 =

0.4491

-0.1514

W2 =

1.1491 0.6853

b2 =

0.0617

iter =

331

[pic]

Discussion:

From above cases, we see that:

1) The higher the learning rate is, the faster the iteration process converges.

2) When [pic] > 0.6, the solution becomes unstable. e.g. When [pic] = 0.8 is used, the converging process is very unstable, and takes a long time.

3) The initial weights and biases affects the converging speed and the final results.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download