Neural Networks: Nonlinear Optimization for Constrained ...



Neural Networks: Nonlinear Optimization for

Constrained Learning and its Applications

STAVROS J. PERANTONIS

Computational Intelligence Laboratory,

Institute of Informatics and Telecommunications,

National Center for Scientific Research “Demokritos”,

153 10 Aghia Paraskevi, Athens,

GREECE

Abstract: - Feedforward neural networks are highly non-linear systems whose training is usually carried out by performing optimization of a suitable cost function which is nonlinear with respect to the optimization variables (synaptic weights). This paper summarizes the considerable advantages that arise from using constrained optimization methods for training feedforward neural networks. A constrained optimization framework is presented that allows incorporation of suitable constraints in the learning process. This enables us to include additional knowledge in efficient learning algorithms that can be either general purpose or problem specific. We present general purpose first order and second order algorithms that arise from the proposed constrained learning framework and evaluate their performance on several benchmark classification problems. Regarding problem specific algorithms, we present recent developments concerning the numerical factorization and root identification of polynomials using sigma-pi feedforward networks trained by constrained optimization techniques.

Key-Words: - neural networks, constrained optimization, learning algorithms

1 Introduction

Artificial neural networks are computational systems loosely modeled after the human brain. Although it is not yet clearly understood exactly how the brain optimizes its connectivity to perform perception and decision making tasks, there have been considerable successes in developing artificial neural network training algorithms by drawing from the wealth of methods available from the well established field of non-linear optimization. Although most of the methods used for supervised learning, including the original and highly cited back propagation method for multilayered feed forward networks [1], originate from unconstrained optimization techniques, research has shown that it is often beneficial to incorporate additional knowledge in the neural network architecture or learning rule [2]-[6]. Often, the additional knowledge can be encoded in the form of mathematical relations that have to be satisfied simultaneously with the demand for minimization of the cost function. Naturally, methods from the field of nonlinear constrained optimization are essential for solving these modified learning tasks.

In this paper, we present an overview of recent research results on neural network training using constrained optimization techniques. A general constrained optimization framework is introduced for incorporating additional knowledge in the neural network learning rule. It is subsequently shown how this general framework can be utilized to obtain efficient neural network learning algorithms. These can be either general purpose algorithms or problem specific algorithms. The general purpose algorithms incorporate additional information about the specific type of the neural network, the nature and characteristics of its cost function landscape and are used to facilitate learning in broad classes of problems. These are further divided into first order algorithms (mainly using gradient information) and second order algorithms (also using information about second derivatives encoded in the Hessian matrix). Problem specific algorithms are also discussed. Special attention is paid to recent advancements in numerical solution of the polynomial factorization and root finding problem which can be efficiently solved by using suitable sigma-pi networks and incorporating relations among the polynomial coefficients or roots as additional constraints.

2 Constrained learning framework

Conventional unconstrained supervised learning in neural networks involves minimization, with respect to the synaptic weights and biases, of a cost function of the form

[pic] (1)

Here [pic] is a column vector containing all synaptic weights and biases of the network, [pic] is an index running over the [pic] patterns of the training set, [pic] is the network output corresponding to output node [pic] and [pic] is the corresponding target.

However, it is desirable to introduce additional relations that represent the extra knowledge and involve the network’s synaptic weights. Before introducing the form of the extra relations, we note that we will be adopting an epoch-by-epoch optimization framework with the following objectives:

• At each epoch of the learning process, the vector [pic] is to be incremented by [pic], so that the search for an optimum new point in the space of [pic] is restricted to a region around the current point. It is possible to restrict the search on the surface of a hypersphere, as, for example, is done with the steepest descent algorithm. However, in some cases it is convenient to adopt a more general approach whereby the search is done on a hyperelliptic surface centered at the point defined by the current [pic]:

[pic] (2)

where [pic] is a positive definite matrix and [pic] is a known constant.

• At each epoch, the cost function must be decremented by a positive quantity [pic], so that at the end of learning [pic] is rendered as small as possible. To first order, the change in [pic] can be substituted by its first differential so that:

[pic] (3)

Next, additional objectives are introduced in order to incorporate the extra knowledge into the learning formalism. The objective is to make the formalism as general as possible and capable of incorporating multiple optimization objectives. The following two cases of importance are therefore considered, that involve additional mathematical relations representing knowledge about learning in neural networks:

Case 1: There are additional constraints [pic] which must be satisfied as best as possible upon termination of the learning process. Here [pic] is a column vector whose components are known functions of the synaptic weights. This problem is addressed by introducing a function [pic] and demanding at each epoch of the learning process the maximization of [pic] subject to the condition that [pic]. In this way, it is ensured that [pic] tends to [pic] at a temporarily exponential rate.

Based on these considerations, the following optimization problem can be formulated for each epoch of the algorithm, whose solution will determine the adaptation rule for [pic]:

Maximize [pic] with respect to [pic], subject to the following constraints:

[pic] (4)

[pic] (5)

[pic] (6)

where [pic] is the number of components of [pic].

Case 2: This case involves additional conditions whereby there is no specific final target for the vector [pic], but rather it is desired that all components of [pic] are rendered as large as possible at each individual epoch of the learning process. This is a multiobjective maximization problem, which is addressed by defining [pic] and demanding that [pic] assume the maximum possible value at each epoch. Thus the constrained optimization problem is as before with equation (6) substituted by

[pic] (7)

The solution to the above constrained optimization problems can be obtained by a method similar to the constrained gradient ascent technique introduced by Bryson and Denham [7] and leads to a generic update rule for [pic]. Therefore, suitable Lagrange multipliers [pic] and [pic] are introduced to take account of equations (5) and (4) respectively and a vector of multipliers [pic] to take account of equation (6) or equation (7). The Lagrangian thus reads

[pic] (8)

where the following quantities have been introduced:

1. A vector [pic] with elements [pic]

2. A matrix [pic] whose elements are defined by [pic] (for case 1) or [pic] (for case 2).

To maximize [pic] under the required constraints, it is demanded that:

[pic] (9)

[pic] (10)

Hence, the factors multiplying [pic] and [pic] in equation (8) should vanish, and therefore:

[pic] (11)

[pic] (12)

Equation (11) constitutes the weight update rule for the neural network, provided that the Lagrange multipliers appearing in it have been evaluated in terms of known quantities. The result is summarized forthwith, whereas the full evaluation is carried out in the Appendix. To complete the evaluation, it is necessary to introduce the following quantities:

[pic] (13)

[pic] (14)

[pic] (15)

[pic] (16)

[pic] (17)

where [pic] denotes the sum of all elements of a matrix and [pic] is a column vector whose elements are all equal to 1. In terms of these known quantities, the Lagrange multipliers are evaluated using the relations:

[pic] (18)

[pic] (19)

[pic] (20)

In the Appendix, it is shown that [pic] must be chosen adaptively according to [pic]. Here [pic] is a real parameter with [pic]. Consequently, the proposed generic weight update algorithm has two free parameters, namely [pic] and [pic].

3 First order constrained learning algorithms

3.1 Algorithms with adaptive momentum

Learning in feedforward networks is usually hindered by specific characteristics of the landscape defined by the mean square error cost function

[pic] (21)

The two most common problems arise because

• of the occurrence of long, deep valleys or troughs that force gradient descent to follow zig-zag paths [8]

• of the possible existence of temporary minima in the cost function landscape.

In order to improve learning speed in long deep valleys, it is desirable to align current and previous epoch weight update vectors as much as possible, without compromising the need for a decrease in the cost function [6]. Thus, satisfaction of an additional condition is required i.e. maximization of the quantity [pic] with respect to the synaptic weight vector [pic] at each epoch of the algorithm. Here [pic] and [pic] are the values of the weight vectors at the present and immediately preceding epoch respectively and are treated as known constant vectors. The generic update rule derived in the previous section can be applied. There is only one additional condition to satisfy (maximization of [pic] at each epoch), so that Case 2 of the previous section is applicable. It is readily seen that in this case [pic] has only one component equal to -1 (by equation (12)) and the weight update rule is quite simple:

[pic] (22)

where

[pic] (23)

with

[pic] (24)

Hence, weight updates are formed as linear combinations of the cost function derivatives [pic] with respect to the weights and of the weight updates [pic] at the immediately preceding epoch. This is similar to back propagation with a momentum term, with the essential difference that the coefficients of [pic] and [pic] are suitably adapted at each epoch of the learning process.

The resulting learning algorithm (Algorithm for Learning Efficiently using Constrained Optimization - ALECO) is historically the first constrained learning algorithm to be proposed that complies with the optimization framework of the previous section. An example of its behaviour in a benchmark problem, the 11-11-1 multiplexer problem [9] is shown in Fig. 1 where the performance of ALECO is compared to other state of the art algorithms including Back propagation (BP), resilient propagation (RP) [10], conjugate gradient (CG) [11], the quickprop (QP) algorithm [12] and Delta-bar-Delta (DBD) [9]. From these experiments it is evident that ALECO generally outperforms the other algorithms in terms of success rate and computing time (these first order algorithms require roughly the same CPU time per epoch).

[pic]Figure 1: Multiplexer benchmark task (11-11-1): Comparison of a prototype first order constrained learning algorithm (ALECO) with other learning algorithms in terms of success rate and epochs needed to complete the task.

[pic]

Figure 2: Classification of lithologies and identification of mineral alteration zones using ALECO.

Moreover, the algorithm has exhibited very good generalization performance in a number of benchmark and industrial tasks. Fig. 2 shows the result of supervised learning performed on the problem of classifying lithological regions and mineral alteration zones for mineral exploration purposes from satellite images. Among several neural and statistical classification algorithms tried on the problem, ALECO was most successful in correctly classifying mineral alteration zones which are very important for mineral exploration [13].

3.2 Dynamical system based approach

As mentioned in the previous section, the problem of slow learning in feedforward networks is also associated to the problem of temporary minima in the cost function landscape. In recent work [14], the problem of temporary minima has been studied using a method that originates from the theory of dynamical systems. One of the major results obtained are the analytical predictions for the characteristic dynamical transitions from flat plateaus (or temporary minima) of finite error to the desired levels of lower or even zero error.

It is well known that temporary minima result from the development of internal symmetries and from the subsequent building of redundancy in the hidden layer. In this case, one or more of the hidden nodes perform approximately the same function and therefore they form clusters of redundant nodes which are approximately reducible to a single unit. Due to the formation of these clusters, the network is trapped in a temporary minimum and it usually takes a very long time before the redundancy is broken and it finds its way down the cost function landscape. Introducing suitable state variables formed by appropriate linear combinations of the synaptic weights, a dynamical system model can be derived that describes the dynamics of the feedforward network in the vicinity of these temporary minima [14].

The corresponding non-linear system can be linearized in the vicinity of temporary minima and the learning behaviour of the feedforward network can then be characterized by the largest eigenvalue of the Jacobian matrix [pic] of each cluster of redundant hidden nodes corresponding to the linearized system. It turns out that in the vicinity of the temporary minima, learning is slow because the largest eigenvalues of the Jacobian matrices of all formed clusters are very small, and therefore the system evolves very slowly when unconstrained back propagation is utilized. The magnitude of the largest eigenvalues gradually grows with learning and eventually a bifurcation of the eigenvalues occurs and the system follows a trajectory which allows it to move far away from the minimum.

Instead of waiting for the growth of the eigenvalues, it is useful to raise these eigenvalues more rapidly in order to facilitate learning. It is therefore beneficial to incorporate in the learning rule additional knowledge related to the desire for rapid growth of these eigenvalues. Since it is difficult in the general case to express the maximum eigenvalues in closed form in terms of the weights, it is chosen to raise the values of appropriate lower bounds [pic] for these eigenvalues. Details are given in [14] but here it suffices to say that this is readily achieved by the generic weight update rule (Case 2 of section 2) using the above [pic] with [pic] (the unit matrix).

[pic]Figure 3: Parity 4 problem: Comparison of constrained learning algorithm DCBP with other first order algorithms in terms of success rate and complexity.

The algorithm thus obtained, DCBP (Dynamical Constrained Back Propagation), results in significant acceleration of learning in the vicinity of the temporary minima also improving the success rate in comparison with other first order algorithms. The improvement achieved for the parity-4 problem is shown in Fig. 3. Moreover, in a standard breast cancer diagnosis benchmark problem (cancer3 problem of the PROBEN1 set [15]), DCBP was the only first order algorithm tried that succeeded in learning the task with 100% success starting from different initial weights, with resilient propagation achieving a 20% success rate and other first order algorithms failing to learn the task.

4 Second order constrained learning algorithms

Apart from information contained in the gradient, it is often useful to include second order information about the cost function landscape which is of course contained in the Hessian matrix of second derivatives with respect to the weights. Constrained learning algorithms in this section combine advantages of first order unconstrained learning algorithms like conjugate gradient and second order unconstrained algorithms like the very successful Levenberg Marquardt (LM) algorithm.

The main idea is that a one-dimensional minimization in the previous step direction [pic] followed by a second minimization in the current direction [pic] does not guarantee that the function has been minimized on the subspace spanned by both of these directions. A solution to this problem is to choose minimization directions which are non-interfering and linearly independent. This can be achieved by the selection of conjugate directions which form the basis of the Conjugate Gradient (CG) method [16]. The two vectors [pic] and [pic] are non-interfering or mutually conjugate with respect to [pic] when

[pic] (25)

Our objective is to decrease the cost function of equation (1) with respect to [pic] as well as to maximize [pic]. Second order information is introduced by incrementing the weight vector [pic] by [pic], so that

[pic] (26)

Thus, at each iteration, the search for an optimum new point in the weight space is restricted to a small hyperellipse centered at the point defined by the current weight vector. The shape of such a hyperellipse reflects the scaling of the underlying problem, and allows for a more correct weighting among all possible directions [17]. Thus, in terms of the general constrained learning framework of section 2, this is a one-goal optimization problem, seeking to maximize [pic], so that (26) is respected. This falls under Case 2 of the general framework with the matrix [pic] being replaced by the Hessian matrix [pic]. The derived weight update rule reads:

[pic] (27)

where

[pic] (28)

[pic] (29)

As a final touch, to save computational resources, the Hessian is evaluated using the LM trust region prescription for the Hessian [18]

[pic] (30)

where [pic] is the Jacobian matrix and [pic] is a scalar that (indirectly) controls the size of the trust region.

As is evident from equation (27), the resulting algorithm is an LM type algorithm with an additional adaptive momentum term, hence the algorithm is termed LMAM (Levenberg-Marquardt with Adaptive Momentum). An additional improvement over this type of algorithm is the so called OLMAM algorithm (Optimized Levenberg-Marqardt with Adaptive Momentum) which implements exactly the same weight update rule but also achieves independence from the externally provided parameter values [pic] and [pic]. This independence is achieved by automatically regulating analytical mathematical conditions that should hold in order to ensure the constant maintenance of the conjugacy between weight changes in successive epochs. Details are given in [19].

LMAM and OLMAM have been particularly successful in a number of problems concerning ability of reaching a solution and good generalization ability. In the famous 2-spiral benchmark, LMAM and OLMAM have achieved a remarkable success rate of [pic] and the smallest mean number of epochs for a feedforward network with just one hidden layer (30 hidden units) and no shortcut connections that (to the best of our knowledge) has ever been reported in the literature of neural networks. As shown in Fig. 4, other algorithms like the original LM and conjugate gradient have failed to solve this type of problem in the great majority of cases, while the CPU time required by LMAM and OLMAM is also very competitive.

[pic]Figure 4: 2-spiral problem: Success rates and CPU times required for second-order constrained learning algorithms (LMAM and OLMAM) and two other algorithms (Levenberg-Marquardt and Conjugate Gradient).

Regarding generalization ability, the second order constrained learning algorithms have been successfully applied to a medical problem concerning classification of Pap-Smear test images. In particular, OLMAM has achieved the best classification ability on the test set reported in the literature among a multitude of statistical and other classification methods. Some of the results of this comparison are shown graphically in Fig. 5, while a full account of this comparison is given in [20].

[pic]Figure 5: Comparison of classification rate achieved by different classification methods for the pap-smear classification problem. The constrained learning algorithm (OLMAM) outperforms all other methods achieving a rate of 98.8%.

5 Problem specific algorithms

Problem specific applications of constrained learning range from financial modeling and market analysis problems, where sales predictions can be made to conform to certain requirements of the retailers, to scientific problems like the numerical solution of simultaneous linear equations [21]. A scientific application of constrained learning that has reached a level of maturity is numerical factorization and root finding of polynomials.

Polynomial factorization is an important problem with applications in various areas of mathematics, mathematical physics and signal processing ([22] and references cited therein).

Consider, for example, a polynomial of two variables [pic] and [pic]:

[pic] (31)

with [pic] even, and [pic]. For the above polynomial, It is sought to achieve an exact or approximate factorization of the form

[pic] (32)

where

[pic] (33)

with [pic]. We can try to find the coefficients [pic] by considering [pic] training patterns selected from the region [pic]. The primary purpose of the learning rule is thus to minimize with respect to the [pic] a cost function of the form

[pic] (34)

Note that this cost function corresponds to a sigma-pi neural network with the elements of [pic] and [pic] as its synaptic weights.

Unconstrained minimization of the cost function has been tried, but often leads to unsatisfactory results, because it can be easily trapped in flat minima. However, there is extra knowledge available for this problem. The easiest way to incorporate more knowledge is to take advantage of the constraints among the coefficients of the desired factor polynomials and the coefficients of the original polynomial. More explicitly, if it is assumed that [pic] is factorable, then these constraints can be expressed as follows:

[pic]

with [pic]. Thus, the objective is to reach a minimum of the cost function of equation (34) with respect to the variables [pic], which satisfies as best as possible the constraints [pic] 0, where [pic].

The constraints can be incorporated into the constrained optimization formalism as in Case 1 of section 2. It turns out that the constrained learning algorithm can determine the factor polynomials in factorable cases and gives good approximate solutions in cases where the original polynomial is non-factorable [23].

Recently, there have been interesting developments that allow us to achieve improved results in the problem of numerical factorization and root finding for polynomials using constrained learning techniques. In [24] additional constraints have been incorporated for ensuring stability of the resulting factor polynomials in filter factoring problems related to signal processing. More importantly, the basic constrained learning method has been applied to root finding of arbitrary polynomials of one variable. The parallel structure of the neural network has been exploited in order to obtain all roots of the original polynomial simultaneously or in distinct groups, resulting to efficient algorithms capable of handling problems with polynomials of high degree [25]. The method has also been extended for finding the roots of polynomials with arbitrary complex coefficients and with roots that can be close to each other and need extra effort to be resolved [26]. A recent extension incorporates constraints among the root moments of the original polynomial instead of the relations among the coefficients themselves [27]. These advances have rendered the constrained learning algorithms very competitive compared with well established numerical root finding methods like the Muller and Laguerre techniques leading to better accuracies at a fraction of the CPU time.

6 Conclusion

An overview of recent research results on neural network training using constrained optimization techniques has been presented. A generic learning framework was derived in which many types of additional knowledge, codified as mathematical relations satisfied by the synaptic weights, can be incorporated. Specific examples were given of the application of this framework to neural network learning including first and second order general purpose algorithms as well as problem specific methods. It is hoped that the constrained learning approach will continue to offer insight into learning in neural networks. It has potential to combine the merits of both connectionist and knowledge based approaches for developing successful applications.

Appendix: Derivation of constrained learning algorithm

In this Appendix, evaluation of the Lagrange multipliers [pic], [pic] and [pic] involved in the general constrained learning framework of section 2 is carried out.

By multiplying both sides of equation (11) by [pic] and by taking into account equation (5) we obtain:

[pic] (35)

Solving for [pic] readily yields equation (20), which evaluates [pic] in terms of [pic] and [pic].

By left multiplication of both sides of equation (11) by [pic] and taking into account equations (6) and (20), we obtain

[pic] (36)

where the matrix [pic] is defined by equation (16). Solving equation (36) for [pic] yields

[pic] (37)

By substituting this equation into equation (12) we arrive at:

[pic] (38)

We can now substitute this equation into equation (37) to obtain equation (19) evaluating [pic] in terms of [pic].

To evaluate [pic], we must substitute our expression for [pic] into equation (4). To make the algebra easier, we note that on account of equation (20), equation (11) can be written as:

[pic] (39)

where

[pic] (40)

From the definition of [pic] we can readily derive the following properties:

[pic] (41)

Substituting equation (39) into equation (4) and taking into account equation (41), we can obtain a relation involving only [pic] and [pic]:

[pic] (42)

where the negative square root sign has been selected on account of inequality (10).

By substituting equation (19) into equation (42) and solving for [pic], equation (18) is obtained, with [pic] given by equation (17). Evaluation of all Lagrange multipliers in terms of known quantities is now complete.

As a final note, let us discuss our choice for [pic]. This choice is dictated by the demand that the quantity under the square root in equation (42) be positive. It can readily be seen by the first of equation (41) that [pic] provided that [pic] is positive definite. Since [pic], it follows from equation (42) that care must be taken to ensure that [pic]. The simplest way to achieve this is to set [pic] with [pic].

References

[1] D. E. Rumelhart, J. E. Hinton and R  J. Williams, Learning internal representations by error propagation, in: Parallel Distributed Processing: Explorations in the Microstructures of Cognition, Vol. 1, Foundations, eds. D. E. Rumelhart and J. L. McLelland, MIT Press, 1986, pp. 318-362.

[2] D. Barber and D. Saad, Does extra knowledge necessarily improve generalization?, Neural Computation, Vol. 8, 1996, pp. 202-214.

[3] Y. le Cun, L. D. Jackel, B. E. Boser, J. S. Denker, H-P. Graf, I. Guyon, D. Henderson, R. E. Howard and W. Hubbard, Handwritten digit recognition: Applications of neural network chips and automatic learning, IEEE Communications Magazine, Nov.1989, pp. 41-46.

[4] P. Simard, Y. le Cun and J. Denker, Efficient pattern recognition using a new transformation distance, in: Advances in Neural Processing Systems, eds. S. J. Hanson, J. D. Cowan and C. L. Giles, Morgan Kaufmann, 1993, pp. V-50–V-58.

[5] S. Gold, A. Rangarajan and E. Mjolsness, Learning with preknowledge: clustering with point and graph matching distance, Neural Computatation, Vol. 8, 1996, pp. 787-804.

[6] S. J. Perantonis and D. A. Karras, An efficient learning algorithm with momentum acceleration, Neural Networks, vol. 8, 1995, pp. 237-249.

[7] A. E. Bryson and W. F. Denham, A steepest ascent method for solving optimum programming problems, Journal App. Mech., Vol. 29, 1962, pp. 247-257.

[8] S. S. Rao, Optimization Theory and Applications, New Delhi, Wiley Eastern, 1984.

[9] R. A. Jacobs, Increased rates of convergence through learning rate adaptation, Neural Networks, Vol. 1, 1988, pp. 295-307.

[10] M. Riedmiller and H. Braun, A direct adaptive method for faster backpropagation learning: The RPROP algorithm, Proceedings of the International Conference on Neural Networks, San Francisco, Vol. 1, 1993, 586-591.

[11] E. M. Johansson, F. U. Dowla and D. M. Goodman, Backpropagation learning for multilayer feedforward networks using the conjugate gradient method, International Journal of Neural Systems, Vol. 2, 1992, pp. 291-301.

[12] S. E. Fahlman, Faster learning variations on back-propagation: An empirical study, in: Proceedings of the Connectionist Models Summer School, eds. D. Touretzky, G. Hinton and T. Sejnowski, Morgan Kaufmann, 1988, pp. 29-37.

[13] J. Aarnisalo et al., Integrated technologies for minerals exploration: pilot project for nickel ore deposits, Transactions of the Institution of Mining and Metallurgy, Section B, Applied Earth Science, Vol. 108, September-December 1999, pp. 151-163.

[14] N. Ampazis, S. J. Perantonis and J. G. Taylor, A dynamical model for the analysis and acceleration of learning in feedforward networks, Neural Networks, Vol. 14, 2002, pp. 1075-1088.

[15] L. Prechelt, PROBEN1-A set of neural network benchmark problems and benchmarking rules, Technical Report 21/94, Universität Karlsruhe, Germany, 1994.

[16] J. Gilbert and J. Nocedal, Global convergence properties of conjugate gradient methods for optimization, SIAM Journal on Optimization, Vol. 2, 1992, pp. 21-42.

[17] N. I. M. Gould and J. Nocedal, On the modified absolute-value factorization norm for trust-region minimization, In High Performance Algorithms for Software in Nonlinear Optimization, Boston, MA: Kluwer, 1998, pp. 225-241.

[18] K. Levenberg, A method for the solution of certain problems in least squares, Quarterly of Applied Mathematics, Vol. 5, 1944, pp. 164-168.

[19] N. Ampazis and S. J. Perantonis, Two highly efficient second order algorithms for feedforward networks, IEEE Transactions on Neural Networks, Vol. 13(5), 2002, pp. 1064-1074.

[20] N. Ampazis, G. Dounias and J. Jantzen, Efficient second order neural network training algorithms for the construction of a Pap-smear classifier, SETN04, Samos, Greece, 2004, accepted for presentation.

[21 D. S. Huang, On the comparisons between RLSA and CLA for solving arbitrary linear simultaneous equations, Lecture Notes in Computer Science, Vol. 2690, 2003, pp. 169-176, Springer-Verlag.

[22] N. E. Mastorakis, A method of approximate multidimensional factorization via the singular value decomposition, Found. of Comput. Decis. Sci., Vol. 21, No. 3, 1996, pp.137-144.

[23] S. J. Perantonis, N. Ampazis, S. Varoufakis and G. Antoniou, Constrained learning in neural networks: Application to stable factorization of 2-D polynomials, Neural Proc. Lett., Vol. 7, 1998, pp. 5-14.

[24] G. Antoniou, S. J. Perantonis, N. Ampazis and S. J. Varoufakis, Stable factorization of 2-D polynomials using neural networks, Proceedings of 13th International Conference on Signal Processing, Santorini, Greece, July 1997, pp. 983-986.

[25] D. S. Huang and Z. Chi, Neural networks with problem decomposition for finding real roots of polynomials, Proceedings of IJCNN2001, Washington DC, Addendum, 2001, pp. 25-30.

[26] D. S. Huang, H. H. S. Ip, Z. Chi and H.S. Wong, Dilation method for finding close roots of polynomials based on constrained learning neural networks, Physics Letters, Vol. A309, No. 5-6, 2003, 443-451.

[27] D. S. Huang, Finding roots of polynomials based on root moments. Proc. 8th Int. Conf. on Neural Information Processing (ICONIP), Shanghai, China 2001, Vol. 3, 2001, pp. 1565-1571.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download