Neural Networks:



Neural Networks:

Techniques and Applications in Telecommunications Systems

S. Wu K. Y. Michael Wong

Department of Computer Science Department of Physics

Sheffield University Hong Kong University of Science and Technology

Regent Court Clear Water Bay

211 Portobello Street Hong Kong

Sheffield S1 4DP China

United Kingdom

I. INTRODUCTION

As their name suggests, neural networks represent a class of intelligent techniques which derive their inspiration from neuroscience in biology [1]. Despite the slowness of signal transmission and processing in the human brain when compared with the digital computer, it remains superior in many everyday tasks such as retrieving memories, recognizing objects and controlling body motion. This superiority is attributed to the fundamental difference in the way information is processed in the two systems. In conventional approaches, information is processed serially and with mathematical precision. Explicit algorithms for computation have to be pre-specified. In contrast, the human brain consists of a network of interconnecting neurons which receive (often unprecise) information fed by other neurons or by external stimuli in a parallel fashion. The neurons then transmit output signals in an apparently stochastic manner, with probabilities determined by the received input. Amazingly, this endows them with the advantages of being robust, fault tolerant, flexible and easy to adapt to environmental changes, when compared with conventional information processors. They are able to learn from experience even when no precise mathematical models are available. Artificial neural networks are introduced as systems which try to capture these features so that they can be utilized in intelligent applications.

With their well-known complexities, telecommunications systems become a natural niche for neural network applications. Studies in telecommunications systems are often classified into layers. The bottommost layers deal with physical connections and data links, addressing such issues as how to minimize the bit error rate, or to cancel interferences in the channel. Intermediate layers deal with the network as a whole, addressing such issues as how to allocate resources according to the demand of individuals or groups of customers, while generating maximum revenue for the network at the same time. Topmost layers deal with customer applications, addressing such issues as speech and text recognition, data mining and image processing.

In terms of the temporal nature of the studies, these issues may arise in the design stage, in which one has to consider a static optimization problem. Other issues may be found in real-time control of the network, and hence the dynamical aspects of the problem have to be considered, and the computational output has to be fast.

In terms of the scope of the computation, network design problems and some real-time control problems may involve the entire network. Network management problems, such as fault diagnosis, may involve monitoring networkwide events which are highly correlated among themselves. Other control issues may involve a group of locally optimal controllers, which nevertheless are expected to collectively attain a networkwide optimal state.

Taking these factors into consideration, complexity is best single word to summarize the situation of modern telecommunications systems. They are characterized by their openness to a growing population of customers with an expanding demand on bandwidths and new applications, a highly competitive market, and ever-evolving technology. Concerning the last point on technological advances, high-speed networks decimating the present-day ones will emerge, heightening demands for faster controls on more volatile situations.

Research literature, mainly (but by no means exclusively) found in several workshop proceedings and special issues, confirmed the rich variety of areas that researchers have embarked on [2-6]. At the bottommost layers of telecommunications, researchers have considered neural network techniques of channel equalization of digital signals [7-17], switching [18-22], and adaptive flow control [23] for data transmissions. On the network level, there is much interest of using neural networks in dynamic routing of teletraffic [24], rapid configuration of the intercity backbone network (technically, the synchronous digital hierarchy, or SDH) [25-26], and overload control for distributed call processors [27-28].

At the topmost layer of applications, neural network techniques are employed in speech processing [29-32], smoothing of video delay variations [33], data mining [34-35], image coding [36-38], clone detection [39-40], growth prediction [41], and marketing support [42-43].

Network designers and planners also found support in neural network techniques, using them in network dimensioning [44], network clustering and topology design [45-47], optimal broadcast scheduling and channel assignment in wireless networks [48-51], optimal routing [52], optimal trunk reservation [53], and expert system design tools [54].

In network management, neural networks are used in monitoring [55], path failure prediction [56], fault detection [57-58], error message classification and diagnosis [59-66], alarm correlation [67], and fraud detection [68-71].

With the recent growth in mobile wireless networks and asynchronous transfer mode (ATM) networks carrying multiple classes of traffic, new and more complicated design, control and management issues arise, stimulating an upsurge in the development of the neural network as a tool to deal with them. In mobile wireless networks, neural networks are considered in the handover decision process [71] location detection [72], dynamic channel allocation [73-75], and admission control [68]. In ATM networks, excitement about neural networks is raised in the areas of link allocation and call admission [76-89], flow and congestion control [90-92], dynamic routing [93], capacity dimensioning [94-95], traffic trends analysis [96], generation and modeling and prediction of ATM traffic [97-100], and multicast routing [101].

As the neural network and other intelligent techniques became more mature, more recent applications in telecommunications began to appear in systems integrating neural networks with conventional approaches, fuzzy logic and genetic algorithms. Neural networks may be combined with fuzzy logic to ensure media synchronization in a Web environment [102]. In another two-level attempt to combine call admission control and bandwidth allocation problems when accessing a multiservice network, neural networks are used to decide the balance of access of real-time and non real-time traffic from each single user, whereas a dynamic programming algorithm is used to decide the share of each user in the common resource [103]. For more sophisticated applications, neural and fuzzy systems may come in modules which cooperate to perform complex tasks. For example, a fuzzy/neural congestion controller used for multimedia cellular networks may consist of three modules: (1) a neural predictor of microwave interference, and (2) a fuzzy performance indicator, both feeding their outputs to (3) a neural controller assigning access probability to network users [104].

Two features of neural networks will be apparent from the examples discussed in this chapter. First, when the function used to implement the task can be generated by conventional methods, it may be too complicated to be computed in real time. In these cases, neural networks are ideal substitutes for the conventional methods, since their outputs can be computed in a small number of passes through their structures. It has been proved that arbitrary functions can be approximated by neural networks with suitable structures [105]. Hence neural networks can play the role of approximating, interpolating or extrapolating complex functions. They underwent a preparatory learning stage, during which examples are generated by the conventional method and used to train the neural networks. After learning, the neural networks are expected to imitate the functions in general network situations. In this chapter we will consider the example of overload control of call processors in telecommunications networks. Here, the networkwide optimal function is known, but is too complex to be computed in real time. The neural network controllers are supposed to implement local control in a distributed manner, with only partial information of the network status available to them. In this way, they can be considered as extrapolators of complex functions.

Second, when the task function cannot be obtained by conventional methods, but learning examples of the target function are available, neural networks can play the role of a rule extractor, capitalizing on their capability to learn from experience without knowing explicitly the underlying rules. The learning examples may be generated off-line, so that the learning and operating stages of the neural network can be separated.

Alternatively, the learning examples may be generated directly from the network during its real time operations. In many cases, such as those involving the prediction of temporal information, the current states of the neural networks affect the outcomes of the examples, which in turn affect the learning process of the neural networks themselves. The entire system can then be considered as a dynamical system. This approach works when the consequences of the control actions can be assessed shortly afterwards, so that training examples can be collected for supervised learning. An example is the call admission control in ATM networks based on the steady state behavior of the calls in the network [76-79].

Even more complicated are the cases in which no error signals are available for the neural networks to formulate their learning processes. Instead, the environment only provides punishing or awarding feedbacks to the neural networks, which are less specific and less informative. In these cases the neural networks may achieve their target tasks through a process of reinforcement learning, in much the same way that circus animals are trained through carrot-and-stick processes. In telecommunications applications, the complexity of the system often renders the long-term consequences of the control actions difficult to be assessed and collected into sets of training examples for supervised learning. For example, the acceptance of a call into the network may prevent the access of a more valuable call in the future. Thus, reinforcement learning approaches have been used in applications with this nature, such as dynamic packet routing in networks with many nodes [106-108], dynamic channel allocation in mobile networks [74-75], adaptive call admission control and routing in ATM networks [81-85], and power management in wireless channels [109]. Reinforcement learning is often also considered as dynamic programming problems, which are outside the scope of this chapter.

In this chapter, we will use as an example a problem of fault diagnosis in telecommunications management, in which learning examples are available off-line, but no conventional mathematical models are available, and strong unknown correlations exist in the various symptoms used as the inputs.

The chapter is organized as follows. In Section II we will introduce the techniques of neural networks to be used in subsequent examples. In Sections III and IV, we will illustrate the applications of these techniques via two examples, respectively being overload control and fault diagnosis. A data model will be introduced in Section V, in an attempt to understand the choice of appropriate network algorithms for general diagnostic tasks. Concluding remarks are given in Section VI.

II. OVERVIEW OF NEURAL NETWORK METHODS (NNM)

There are many different ways to interpret the working principles of NNM, such as from the perspectives of statistical inference [110] or information geometry [111]. Here, we adopt the perspective of function approximation. Simply speaking, NNM can be considered as a way to construct a function by associating it with a network structure. The neural networks in Figs. 1 and 2 are two common examples. The network weights are the adjustable parameters. Given examples of the function to be learned (often referred to as the teacher), a neural network adjusts the weights, so that its outputs can imitate those of the teacher. The power of neural networks comes from their ability of universal approximation and the ease of training them. For a two-layer neural network with sufficient number of units, any continuous function can be learned well [105]. It is remarkable that this learning process can be done without the knowledge of the real function, but instead, by using examples only. These properties make NNM extremely useful in applications when real functions are unknown.

[pic]

Figure 1. The single layer perceptron.

[pic]

Figure 2. The multiplayer perceptron.

A. EXAMPLES OF NEURAL NETWORKS

Below, we will first introduce two standard feedforward networks, namely, the single layer perceptron (SLP) and the multilayer perceptron (MLP), as shown in Figs. 1 and 2, respectively. Various forms of neural networks exist in the literature, either for the purpose of simplifying the training process, or as designs for specific applications. One such variation, called the radial basis function neural network (RBFNN), will be introduced next. Finally, a modern learning method, called the support vector machine (SVM), will be described and may be understood as an advanced version of RBFNN.

Apart from the feedforward network, another widely used neural network model is the Hopfield network, in which nodes of the same layer interact with each other (instead of acting only in the feedforward direction) [112]. This kind of network is often used to solve difficult optimization problems, utilizing the property of its recurrent dynamics which evolves to a steady optimal state. They will not be discussed in this chapter.

1. Single Layer Perceptron (SLP)

Figure 1 shows the structure of SLP, where x = {xi}, for i = 1, …, d, represents the input; w = {wi}, for i = 1, …, d, the network weights; and y the network output. [pic] is the bias, and [pic] is permanently set to +1. The function expressed by this network is

[pic] (1)

where f is a function chosen for the network. Given a set of examples of the teacher, {xl, Ol} for l = 1, …, N, where [pic] is the output of the teacher for [pic], SLP optimizes the weight by minimizing a cost function, which is often chosen to be the mean square error

[pic] (2)

The simplest method to minimize the function (2) is the standard gradient descent method, which updates the weights as

[pic] (3)

where ( is a chosen learning rate.

The most serious limitation of SLP lies in the range of functions it can represent. For example, if SLP is used for a classification task, it outputs the class index of an input (+1 or –1), i.e., y(x) = sign(w(x + w0), where the function sign(() is the sign of its argument. The separating boundary generated by SLP is determined by w(x + w0 =0, which is a linear hyperplane in the d dimensional space. To allow for more general mappings (nonlinear boundaries, for example), SLP should be extended to have more than one layer, which leads us to MLP.

2. Multilayer Perceptron (MLP)

Figure 2 shows a two-layer MLP. The nodes in the second layer are called the hidden units, which receive inputs from the lower layer and feed signals to the upper one. The function expressed by this two-layer network has the form

[pic] (4)

where the function [pic] must be nonlinear, since the network reduces to SLP otherwise. The standard gradient descent method, when adapted to MLP, is called Back-Propagation (BP) [113]. According to BP, MLP updates weights by

[pic] (5)

[pic] (6)

where E is a suitably chosen cost function. In (6), the chain rule is used to calculate the derivative of [pic], which can be understood as back-propagating an error, (E/(g, from the second layer to the first one.

3. Radial Basis Function Neural Network (RBFNN)

Though MLP with sufficient number of hidden units has guaranteed approximation ability, it is still not preferable in many cases. One complication in training MLP arises from the fact that the cost function is a nonlinear function of the network weights, and hence has many local minima. In this case, it is difficult to find the optimal solution by using any method based on gradient descent. To overcome this shortcoming, a new network model, RBFNN, is proposed [114].

As shown in Fig. 3, RBFNN can be expressed as a two-layer network. In contrast to MLP, it has no adaptive weights in the first layer. The activations of the hidden units are radial basis functions (RBF) of the inputs, centered at different locations. For example, when the RBF is chosen to be Gaussian,

[pic] (7)

where [pic] is the RBF center of the jth hidden unit, and ( the RBF width. The final output of RBFNN is a weighted summation of all RBFs

[pic] (8)

The universal approximation ability of RBFNN with adequate RBFs is also guaranteed [115]. The adjustable parameters in the case of Gaussian RBFs include the centers and widths, and the network weights in the second layer. The efficiency of RBFNN comes from splitting the training process into two simple steps. The first step is a an unsupervised procedure aiming to find out the centers and widths, which is often done through modeling the density distribution of input data by using methods such as K-means clustering [114] or EM algorithm [116]. (Here, supervised or unsupervised learning refer to whether the teacher outputs the examples are used). The second step is to optimize the network weights by using supervised learning analogous to that in SLP. Note that, in the second step of training, the cost function is linear in weights, whose minimum can be easily obtained.

[pic]

Figure 3. The radial basis function neural network.

4. Support Vector Machine (SVM)

SVM is currently the most advanced technique for classification and regression [117]. It was originally developed by Vapnik and co-authors from the point of view of Structure Risk Minimization [118], and later connected with the regularization networks [119]. As far as we are aware of, the application of SVM to telecommunications networks has not appeared in the literature yet. However, its popularity in the near future is anticipated. It is therefore worthwhile to give an introduction here. We will not go into the details of the SVM implementation, but would rather compare it with RBFNN to show its power. To get more knowledge of SVM, the reader can refer to [120].

SVM looks for a solution of y(x) = w(((x) by minimizing the cost function (the regression problem is used here as illustration)

[pic] (9)

where [pic] denotes the ( - insensitive function, which takes value of f when |f| > (, and 0 otherwise. The solution of SVM turns out to have the form

[pic] (10)

where K(x, xl) = ((x)(((xl) is called the kernel function, and hl = 0 for M + 1 ( l ( N, after relabeling the examples. Due to the specific form of the cost function, the attractive property of SVM is that normally only a very few number of coefficients [pic] are nonzero, i.e., M p0–1, or y > –x on using Eqs. (39-40), nki >> 1 for all bits. Both the informators and backgrounds contribute to classification, and the Bayesian probabilities can be estimated accurately.

D. THE OBSERVED TRENDS

Figure 10 shows the rich behavior in the phase diagrams for the Bayesian classifier when the informator strength [pic] varies at a given informator frequency (. We observe the following trends.

1. Classification Eases with Error Rate

For a given error rate exponent x, there is a critical training set size, with exponent y, necessary for perfect generalization, which yields the phase lines. When the error rate p0 (or its exponent x) increases, both the background bits and the informators carry more and more information, since the former become more and more populous, and the latter more and more certain. Hence the critical training set size exponent y decreases. When x is sufficiently large, classification is easy and the critical y reaches 0. A small training set size of the order N0 is already sufficient for achieving perfect generalization.

2. Classification Eases with Informator Strength

When the informator strength [pic] is below (1 – ()/(1 + () (equals 1/3 in Fig. 10), the random generalization phase is maximally bounded by the phase line 2x + y + 1 = 0. When the informator strength [pic] increases, classification becomes easier and the random generalization phase narrows. For informator strengths [pic] above 1 – (/2 (equals ¾ in Fig. 10), the entire space has perfect generalization.

It is interesting to note the presence of a kink for [pic] lying between (1 – ()/(1 + () and 1 – ( (the example of 0.4 in Fig. 10). It arises from the transition between the white and gray regimes. Indeed, the boundary y = –(x separating the two regimes passes through the kink, causing a discontinuity in the slope of the transition line between perfect and random generalization.

3. Backgrounds May Interfere or Assist Classification

In the gray regime, the probability for the occurrence of a symptom at a given background bit in the training set is of the order of Pp0. Dominantly, weights wki feeding out from these background bits have values ln(1/P). All the other background bits, recording no occurrence of symptoms in the training set, remain at the very negative value of ln (. Since the number of background bits activated by a test example is of the order p0N, the contribution of the background bits to the activation zk of an output class is of the order Pp02 N ln(1/P(), after subtracting the common background values of ln (.

This should be compared with the contribution to the activation due to the informators. In the gray regime, dominantly, all informators record occurrence of symptoms in a fraction of p0( of the training examples. Weights wki feeding from them are of the order ln(p0(). When a test example is presented, the number of informators activated is of the order p0(C. Hence the contribution of informators to the activation zk of an output class is of the order p0(C ln(p0(/(), after subtracting the common background values of ln (.

When the background bits make a smaller contribution than the informators, generalization is performed by the information extracted form the informators, and background bits are only interfering the task, since they send signals to all output classes. It is not surprising that classification becomes more difficult when the error rate p0 (or its exponent x) increases. However, on further increase in p0 or x, the background bits become numerous enough, so that their information traces from the background bits accumulate in the contribution to the activation of an output class. The background bits are assisting the classification, and the task becomes easier when p0 or x increases.

We therefore conclude that there is a role reversal of the background bits from interference to assistance when p0 or x increases. This happens when the contributions from the background bits and informators to the activation are comparable, namely, Pp02 N ~ p0(C, where the logarithmic terms can be neglected. In terms of the exponents, role reversal occurs at the line (2 – ()x + y + 1 – ( = 0. We expect a drop in generalization around this line.

This drop in generalization is most marked when Pp02 N ~ p0(C ~ N0. In terms of the exponents, this occurs for certainty informators ([pic] = 1) at the point (x, y) = (– ½, 0) when the number of informators scales as N0 (( = 0). As we will see, this accounts for the performance depression for finite values of N observed in simulations.

[pic]

Figure 10. Phase diagram for the Bayesian classifier at informator frequency ( = 0.5 for several values of informator strength [pic]. Below and above the lines, the classifier is in the random generalization and perfect generalization (denoted by PG) phase respectively.

4. Weak Informators May Misinform

This is observed in the Hebb classifier, where generalization may become impossible if the error rates are too low, even for infinitely large training sets. This threshold error rate exists at intermediate informator strengths, namely for (1 – ()/3 < [pic] < 1 – (. This is because the informators help classification when they are strong, and are irrelevant when they are weak. When their strength is intermediate, they are relevant but may misinform the classifier. We may say that 1 – ( < [pic] < 1 is the true informator phase, (1 – ()/3 < [pic] < 1 – ( the misinformator phase, and 0 < [pic] < (1 – ()/3 the non-informator phase. On the other hand, generalization in the Bayesian classifier is always possible for sufficiently large training sets, irrespective of the informator strength.

E. SIMULATIONS

1. No Informator

To study finite-size corrections to the large N limit, we perform Monte Carlo simulations of the classifiers using an input dimension of N = 100 and K = 4 output classes. For comparison, we first consider the predictions of the large N theory for the Bayesian and Hebb classifiers. With no informators (C = 0 or ( ( – (), the SNR exponent E of both classifiers becomes 2x + y + 1 in the white and gray regimes, and x + 1 in the black regime. In both classifiers, the behavior changes from random to perfect generalization when x or y increases across the line 2x + y + 1 = 0. The only exception is that at x = –1 in the black regime, fg tends to 0.39 instead of 1, which is thus a singular line.

As shown in Fig, 11(a) for the case of no informators (C = 0) using the Hebb classifier, the region of poor generalization decreases with p0, agreeing with the theory. The asymptotic generalization drops as x tends to –1, in agreement with the large N prediction that x = –1 is a singular line.

2. One Informator

For one informator (C = 1 or ( = 0) with certainty ([pic] = 1), generalization of the Bayesian classifier is perfect in the entire space, but marginal along the singular line 2x + y + 1 = 0, where fg tends to 0.55, but 1 elsewhere. This dip in generalization is a consequence of the role reversal between the informators and backgrounds explained in Section V.D.3. By comparison, marginal generalization in the Hebb classifier only exists at the singular point (x, y) = (–½ , 0), where fg tends to 0.55, but to 1 elsewhere.

Fig. 11(b) shows the case with one informator (C = 1) classified by the Hebb classifier. Comparing with Fig. 11(a), the region of poor generalization is much reduced. This confirms that the informators provide a significant assistance to the classification task. The initial generalization drops near x = –½, reflecting the large N prediction that it is a singular point.

Fig. 11(c) shows the same case using the perceptron classifier. The generalization performance is similar to that of the Hebb classifier. A detailed comparison reveals that the asymptotic values improve slightly over the Hebbian results.

Fig. 11(d) shows the case with one informator using the Bayesian classifier. Comparing with Figs. 11(b-c), generalization is poorer for intermediate training set sizes in the present case, in agreement with the line of marginal generalization predicted by role reversal of the background bits in the large N theory.

For the CART-perceptron hybrid classifier, no analytical results are worked out. Turning to the simulation results in Fig. 12 [136], we see that in the case of no informators in Fig. 12(a), its performance is even worse than that of the perceptron classifier shown in Fig. 11(c). Since there are no informative bits in the model, CART takes no advantages of preprocessing the data.

The curves in Fig. 12(b) are interesting. For few background bits at small p0, the classifier needs only a few training examples for perfect generalization, which is most satisfactory among the series of results presented here. It is because CART can extract the informator of the data and produce high quality rules for subsequent processing with the perceptron layer.

However, when p0 increases further, the results are unsatisfactory especially when the training examples are few. It is probably because the background bits are too active for CART to accurately extract the relevant informators.

We also perform simulations using uncertain informators with [pic] < 1, and confirm that a task lacking informators with high certainty is not a suitable application of the CART-perceptron classifier.

F. IMPLICATIONS TO CHOICE OF CLASSIFIERS

By studying the phase diagram in the space of background error rate and training set size we have demonstrated, both analytically and simulationally, that the presence of informators makes the classification task easier. When the informators are not too strong, the data is relatively uniform. Hence the Bayesian classifier performs better than other classifiers such as Hebb, as illustrated by the absence of the misinformator phase therein. However, the Bayesian classifier is not always optimal, as shown in both the theory and the large N limit, and simulations for finite size networks.

When the training examples are not sufficient, the Bayesian probabilities are not estimated accurately. This is especially valid in diagnostic models such as ours, in which the probability of occurrence differs largely between the informators and backgrounds.

On the other hand, a Hebb or a perceptron classifier is preferable in extracting the informator features with relatively few examples, though they are not necessarily optimal in the asymptotic limit of enormous examples.

We also confirm that the CART-perceptron hybrid classifier works best when there are strong informative bits and few background bits. This condition is exactly the same as that of fault diagnosis of the telephone system discussed in Section IV. Hence, this accounts for the success of the hybrid network approach to the problem. However, the same successful network may not be suitable in applications where strong informators are absent, or backgrounds are too active. The complementary dependence on the informator strength implies that a combination of all these classifiers may be useful in a wider range of applications.

[pic][pic]

[pic][pic]

Figure 11. Simulations of the generalization performance with no informators for (a) the Hebb classifier, and with one certain informator for (b) the Hebb classifier, (c) the perceptron classifier, (d) the Bayesian classifier.

[pic]

Figure 12. Simulations of the generalization performance of the CART-perceptron hybrid classifier for (a) no informators, and (b) one certain informator. The dotted lines indicate the position where the generalization performance is at mid-value between the initial and asymptotic values.

VI. CONCLUSION

We have surveyed the applications of neural networks in telecommunications systems, illustrating their usage with the two examples of overload control and fault diagnosis. While the technique has been extensively adopted in all levels of the system, the considerations involved in these applications are typical in many aspects, and exemplify the care needed to make them perform.

In the first example, we use a group of neural networks to implement a distributed overload control by learning the control actions of an optimal centralized teacher. This avoids two complications in the centralized control method, namely, the complex optimization which scales with the size of the network, and the overloading of the signaling network caused by the collection of network status information and the dissemination of control decisions. On the other hand, each of the neural network controllers learn to infer its own control action as part of the globally optimal decision, thus achieving a cooperative control even without global coordination. This is in contrast to the traditional local controllers, which base their decisions on strictly locally optimal objectives.

In the implementation of this concept, we have exercised care in a number of aspects. First, examples are generated by a teacher with globally optimal performance. The high quality of the examples is ensured by undertaking a sequence of linear programming for each network condition. Second, we have preprocessed the data to form inputs which are most relevant to the geometry of locating the optimal point in the solution space. Third, we have used the radial basis function architecture, so that each RBF center approximates the teacher control function within its range of loading level, and the combination provides a smooth interpolation among them all. Simulations show that the method is successful in both steady overload and traffic upsurges.

We have applied the same method to the dynamic routing of teletraffic in hierarchical networks [24,137]. First, a training set of globally optimal examples is obtained by a sequence of linear programming which aims at breaking the degeneracy of the optimal solution until no ambiguity is left. Then a group of neural networks, each located at an originating node, learn the teacher control function through the examples, and infer the optimal routing ratios as part of the globally optimal control, based on locally available information. The method yields blocking and crankback rates comparable to the centralized controller. Although we subsequently found a heuristic method which outperforms the neural controllers in hierarchical networks [137], it is possible that the neural controllers may retain their advantages in networks with higher complexities. Further studies are needed.

In general, many control problems in telecommunications systems share the same features of a complicated centralized control versus a sub-optimal local control. Examples are dynamic packet routing in computer networks, dynamic channel allocation in wireless networks, multiservice call admission control in wireless networks, and dynamic call admission control and routing in ATM networks. Neural network control provides an attractive solution to them.

In the second example, we use a hybrid classifier consisting of a rule-based preprocessing layer and a perceptron output layer. The rule-based layer plays the role of feature extraction and dimensionality reduction, easing the classification task of the perceptron layer. That the hybrid network performs better than each individual component again illustrates the importance of preprocessing the data. We remark that the hybrid classifier works best when the components have complementary features. For example, we found that CART and Bayesian classifiers are incompatible, probably because their common decisive behavior offers no complementary advantages on hybridization.

We reason that a diagnostic problem differs from a typical pattern recognition problem in having a non-uniform distribution of information in the data. Some input bits are more informative or appear in high correlation, thus reducing the applicability of conventional classification techniques. Based on these characteristics, we further propose the informator model of diagnostic data.

Both analytical and simulational results show that the presence of informators makes the classification task easier, as evident from the reduction in the random generalization region. However, it is difficult to find a universal technique that can give superior performance on all problems, again illustrating the importance of carefully choosing the right classifier. For example, while the Bayesian classifier works perfectly in the asymptotic regime of many examples, it does not perform well in the regime of few examples, which does not allow the Bayesian probabilities to be estimated precisely. It also deteriorates quickly when there are correlations in the input bits.

When the case of insufficient training is taken into consideration, we find that the CART-perceptron classifier performs especially well when there are strong informative bits and few background noise in the data model, where the CART takes advantages of extracting the informative data for subsequent network processing.

For that reason, we conclude that different problems may require specialized techniques for good performance, and the use of classification trees as data preprocessing for a percpetron classifier is found applicable to diagnostic problems.

ACKNOWLEDGEMENTS

We thank the Research Grant Council of Hong Kong for partial support (grant no. HKUST6157/99P).

REFERENCES

1] J. Hertz, A. Krogh and R. G. Palmer, Introduction to the Theory of Neural Computation, Addison-Wesley, Redwood City (1991).

2] J. Alspector, R. Goodman and T. X. Brown, eds., Applications of Neural Networks to Telecommunications, Lawrence Erlbaum, Hillsdale (1993).

3] J. Alspector, R. Goodman and T. X. Brown, eds., Applications of Neural Networks to Telecommunications 2, Lawrence Erlbaum, Hillsdale (1995).

4] J. Alspector, R. Goodman and T. X. Brown, eds., Applications of Neural Networks to Telecommunications 3, Lawrence Erlbaum, Hillsdale (1997).

5] IEEE Journal on Selected Areas in Communications vol. 15, no. 2 (1997).

6] IEEE Journal on Selected Areas in Communications vol. 18, no. 2 (2000).

7] S. H. Bang, B. J. Sheu and J. Choi, “Programmable VLSI neural network processors for equalization of digital communication channels”, 1-12 in [2] (1993).

8] J. Cid-Suerio and A. R. Figueiras-Vidal, “Improving conventional equalizers with neural networks”, 20-26 in [2] (1993).

9] T. X. Brown, “Neural netwroks for adaptive equalization”, 27-33 in [2] (1993).

10] M. Meyer and G. Pfeiffer, “Multilayer perception based equalizers applied to nonlinear channels”, 188-195 in [2] (1993).

11] M. K. Sönmez and T. Adali, “Channel equalization by distribution learning: the least relative entropy algorithm”, 218-224 in [2] (1993).

12] M. J. Bradley and P. Mars, “Analysis of recurrent networks as digital communication channel equalizer”, 1-8 in [3] (1995).

13] D. S. Reay, “Nonlinear channel equalization using associative memory neural networks”, 17-24 in [3] (1995).

14] A. Jayakumar and J. Alspector, “Experimental analog neural network based decision feedback equalizer for digital mobile radio”, 33-40 in [3] (1995).

15] Q. Gan, N. Sundararajan, P. Saratchandran and R. Subramanian, “Equalisation of rapidly time-varying channels using an efficient RBF neural network”, 224-231 in [4] (1997).

16] E. Dubossarsky, T. R. Osborn and S. Reisenfeld, “Equalization and the impulsive MOOSE: Fast adaptive signal recovery in very heavy tailed noise”, 232-240 in [4] (1997).

17] K. Raivio, J. Henrikson and O. Simula, “Neural receiver structures based on self-organizing maps in nonlinear multipath channels”, 241-247 in [4] (1997).

18] T. X. Brown, “Neural Networks for Switching”, IEEE Communications Magazine 27, 72-80 (1989).

19] S. Amin and M. Gell, “Constrained optimization for switching using neural networks”, 106-111 in [2] (1993).

20] Y. K. Park, V. Cherkassky and G. Lee, “ATM cell scheduling for broadband switching systems by neural network”, 112-118 in [2] (1993).

21] Y. K. Park and G. Lee, “NN based ATM cell scheduling with queue length-based priority scheme”, 261-269 in [5] (1997).

22] A. Varma and R. Antonucci, “A neural-network controller for scheduling packet transmissions in a crossbar switch”, 121-128 in [3] (1995).

23] A. Murgu, “Adaptive flow control in multistage communications networks based on a sliding window learning algorithm”, 112-120 in [3] (1995).

24] W. K. F. Lor and K. Y. M. Wong, “Decentralized neural dynamic routing in circuit-switched networks”, 137-144 in [3] (1995).

25] P. Campbell, A. Christiansen, M. Dale, H. L. Ferrá, A. Kowalczyk and J. Szymanski, “Experiments with simple neural networks for real-time control”, 165-178 in [5] (1997).

26] A. Christiansen, A. Herschtal, M. Herzberg, A. Kowalczyk and J. Szymanski, “Neural networks for resource allocation in telecommunication networks “, 265-273 in [4] (1997).

27] S. Wu and K. Y. M. Wong, “Overload control for distributed call processors using neural networks”, 149-156 in [4] (1997).

28] S. Wu and K. Y. M. Wong, “Dynamic overload control for distributed call processors using the neural network method”, IEEE Trans. of Neural Networks 9, 1377-1387 (1998).

29] B. de Vries, C. W. Che, R. Crane, J. Flanagan, Q. Lin and J. Pearson, “Neural network speech enhancement for noise robust speech recognition “, 9-16 in [3] (1995).

30] S. Frederickson and L. Tarassenko, “Text-independent speakers recognition using radial basis functions”, 170-177 in [3] (1995).

31] N. Kasabov, “Hybrid environments for building comprehensive AI and the task of speech recognition” 178-185 in [3] (1995).

32] E. Barnard, R. Cole, M. Fanty and P. Vermeulen, “Real-world speech recognition with neural networks” 186-193 in [3] (1995).

33] M. C. Yuang, P. L. Tien and S. T. Liang, “Intelligent video smoother for multimedia communications” 136-146 in [5] (1997).

34] R. A. Bustos and T. D. Gedeon, “Learning synonyms and related concepts in document collections” 202-209 in [3] (1995).

35] T. D. Gedeon, B. J. Briedis, R. A. Bustos, G. Greenleaf and A. Mowbray, “Query word-concept clusters in a legal document collection” 189-197 in [4] (1997).

36] H. Liu and D. Y. Y. Yun, “Self-organizing finite state vector quantization for image coding” 176-182 in [2] (1993).

37] T. D. Chieuh, T. T. Tang and L. G. Chen, “Vector quantization using tree-structured self-organizing feature maps” 259-265 in [2] (1993).

38] F. Mekuria and T. Fjällbrant, “Neural networks for efficient adaptive vector quantization of signals” 218-225 in [3] (1995).

39] S. Carter, R. J. Frank and D. S. W. Tansley, “Clone detection in telecommunications software systems: a neural net approach” 273-280 in [2] (1993).

40] P. Barson, N. Davey, S. Field, R. Frank and D. S. W. Tansley, “Dynamic competitive learning applied to the clone detection problem” 234-241 in [3] (1995).

41] J. T. Connor, “Predition of access line growth” 232-238 in [2] (1993).

42] C. Giraud-Carrier and M. Ward, “Learning customer profiles to generate cash over the Internet” 165-170 in [4] (1997).

43] M. C. Mozer, R. Wolniewicz, D. B. Grimes, E. Johnson and H. Kaushansky, “Churn reduction in the wireless industry”, Advances in Neural Information Processing Systems 12, S. A. Solla, T. K. Leen, K.-R. Müller, eds., 935-941, MIT Press, Cambridge (2000).

44] A. P. Engelbrecht and I. Cloete, “Dimensioning of telephone networks using a neural network as traffic distribution approximator”, 72-79 in [3] (1995).

45] C. X. Zhang, “Optimal traffic routing using self-organization principle “, 225-231 in [2] (1993).

46] L. Lewis, U. Datta and S. Sycamore, “Intelligent capacity evaluation/planning with neural network clustering algorithms”, 131-139 in [4] (1997).

47] D. B. Hoang, “Neural networks for network topological design”, 140-148 in [4] (1997).

48] G. Wang and N. Ansari, “Optimal broadcast scheduling in packet radio networks using mean field annealing”, 250-260 in [5] (1997).

49] A. Jagota, “Scheduling problems in radio networks using Hopfield networks”, 67-76 in [2] (1993).

50] F. Comellas and J. Ozón, “Graph coloring algorithms for assignment problems in radio networks”, 49-56 in [3] (1995).

51] M. O. Berger, “Fast channel assignment in cellular radio systems”, 57-63 in [3] (1995).

52] M. W. Dixon, M. I. Bellgard and G. R. Cole, “A neural network algorithm to solve the routing problem in communication networks”, 145-152 in [3] (1995).

53] R. M. Goodman and B. E. Ambrose, “Learning telephone network trunk reservation congestion control using neural networks”, 258-264 in [3] (1995).

54] H. I. Fahmy, G. Develekos and C. Douligeris, “Application of neural networks and machine learning in network design”, 226-237 in [5] (1975).

55] C. S. Hood and C. Ji, “An intelligent monitoring hierarchy for network management”, 250-257 in [3] (1995).

56] C. Cortes, L. D. Jackel and W. P. Chiang, “Predicting failures of telecommunication paths: limits on learning machine accuracy imposed by data quality”, 324- 333 in [3] (1995).

57] L. Lewis and S. Sycamore, “Learning index rules and adaptation functions for a communications network fault resolution system”, 281-287 in [2] (1993).

58] M. Collobert and D. Collobert, “A neural system to detect faulty components on complex boards in digital switches”, 334-338 in [3] (1995).

59] R. Goodman and B. Ambrose, “Applications of learning techniques to network management”, 34-44 in [2] (1993).

60] A. Chattell and J. B. Brook, “A neural network pre-processor for a fault diagnosis expert system”, 297-305 in [2] (1993).

61] A. Holst and A. Lansner, “Diagnosis of technical equipment using a Bayesian neural network”, 147-153 in [2] (1993).

62] A. Holst and A. Lansner, “A higher order Bayesian neural network for classification and diagnosis”, 347-354 in [3] (1995).

63] H. C. Lau, K. Y. Szeto, K. Y. M. Wong and D. Y. Yeung, “A hybrid expert system for error message classification”, 339-346 in [3] (1995).

64] T. Sone, “Using distributed neural networks to identify faults in switching systems”, 288-296 in [2] (1993).

65] T. Sone, “A strong combination of neural networks and deep reasoning in fault diagnosis”, 355-362 in [3] (1995).

66] P. Leray, P. Gallinari and E. Didelet, “Local diagnosis for real-time network traffic management”, 124-130 in [4] (1997).

67] H. Wietgrefe, K. D. Tuchs, K. Jobmann, G. Carls, P. Fröhlich, W. Nejdl and S. Steinfeld, “Using neural networks for alarm correlation in cellular phone networks”, 248-255 in [4] (1997).

68] B. P. Yuhas, “Toll-fraud detection”, 239-244 in [2] (1993).

69] J. T. Connor, L. B. Brothers and J. Alspector, “Neural network detection of fraudulent calling card patterns”, 363-370 in [3] (1995).

70] S. D. H. Field and P. W. Hobson, “Techniques for telecommunications fraud management”, 107-115 in [4] (1997).

71] M. Junius and O. Kennemann, “Intelligent techniques for the GSM handover process”, 41-48 in [3] (1995).

72] J. Biesterfeld, E. Ennigrou and K. Jobmann, “Neural networks for location prediction in mobile networks”, 207-214 in [4] (1997).

73] K. Smith and M. Palaniswami, “Static and dynamic channel assignment using neural networks, 238-249 in [5] (1997).

74] S. Singh and D. Bertsekas, “Reinforcement learning for dynamic channel allocation in cellular telephone systems”, Advances in Neural Information Processing Systems 9, M. C. Mozer, M. I. Jordan, T. Petsche, eds., 974-980, MIT Press, Cambridge (1997).

75] E. J. Wilmes and K. T. Erickson, “Reinforcement learning and supervised learning control of dynamic channel allocation for mobile radio systems”, 215-223 in [4] (1997).

76] A. Hiramatsu, “ATM communications network control by neural networks”, IEEE Trans. on Neural Networks, 1, 122-140 (1990).

77] A. Hiramatsu, “Integration of ATM call admission control and link capacity control by distributed neural networks”, IEEE J. Selected Areas in Commun. 9, 1131-1138 (1991).

78] S. A. Youssef, I. W. Habib and T. N. Sadaawi, “A neurocomputing controller for bandwidth allocation in ATM Networks”, 191-199 in [5] (1997).

79] T. X. Brown, “Adaptive access control applied to Ethernet data”, Advances in Neural Information Processing Systems 9, M. C. Mozer, M. I. Jordan, T. Petsche, eds., 932-938, MIT Press, Cambridge (1997).

80] C. K. Tham and W. S. Soh, “ATM connection admission control using modular neural networks”, 71-78 in [4] (1997).

81] M. B. Zaremba, K. Q. Liao, G. Chan and M. Gaudreau, “Link bandwidth allocation in multiservice networks using neural technology”, 64-71 in [3] (1995).

82] E. Nordström and J. Carlström, “A reinforcement learning scheme for adaptive link allocation in ATM networks”, 88-95 in [3] (1995).

83] O. Gällmo and L. Asplund, “Reinforcement learning by construction of hypothetical targets”, 300-307 in [3] (1995).

84] P. Marbach, O. Mihatsch and J. N. Tsitiklis, “Call admission control and routing in integrated services networks using neuro-dynamic programming”, 197-208 in [6] (2000).

85] H. Tong and T. X. Brown, “Adaptive call admission control under quality of service constraints: a reinforcement learning solution”, 209-221 in [6] (2000).

86] J. Carlström and E. Nordström, “Control of self-similar ATM call traffic by reinforcement learning”, 54-62 in [4] (1997).

87] A. D. Estrella, E. Casilari, A. Jurado and F. Sandoval, “ATM traffic neural control: Multiservice call admission and policing function”, 104-111 in [3] (1995).

88] A. Faragó, M. Boda, H. Brandt, T. Henk, T. Trón and J. Bíró, “Virtual lookahead – a new approach to train neural nets for solving on-line decision problems”, 265-272 in [3] (1995).

89] I. Mahadevan and C. S. Raghavendra, “Admission control in ATM networks using fuzzy-ARTMAP”, 79-87 in [4] (1997).

90] Y. Liu and C. Douligeris, “Rate regulation with feedback controller in ATM networks – a neural network approach”, 200-208 in [5] (1997).

91] A. Pitsillides, Y. A. Şekercioğlu and G. Ramamurphy, “Effective control of traffic flow in ATM networks using fuzzy explicit rate marking (FERM), 209-225 in [5] (1997).

92] A. Murgu, “Fuzzy mean flow estimation with neural networks for multistage ATM systems”, 27-35 in [4] (1997).

93] Z. Fan and P. Mars, “Dynamic routing in ATM networks with effective bandwidth estimation by neural networks”, 45-53 in [4] (1997).

94] A. Faragó, J. Bíró, T. Henk and M. Boda, “Analog neural optimization for ATM resource management”, 156-164 in [5] (1997).

95] T. X. Brown, “Bandwidth dimensioning for data traffic”, 88-96 in [4] (1997).

96] T. Edwards, D. S. W. Tansley, R. J. Frank and N. Davey, “Traffic trends analysis using neural networks”, 157-164 in [4] (1997).

97] E. Casilari, A. Reyes, A. D. Estrella and F. Sandoval, “Generation of ATM video traffic using neural networks”, 19-26 in [4] (1997).

98] E. Casilari, A. Jurendo, G. Pansard, A. D. Estrella and F. Sandoval, “Model generation of aggregate ATM traffic using a neural control with accelerated self-scaling”, 36-44 in [4] (1997).

99] A. A. Tarraf, I. W. Habib and T. N. Sadaawi, “Neural networks for ATM multimedia traffic prediction”, 85-91 in [2] (1993).

100] J. E. Neves, L. B. de Almeida and M. J. Leitão, “ATM call control by neural networks”, 210-217 in [2] (1993).

101] E. Gelenbe, A. Ghanwani and V. Srinivasan, “Improved neural heuristics for multicast routing”, 147-155 in [5] (1997).

102] Z. Ali, A. Gafoor and C. S. G. Lee, “Media synchronization in multimedia web environment using a neuro-fuzzy framework”, 168-183 in [6] (2000).

103] F. Davoli and P. Maryni, “A two-level stochastic approximation for admission control and bandwidth allocation”, 222-233 in [6] (2000).

104] C. J. Chang, B. W. Chen, T. Y. Liu and F. C. Ren, “Fuzzy/Neural congestion control for integrated voice and data DS-CDMA/FRMA cellular networks”, 283-293 in [6] (2000).

105] G. Cybenko, “Approximation by superpositions of a sigmoid function”, Mathematics of Control, Signals and Systems 2, 303 (1989).

106] J. Boyan and M. L. Littman, “Packet routing in dynamically changing networks: a reinforcement learning approach”, Advances in Neural Information Processing Systems 6, J. Cowan, G. Tesauro, J. Alspector, eds., 671-678, Morgan Kaufmann, San Francisco (1994).

107] S. Choi and D. Y. Yeung, “Predictive Q-routing: a memory-based reinforcement learning approach to adaptive traffic control”, Advances in Neural Information Processing Systems 8, D. Touretzky, M. C. Mozer, M. E. Hasselmo, eds., 945-951, MIT Press, Cambridge (1996).

108] L. Hérault, D. Dérou and M. Gordon, “New Q-routing approaches to adaptive traffic control”, 274-281 in [4] (1997).

109] T. X. Brown, “Low power wireless communication via reinforcement learning”, Advances in Neural Information Processing Systems 12, S. A. Solla, T. K. Leen, K.-R. Müller, eds., 893-899, MIT Press, Cambridge (2000).

110] C. M. Bishop, Neural Networks for Pattern Recognition, Clarendon Press, Oxford (1995).

111] S. Amari, Differential-Geometrical Methods in Statistics, Springer-Verlag, New York (1985).

112] J. J. Hopfield, “Neural networks and physical systems with emergent computational abilities”, Proc. Natl. Acad. Sci. U.S.A. 79, 2554-2558 (1982).

113] D. E. Rumelhart, G. E. Hinton and R. J. Williams, “Learning internal representations by error propagation”, Parallel Distributed Processing: Explorations in Microstructure of Cognition 1, 318-362, MIT Press, Cambridge (1988).

114] J. Moody and C. J. Darken, “Fast learning in networks of locally-tuned processing units”, Neural Computation 1, 281-294 (1989).

115] J. Park and I. W. Sandberg, “Universal approximation using radial basis function networks”, Neural Computation 3, 246-257 (1991).

116] A. P. Dempster, N. M. Laird and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm”, Journal of the Royal Statistical Society B 39, 1-38 (1977).

117] B. E. Boser, I. M. Guyon and V. N. Vapnik, “A training algorithm for optimal margin classifier”, Proc. 5th ACM Workshop on Computational Learning Theory, 144-152 (1992).

118] V. Vapnik, The Nature Of Statistical Learning Theory, Springer-Verlag, New York (1995).

119] F. Girosi, M. Jones and T. Poggio, “Regularization theory and neural network architectures”, Neural Computation 10, 1455-1480 (1998).

120] N. Christianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel Based Methods, Cambridge Univ. Press, Cambridge (2000).

121] S. Geman, E. Bienenstock and R. Doursat, “Neural networks and the bias/variance dilemma”, Neural Computation 4, 1-58 (1992).

122] M. G. Bello, “Enhanced training algorithms, and integrated training/architecture selection for multilayer perceptron networks”, IEEE Trans. Neural Networks 3, 864-875 (1992).

123] M. C. Mozer and P. Smolensky, “Skeletonization: a technique for trimming the fat from a network via relevance assessment”, Advances in Neural Information Processing Systems 1, 107-115, Morgan Kaufmann, San Mateo (1989).

124] S. Amari, “Natural gradient works efficiently in learning”, Neural Computation 10, 252-276 (1998).

125] W. H. Press, S. A. Teukolsky, W. T. Vwtterling and B. P. Flannery, Numerical Recipes in C: The Art of Scientific Computing (2nd ed.), Cambridge University Press, Cambridge (1992).

126] P. Hanselka, J. Oehlerich and G. Wegmann, “Adaptation of the overload regulation method stator to multiprocessor controls and simulation results”, ITC-12, 395-401 (1989).

127] D. Manfield, B. Denis, K. Basu and G. Rouleau, “Overload control in a hierarchical switching system”, ITC-11, 894-900 (1985).

128] M. Villen-Altamirano, G. Morales-Andres and L. Bermejo-Saez, “An overload control strategy for distributed control systems”, ITC-11, 835-841 (1985).

129] J. S. Kaufman and A. Kumar, “Traffic overload control in a fully distributed switching environment”, ITC-12, 386-394 (1989).

130] M. J. Best and K. Ritter, Linear Programming: Active Set Analysis and Computer Programs, Prentice-Hall, Englewood Cliffs (1985).

131] K. Stokbro, D. K. Umberger and J. A. Hertz, “Exploiting neurons with localized receptive fields to learn chaos”, Complex Syst. 4, 603-622 (1990).

132] R. M. Goodman, C. M. Higgins, J. W. Miller and P. Smyth, “Rule-based neural networks for classification and probability estimation”, Neural Computation 4, 781-803 (1992).

133] L. Breiman, J. H. Friedman, R. A. Olshen and C. J. Stone, Classification and Regression Trees, Wadsworth, Pacific Grove (1984).

134] J. Sklansky and G. N. Wassel, Pattern Classifiers and Trainable Machines, Springer-Verlag, New York (1981).

135] K. Y. M. Wong and H. C. Lau, “Neural network classification of non-uniform data”, Progress in Neural Information Processing, S. Amari, L. Xu, L. W. Chan, I. King, K. S. Leung, eds., 242-246, Springer, Singapore (1996).

136] H. C. Lau, “Neural network classification techniques for diagnostic problems”, MPhil Thesis, HKUST (1995).

137] W. K. Au, “Conventional and neurocomputational methods in teletraffic routing”, MPhil Thesis, HKUST (1999).

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download