Proceedings Template - WORD



A4 Simulator

Junwei Cao, Darren J. Kerbyson, and Graham R. Nudd

Department of Computer Science, University of Warwick, Coventry, UK



{junwei, djke, grn}@dcs.warwick.ac.uk

Abstract

A4 (Agile Architecture + Autonomous Agents) is a philosophy for building large-scale distributed software systems with high dynamics. A4 model includes three layers: hierarchical model, discovery model, and coordination models. Service advertisement and discovery is one of the most essential parts for the A4 model to implement complex software applications with high performance.

A4 Simulator is a modeling and simulation environment to aid the performance optimisation of the service advertisement and discovery in the practical A4 system. A4 Simulator supports all kinds of the performance optimisation strategies and can output several kinds of simulation results on different high-performance criterions. It also supports the service reconfiguration for the agent mobility. A4 Simulator is developed using Java and has friendly graphical user interfaces. A case study is also introduced in this work to show the details of the usage of the A4 Simulator.

1 Introduction

It has become universally accepted that future distributed software systems (DSS) should be a collection of the components with the interactions between them. A4 (Agile Architecture + Autonomous Agents) is a concrete extension of this idea. The components are not only computational, but also intelligent and autonomous, which can be abstracted to be an agent. The interactions between them, which can be described in the software architecture, are not static, but dynamic and agile. A4 aims to provide the methodology to meet two main requirements of the future DSS.

• Scalability. Large-scale DSS may contain millions of agents. Each agent has its own motivation, function, resource and environment. They are not predetermined to work together. The are even developed by different groups. How to organise these agents to cooperate with each other when necessary becomes very difficult, as they may be distributed in a large scope and may not be aware of each other.

• Adaptability. The environment of the system can change from time to time. The adaptation to the environmental changes should be achieved in two levels. Agents can change their identities, functions, interfaces, performance and so on. This is the agent-level adaptability, which is described as the autonomy in A4. The system architecture can also change with the environment dynamics. For example, agents can be added and removed at running time. The coordination style can vary in different situations. This is the architecture-level adaptability, which is described as the agility in A4.

Many tools and infrastructures have been implemented to aid multi-agent system development. The models and motivations of these vary from each other. The most recent works include HiMAT [8], Jackal [7], OAA [19], SIM_AGENT [26], JAFMAS [6]. None of them focus on the problem of scalability and agent coordination. Some of the software engineering research on software architecture [13, 22] and coordination models and languages [18, 21] have been introduced recently to overcome agent coordination problems. But most of the architecture styles and coordination models can not be extended to address the practical problems like scalability, though they have strict theoretical bases.

Much of the research on distributed software systems is not undertaken in the context of agent research. CORBA [17] is taken as middleware of Client/Server system and aims to standardise the distributed object management. Jedi [9] is a Java event-based distributed infrastructure that supports asynchronous and decoupled communication among active objects. Applying these technologies into a multi-agent system may result in the autonomy of agents not being fully considered and the problem of dynamism not being well addressed.

There are many other kinds of distributed system infrastructures that are developed, or under development, to overcome the same difficulties that occur in the research areas such as mobile computing and home networking. Their models and methodologies can be exploited, though their target system may not only be the software system. For example:

• Bluetooth [1]: The Bluetooth protocols allow for the development of interactive services and applications over interoperable radio modules and data communication protocols.

• HAVi [14]: Home Audio-Video interoperability is a specification for home networks of consumer electronics devices such as CD players, TVs, VCRs, digital cameras, and set top boxes.

• JetSend [15]: This technology is an example of a service protocol that allows devices like printers, digital cameras, and PCs to intelligently negotiate information exchange without user intervention.

• Jini [16]: This is a distributed system based on the idea of federating groups of users and the resources, which can be implemented as either hardware devices or software programs.

• Salutation [25]: The Salutation architecture is created to solve the problems of service discovery and utilisation among a broad set of appliances and equipment and in an environment of widespread connectivity and mobility.

• UPnP [27]: Universal Plug and Play is an open network architecture that is designed to enable simple, ad hoc communication among distributed devices and services from different vendors.

Service is considered to be the most important concept within the above architectures. A service is an entity that can be used by a person, a program, or another service. All of these architectures provide protocols for service advertisement and discovery, such as SDP (Service Discovery Protocol) in Bluetooth, lookup service in Jini, SSDP (Simple Service Discovery Protocol) in UPnP and SLP (Service Location Protocol) in Salutation.

The concept of service is also used in the A4 system. Each agent can be viewed as both a service provider and a consumer. A particular service implementation may require other services. Agents can work together by federating other agents that can provide the required services. Most of the above researches on the service discovery focus on the service models and discovery protocols. Performance issues are mentioned but not discussed and analysed in details. Some of the service discovery protocols can not be introduced into large-scale DSS directly. For example, SLP is a non-hierarchical approach to managing local network resources.

In this work, we give a brief introduction of A4 models in the next section, which is an extension of our work presented in [3]. The performance issues rise from the service advertisement and discovery of the A4 model are discussed in Section 3, which are the basis to understand the design and implementation of the A4 Simulator. A4 Simulator is a modeling and simulation environment to aid the performance optimisation of the service advertisement and discovery in the practical A4 system, which is introduced in Section 4 in greater details. A case study is given in Section 5 to illustrate how to use A4 Simulator to aid the design of the DSS. Preliminary conclusions and future works are given in Section 6.

2 A4 Models

An A4 system is a distributed system based on the idea of federating agents, which can provide services, to a large-scale, dynamic, multi-agent system. The A4 models are summarised in Figure 1, which include the hierarchical model, discovery model and coordination model. The hierarchical model describes a simple way that can organise a large number of agents together. The discovery model solves the problem of how to coordinate the services in different agents to find each other, which is on the basis of the hierarchical model. Coordination model focuses on the ways that services can be organised to provide more complex services, which is out of the scope of this work and will not be introduced below.

[pic]

Figure 1. A4 Models

2.1 Hierarchical Model

In this section the hierarchical model of the agent system is introduced as the basis of understanding the service advertisement and discovery mechanism that is described in the next section.

The hierarchical model is illustrated in Figure 1. The single type of the components that composes the system is the agent. There is no agent that has more special purposes or functions than the others. Every agent can be the router between the request and the service. However, in Figure 1 we use different terms to differentiate the level of the agents. The broker is the agent that heads the whole hierarchy, which maintains all the service information of the system. The coordinator is the agent that heads a sub-hierarchy. Only the leaf-node of the hierarchy is named as an agent.

[pic]

Figure 2. Hierarchical Model

In the hierarchical model, when a new agent wants to join the system, it will broadcast to find its nearest existing agent. An agent can only have one upper agent to register with and be registered by many lower agents. All the requests that enter a sub-hierarchy must arrive at the coordinator of the sub-hierarchy first and then be dispatched to the lower agents. From the view of service providers, a sub-hierarchy is the same as an agent.

If an agent has the required service information, it can contact the target agent directly. Otherwise, it must search its local agents or ask its upper agent to discovery the agent that can provide the service. The lower or upper agent can also ask other agents for assistance until the service information is finally found and returned to the original agent. The agent can then connect directly to the target agent to ask for the service. All the connections between the agents are broken after the task is finished.

All the agents are able to change their services from time to time. The identity of an agent may also change when it moves from one host to the other. The dynamism increases the difficulty of service discovery. The most essential problem is how an agent advertises its services and coordinates with the other agents to find the required services in the most efficient way.

2.2 Discovery Model

In this section, we discuss on how services are advertised and discovered in the A4 system with the above hierarchical model. The discovery model includes service advertisement and service discovery.

• Service Advertisement. The service information of an agent can be advertised along the hierarchy upwards or downwards. There are different strategies to decide when and how to advertise a service with different performance.

• Service Discovery. When an agent want to require a service, it will check its own knowledge to find whether it has the related service information. If it has, it will contact the target agent, or it may contact its upper agent until the defined discovery scope is reached.

The main data type for the advertisement and discovery is the Agent Capability Table (ACT). Different strategies can use different kinds of ACTs. The process of the service advertisement and discovery is the process of editting and lookup the ACTs. The basic structure of an ACT item includes two parts: agent identity and service information. The service information can be different according to different system. For example, simple service information can include the service name, its performance value, its valid time, its scope limitation, and so on. Complex service information can include the service interface information.

Some other works on service discovery focus more on the protocols. However, we discuss more on performance issues. Let's consider about two extreme situations.

• No service advertisement, then complex service discovery. In this discovery model, no processes for the service advertisement are performed, so no ACTs are maintained in the agents. Each agent has absolutely no knowledge on the services of the other agents. Each service discovery process is possible to be so complex to traverse all of the agents in the system. The service information is pulled from the agents at discovery time, so we call this as pure “data-pull” model.

• Full service advertisement, then no service discovery. In this discovery model, the services of one agent are advertised to all the other agents. Each agent has complete knowledge on the services of the other agents, then no service discovery process is needed and the target agent can be found at once in the agent that sends the request. The service information has been pushed to the other agents during the advertising processes, so we call this as pure “data-push” model.

Different systems can use different discovery models to achieve high performance. For examples, static systems, where frequency of service information change is far less than the frequency of the request for the service, can use pure data-push model to achieve high performance service discovery. Extreme dynamic systems, where the frequency of the service information change is far more than the frequency of the request, can use pure data-pull model to achieve high performance. Most practical systems should found a middle point between these. The performance related issues will be discussed in greater details in the section below.

3 High Performance Service Discovery

Service discovery in large-scale DSS with high dynamics is difficult because the additional workload may be very high. In fact, different systems may have different kinds of performance requirements. There are also many kinds of performance optimisation strategies that can be used to meet these requirements. In this section, several common accepted criteria for high performance are introduced firstly. Several performance optimisation strategies are also given in details.

3.1 Performance Criteria

There are different kinds of performance criteria that can be used to describe the performance of the service discovery model of the system. What can be called high performance depends actually on the requirements of the system. However, there are some common characteristics of the system that are usually the most concerned by the system developer.

• Quick discovery

Each request from one agent can pass one or several agents to find the target agent that can provide the correspondent service. The efficiency of the discovery process is mainly up to the number of the connections between the routing agents, since the amount of data communication is very small. The less the number of the connections, the quicker the discovery process, and the higher the system performance. In a whole system, there may be many requests for different services at the same time. We can use the total number of the requests r during a certain period, in proportion to the total number of connections for their discovery processes d, to describe the average service discovery velocity v.

[pic] (1)

• High efficiency

The cost of the system for the service discovery also includes the connections made for the service advertisement and the data maintenance. Service advertisement may add additional workload to the system. When we consider the efficiency of the system for service discovery, we must consider the both. For each request to find its correspondent service, the total number of the connections between the agents c includes not only those for the discovery processes d, but also those for the advertising processes a.

[pic] (2)

We can use the total number of the requests r during a certain period, in proportion to the total number of connections c, to describe the average system discovery efficiency e. The higher the system discovery efficiency, the higher the performance.

[pic] (3)

• Load balancing

In some of the systems when the system resources are critical, users may want to keep the workload of different resources to be balanced. In the A4 system, there are no special agents that are used only for service discovery. All the agents have their own design functions. There are no reasons to keep any agents to be with higher discovery workload than the others. In order to describe the workload of each agent, we assume that there are n agents in the system. The outgoing and incoming connection times of an agent is the ok and ik respectively (k = 1……n). Let wk be the sum of the ok and ik, which can be used to describe the workload of each agent.

[pic] [pic] (4)

We can use the mean square deviation of the wk (k = 1……n) to describe the level of the load balancing of the system b.

[pic] (5)

[pic] (6)

• High found ratio

Using some of the performance optimisation strategies may cause that the discovery model can not guarantee to find the target service that actually exists in the system, because users may give up the discovery process if the cost is too high. However, in general system a reasonable service found ratio should always be achieved. We use f to describe the percentage of the found services.

[pic] (7)

In most of the time, these criterions for high performance service discovery are conflictive. For example, quick discovery does not mean high efficiency of the system, because sometimes quick discovery may be achieved through the high workload of the service advertisement and data maintenance, which may lead to low efficiency of the system. The proper method is to find the critical factors of the practical system, and then use difference performance optimisation strategies and adjust them to reach the best point.

3.2 Performance Optimisation Strategies

There are several kinds of performance optimisation strategies. Most of them have been used in many current practical systems with simple assumption that the performance can be optimised. The effects of these strategies are not discussed in details especially when the dynamics of the system increases. The combination of these strategies may also lead to different performance. We will introduce these strategies below firstly.

• Using cache

Caching the previous service discovery results is a good strategy for performance optimisation that assumes the same request may be required more than once. Cached service information can be expressed as C_ACT. When an agent sends a request for the service discovery, the result can be stored in the C_ACT. The next time the request is sent, the agent can look up the information in the C_ACT first. If there is satisfied service information, the agent can contact the target agent to assure it. If the service has been changed and not available any more, the agent may update the C_ACT and perform other service discovery functions.

Many current network applications use caches to optimise the performance. Using cached service information may result in direct service discovery in one step. Another advantage of using cache is that it adds no additional data maintenance workload. However, if the service information changes very frequently compared to the request frequency, and the cached information can not be updated in time, using cache may decrease the service discovery speed. So the efficiency of using cache depends on the real situation of the system.

• Using local and global knowledge

Adding some local or global knowledge to an agent is also a kind of performance optimisation that assumes that local services are more often required by the local agents. Then local requests need less connection times to find the local service because higher level agents need not take part in the routing process. The system load can be also reduced.

In order to coordinate the agents to find the services, two kinds of ACTs can be used in each agent to record the details of the services and their information, which are local ACT (L_ACT) and global ACT (G_ACT).

• L_ACT. Each agent has one L_ACT to record the service information about the agents registered with it. If the request comes, which is within the capabilities of the local agents, the agent may dispatch the request to the target agent.

• G_ACT. An agent may have a copy of L_ACT of its upper agent, which is called G_ACT. An agent can have the information of much more services and contact them directly without submitting the request to the upper agent.

Unlike the C_ACT, additional data maintenance workload is needed for the L_ACT and G_ACT. There are basically three ways to maintain the contents in the L_ACT and G_ACT.

• Pulling. The agent itself can go to pull the correspondent data positively. For L_ACT, the agent can ask its lower agents for their L_ACT frequently. For G_ACT, the agent can ask its upper agent for its L_ACT frequently.

• Pushing. The maintenance of the L_ACT and G_ACT can also be driven by the service change. If contents in one L_ACT of an agent are changed, it may report the change to the L_ACT of its upper agent, and may also inform the change to the G_ACTs of its lower agents.

• Caching. The L_ACT and G_ACT can also be updated in the same way as the C_ACT.

The process for the service discovery using the L_ACT and G_ACT is also different from that using the C_ACT. When an agent receives a request, it will look up its L_ACT. If the agent finds that one of its lower agents can provide the service, it will dispatch the request to the agent. Otherwise, if the agent can not find the service information in its L_ACT, it will look up its G_ACT. If G_ACT shows that another agent can provide the service, it will dispatch the request to that agent. If the agent can not find that agent or there is no information at all in the G_ACT, the agent will ask its upper agent for help. After the upper agent returns the result, it can update its G_ACT and return the result to the agent who originated the request.

The behaviour of the whole system for the service discovery may become very complex because of the dynamism of the system. We provide a formal approach to describe the service discovery process more clearly using rule-based reasoning. A simple case study is also given below.

The formal representation of the problem is summarised in Table 1, which includes the definitions of agents, evaluations, and processes. This is the basis for the rule-based reasoning of system dynamic discovery processes.

|Agents |Ai, (i=1,……,n), one of the agents |

| |s, a given service request |

|Evaluations |l(s), evaluation result of s in L_ACT |

| |g(s), evaluation result of s in G_ACT |

| |l(s), g(s)({Ai (i=1,……,n), null} |

| |null means no service information is available for the request s |

|Processes |Ai(s), Ai processes the request s |

Table 1. Formal Representation

We represent the process for an agent to require a service in a logical way. The rules show the routes for a request from the original agent to reach the target agent though the required service can be changed dynamically. Several basic rules are used, which formalise the service discovery process described above.

• Rule 1: Ai(s) ( Ai ( (l(s), g(s))Ai

• Rule 2: (Athis *)this ( ServiceFound

• Rule 3: (Alower *)this ( Alower (s)

• Rule 4: (null, Aanother)this ( Aanother(s) / Aupper(s)

• Rule 5: (null, null)this ( Aupper(s)

• Rule 6: (null)broker ( NoService

These rules can be organised together to reason the route for a service discovery process. An simple example is used below to illustrate the formal approach. The example shown in Figure 3 is a simple system with three levels. Consider a typical process: A1 sends a request, s, and the service can be provided by A3. But A3 just moved from coordinator C2 to C3, changed its identity to be A4, and the G_ACTs of these agents have not been updated in time.

[pic]

Figure 3. Example System

The equations are shown below. For each step, the evaluation results of all of the ACTs to the request s replace the correspondent parts, (l(s), g(s))Ai, in the process automatically. The number at the end of each line indicates the rule used for the transformation.

[pic] [1]

[pic] [5]

[pic] [1]

[pic] [4]

[pic] [1]

[pic] [5]

[pic] [1]

[pic] [3]

[pic] [1]

[pic] [3]

[pic] [1]

[pic] [2]

Five connections are needed for the A1 to find the required service in A4. In the G_ACT of C1 the service is still recorded to be within the capability of C2. C2 still has to take part in the routing process. The routing process can be simplified if C2 can cache this routing result or the G_ACT of C1 can be updated some time later.

The system can have more than three levels and the services may be changed many times. The system behaviour for service discovery may become much more complex. The formal approach will be more helpful to reason and understand the system behaviour. Modelling and simulation tools can be developed to estimate the system performance, as introduced in the next section.

• Service validation limitation

Another performance optimisation strategy is adding the service validation limitation to the attributes of each service information. The service valid time should be pre-estimated before the service is advertised. When the service information is stored in the ACTs of the agents. The agent can check the records frequently and delete the out-of-date service information in time. This can avoid the unnecessary routing processes and make the discovery process quicker. There is also no additional data maintenance workload. However, the valid time of some services in the system may be unpredictable.

• Scope limitation

The scope that a service can be advertised and discovered can also be pre-defined by add the service scope limitation to the attributes of each service information. The service can only be advertised within a certain scope of the system, which can reduce the advertisement and data maintenance workload. The service can also only be searched within a certain scope, which can avoid unnecessary discovery processes. However, the pre-knowledge about the service and its requests are needed to achieve performance optimisation. The mismatch between the scope limitations of the service and the request may result in the low found ratio.

4 A4 Simulator

The system behaviour will become complex when the different performance optimisation strategies are used in a practical system. It is helpful to model and simulate the performance of the service advertisement and discovery of the system in advance. A4 Simulator has been developed using Java. In this section, we give an introduction of the A4 Simulator, which includes the modeling part and the simulation part. In the next section, we show a concrete example use of this tool.

4.1 Modeling

A4 Simulator can be used to model the practical system. It abstracts an agent to be a request sender and a service provider with unique identity. An A4 model can contain many agents, each with their own services, requests, and performance optimisation strategies. The agents are organised to be a hierarchy. The model can also be saved and reloaded for reuse later. All of the modeling functions can be performed via the graphical user interfaces (GUI) of the tool, which are illustrated in Figure 4. A4 Simulator supports both the agent-level modeling and the model-level modeling.

Agent-level modeling defines the attributes of each agent, which include the basic information, requests, services, and the performance optimisation strategies as shown in Figure 4(b). User can add multiple agents, which have the same attributes, to the model at one time, which can ease the modeling process.

• Basic information of an agent includes its name, description and the name of its upper agent. The name of an agent is the identity of the agent in the system, which must be different from each other. Users can use the description item to record some details of the agent functions. Each agent can choose one upper agent to form a hierarchy. No loop is permitted in the system, which may cause the dead lock during the service discovery process.

• One agent can contain many requests for different services with different frequency. One request item includes the service name, the required performance value, the sending frequency, and the service discovery scope limitation. The service name is essential to each request. The default required performance value is 0. The default sending frequency is 1 time per simulation step. The default scope limitation is “Unlimited”, which means the discovery functions can be performed as far as possible without limitation.

• One agent can also be modeled to provide multiple services. Each service information includes the service name, its performance value, the performance changing frequency, the service validation limitation, and the service advertisement scope limitation. The service name needs not to be unique in the system. Different agents can provide the same service but with different performance. The default performance value of the service is the pre-defined maximum value of the service performance. The default performance changing frequency is “Static”, which means the performance value would never change during the simulation steps. The A4 Simulator currently only supports the random changing of the service performance value. The default validation limitation is “Unlimited”, which means the service is always valid during the simulation The default scope limitation is “Unlimited”, which means the service can be advertised to the top of the hierarchy.

• The A4 Simulator support to model all kinds of the performance optimisation strategies described in the last section. The Simulator provide different configurations, which include whether the agent would use the C_ACT, L_ACT and G_ACT to optimise its performance, how to maintain the data in these ACTs, by pulling or pushing styles, and how often the data are updated. The default date updating frequency is “Never”, which means the data in the L_ACT or G_ACT would be never updated by the pulling style.

The A4 model is composed with the agents, so the agent-level can define all the model attributes that would be used during simulation. However, model-level modeling is also necessary to ease the modeling process, which includes five categories of information: the basic, the requests, the services, the agent mobility and the performance optimisation strategies as shown in Figure 4(c).

[pic]

(a) Main Modeling Pad

[pic] [pic]

(b) Agent Editing Window (c) Model Editing Window

Figure 4. A4 Simulator GUI

• Basic information of a model includes its model name and a brief description of the model. The information is used only for user’s convenience. They are not used when simulating.

• Users can also define requests at the model level. One request item includes the same attributes as those defined in the individual agent. An additional attribute is the distribution of the request. This can be used to ease the definition of the same request in many agents. The user can choose the distribution mode. A4 Simulator currently only support random distribution with different percentage.

• Users can also define services at the model level, which is similar to the model-level request definition.

• A unique feature of the A4 Simulator is that it supports the agent mobility. With the development of the technology like mobile agent and mobile computing, the mobility of the software will increase, which will increase the dynamics of the system extremely. The mobility of the agent means the identity of the agents and its upper agent will change and the system must be reorganised to be a hierarchy. The agent identity change means all of current service information in the system about this agent need to be updated, otherwise, the wrong routing processes may occur. User can define the agent mobility at the model level, which include the original agent identity, the new agent identity, the new upper agent identity and the step number that indicates when the mobility occurs during simulating. The new agent and the upper identity can be ignored, which means the removing the original agent from the system.

• Model-level configuration of the performance optimisation strategies is also supported. If the user want to define all of the agents to have the same performance optimisation strategies, it is not needed to do the agent-level definition one by one. Model-level definition can cover all of the agents with the same strategy configuration.

The user can add, edit and delete agents from the model pad via the main GUI shown in Figure 4(a). In the left column of the main GUI, all of the agents are listed. The brief description of the selected agent can also shown below the agent list. The text field above the agent list can be used to search an agent by its name. The model can also be saved and reloaded for reuse later. The simulation of the service discovery will be processed step by step using the model data.

4.2 Simulation

When the model is started to simulate the service discovery of the system, a thread is created to calculate the related statistical data step by step. A step can be designed as an arbitrary number of seconds. The simulation process can be divided into three main phases, which is illustrated in Figure 5 and introduced in details below.

• Pre-simulation phase

During this phase, the simulation related parameters are firstly initialised. The user is not permitted to modify the model any more, so after the simulator begins simulating, all menus used for modeling are disabled and those for simulation are enabled. The system will make a copy of current model data to a temporary file, because the model may be changed during the simulation. The model-level attributes will be set to each agent automatically by the simulator, for example, the requests and the services may be distributed to the agents in the model. The simulation results are also initialised ready for the simulation. The pre-simulation service advertisement is used to advertise current services in order that the simulation can begin from a normal system with stable states.

• Simulation phase.

When the simulator is ready, it goes into the real simulation phase to implement the simulation of the service advertisement and discovery of the model. The simulation is processed step by step. The total number of the steps is given by the user before the simulation starts. For each step, there are three may phases.

During each step, the simulator first performs some functions related to the service advertisement and data maintenance. The invalid services in the model ACTs will be removed if the service validation limitation is applied. When the agent mobility occurs, the hierarchy will be changed. The related service information will be updated, for example, the services of the agent advertised along the original hierarchy should be removed and re-advertised along the new hierarchy. The performance of some of the services in the model may change frequently, which may need additional advertisement. Each agent may also update their L_ACT and G_ACT as mentioned before as “Pulling”.

The phase for request sending and the service discovery is the key part of the whole simulation process. The request is decided whether to be sent according its frequency. When an agent receives a request, it will check its own service capability, the C_ACT, the L_ACT, and the G_ACT sequentially if the correspondent performance optimisation strategy is applied. If the agent doesn’t maintain the L_ACT, the agent may dispatch the request to one of its lower agent to achieve the local traverse. If the service information can’t be find after searching all above, the agent may submit the request to its upper agent. The discovery will stop in three situations: the service is found in one agent, the top of the hierarchy is reached and the service still cannot be found, the scope limitation of the request is reached and the service still cannot be found. When a connection between the agents happens, the simulator will increase the related simulation data. The mechanisms to avoid the deadlock are also set in this phase. The deadlock of the service discovery means the request is dispatched to the same agents for many times and never reaches the target agent.

[pic]

Figure 5. Simulation Flowchart

Each step during the simulation the simulator will update the results shown in the GUI. The simulator can provide multiple views of the simulation data, which include the each step view, accumulative view, agent view, and log view. Figure 6 shows these four views. The data in these views are all updated in the real time. In the each step view of the Figure 6(a), the simulation data, r, a, d, rf, and the statistic data, v, e, b, f, in each step are shown. In the accumulative view shown in Figure 6(b), the statistic data in the accumulative steps are shown. In the agent view shown in Figure 6(c), the user can find the contents of all the ACTs of the given agent, its each step, accumulative and average view of the data ok, ik, and wk. In Figure 6(d), the log view gives the log file of the simulation, which records all the details of the simulation and can be checked if necessary. Each step the simulation thread will sleep for a while to release the system resource for the result updating.

[pic] [pic]

(a) Each Step View (b) Accumulative View

[pic]

(c) Agent View

[pic]

(d) Log View

Figure 6. Simulation Results

• Post-simulation phase

After the simulation is finished, there are several things needed to do for the simulator to switch from the simulation mode to the modeling mode. The original saved model is retrieved again and the menus are set back. The related parameters in the simulator are also finally set back and the GUIs that show the simulation results are still available for the user until the new simulation begins.

The A4 Simulator also supports multiple models to be simulated at the same time. For example, Figure 6(a) and (b) shows the simulation results for two models together with different colours. Users can simulate models with different configurations and compare their results in a convenient way. In the next section, an example is introduced to illustrate the use of the simulator.

5 A Case Study

In this section, we use A4 methodology to model a simplified environment of the Computational Grid. We simulate the execution of the model to illustrate how to use A4 Simulator to help the user to choose the most proper performance optimisation strategies for the service advertisement and discovery of the system.

5.1 Background

A Computational Grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capability [12]. It may contain many supercomputers connected with high-speed networks with even nation-wide distribution. In the environment, there are also computer ends with poor resource and computational capabilities. However, they can make use of the computational resources of those supercomputers to achieve high performance, which is shown in the Figure below.

[pic]

Figure 7. The Computational Grid

The software infrastructure of the Computational Grid must implement many functions, such as performance evaluation, distributed scheduling, information service, security, accounting and so on. We here only consider a simplified problem. If we take each supercomputer as a high performance computing service provider and the users can require these services to execution high performance applications, how can each request find the suitable service most efficiently? Note that the system is large-scale and the performance of these service providers will vary from time to time. A4 is designed to address this kind of problem.

There are also many grid-oriented software and tools. The resource management subsystems of them provide some solutions of the problem mentioned above, most of which can not meet both the requirements of the scalability and adaptability simultaneously.

• Condor [23]. The matchmaking framework of the Condor uses the matchmaker/entity (which can be both provider and requestor) structure, in which the matchmaker becomes the bottleneck of the system and achieving scalability is difficult, and focuses more on the protocol issues than the performance issues.

• DPSS [2]. DPSS use the broker/agent structure to manage the distributed system. Each agent maintains a continually updated view of the global state of the system, which can not be used to achieve scalability.

• Globus [10]. Globus uses a Metacomputing Directory Service (MDS) [11], which adopts the data representations and API defined by the LDAP directory service [29] and is designed to meet the requirements of both the scalability and dynamic data. The performance issues are also mentioned in its implementation, but with no further discussions.

• Hector [24]. Hector use the master/slave allocators to manage dynamic resource, in which the issues of the scalability are not addressed.

• Legion [5]. In the Resource Management Infrastructure (RMI) of the Legion, the functions and the structure of the collections are almost the same as the information service in the Globus.

• NetSolve [4]. NetSolve uses the agent as a database and a resource broker to implement resource management. The issues on coordinate different agents are not discussed in details.

• Ninf [20]. The Ninf metaserver monitors multiple Ninf computing servers on the network, and performs scheduling and load balancing of client requests, which is similar to the agent in the NetSovle.

• NWS [28]. In the Network Weather Service (NWS), sensor processes are used to gather performance measurements from a specified resource. However, a central name server limits its scalability.

In this section, we give a simple A4 model under the background of the resource management and allocation of the high performance computing. There are two features that differentiate the A4 solution from other models. First, in A4 model, there are no special software entities that function only as the resource managers. All the components in the A4 model, which are abstracted as agents, are all providers and requestors as well as the go-betweens for agent coordination. Second, the A4 model focus more on the performance issues than on the protocols or the data representation.

5.2 Example Model

|Agents |Upper Agent |

|gem |- |

|sprite~0……sprite~49 |gem |

|tup~0……tup~49 |sprite~9 |

|cola~0……cola~49 |sprite~19 |

|tango~0……tango~49 |sprite~29 |

|pepsi~0……pepsi~49 |sprite~39 |

(a) Hierarchy

|Name |Performance |Frequency |Validation |Scope |Distribution |

|HPC |1000 |5 |Unlimited |Top |10% |

|HPC |600 |10 |Unlimited |Top |20% |

|HPC |200 |20 |Unlimited |Top |30% |

(b) Services

|Name |Performance |Frequency |Scope |Distribution |

|HPC |100 |5 |Top |80% |

|HPC |300 |10 |Top |60% |

|HPC |500 |20 |Top |40% |

|HPC |800 |40 |Top |20% |

|HPC |1000 |60 |Top |10% |

(c) Requests

Table 2. Example Model

As shown in Table 2, We use the A4 Simulator to design an example system model, which is composed with about 250 agents. These agents are organised to be a hierarchy, which have three layers. The identity of the root agent is gem. There are 50 agents registered to gem, four of which each also have 50 lower agents. The hierarchy is illustrated in Table 2(a).

To simplifying the model processes, we define the services and requests in the system at the model level. All the services and requests have the same name, but with different performance values. The frequency value of the service 5, for example, means the service performance will change 1 time every 5 steps when simulation between 0 and the performance value. The frequency value of the request 5, for example, means the request will be sent 1 time every 5 steps when simulation. The performance optimisation strategies of the validation and scope limitations are not used in the model. The distribution value is used to define how many agents will be configured the correspondent service or request. The simulator will choose agents randomly to be configured with these model level definitions before simulation begins.

5.3 Simulation Results

Once the model is built up, we can use A4 Simulator to get some simulation results. As we have mentioned above, users can use the simulator to aid to choose the proper performance optimisation strategies. We firstly did some experiments under the different selection of the strategies. The results are shown in Table 3.

|Experiment No. |1 |2 |3 |4 |5 |6 |

|Strategies |Using C_ACT |- |( |( |( |( |( |

| |Using L_ACT |- |- |( |( |( |( |

| |Using G_ACT |- |- |- |( |( |( |

| |Updating L_ACT** |- |- |- |- |( |( |

| |Updating G_ACT** |- |- |- |( |( |( |

| |Advertising L_ACT |- |- |( |( |( |( |

| |Multicasting L_ACT |- |- |- |- |- |( |

|SimulationResult|r |12296 |12355 |12576 |12560 |12645 |11715 |

|s* | | | | | | | |

| |a |0 |0 |5604 |8051 |10172 |285148 |

| |d |65595 |51113 |7435 |6901 |6910 |7056 |

| |v=r/d (*100) |18 |24 |169 |182 |182 |184 |

| |e=r/(a+d) (*100) |18 |24 |96 |84 |74 |4 |

* These are accumulative results after 100 simulation steps.

** Here the updating frequency is 1 time every 10 steps.

Table 3. Simulation Results (I): Strategy Selection

We also choose the strategies at the model level, which means all of the agents in the model must adopt the same performance optimisation strategies. The load balancing and service found ratio are not critical in this model, so we pay more attention to the discovery speed and the system efficiency and try to find a good balance of them.

• In the first simulation, no strategies are selected. The system maintains no data at all. Each time the request arrives, a lot of the connections must be made to traverse the agents to find the satisfied service. In this situation, the discovery speed and system efficiency are both rather low.

• In the second, the cache is used in each agent, which need no extra data maintenance and improve the discovery speed and the system efficiency a little. This is because the dynamics of the services is very high and in most time the cached information become unreliable. The system has to traverse many agents to find the target service.

• L_ACT is added in each agent in the third situation. Each time the service performance changes, the correspondent agent will advertise the change upward along the hierarchy. This adds additional data maintenance workload to the system, which, however, decreases the discovery workload extremely. So the discovery speed and the system efficiency are all improved very much.

• G_ACT is also added in the fourth experiment. Each agent will get a copy of the L_ACT of its upper agent 1 time every 10 simulation steps, which will add additional data maintenance workload. From the simulation results, we can see this improves the discovery speed further. But the system efficiency decreases a little because of too much additional data maintenance.

• Another maintenance of the L_ACT is added in the fifth situation. Each agent will get a collection of the L_ACTs of its lower agents 1 time every 10 steps. This doesn’t improve the discovery speed any more and only adds more data maintenance workload, which decrease the system efficiency further.

• Another maintenance of the G_ACT is added in the sixth simulation. The change of the L_ACTs will be informed to the G_ACTs of the lower agents. This improves the discovery speed only a little, but adds too much data maintenance workload, which decrease the system efficiency extremely.

Balancing the effect that different selections of the strategies have on the discovery speed and the system efficiency, the fourth situation is the most proper for the example model. All of the ACTs are used. The service performance changes are advertised at real time and the G_ACTs are updated by the correspondent agents frequently. Obviously, changing the G_ACT updating frequency will also change the performance of the model. The simulation results in the Table 4 and also illustrated in Figure 7 show which is the best choice.

G_ACT Updating Frequency |1 |2 |5 |10 |20 |30 |40 |80 |Never | |v (*100)* |188 |185 |180 |182 |182 |177 |169 |172 |169 | |e (*100)* |33 |50 |71 |84 |87 |92 |94 |98 |96 | |* These are accumulative results after 100 simulation steps.

Table 4. Simulation Results (II): G_ACT Updating Frequency

[pic]

Figure 7. Choosing Proper G_ACT Updating Frequency

The example model should use all of the C_ACT, L_ACT, and G_ACT. L_ACT should be maintained by the real-time service advertisement. The G_ACT should be maintained by updating 1 time every 20 steps. In fact, the performance of the example model can be improved further if using agent level modeling. Different agents can use different strategies to achieve higher performance of the whole system. This will not be discussed here in details. In summary, A4 Simulator is a useful tool to aid the performance optimisation of the service and discovery of the distributed system.

6 Conclusions

A4 model provides good abstraction of the DSS, like agents, services, requests, and so on mainly used to address the problems of the scalability and adaptability. A4 discovery model is built on the basis of the hierarchical model, which focus more on the performance optimisation rather than the communication protocols or the data representation. The performance of the service discovery of an A4 system can be described by its discovery speed, system efficiency, load balancing, and service found ratio. The performance optimisation strategies include using cache, maintaining local service information and global service information, service validation limitation, service and request scope limitation.

A4 Simulator introduced in this work is a modeling and simulation environment to aid the performance optimisation of the service advertisement and discovery in the practical A4 system. It support all of the above high performance criterions and optimisation strategies. It provides functions like two-level modeling and multi-model simulation to make the modeling and simulation processes much easier. It is developed using Java and has friendly graphic user interface. It also supports service performance changing and agent mobility modeling. The case study in this work shows how users can use the simulator to select the most proper performance optimisation strategies for their practical system.

The research on A4 is just beginning and many works can be done in the future. The Simulator itself can be designed to support more kinds of performance changing patterns and request and service distribution style. The data amount of the agent communications can also be considered to evaluate the system performance. More practical models should be considered to study the efficiency of the A4 Simulator. For the A4 methodology, the Java APIs are being developed to aid the application developers to build practical A4 systems rather than only modeling and simulating.

References

1] Bluetooth. “Bluetooth Protocol Architecture Version 1.0”, Bluetooth White Paper, 1999, .

2] C. Brooks, B. Tierney, and W. Johnston, “JAVA Agents for Distributed System Management”, LBNL Report, 1997.

3] J. Cao, D. J. Kerbyson, and G. R. Nudd, “Dynamic Application Integration Using Agent-Based Operational Administration”, in Proc. of 5th Int. Conf. on Practical Application of Intelligent Agents and Multi-Agent Technology, Manchester, UK, pp. 393-396, 2000.

4] H. Casanova, and J. Dongarra, “Applying NetSolve’s Network-Enabled Server”, IEEE Computational Science & Engineering, Vol. 5, No. 3, pp. 57-67, 1998.

5] S. J. Chapin, D. Katramatos, J. Karpovich, and A. Grimshaw, “Resource Management in Legion”, Future Generation Computer Systems, Vol. 15, No. 5, pp. 583-594, 1999.

6] D. Chauhan, and A. D. Baker, “JAFMAS: A Multiagent Application Development System”, in Proc. of 2nd Int. Conf. on Autonomous Agents, Minneapolis/St. Paul, pp. 100-107, 1998.

7] R. S. Cost, T. Finin, Y. Labrou, X. Luan, Y. Peng, and I. Soboroff, “Agent Development with Jackal”, in Proc. of 3rd Int. Conf. on Autonomous Agents, Seattle, Washington, U.S.A., pp. 358-359, 1999.

8] M. Cremonini, A. Omicini, and F. Zambonelli, “Modelling Network Topology and Mobile Agent Interaction: An Integrated Framework”, in Proc. of the 1999 ACM Symposium on Applied Computing, The Menger, San Antonio, Texas, U.S.A., pp. 410-412, 1999.

9] G. Cugola, E. D. Nitto, and A. Fuggetta, “Exploiting an Event-based Infrastructure to Develop Complex Distributed Systems”, in Proc. of 20th Int. Conf. on Software Engineering, Japan, pp. 261-270, 1998.

10] K. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S. Martin, W. Smith, and S. Tuecke, “A Resource Management Architecture for Metacomputing Systems”, in Proc. of IPPS/SPDP '98 Workshop on Job Scheduling Strategies for Parallel Processing, 1998.

11] S. Fitzgerald, I. Foster, C. Kesselman, G. von Laszewski, W. Smith, and S. Tuecke, “A Directory Service for Configuring High-Performance Distributed Computations”, in Proc. of 6th IEEE Symp. on High-Performance Distributed Computing, pp. 365-375, 1997.

12] I. Foster, and C. Kesselman. The GRID: Blueprint for a New Computing Infrastructure. Morgan-Kaufmann, July 1998.

13] D. Garlan, and M. Shaw, “An Introduction to Software Achitecture. Advances in Software Engineering and Knowledge Engineering”, New York: World Scientific, Vol. 1, 1993.

14] HAVi, “Specification of the Home Audio/Video Interoperability Version 1.0”, The HAVi Specification, 2000. .

15] JetSend, .

16] Jini, “Jini Architectural Overview”, Sun Technical White Paper, 1999.

17] S. M. Lewandowski, “Frameworks for Component-Based Client/Server Computing”, ACM Computing Survey, Vol. 30, No. 1, 1998.

18] T. W. Malone, and K. Crowston, “The Interdisciplinary Study of Coordination”, ACM Computing Survey, Vol. 26, No. 1, 1994.

19] D. B. Moran, A. J. Cheyer, L. E. Julia, D. L. Martin, and S. Park, “Multimodal User Interfaces in the Open Agent Architecture”, in Proc. of the 1997 Int. Conf. on Intelligent User Interfaces, pp. 61-68, 1997.

20] H. Nakada, H. Takagi, S. Matsuoka, U. Nagashima, M. Sato, and S. Sekiguchi, “Utilizing the Metaserver Architecture in the Ninf Global Computing System”, in Proc. of High-Performance Computing and Networking '98, LNCS 1401, pp. 607-616.

21] G. Papadopoulos, and F. Arbab, “Coordination Models and Languages”, in Advances in Computers, Vol. 46: The Engineering of Large Systems, Academic Press, 1998.

22] D. E. Perry, and A. L. Wolf, “Foundations for the Studies of Software Architecture”, ACM SIGSOFT Software Engineering Notes, Vol. 17, No. 4, 1992.

23] R. Raman, M. Livny, and M. Solomon, “Matchmaking: Distributed Resource Management for High Throughput Computing”, in Proc. of 7th IEEE Int. Symp. on High Performance Distributed Computing, Chicago, Illinois, July, 1998.

24] S. H. Russ, K. Reece, J. Robinson, B. Meyers, R. Rajan, L. Rajagopalan, and C. Tan, “Hector: An Agent-Based Architecture for Dynamic Resource Management”, IEEE Concurrency, April-June, pp. 47-55, 1999.

25] Salutation, “Salutation Architecture Specification Version 2.1”, The Salutation Consortium Inc., 1999, .

26] Sloman, and B. Logan, “Building Cognitively Rich Agents Using the SIM_Agent Toolkit”, Communications of the ACM, Vol. 42, No. 3, pp. 71-77, 1999.

27] UPnP, “Universal Plug and Play Device Architecture Reference Specification Version 0.90”, Microsoft Corporation, 1999. .

28] R. Wolski, N. T. Spring, and J. Hayes, “The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing”, Future Generation Computing Systems, 1999.

29] W. Yeong, T. Howes, and S. Kille, “Lightweight Directory Access Protocol”, RFC 1777, Draft Standard, 1995.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download