IPMM'03 paper



A New Taxonomy for Parallel Java Models Applied on Cluster Computing

Mohamed Galal*, Hesham Eldeeb**, Salwa Nassar***

* National Center for Examination and Educational Evaluation (NCEEE)

**, *** Electronic Research Institute (ERI)

EGYPT

Abstract: - The efficient Java parallel programming models that make it easier for parallel programmers to synchronize and exchange data between processors in network of PCs cluster with different levels of parallelism, are needed. A cluster of PCs provides a scalable parallel hardware platform for high performance computing. Moreover, Parallel and distributed computing on a cluster of PC's is being increasingly applied to a variety of large size computational problems. A huge computational infrastructure can be presented by building a Java programming with its advantages on cluster computing which is characterized by its scalability, flexibility, and improvement of cost and time.

In this paper, we propose a new taxonomy that is considered useful for describing the different models of parallelism that can be used for writing a parallel Java programs. Firstly, we present the taxonomy models including usage, advantages, and disadvantages of each model. Secondly, we provide a set of criteria to be used as a reference for evaluation and comparison of each model.

In addition, we describe the design and the implementation of three parallel models and extract their features to make it easy for the parallel programs developers to choose appropriate Java model for there application.

Key-Words: - Java, Parallel programming, Distributed computation, Taxonomy, Cluster computing, JavaSpaces,

RMI, Java sockets.

1 Introduction

Computational power of high-performance computing systems is required to solve many scientific and engineering problems. The complex parallel and distributed applications take advantage of the increased raw computational power provided by the cooperation of a multitude of connected computers with today's networks. The crucial point in building such applications is to establish the necessary communication structures between the participating processes allowing them to synchronize and exchange data, hide the hardware differences from the designer, and simple in use.

Cluster computing is an efficient platform for running heavy computation applications [1]. A cluster computing offers several important advantages. Some of these advantages are better price/performance ratios, scaling to very large number of nodes, flexibility of configuration and upgrading.

The Java platform bases on the power of networks and the idea that the same software should run on many different kinds of computers. Since its initial commercial release in 1995, Java technology has grown in popularity and usage because of its portability. Designers of Java describe it as a simple, distributed, interpreted, secure, architecturally neutral, portable, high-performance, multithreaded, and dynamic language. Some of these adjectives also have been used to describe other languages, but Java is unique in some adjectives. It is the first language that can be use for writing general-purpose programs, as well as programs designed for use on internets and intranets. In addition, it is completely object-oriented language and its code run without any change in any platform.

The main goal of this paper is to provide a new taxonomy for parallel Java models that will be used to build a parallel, distributed applications using Java, and to make a comparative study between them. Next section presents the new taxonomy levels, followed by an overview of each level of this taxonomy. Two more sections for: the implementation of the three models and extract features to help parallel programs developers to choose appropriate Java model for parallelism. The last section is the discussion and conclusion.

2 The new taxonomy of parallel Java models

Multithreading in Java language utilizes the multiprocessor system and multithreading shares the CPU time in case of single processor system. Java automatically distributed threads in multithreading Java code over CPUs with no programming control in case of cluster computing. We will not study here in this paper multithreading in Java. We want to study the different Java communication structures that used as good tools for building parallel and distributed applications, given the rising importance of the network. We provide a taxonomy that makes it easier for parallel programmers to select the suitable Java model to synchronize and exchange data between processors in network and distributed environment.

The standard Java libraries offer network communications that are UDP, TCP/IP, and a higher-level communication method RMI. In addition, there is a new technology for network world called Jini, which has been built on the top of Java language, object serialization, and RMI. Jini enable objects to move around the network from virtual machine to virtual machine and it extends the benefits of Java language to the network.

Building parallel and distributed applications need special programming models that explicitly handle the synchronization and data exchanging. The classical programming languages are based on extensions to the sequential programming paradigm, thus they are not very well suited to handle these problems. To exploit the potential of java, there are three levels of parallel programming with Java language as shown in figure 1.

[pic]

Fig.1 A new taxonomy for parallel Java models.

These levels and its names are selected to be a new taxonomy for parallel programming in Java and these levels are:

1. Low programming level using Java network tools model.

2. Basic programming level using Java RMI model.

3. High programming level using Jini & JavaSpaces model.

3 Low level programming using Java network tools

Networks are always designed as a series of protocol layers, with each layer responsible for some aspect of the network operation.

3.1 TCP/IP & UDP protocol

The TCP/IP is well known protocol. It includes a rich set of application services. TCP, UDP, and IP provide infrastructure for application services.

By comparing TCP and UDP, it is clear that UDP is connectionless transport protocol because there are no connections needed to be established prior to sending data in contrast to TCP. UDP is less reliable than TCP but, it is faster, and simpler [2]. Depending on the application, the appropriate protocol is selected.

3.2 Java sockets programming

Java creates a new programming mechanism called sockets. A Java application is free to open a socket connection to any host it wants, open any file, and create its own custom class loaders. Actually, there are two kinds of sockets: connection-oriented sockets, and connectionless sockets.

3.2.1 Connection-oriented programming

In cluster computing applications over TCP, a client program and a server program establish to one another. Each program binds a socket to its end of the connection. The Socket and ServerSocket classes provided in package implement the client side and server side of the connection, respectively [3].

In fact, the Java classes have significantly reduced the skill needed to create a sockets program. Each transmission mode is implemented in a separate set of Java classes. Once the socket is created and listening for connections, incoming connections are created and placed on the listen stack.

3.2.2 Connectionless programming

Connectionless programming and UDP protocol provide a mode of network communication whereby applications send packets of data, called datagrams, to one another. The application sends and receives DatagramPackets through a DatagramSocket [4].

In addition, we can use connectionless programming using UDP connection for multicasting. Most high-level network protocols only provide a unicast transmission, that is, nodes of the network only have the ability to send to one other node at a time.

4 Basic level programming using Java RMI model

The problems with sockets essentially stem from the fact that they are very primitive; it is not part of object model, and point-to-point connection. RMI is a software layer on top of sockets.

The purpose of RMI is to provide an abstraction that enables an application designer to invoke methods on remote objects instead of having to communicate at a lower level. To transfer or persistence objects, the RMI API uses the serialization API to wrap and unwrap the objects [4].

4.1 Serialization

Serialization is a mechanism for converting Java objects to a stream of bytes that is platform independent. When a version or type is unknown, the deserializer can use the byte code loader to load the correct class file for into the running application. Serialization and its performance are of critical importance for parallel programming.

4.2 RMI Advantages

At the most basic level, RMI is Java Remote Procedure Call RPC mechanism. The primary advantages of RMI over RPC and sockets are Object-oriented, Mobile behavior, Distributed garbage collection, Concurrent computing, and Distributed computing solution.

4.3 RMI Architecture for cluster of PC's

The RMI architecture is based on one important principle: the definition of behavior and the implementation of that behavior are separate concepts.

RMI allows the code that defines the behavior and the code that implements the behavior to remain separate and to run on separate Java virtual machines.

The RMI implementation is essentially built from three abstraction layers. The first is the Stub and Skeleton layer that which intercepts method calls made by the client to the interface reference variable and redirects these calls to a remote RMI service. The next layer is the Remote Reference Layer that understands how to interpret and manage references made from clients to the remote service objects [5].

The transport layer is based on TCP/IP connections between machines in a network that is the third layer. There is no need to design an application level protocol. RMI takes care of low-level communication details.

5 High level programming using Jini & JavaSpaces for cluster of PC's

Jini extends RMI to provide services to a network on the cluster. Jini provides new registry called Jini Lookup Service. It has some advantages comparing with RMI: entries are leased not bound, Jini Lookup service is discovered, and search by interface versus by name [6]. The Most important Jini service of inter-est to cluster computing environment is JavaSpace.

5.1 JavaSpaces services

JavaSpaces are object repositories where any platform architecture can share objects. JavaSpaces technology provides a programming model that views applications as a collection of processes cooperating via the flow of objects into and out of one or more spaces. A space is a shared, network accessible repository for objects [7]. It provides associative lookup of persistent objects and maintains security through transactions.

Using space-based implementation allows transacting executable content across the network. This decouples the semantics of distributed computing from the semantics of the problem domain. A loose coupling in this case is means that the two elements can be managed and developed independently. This eliminates issues around multithreaded server implementation, low-level synchronization, or network communication protocols.

5.2 Application Model and Terms

JavaSpaces service holds entries. An entry is a typed group of objects, expressed in a class for the Java platform. The following methods are the essential of the JavaSpaces API and are supplied by any object that implements the JavaSpace interface. These are write, read, and take. In addition, the JavaSpace interface supplies two other methods notify and snapshot which can be useful in many applications [7].

5.3 JavaSpaces Advantages

In JavaSpaces applications, processes do not communicate directly, but instead coordinate their activities by exchanging objects through a space [8].

It leverages features that come with Java objects such as strong typing, mobile code, and secure execution. JavaSpaces provided a space with the many advantages like shared, persistent, associative, transactionally secure, and it allows exchange of executable content.

6 Implementation of the different taxonomy levels on cluster of PCs

Matrix multiplication is a building block in many matrix operations and for solving many problems like graph theory problems, neural networks system, and so on. Many papers provide some insights as to why this problem is complex. Practical algorithms that use matrix multiplication tend to use different shaped matrices, and use parallel implementation of matrix-matrix multiplication can significantly impact the performance of matrix multiplication.

In particular, we consider in this paper the problem of developing a code to compute C = A.B, where A, B, and C are dense matrices of size NxN.

This matrix-matrix multiplication involves O(N3) operations, since for each element cij of C , we must compute form (1) and the data distributed in tasks that a row of A and B allocated to a single task. The task n generates result n for the C matrix.

[pic]

We design programs that will allow distributing the parallel implementation of multiplication of A, B, and C using three models: Java sockets, Java RMI, and Jini & JavaSpaces. We fixed the number of processor to five processors. One machine as master that create and distribute tasks over four machines as workers that are perform task computation and return the result back to the master. Changing is only in the size (N) of the matrices.

6.1 Implementation the low-level program using java sockets model

In this paper, we choose to use TCP for its reliability, multiplexing, and connections. Moreover, implemen-tation of programs using TCP protocol is more suitable to compare with the other implementation of RMI and JavaSpaces. To communicate over TCP, a worker program and a master program establish to one another. Each program binds a socket to its end of the connection. The Socket and ServerSocket classes provided in package implement the worker side and master side of the connection, respectively.

Master program gets number of workers and N from user, then generates tasks with random integers for both matrices A and B. After that, it creates a ServerSocket and it waits workers to connect on this ServerSocket. Once a worker program connects to the ServerSocket, master program creates a thread to deal with this worker by start sending a task and wait for a result from the worker, while it still listening for the other workers to connect. There is a shared object local in the master program to hold the collected outputs from each thread to form matrix C.

On the other hand, the worker program try to connect the master ServerSocket then, it starts synchronize with the thread of master to receive the task, converts data types, start computing the task, and return the result to the thread in master program.

Design notes

• Synchronization between master and worker, if worker start before master or worker start while master generate matrix and not start listen to the port the connection is fail.

• A lot of work done to convert data from type to another because the socket can not transfer all types of data.

• After every process of sending or receiving between master and worker, the sender waits a receiver to replay before skip to the second step.

• The code written to do this job is very low level, manage everything and the worker is dedicated to do a special task only.

• All detail about the workers, master location and port numbers must known a priori.

The sockets cannot transfer objects directly. This is at odds with our use of an object oriented programming language. Distributed objects are a potentially powerful tool that has only become broadly available for developers at large in the past few years. The power of distributing objects is not in the fact that the object is only a complex data type, but it is executable contents also. From this point, we can design a new tool for parallel and distributed programming that called compute server (it is not a new term, but we redesign it with Java RMI and JavaSpaces).

Compute Server

Historically, compute servers have been used to take advantage of resources of large farms of processors in order to tackle computationally intensive problems like ray tracing, weather modeling, cryptography, and so called "grand challenge". Now we design general purpose compute servers in both RMI and JavaSpaces that allow us to compute any task, add new tasks at will, and the server automatically incorporates their corresponding code.

6.2 Implementation the basic level using java RMI model to design a compute server

We design a general compute server using RMI that accepts tasks from master program, executes these tasks, and returns results. RMI users usually use a remote object on a server called RMI server. This server executes a method invocated by any client on the server throw the RMI mechanism. This style will reverse in our design; every client (worker) will run as compute server that wait to invoke a local method called runTask() remotely by the master. A Task object that defines the interface between the compute server and the work that it needs to do, providing the way to start the work. Implementation of the Task object is downloaded by RMI into the compute server virtual machine when needed.

The Compute server knows only that each object it receives implements the runTask method; it does not know, and does not need to know, what the implementation does.

The worker processor runs a program called ComputeWorker that implements a main Compute interface. It declares the remote interface being implemented, defines the no-argument constructor for the remote object, and implements each remote method in the remote interface. The main method of the ComputeWorker creates and installs a security manager to determine whether downloaded code has access to the local file system or can perform any other privileged operations. This restriction ensures that the operations performed by downloaded code go through a set of security checks. In addition, ComputeWorker creates an instance of the remote object and registers the remote object with the RMI remote object registry. The Task interface extends the java.io.Serializable interface because all method parameter and return values are serialized

The master program gets (N), the number of compute servers, and a list of locations (hostnames, or IP's) for the compute servers then generates tasks with random integers for both matrices A and B. After that, it looks up for a compute server form the input list and creates a thread called SlaveManager to deal with this worker by invoke a remote runTask() method with the parameter Task object and the returned object is task result from the worker and so on with other compute server to finish all tasks. There is a shared object local in the master program to hold the collected outputs from each thread to form matrix C.

6.3 Implementation the high-level using JavaSpaces model to design a compute server

We design a general compute server using JavaSpaces. It is an implementation of a powerful, all purpose computing engine, and a new tool for parallel and distributed programs. Tasks are dropped into the space, picked up by processes, and then computed; the result is written back into the space and, at some point, retrieved.

For master and workers to exchange task objects, they must agree on a common interface for those objects. The well known Command pattern is used for this purpose. This pattern, which consists of a single method called execute(), lets us decouple the object that invokes an operation from the object that can perform it. In this case, the object that will invoke the operation is a worker that has cycles to spare and time to compute a task. As we have done in RMI program implementation, the object that knows how to perform the operation is the task itself. Master process is free to implement various execution methods.

Here is it more easy code than RMI. The worker is a straightforward concept: after connects to the space, it continually looks for a task, takes it from the space, computes it, returns a result back to the space, and then starts over. The result is a shared object in the space every worker can take it out from the space, write its own result in that object, and then return it back to the space to be available to other worker. By this way, many complex parallel problems can be solved with few lines of code.

The master program is a simple program not like the corresponded one in RMI model. No multithreading process needed to manage each worker. Just it connects space, creates tasks, drops task into space, waits for workers completed their task, and takes the shared object (Matrix C) as the final result.

7 Result

In this section, the measure of the three different models of parallelism is presented. Fig. 2 shows that the practical experiments performance with a cluster of five nodes. One node is a master and the other four are the workers. Using the three programs that written for Sockets, RMI, and JavaSpaces.

From this implementation, its models code, and results, we can conclude some differences between the models of our proposed taxonomy. The table 1 presents a competitive study between parallel programming using Java sockets, Java RMI, and JavaSpaces models.

[pic]

Fig. 2. The practical experiments performance with different matrices sizes.

Table 1. Competitive study between the models of the proposed taxonomy

| |Sockets |RMI |JavaSpaces |

|Significant |Yes |No |No |

|execution time | | | |

|Network traffic|Independent |Dependent |Dependent |

|Coupled |Tightly |Tightly |Loosely |

|communication | | | |

|Data transfer |Primitive |Object |Object |

|type | | | |

|Coding |Complex |Normal |Easy |

|difficulty | | | |

|Require an |No |Yes |Yes |

|initial Setup | | | |

|compute server |No |Yes |Yes |

|designable | | | |

|Garbage |No |Yes |Yes |

|collection | | | |

|Scalability |No |No |Yes |

|Add/Remove |No |No |Yes |

|running worker | | | |

|Searching by |No |No |Yes |

|contents | | | |

|Dealing with |No |No |Yes |

|partial failure| | | |

|Provided |No |No |Yes |

|leasing | | | |

|feature | | | |

|Distributed |No |No |Yes |

|persistence | | | |

8 Conclusion:

In this paper, we have proposed a new taxonomy for writing a parallel Java programs applied on cluster computing. This taxonomy is make it easy for the parallel Java developer to choose the appropriate model for Java parallelism according to its application. Three models for sockets, RMI, and JavaSpaces are evaluated and tested.

Socket programming provides fast execution time and the program does not dependent on the network traffic because the direct connections throw sockets and ports. However, it used only primitive data only. It is hard in coding and complex debugging. Therefore, it is suitable for small and very special applications that are need more speed and not depended on the network traffics like real time applications.

RMI as application layer, can invoke a remote method and it can be used to design a compute server. In addition, it can search for the service availability but their registry location must be known a priori. But, it is not easy to program and have no mechanism for deal with partial failure. We can conclude that this model is suitable for small intranets applications where the locations of servers are known a priori. Also, it can be can used for application level programming without concerned with low level details.

Finally, JavaSpaces is robust tool to design a parallel programs. It provides intrinsic forms of dynamic scalability and fault tolerance. However, its consuming more time compared to other models.

References:

1] A. Abd-Elsamea, H. Eldeeb, S. Nassar, PC cluster as a platform for parallel applications, WSEAS

Transactions on Computers, Issue 5, Vol. 3, Nov. 2004.

2] W. Stallings, Data and Computer Communications, Seventh Edition, Prentice Hall, 2004.

3] M. Campione, K. Walrath, The Java Tutorial Object-Oriented Programming for the Internet, Third edition, Addison Wesley, 2002.

4] Sun Microsystems, The Java 2 SDK, Standard Edition, package documents, 2001.

5] C. Austin, M. Pawlan, Advanced Programming for the Java 2 Platform, Addison-Wesley, 2000.

6] M. L. Liu, Distributed Computing: Principles and Applications, Addison Wesley, 2004.

7] E. Freeman, S. Hupfer, K. Arnold, JavaSpaces: Principles, Patterns, and Practice, Addison Wesley, 1999.

8] M. Galal, H. Eldeib, S. Nassar, Distributed Backpropagation Neural Network System for Pattern Recognition using JavaSpaces, Artificial Neural Network Applications, IPMM '03 Proceeding, Japan, 2003.[pic]

-----------------------

High level

N-1

Cij = " Aik . Bkj (1)

k=0

Basic level

Low level

JavaSpaces

Java RMI

Network tools

Parallel Java m∑ Aik . Bkj (1)

k=0

Basic level

Low level

JavaSpaces

Java RMI

Network tools

Parallel Java model

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download