123seminarsonly.com



ABSTRACTVery often applications need more computing power than a sequential computer can provide. One way of overcoming this limitation is to improve the operating speed of processors and other components so that they can offer the power required by computationally intensive applications. Even though this is currently possible to certain extent, future improvements are constrained by the speed of light, thermodynamic laws, and the high financial costs for processor fabrication. A viable and cost-effective alternative solution is to connect multiple processors together and coordinate their computational efforts. The resulting systems are popularly known as parallel computers, and they allow the sharing of a computational task among multiple processorsINTRODUCTIONThe needs and expectations of modern-day applications are changing in the sense that they not only need computing resources (be they processing power, memory or disk space), but also the ability to remain available to service user requests almost constantly 24 hours a day and 365 days a year. These needs and expectations of today’s applications result in challenging research and development efforts in both the areas of computer hardware and software.It seems that as applications evolve they inevitably consume more and more computing resources. To some extent we can overcome these limitations. For example, we can create faster processors and install larger memories. But future improvements are constrained by a number of factors, including physical ones, such as the speed of light and the constraints imposed by various thermodynamic laws, as well as financial ones, such as the huge investment needed to fabricate new processors and integrated circuits. The obvious solution to overcoming these problems is to connect multiple processors and systems together and coordinate their efforts. The resulting systems are popularly known as parallel computersand they allow the sharing of a computational task among multiple processors.Parallel supercomputers have been in the mainstream of high-performance computing for the last ten years. However, their popularity is waning. The reasons for this decline are many, but include being expensive to purchase and run, potentially difficult to program, slow to evolve in the face of emerging hardware technologies, and difficult to upgrade without,generally, replacing the whole system. The decline of the dedicated parallel supercomputer has been compounded by the emergence of commodity-off-the-shelf clusters of PCs and workstations. The idea of the cluster is not new, but certain recent technical capabilities,particularly in the area of networking, have brought this class of machine to the vanguard as a platform to run all types of parallel and distributed applications.The emergence of cluster platforms was driven by a number of academic projects, such as Beowulf [1], Berkeley NOW [2], and HPVM [3]. These projects helped to prove the advantage of clusters over other traditional platforms. Some of these advantages included, low-entry costs to access supercomputing-level performance, the ability to track technologies, an incrementally upgradeable system, an open source development platform, and not being locked into particular vendor products. Today, the overwhelming price/performance advantage of this type of platform over other proprietary ones, as well as the other key benefits mentioned earlier, means that clusters have infiltrated not only the traditional science and engineering marketplaces for research and development, but also the huge commercial marketplaces of commerce and industry. It should be noted that this class ofmachine is not only being used as for high-performance computation, but increasingly as a platform to provide highly available services, for applications such Web and database servers.A cluster is a type of parallel or distributed computer system, which consists of a collection of inter-connected stand-alone computers working together as a single integrated computing resource.HISTORY OF CLUSTER COMPUTINGThe history of cluster computing is best captured by a footnote in Greg Pfister's In Search of Clusters: “Virtually every press release from DEC mentioning clusters says ‘DEC, who invented clusters...’. IBM did not invent them either. Customers invented clusters, as soon as they could not fit all their work on one computer, or needed a backup. The date of the first is unknown, but it would be surprising if it was not in the 1960s, or even late 1950s.”The formal engineering basis of cluster computing as a means of doing parallel work of any sort was arguably invented by Gene Amdahl of IBM, who in 1967 published what has come to be regarded as the seminal paper on parallel processing: Amdahl's Law. Amdahl's Law describes mathematically the speedup one can expect from parallelizing any given otherwise serially performed task on a parallel architecture. This article defined the engineering basis for both multiprocessor computing and cluster computing, where the primary differentiator is whether or not the interprocessor communications are supported "inside" the computer (on for example a customized internal communications bus or network) or "outside" the computer on a commodity network.Consequently the history of early computer clusters is more or less directly tied into the history of early networks, as one of the primary motivation for the development of a network was to link computing resources, creating a de facto computer cluster. Packet switching networks were conceptually invented by the RAND corporation in 1962. Using the concept of a packet switched network, the ARPANET project succeeded in creating in 1969 what was arguably the world's first commodity-network based computer cluster by linking four different computer centers (each of which was something of a "cluster" in its own right, but probably not a commodity cluster). The ARPANET project grew into the Internet—which can be thought of as "the mother of all computer clusters" (as the union of nearly all of the compute resources, including clusters, that happen to be connected). It also established the paradigm in use by all computer clusters in the world today—the use of packet-switched networks to perform interprocessor communications between processor (sets) located in otherwise disconnected frames.The development of customer-built and research clusters proceeded hand in hand with that of both networks and the Unix operating system from the early 1970s, as both TCP/IP and the Xerox PARC project created and formalized protocols for network-based communications. The Hydra operating system was built for a cluster of DEC PDP-11 minicomputers called C.mmp at Carnegie Mellon University in 1971. However, it was not until circa 1983 that the protocols and tools for easily doing remote job distribution and file sharing were defined (largely within the context of BSD Unix, as implemented by Sun Microsystems) and hence became generally available commercially, along with a shared filesystem.ARCHITECTURE OF A CLUSTER The typical architecture of a cluster computer is shown in Figure 1. Thekey components of a cluster include, multiple standalone computers (PCs, Workstations, or SMPs), an operating systems, a high performance interconnect, communication software, middleware, and applications. Figure 1. A Cluster Architecture. DESIGNING A CLUSTER COMPUTER Choosing a processor The first step in designing a cluster is to choose the building block. The processing power, memory, and disk space of each node as well as the communication bandwidth between the nodes are all factors that can be chosen. You will need to decide which are important based on the mixture of applications you intend to run on the cluster, and the amount of money you have to spend. Best performance for the price ==> PC (currently dual-Xeon systems) If maximizing memory and/or disk is important, choose faster workstations For maximum bandwidth, more expensive workstations may be needed PCs running Linux are by far the most common choice. They provide the best performance for the price at the moment, providing good CPU speed with cheap memory and disk space. They have smaller L2 cache sizes than some more expensive workstations which can limit the SMP performance. They have less main memory bandwidth which can limit the performance for applications that do not reuse data cache well. The availability of 64-bit PCI-X slots and memory upto 16 GBytes removes several bottlenecks, but new 64-bit architectures will still perform better for large-memory applications. For applications that require more networking than Gigabit Ethernet can provide, more expensive workstations may be the way to go. You will have fewer but faster nodes, requiring less overall communications, plus the memory subsystem can support communication rates upwards of 200-800 MB/sec. When in doubt, it is always a good idea to benchmark your code on the machines that you are considering. If that is not possible, there are many generic benchmarks that you can look at to help you decide. The HINT benchmark developed at the SCL, or a similar benchmark based on the DAXPY kernel shown below, show the performance of each processor for a range of problem sizes. If your application uses little memory, or heavily reuses data cache, it will operate mainly on the left side of the graph. Here the clock rate is important, and the compiler choice can make a big difference. If your application is large and does not reuse data much, the right side will be more representative and the memory speed will be the dominate factor. Designing the network Along with the basic building block, you will need to choose the fabric that connects the nodes. As explained above, this depends greatly on the applications you intend to run, the processors you choose, and how much money you have to spend. Gigabit Ethernet is clearly the cheapest. If your application can function with a lower level of communication, this is cheap, reliable, but scales only to around 14 nodes using a flat switch (completely connected cluster, no topology). Which OS? The choice of an OS is largely dictated by the machine that you choose. Linux is always an option on any machine, and is the most common choice. Many of the cluster computing tools were developed under Linux. Linux, and many compilers that run on it, are also available free. With all that being said, there are PC clusters running Windows NT, IBM clusters running AIX, and we have even built a G4 cluster running Linux. Loading up the software I would recommend choosing one MPI implementation and going with that. PVM is still around, but MPI the way to go (IMHO). LAM/MPI is distributed as an RPM so it is easiest to install. It also performs reasonably well on clusters. There are many free compilers available, and the availability will of course depend on the OS you choose. For PCs running Linux, the GNU compilers are acceptible. The Intel compilers provide better performance in most cases for the Intel processors, and pricing is reasonable. The Intel or PGI compilers may help on the AMD processors. However, the cluster licenses for the PGI compilers are prohibitively expensive at this point. For Linux on the Alpha processors, Compaq freely distributes the same compilers that are available under Tru64 Unix. There are also many parallel libraries such as ScaLAPACK available. For Linux PCs, you may also want to install a BLAS library like the Intel MKL or one Sandia developed. If you have many users on a cluster, it may be worthwhile to put on a queueing system. PBS (portable batch system) is currently the most advanced, and is under heavy development. DQS can also handle multiprocessor jobs, but is not quite as efficient. You will also want users to have a quick view of the status of the cluster as a whole. There are several status monitors freely available, such as statmon developed locally. None are up to where I'd like them to be yet, although commercial versions give a more active and interactive view. Assembling the cluster A freestanding rack costs around $100, and can hold 16 PCs. If you want to get fancier and reduce the footprint of your system, most machines can be ordered with rackmount attachments. You will also need a way to connect a keyboard and monitor to each machine for when things go wrong. You can do this manually, or spend a little money on a KVM (keyboard, video, mouse) switch that makes it easy to access any computer. Pre-built clusters If you have no desire to assemble a system yourself, there are many vendors who sell complete clusters to your design. These are 1U or 2U rackmounted nodes pre-configured to your specifications. They are compact, easy to setup and maintain, and usually have good custom tools like web based status monitors. The price really isn't too much more than building your own systems now. Cluster administration With large clusters, it is common to have a dedicated master node that is the only machine connected to the outside world. This machine then acts as the file server, and the compile node. This provides a single-system image to the user, who launches the jobs from the master node without ever logging into any nodes. There are boot disks available that can help in setting up the individual nodes of a cluster. Once the master is configured, these boot disks can be configured to perform a complete system installation for each node over the network. Most cluster administrators also develop other utilities, like scripts that operate on every node in the cluster. The rdist utility can also be very helpful. If you purchase a cluster from a vendor, it should come with software installed to make it easy to use and maintain the system. If you build your own system, there are some software packages available to do the same. OSCAR is a fully integrated software bundle designed to make it easy to build a cluster. Scyld Beowulf is a commercial package that enhances the Linux kernel providing system tools that produce a cluster with a single system image. If set up properly, a cluster can be relatively easy to maintain. The operations that you would normally do on a single machine simply need to be replicated across many machines. If you have a very large cluster, you should keep a few spare machines to make it easy to recover from hardware problems.HOW CLUSTER COMPUTING WORKSThe software architecture consists of a user interface layer, a scheduling layer, and an execution layer. The interface and scheduling layers reside on the head node. The execution layer resides primarily on the compute nodes. The execution layer as shown here includes the Microsoft implementation of MPI, called MS MPI, which was developed for Windows and is included in the Microsoft? Compute Cluster Pack. This application is based on the Argonne National Laboratories MPICH2 implementation of the MPI-2 standard.The user interface layers consist of the Compute Cluster Job Manager, the Compute Cluster Administrator, and Command Line Interface (CLI).The Compute Cluster Job Manager is a WIN32 graphic user interface to the Job Scheduler that is used for job creation and submission.The Compute Cluster Administrator is a Microsoft Management Console (MMC) snap-in that is used for configuration and management of the cluster.The Command Line Interface is a standard Windows command prompt which provides a command-line alternative to use of the Job Manager and the Administrator.The scheduling layer consists of the Job Scheduler, which is responsible for queuing the jobs and tasks, reserving resources, and dispatching jobs to the compute nodes.In this example, the execution layer consists of the following components replicated on each compute node: the Node Manager Service, the MS MPI launcher mpiexec, and the MS MPI Service.The Node Manager is a service that runs on all compute nodes in the cluster. The Node Manager executes jobs on the node, sets task environmental variables, and sends a heartbeat (health check) signal to the Job Scheduler at specified intervals (the default interval is one minute).Mpiexec is the MPICH2-compatible multithreading executable within which all MPI tasks are run.The MS MPI Service is responsible for starting the job tasks on the various processors.CONCLUSIONNetwork clusters offer a high-performance computing alternative to SMP and massively parallel computing systems. Aggregate system performance aside, cluster architectures also can lead to more reliable computer systems through redundancy. Choosing a hardware architecture is just the beginning step in building a useful cluster: applications, performance optimization, and system management issues must also be handled.REFERENCESBader, David; Robert Pennington (June 1996). "Cluster Computing: Applications". Georgia Tech College of Computing. Retrieved 2007-07-13.^ TOP500 List - June 2006 (1-100) | TOP500 Supercomputing Sites^ Farah, Joseph (2000-12-19). "Why Iraq's buying up Sony PlayStation 2s". World Net Daily.^ Pfister, Gregory (1998). In Search of Clusters (2nd ed.). Upper Saddle River, NJ: Prentice Hall PTR. p. 36. ISBN 0-13-899709-8.^ ^ gridMathematica Cluster Integration^ Chari, Srini (2009). "Mastering the Odyssey of Scale from Nano to Peta: The Smart Use of High Performance Computing (HPC) Inside IBM?". Denbury, CT: IBM. p. 5.Robert W. Lucke: Building Clustered Linux Systems, Prentice Hall, 2005, ISBN 0-13-144853-6Evan Marcus, Hal Stern: Blueprints for High Availability: Designing Resilient Distributed Systems, John Wiley & Sons, ISBN 0-471-35601-8Greg Pfister: In Search of Clusters, Prentice Hall, ISBN 0-13-899709-8Rajkumar Buyya (editor): High Performance Cluster Computing: Architectures and Systems, Volume 1, ISBN 0-13-013784-7, Prentice Hall, NJ, USA, 1999.Rajkumar Buyya (editor): High Performance Cluster Computing: Programming and Applications, Volume 2, ISBN 0-13-013785-5, Prentice Hall, NJ, USA, 1999. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download