Overview of SOA Programming Model and Runtime System …



Overview of SOA Programming Model and Runtime System for Windows HPC Server 2008

Microsoft Corporation

Published: May 2008

Abstract

With the increasing number and size of the problems being tackled on ever-larger clusters, developers, users, and administrators face increasing challenges in meeting time-to-result goals. Applications must be developed quickly, run efficiently on the cluster, and be effectively managed so that application performance, reliability, and resource utilization are optimized. Taking an approach to building applications using Service-Oriented Architecture (SOA) with Windows® HPC Server 2008 can help meet these challenges.

Windows HPC Server 2008 provides a platform for SOA-based applications. The SOA programming model allows solution developers and architects to rapidly develop new high performance computing (HPC) cluster-enabled interactive applications and easily modify existing distributed computing applications. With Windows HPC Server 2008, the developer build/debug/deploy experience is streamlined, the speed of processing is accelerated, and the management of the applications and systems is simplified.

This white paper provides a technical overview of SOA applications and the Windows HPC Server 2008 functions that support the SOA model; including building and deploying SOA applications; their architecture, runtime system, scaling, and performance considerations; and monitoring and troubleshooting.

This document was developed prior to the product’s release to manufacturing, and as such, we cannot guarantee that all details included herein will be exactly as what is found in the shipping product.

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

© 2008 Microsoft Corporation. All rights reserved.

Microsoft, Active Directory, Excel, SharePoint, SQL Server, Visual Basic, Visual Studio, Windows, the Windows logo, Windows PowerShell, and Windows Server are trademarks of the Microsoft group of companies.

All other trademarks are property of their respective owners.

Contents

Windows HPC Server 2008 Overview 1

Job Operation in Windows HPC Server 2008 2

Service-Oriented Architecture Application Overview 4

What Is SOA? 4

Batch Applications and Interactive Applications 4

Target Applications for SOA 5

Building an Application: The SOA Programming Model 7

Benefits of Windows HPC Server 2008 for SOA Applications 7

Getting Started: Building an Application with the SOA Programming Model 7

Creating the Service 7

Deploying the Service to a Compute Cluster 9

Creating a Client Program 10

Session API 13

Service Deployment 15

Copying and Placing the Service DLLs 15

Registering the Service 15

Maintaining Multiple Versions of a Service 18

Running the SOA Application: Architectural Considerations 19

Running the SOA Application 19

Recovering from a Node Failure 21

Compute Node Failure 21

WCF Broker Node Failure 21

Security 26

Sharable Sessions 28

Service Instance Resourcing Model 30

How Broker Dispatches Requests to Service Instances 31

Broker Configuration Parameters 32

Session Life Cycle Model 33

Broker Node Level Settings 34

Session Level Settings 34

Monitoring and Managing the SOA Infrastructure 35

Monitoring the Cluster 35

Advanced Monitoring with System Center Operations Manager 36

Enabling a Broker Node 36

Monitoring a Session 38

Monitoring the WCF Broker Node 38

Monitoring a Service Job 40

Reporting 41

Troubleshooting and Diagnosing SOA Application Runtime Errors 42

Service Repository Test 43

Service Model Test 44

Advanced Programming Topics 45

Throttling Requests 45

Handling Large Messages 47

Reducing the Message Passing Overhead 48

Summary 49

Glossary 50

Windows HPC Server 2008 Overview

High performance computing (HPC) applications use a cluster of computers working together to solve a single computational problem or single set of closely related computational problems. Windows HPC Server 2008 enables such cluster-based supercomputing based on x64 versions of the Windows Server® 2008 operating system. Windows® HPC Server 2008 can efficiently scale to thousands of processing cores and provides a comprehensive set of deployment, administration, and monitoring tools that are easy to deploy, manage, and integrate with an existing infrastructure. A wide range of software vendors in various vertical markets have been designing their applications to work seamlessly with Windows HPC Server 2008, so that users can submit and monitor jobs from within familiar applications without having to learn new or complex user interfaces.

Windows HPC Server 2008 includes an advanced Job Scheduler, a new and faster Microsoft Message Passing Interface (MS-MPI), rapid deployment options using Windows Deployment Services (WDS), and a new management interface built on the Microsoft® System Center user interface (UI) that supports Windows PowerShell™ as a preferred scripting language. Windows HPC Server 2008 takes advantage of Windows Server 2008 failover services, in addition to the failover clustering capabilities of Microsoft® SQL Server®, for cluster failover and redundancy.

Windows HPC Server 2008 integrates with other Microsoft products to help increase HPC productivity and improve the overall user experience. This includes collaboration through Microsoft® Office SharePoint® Server 2007 and the Windows® Workflow Foundation (WF), in addition to improved management and efficiency through integration with System Center solutions.

Windows HPC Server 2008 delivers an integrated platform that makes it possible to create a new breed of applications that can be run in interactive settings, in addition to the traditional batch applications in the engineering, oil and gas, and life science market segments. These new interactive applications include trade and risk management applications in financial services, Microsoft® Office Excel®, and insurance risk modeling applications. Windows HPC Server 2008 can be used for massively parallel programs (computational fluid dynamics, reservoir simulation) in addition to embarrassingly parallel programs (Basic Local Alignment Search Tool [BLAST], Monte Carlo simulations). Through integration with the Windows® Communication Foundation (WCF), Windows HPC Server 2008 empowers software developers working with Service-Oriented Architecture (SOA) applications to harness the power of parallel computing offered by HPC solutions.

Note:   

For general information about Windows HPC Server features and capabilities, see the white paper “Windows HPC Server 2008 Technical Overview.”

For overall management and deployment information, see the white paper “Windows HPC Server 2008 System Management Overview.”

For information about the Windows HPC Server 2008 Job Scheduler, see the white paper “Windows HPC Server 2008 Job Scheduler.”

These papers can be found at .

Job Operation in Windows HPC Server 2008

Jobs, defined as discrete activities scheduled to perform on the compute cluster, are the key to operating in a Windows HPC Server environment. Compute cluster jobs are comprised of tasks; the job can be a single task, or it can include many individual tasks. Tasks can be serial, running one after another, or parallel, running across multiple processors. Tasks can also run interactively as SOA applications. The structure of the tasks in a job is determined by the dependencies among tasks and the type of application being run. In addition, jobs and tasks can be targeted to specific nodes within the cluster. Nodes can be reserved exclusively for particular jobs, or they can be shared between different jobs and tasks.

To understand job operation, it is helpful to understand the components of an HPC cluster. Figure 1 shows cluster components and how they relate to each other.

Figure 1 Elements of a compute cluster

A cluster consists of a single head node (or a primary and secondary head node, if the deployed cluster is made highly available) and compute nodes. For interactive SOA applications, the cluster also includes one or more WCF broker nodes.

• The head node, which can also operate as a compute node, is the central management node for the cluster. The head node deploys the compute nodes, runs the Job Scheduler, monitors job and node status, runs diagnostics on nodes, and provides reports on node and job activities.

• Compute nodes execute job tasks.

• WCF broker nodes act as intermediaries between the application and the services. The broker load-balances the service requests to the services, and finally return results to the application

When a user submits a job to the cluster, the Job Scheduler validates the job properties and stores the job in a SQL Server database. The job is entered into the job queue based on the specified policy. When the necessary resources are available, the job is sent to the compute nodes assigned for the job and run under the user’s security context. As a result, the complexity of using and synchronizing different credentials is eliminated, and the user does not have to employ different methods of sharing data or compensate for permission differences among different operating systems.

An SOA application differs from traditional HPC batch-oriented applications in several ways. The admission, allocation, and activation boundaries are blurred. The initial admission involves a session in addition to the actual job, and the job admission request comes from the library implementing the session, not directly from the application code. Allocation is still fairly typical, with the Job Scheduler still managing resource allocation. Once a session is created, requests are sent to the broker node and results returned back to the client through the broker node.

Service-Oriented Architecture Application Overview

What Is SOA?

A Service-Oriented Architecture is an approach to building distributed, loosely coupled applications. SOA separates functions into distinct services that can be distributed over a network, and combined and reused. These functions are loosely coupled with the operating systems and programming languages underlying the applications. SOA defines and provisions the IT infrastructure to support and participate in the exchange of data between different applications. SOA services communicate with each other by passing data or by coordinating an activity between several services.

The SOA architecture is not tied to a specific technology. It may be implemented using a wide range of technologies (including SOAP, Web services, WCF) and a variety of languages across different operating systems. The defining characteristic of SOA is independent services with defined interfaces that can be called to perform their tasks in a standard way—the service does not need to know the calling application, and the application does not need to know how the service actually performs its tasks

Batch Applications and Interactive Applications

While the first version of Windows HPC Server 2008 supports traditional HPC applications in the engineering, oil and gas, and life science market segments (applications that generally run in batch fashion), Windows HPC Server 2008 now delivers a platform that supports a new breed of applications that run in interactive settings, including trade and risk management applications in financial services and WCF or Web services–based applications (see Figure 2).

[pic]

Figure 2 Windows HPC Server 2008 now focuses on interactive applications

Target Applications for SOA

HPC applications submitted to compute clusters are typically classified as either message intensive or embarrassingly parallel. While message-intensive applications comprise sequential tasks, embarrassingly parallel problems can be easily divided into very large numbers of parallel tasks, with no dependency or communication between them.

To solve embarrassingly parallel problems without having to write the low-level code, developers need to encapsulate the core calculations as a software module. The SOA programming model makes this encapsulation possible and effectively hides the details for data serialization and distributed computing.

Windows HPC Server 2008 includes support for embarrassingly parallel applications that use the SOA programming model; these applications use compute clusters interactively to provide near real-time calculation of complex algorithms. Table 1 shows some example applications and the related tasks.

Table 1 Examples of SOA Applications

|Example Application |Example Task |Units of Work |

|Monte Carlo problems that simulate the behavior of |Predicting the price of a |The pricing of each security. |

|various mathematical or physical systems. Monte Carlo|financial instrument. | |

|methods are used in physics, physical chemistry, | | |

|economics, and related fields. | | |

|BLAST searches. |Gene matching. |Individual matching of genes. |

|Genetic algorithms. |Evolutionary computational |Computational steps. |

| |meta-heuristics. | |

|Ray Tracing. |Computational physics and |Each pixel to be rendered. |

| |rendering. | |

|Microsoft Office Excel add-in calculations. |Calling add-in functions. |Each add-in function call. |

The Monte Carlo problem, a frequently used example of an SOA application, simulates the behavior of various mathematical or physical systems; it is used in physics, physical chemistry, economics, and related fields. The Monte Carlo problem is a computational algorithm that relies on repeated random sampling. Because of the reliance on repeated computation and random or pseudo-random numbers, Monte Carlo methods are well-suited for HPC. Monte Carlo methods tend to be used when it is infeasible or impossible to compute an exact result with a deterministic algorithm.

The Monte Carlo method is widely used by financial analysts who want to construct stochastic or probabilistic financial models (as opposed to the traditional static and deterministic models). Many financial corporations use the Monte Carlo methods for making investment decisions or for valuing mergers and acquisitions; for example, financial corporations may need to formulate trading strategy against historical market data, complete risk analysis via Monte Carlo simulation in near real time, and price new derivative instruments.

Another SOA example is Basic Local Alignment Search Tool (BLAST), a computer program that identifies homologous genes (genes in different species that share similar structures and functions) in different organisms. For example, there may be a gene in mice related to liking (or not liking) the consumption of alcohol; using BLAST, it is possible to search the human genome in search of a homologous gene. Because of the many iterations required, BLAST is well-suited for SOA and HPC.

Table 2 describes the features and tools that Windows HPC Server 2008 provides for meeting the needs of SOA applications.

Table 2 Benefits of Windows HPC Server 2008 for SOA Applications

|Tasks |User Needs |Windows HPC Server 2008 Features |

|Build |The ability to solve embarrassingly parallel problems |A service-oriented programming model based on WCF that |

| |without having to write the low-level code. |effectively hides the details for data serialization and |

| |An integrated development environment (IDE) tool that |distributed computing. |

| |lets developers develop, deploy, and debug applications |Microsoft® Visual Studio® 2008 with tools to debug |

| |on a cluster. |services and clients. |

|Run |Ability to distribute short calculation requests |Low latency round-trip. |

| |efficiently. |End-to-end Kerberos with WCF transport-level security. |

| |Ability to run user applications securely. |Dynamic allocation of resources to the service instances. |

| |A system that decides where to run the tasks of the | |

| |application and dynamically adjusts cluster resource | |

| |allocation to the processing priorities of the workload. | |

|Manage |The ability to monitor application performance from a |Runtime monitoring of performance counters, including the |

| |single point of control. |number and status of outstanding service calls and |

| |The ability to monitor and report service usage. |resource usage. |

| | |Service resource usage reports. |

Building an Application: The SOA Programming Model

SOA applications need to be developed quickly, run efficiently on the cluster, and be effectively managed so that application performance, reliability, and resource use are guaranteed. Developers need to be able to encapsulate the core calculations as software modules that can be deployed and run on the cluster; these software modules identify and marshal the data required for each calculation and optimize performance by minimizing the data movement and communication overhead.

The SOA programming model provides the specifications and open technologies that enable developers to write service programs and client programs using the widely adopted WCF platform. The Microsoft® Visual Studio® development system provides easy-to-use WCF service templates and service referencing utilities that let developers quickly prototype, debug, and unit-test applications.

Benefits of Windows HPC Server 2008 for SOA Applications

Windows HPC Server 2008 provides a scalable, reliable, and secure interactive application platform that empowers developers to rapidly develop and easily modify cluster-enabled interactive applications.

Getting Started: Building an Application with the SOA Programming Model

Building an SOA application using the SOA programming model consists of three steps:

1. Creating the service.

2. Deploying the service to a cluster.

3. Creating a client application.

Creating the Service

A service in the SOA programming model is defined as a program exposing a collection of endpoints; all communication with a service happens via the service's endpoints. Each endpoint specifies a contract that identifies which methods are accessible via this endpoint, a binding that determines how a client application can communicate with this endpoint, and an address that indicates where this endpoint can be found.

The following steps can be used to create a service:

1. Launch Visual Studio 2008 and create a Class Library project. Name the project EchoService.

2. In the Visual Studio Project Explorer pane, navigate to the EchoService project.

3. Right-click References. An Add Reference dialog appears.

4. In the Add Reference dialog, select the .NET tab.

5. Select System.ServiceModel. This reference is required for writing the WCF Service code.

6. If this reference is not listed on the .NET tab, follow these steps:

a. Click the Browse tab.

b. Locate and select the file System.ServiceModel.dll from %windir%\\Framework\v3.0\Windows Communication Foundation.

c. Click OK.

7. In Solution Explorer, navigate to EchoService item, and then rename the file Class1.cs to EchoService.cs.

8. Open the file EchoService.cs, and then copy and paste the following code into it:

|using System; |

|using System.Collections.Generic; |

|using System.Text; |

|using System.Diagnostics; |

|using System.ServiceModel; |

| |

|namespace EchoService |

|{ |

|    [ServiceContract] |

|    public interface IEchoService |

|    { |

|        [OperationContract] |

|        string Echo(string input); |

|    } |

| |

|    [ServiceBehavior(IncludeExceptionDetailInFaults = true)] |

|    public class EchoService : IEchoService |

|    { |

|        #region IEchoService Members |

| |

|        public string Echo(string input) |

|        { |

|            return Environment.MachineName + ":" + input; |

|        } |

|        |

|#endregion |

|    } |

|} |

9. Compile the service to create EchoService.dll. This file should reside in Visual Studio 2008\Projects\EchoService\EchoService\bin\[Debug|Release]\.

Deploying the Service to a Compute Cluster

The following steps can be used to deploy the service DLL file to the compute cluster.

1. Copy the file EchoService.dll to the \Services folder on the local drive of all compute nodes.

2. Register the service DLL on each node in the cluster by creating the EchoService.config file in the %CCP_HOME%ServiceRegistration folder (Default folder is: c:\Program Files\Microsoft HPC Pack\ServiceRegistration):

| |

| |

|  |

|    |

|    |

|      |

|    |

|  |

|  |

|    |

|  |

| |

The contract and service attributes are optional if the service defines only one interface; otherwise, specify these values for each interface that the service defines.

Creating a Client Program

Before creating the client program, install the Microsoft® HPC Pack 2008 Client Utilities on the client computer. The following steps can be used to create the EchoService client proxy code from the EchoService DLL.

1. Navigate to the following folder:

Visual Studio 2008\Projects\EchoService\EchoService\bin\[Debug|Release]\

2. Run the command svcutil EchoService.dll. This command generates the WSDL and XSD files for the service.

3. Run the command svcutil *.wsdl *.xsd /async /language:C# /out:EchoClientProxy.cs.

4. Launch Visual Studio® 2008 and create a Console Application project. Name it EchoClient.

5. In the Solution Explorer, navigate to EchoClient and add a reference to the file Microsoft.Hpc.Scheduler.dll, Microsoft.Scheduler.Properties.dll and Microsoft.Hpc.Scheduler.Session.dll (these files are in C:\Program Files\Microsoft HPC Pack 2008 SDK\bin).

6. In the Solution Explorer/EchoClient pane, add a reference to System.ServiceModel (produced in step 5).

7. Add the file EchoClientProxy.cs to the client program.

8. Right-click EchoClient, click Add, and then click Existing Item. Windows Explorer appears.

9. Browse to the folder where the file EchoClientProxy.cs is located, and select it.

10. Click OK.

11. Add the following code to the file Program.cs:, and then compile the client program and run it:

|using System; |

|using System.Collections.Generic; |

|using System.Text; |

|using System.ServiceModel; |

|using System.Threading; |

|using Microsoft.Hpc.Scheduler.Session; |

| |

|namespace EchoClient |

|{ |

|    class Program |

|    { |

|        static void Main(string[] args) |

|        { |

|            string scheduler = "localhost"; |

|            string serviceName = "EchoService"; |

| |

|            if (args.Length > 0) |

|            { |

|                scheduler = args[0]; |

|                if (args.Length > 1) |

|                { |

|                    serviceName = args[1]; |

|                } |

|            } |

| |

|// Create a session object that specifies the head node |

|// to which to connect |

|//and the name of the WCF service to use. |

|// This example uses the default start information for a |

|// session. |

|            SessionStartInfo info = new SessionStartInfo(scheduler, serviceName); |

|            info.ResourceUnitType = Microsoft.Hpc.Scheduler.Properties.JobUnitType.Node;            |

|info.MinimumUnits = 1; |

|            info.MaximumUnits = 4; |

|           |

|            Console.WriteLine("Creating a session..."); |

|            // Create the session by calling the factory method |

|            using (Session session = Session.CreateSession(info)) |

|            { |

|                Console.WriteLine("Session's Endpoint Reference:{0}", session.EndpointReference.ToString()); |

|                |

|// Binds session to the client proxy using NetTcp |

|// binding (specify only NetTcp binding). The |

|// security mode must be Transport and you cannot |

|// enable reliable sessions. |

|                EchoServiceClient client = new EchoServiceClient(new NetTcpBinding(SecurityMode.Transport, false), |

|session.EndpointReference); |

| |

|                AsyncResultCount = 100; |

| |

|                for (int i = 0; i < 100; i++) |

|// EchoCallBack is defined in EchoClientProxy.cs. |

|                { |

|                    // This call will not block, |

|// as results becomes available |

|// the EchoCallBack method will be invoked |

|                    client.BeginEcho("hello world", EchoCallback, new RequestState(client, i)); |

|                } |

|                AsyncResultsDone.WaitOne(); |

| |

|                client.Close(); |

|                Console.WriteLine("Please enter any key to continue..."); |

|                Console.ReadLine(); |

|            } |

|        } |

| |

|        static int AsyncResultCount = 0; |

|        static AutoResetEvent AsyncResultsDone = new AutoResetEvent(false); |

| |

|        // Encapsulates the context of the function callback |

|        class RequestState |

|        { |

|            int input; |

|            EchoServiceClient client; |

| |

|            public RequestState(EchoServiceClient client, int input) |

|            { |

|                this.client = client; |

|                this.input = input; |

|            } |

| |

|            public int Input |

|            { |

|                get { return input; } |

|            } |

| |

|            public string GetResult(IAsyncResult result) |

|            { |

|                return client.EndEcho(result); |

|            } |

|        } |

| |

|        static void EchoCallback(IAsyncResult result) |

|        { |

|            RequestState state = result.AsyncState as RequestState; |

| |

|            Console.WriteLine("Response({0}) = {1}", state.Input, state.GetResult(result)); |

| |

| |

|            if (Interlocked.Decrement(ref AsyncResultCount) |

| |

| cluscfg setenvs CCP_SERVICEREGISTRATION_PATH=\\filer\serviceregistration |

Maintaining Multiple Versions of a Service

The goal of the service deployment is to enable the XCOPY style of deployment. For each version of the service, a new service registration file needs to be created. The client that uses the new version of the service needs to use a different service name when creating a session. For example, when moving to service version 2, the client application will be changed as follows:

|SessionStartInfo info = new SessionStartInfo(“headnode”, “serviceV2”); |

|info.ResourceUnitType = Microsoft.Hpc.Scheduler.Properties.JobUnitType.Node;            |

|// Create the session by calling the factory method |

|Session session = Session.CreateSession(info) |

Running the SOA Application: Architectural Considerations

The underlying architecture for supporting the SOA programming model and the general steps for running the SOA application are shown in Figure 3.

[pic]

Figure 3 Interactive sessions through the WCF

The head node enables an administrator to monitor job status, view service usage reports, and view application logs. The compute nodes let the administrator view service performance counters, compute node health, and event logs.

At the back-end, the WCF broker node virtualizes the service endpoints, balances requests, collects responses, and grows/shrinks the service pool. The compute nodes track service usage, run the service as the user, restart the service upon failure, and write the tracing.

Running the SOA Application

The following steps can be used to run the SOA application on a Windows HPC Server 2008 cluster:

1. The SOA client application initiates a session with the Job Scheduler.

2. The Job Scheduler allocates the compute nodes and starts the service instances (which load the service DLL files) on those nodes through the node manager. Service instances are responsible for hosting endpoints, which are registered on compute nodes. The Job Scheduler allocates a broker node to launch the WCF broker job, using the round-robin strategy when selecting a broker node. At startup, the broker job publishes its endpoint reference by setting the session’s EPR property. The number of broker and service instance processes depends on the session’s resource requirements, node availability, and workload conditions. These requirements are specified by the client application or by the pre-configured administrative scheduler templates, which are customized according to the dependent resource requirements for the usage scenario.

3. Client retrieves the broker node’s EPR from Job Scheduler.

4. Client sends requests to the broker node.

5. The broker node routes and load-balances service requests between the client and service instance. Broker nodes also assist the scheduler service with managing service instance lifetimes and the grow/shrink policies for cluster resources.

6. The broker node forwards the responses received from the service instances back to the client application.

The SOA component roles are summarized in Table 7.

Table 7 SOA Component Roles

|Components |Roles |Descriptions |

|WCF Broker |Request forwarder |Stores and forwards request/response messages between client |

| | |application and service instances. |

|Service Instance |Service |Performs computation. |

|Job Scheduler |Resource allocator |Allocates resources to sessions. |

|Node Manager |Job nodal agent/authorization service|Starts the job on the node and authorizes the service. |

When the SOA client application creates a session, the session API creates two jobs: a WCF broker job (started in the broker nodes) and a service job (started on the allocated compute nodes).

Table 8 Broker and Service Jobs in an SOA Session

|Jobs |Programs |Execution Nodes |

|WCF Broker Job: one task |HpcWcfBroker |Broker nodes |

|Service Job: as many tasks as |HpcServiceHost |Compute nodes |

|there are allocated units | | |

The number of service instances can change during processing, according to the dynamic workload condition of the cluster. As the job is running, the administrator can use the Windows HPC Server 2008 Administrator Console to monitor the heat map (provides an overview of system utilization) of the broker and compute nodes, and use the job manager to monitor the progress and resource usage of the session job. The resource usage of services is logged so that usage reports based on users, projects, or service names can be created.

Recovering from a Node Failure

Occasionally, an HPC cluster can experience the failure of a compute node or a broker node.

Compute Node Failure

If a compute node fails, the outstanding requests sent to the nodes will be re-routed to the remaining service instances. To restore the processing capacity, the WCF broker node requests that Job Scheduler start a new service instance; the Job Scheduler then determines whether new resources should be allocated to this session based on the available resources, the relative priority of the session compared to other pending sessions, and any other running jobs or sessions. If the request is granted, the new service is started and added to the WCF broker node’s service instance pool.

WCF Broker Node Failure

When the WCF broker node fails, the processing disruption is more severe. There are two ways to recover from the WCF broker node failure: server-side initiated recovery or client-side initiated recovery.

• Server side. For a server-side initiated recovery, the WCF broker node must provide transactional semantics for the message exchange between the client application, the broker node, and the services. A server-side initiated recovery negatively impacts performance and adds management complexity for the persistent storage.

• Client side. For a recovery initiated by the client side, the client side must keep track of all unfulfilled messages; the client application re-establishes the session and resends the outstanding messages. Because recovery initiated by the client application does not require transactional semantics or central storage for message persistence, it is more efficient and adds no extra management overhead. This type of recovery is straightforward for an interactive scenario.

The following code shows how the client application can recover from a broker node failure. The client application uses a queue to track the unfulfilled requests. Initially the queue contains all the requests, and the client application retrieves the requests and sends them asynchronously. When a CommunicationException occurs, the client application re-queues the messages into the unfulfilled request queue, re-creates a session, and resumes from where the client left off. The requests can all be sent in spite of broker node failures—thereby achieving reliable message delivery.

This sample code also shows the use of the Anonymous Delegate. The Anonymous Delegate serves as an AsyncCallback function, and provides more concise code by capturing the free variables used in the context; this eliminates the need to create a context class (such as the RequestState class) to stash the variables required for the callback function. It also makes the thread synchronization performed through the ManualResetEvent more readable:

|using System; |

|using System.Collections.Generic; |

|using System.Linq; |

|using System.Text; |

|using LongRunningSvcClient.ServiceReference1; |

|using System.ServiceModel; |

|using System.Threading; |

|using System.Diagnostics; |

|using Microsoft.Hpc.Scheduler.Session; |

| |

|namespace LongRunningSvcClient |

|{ |

|class Program |

|{ |

|static Semaphore outstandingRequests = null; |

| |

|static void Main(string[] args) |

|{ |

|EndpointAddress epr = null; |

|bool createSession = true; |

|int numServiceInstances = 1; |

|int maxOutstandingRequests = 10; |

| |

|if (args.Length > 0) |

|{ |

|numServiceInstances = int.Parse(args[0]); |

|} |

| |

|if (args.Length > 1) |

|{ |

|maxOutstandingRequests = int.Parse(args[1]); |

|} |

| |

| |

|Queue unfulfilled = new Queue(); |

| |

|for (int i = 0; i < 40000; i++) |

|{ |

|unfulfilled.Enqueue(i); |

|} |

| |

| |

|// |

|// Loop until all the service calls are completed |

|// |

|Stopwatch timer = Stopwatch.StartNew(); |

|long start = 0; |

| |

|ManualResetEvent finishedEvt = new ManualResetEvent(false); |

|for (;;) |

|{ |

|int cnt = unfulfilled.Count; |

|finishedEvt.Reset(); |

|Session session = null; |

| |

|if (createSession == true) |

|{ |

|Console.WriteLine("Creating session..."); |

|session = CreateSession(numServiceInstances); |

| |

|if (session == null) |

|return; |

| |

|epr = session.EndpointReference; |

|createSession = false; |

|} |

| |

|// Create Client Proxy |

|Service1Client client = new Service1Client( |

|new NetTcpBinding(SecurityMode.Transport, false), |

|epr); |

| |

|client.InnerChannel.OperationTimeout = new TimeSpan(1, 0, 0, 0); |

| |

|Console.WriteLine("Proxy created EndpointReference = {0}", epr.ToString()); |

|bool brokerConnectionBroken = false; |

| |

|outstandingRequests = new Semaphore(maxOutstandingRequests, maxOutstandingRequests); |

|start = timer.ElapsedMilliseconds; |

| |

|while (unfulfilled.Count != 0) |

|{ |

|int n = unfulfilled.Dequeue(); |

| |

|try |

|{ |

|// Keep the outstanding requests to within [0, maxOutstandingRequests] |

|outstandingRequests.WaitOne(); |

|client.BeginSquare( |

|n, delegate(IAsyncResult result) |

|{ |

|try |

|{ |

|int reply = client.EndSquare(result); |

|// Console.WriteLine("Square({0})={1}", result.AsyncState, reply); |

|} |

|catch (CommunicationException) |

|{ |

|unfulfilled.Enqueue((int)result.AsyncState); |

|brokerConnectionBroken = true; |

|} |

|catch (TimeoutException) |

|{ |

|unfulfilled.Enqueue((int)result.AsyncState); |

|} |

| |

|Interlocked.Decrement(ref cnt); |

|if (cnt == 0) |

|{ |

|finishedEvt.Set(); |

|} |

| |

|outstandingRequests.Release(); |

|}, |

|n); // callback context |

| |

|} |

|catch (CommunicationException) |

|{ |

|brokerConnectionBroken = true; |

|finishedEvt.Set(); |

|break; |

|} |

|} |

|finishedEvt.WaitOne(); |

| |

|if (unfulfilled.Count == 0) |

|break; |

| |

|if (brokerConnectionBroken == true) |

|{ |

|session.Dispose(); |

|createSession = true; |

|} |

|} |

|timer.Stop(); |

|long end = timer.ElapsedMilliseconds; |

| |

|Console.WriteLine("throughput is {0}", 40000 / ((end-start) / 1000.0)); |

| |

|Console.WriteLine("Please enter any key to continue..."); |

|Console.ReadLine(); |

| |

|} |

| |

|static Session CreateSession(int numServiceInstances) |

|{ |

|SessionStartInfo startInfo = new SessionStartInfo("r25-1183d1002", |

|"SquareService1_0"); |

| |

|#region resource requirements |

|startInfo.ResourceUnitType = Microsoft.Hpc.Scheduler.Properties.JobUnitType.Core; |

|startInfo.MinimumUnits = 1; |

|startInfo.MaximumUnits = numServiceInstances; |

|startInfo.Priority = Microsoft.Hpc.Scheduler.Properties.JobPriority.AboveNormal; |

|#endregion |

| |

|Session session = null; |

| |

|try |

|{ |

|session = Session.CreateSession(startInfo); |

|} |

|catch (Exception ex) |

|{ |

|Console.WriteLine(ex.Message); |

|if (ex.InnerException != null) |

|Console.WriteLine(ex.InnerException.Message); |

| |

|return null; |

|} |

| |

|return session; |

|} |

|} |

|} |

In a batch scenario, however, there are additional considerations—for example, the client application, which must be kept highly available, may fail after the session has been created. One solution is to submit the client application as a job. The job then restarts if the node running the job should fail, maintaining client application availability. For this scenario, the client application must be able to maintain its checkpoint (when restarted, it must resume from where it left off). To ensure that the client application is fungible (able to run on any compute node), the checkpoint storage (for example, a shared file system or message queuing system) should be accessible from all compute nodes.

Security

Since HPC clusters are becoming mission critical, it is imperative that the underlying processing infrastructure is designed to deliver security.

The service broker supports the standard, interoperable transport, HTTP, and a more efficient transport, TCP. This enables client applications running on third-party platforms to invoke WCF services and lets native Windows clients get the best performance.

Table 9 details the security approaches taken by the SOA system to authenticate and authorize user requests for TCP and HTTP bindings.

Table 9 Security Approaches

|Bindings |Security Approaches |

|Net.TCP |Service broker establishes endpoints on NetTcpBinding with Transport security. |

| |Clients are authenticated using Windows integrated security (Kerberos or NTLM). Messages are signed and |

| |encrypted |

| |Service broker authorizes client applications based on their Windows identity. |

|HTTP |Service broker establishes endpoints on BasicHttpBinding with TransportWithMessageCredential security. |

| |Traffic is secured by HTTPS. |

| |Broker authenticates client applications by their user name and password passed in message headers. |

| |Broker authorizes client applications based on their Windows identity. |

To turn off security, set the SessionStartInfo.Secure property to false.

Figure 4 shows the security model for SOA.

[pic]

Figure 4 Security model for SOA

Sharable Sessions

By default, each client application initiates a new broker on the broker node; this is suitable for applications that require dedicated compute resources for mission-critical, deadline-sensitive workloads. Each service typically primes itself upon startup (placing the data into memory); this helps to ensure a fast response time.

However, there are scenarios where multiple users run low-compute, but data-intensive, applications that require each service request to access a wide range of domain data. Having each client application create its own copy of the services at startup can incur a high startup time and can be cost prohibitive. Sharable sessions provide a solution. The Job Scheduler API lets the job property of a session be queried by jobid or jobname; this can then be shared among the client applications, enabling a session created by one client applications to be used by other client applications. The following figure shows the components of a shared session. The main steps are:

1. Creating a shared session (for example, mysharedsession).

2. Starting the broker on the WCF broker node and the service instances on the compute nodes.

3. Getting the EPR by the jobname (in the example, mysharedsession).

Figure 5 shows the steps and the architecture for shared sessions.

[pic]

Figure 5 Shared sessions

The following code can be used by a producer client application to create a sharable session:

SessionStartInfo startInfo = new SessionStartInfo(“HeadNode", “MyService");

// create a sharable session

startInfo.ShareSession = true;

Session session = Session.CreateSession(startInfo);

// Write the jobId to the output

Console.WriteLine(“Broker Job Id is {0}”, session.BrokerJob.Id);

The following code can be used by the consumer client applications:

|IScheduler sched = new Scheduler(); |

|sched.Connect("HeadNode"); |

|ISchedulerJob job = sched.OpenJob(jobId); |

| |

|// Get the Endpoint Reference of the Broker |

|EndpointAddress epr = new EndpointAddress(job.EndpointAddresses[0]); |

|// Create a proxy out of the client |

|Service1Client client = new Service1Client(new NetTcpBinding(SecurityMode.None, false), epr); |

Service Instance Resourcing Model

The Job Scheduler allocates the compute nodes and starts the service instances, which are responsible for hosting endpoints registered on the compute nodes. The Service Instance Resourcing model defines how service instances are mapped to computing resources. There are three Service Instance Resourcing models, as shown in Table 10.

Table 10 Service Instance Resource Models

|Resourcing Model |Description |

|One service process per processor |Used to host services that are linked with non-thread safe libraries. |

|One service process per node |Multithreaded services. |

|One service process per socket |Single threaded services that are memory-bus intensive. |

Table 11 shows the details for each of the resourcing models.

Table 11 Resource Model Details

|Resourcing Model |Job Scheduling Type |Example |

|One service process per |SessionStartInfo.ResourceUnitType = Core |C++ analytics services in |

|processor | |capital market firms. |

|One service process per |SessionStartInfo.ResourceUnitType = Node |Service code that uses |

|node | |multiple processors on a |

| | |given node. |

|One service process per |SessionStartInfo.ResourceUnitType = Socket |Memory-intensive |

|socket | |calculation services. |

How Broker Dispatches Requests to Service Instances

The number of batched messages the broker node sends to the services is based on service resource unit type and the service throttling behavior.

Table 12 Broker and Service Request Dispatching

|Resourcing Model |Number of Requests Broker Sends to the Service in One Batch |

|Core-wide |1 |

|Node-wide |Number of cores on the node |

|Socket-wide |Number of cores on the socket |

To override the default behavior, configure the ServiceThrottlingBehavior section of your service.dll.config file to specify the maximum concurrent calls a service can take. For example, if you are using the Parallel Extension to write a service and you want to override the default behavior of the node-wide service instance to only receive one request at a time, you can specify the following service behavior in the service.dll.config:

| |

| |

| |

| |

| |

The broker will send use the maxConcurrentCall as the capacity of the service. This lets the administrator or software developer to use a standard WCF setting to fine tune the broker node’s dispatching algorithm to fit the processing capacity of the service.

Broker Configuration Parameters

The following parameters govern the behavior of the broker:

Table 13 Broker Configuration Parameters

|Parameters |Descriptions |Defaults |

|loadSamplingInterval |Service load sampling interval in milliseconds |1,000 |

|allocationAdjustInterval |Service resource allocation adjustment interval in |60,000 |

| |milliseconds | |

|allocationGrowLoadRatioThreshold |Let the load be the number of unfulfilled messages in the |Upper threshold: 125 |

|allocationShrinkLoadRatioThreshold |broker, and the load ratio be: 100 * load/(number of |Lower threshold: 75 |

| |service instances * number of cores per instance) | |

| |The processing capacity is considered appropriate if: | |

| |(allocationShrinkLoadRatioThreshold) < (load ratio) < | |

| |(allocationShrinkGrowThreshold) | |

| |The broker will grow the allocation if: | |

| |(load ratio) > (allocationGrowLoadRatioThreshold), | |

| |and will shrink the allocation if the (load ratio) < | |

| |(allocationShrinkLoadRatioThreshold) | |

|clientConnectionTimeout |After a session is created, if no client application is |300,000 |

| |connected within this timeout period, the session will be | |

| |closed (see Session Life Cycle Model for details). | |

| |Unit: millisecond | |

|clientIdleTimeout |After a client application connects to a session, if the |300,000 |

| |client application does not send messages within this | |

| |timeout period, the connection will be closed by the | |

| |broker (see Session Life Cycle Model for details). | |

| |Unit: millisecond | |

|sessionIdleTimeout |When all the client applications are idle (timed out), if |0 |

| |no more client applications are connected within this | |

| |timeout period, the session will be closed (see Session | |

| |Life Cycle Model for details). | |

| |Unit: millisecond | |

|statusUpdateInterval |The timer interval for the broker to publish service stats|15,000 |

| |to the job. | |

| |Unit: millisecond | |

|messageThrottleStartThreshold |Broker stops receiving request messages from the client if|Start threshold: 5120 |

|messageThrottleStopThreshold |the number of queued messages exceeds the |Stop threshold: 3840 |

| |messageThrottleStartThreshold and accepts request messages| |

| |if the number of queued messages goes below the | |

| |messageThrottleStopThreshold. | |

Session Life Cycle Model

To understand how clientConnectionTimeout, clientIdleTimeout, and sessionTimeout work, it is helpful to understand how the broker manages the session life-cycle mode.

[pic]

Figure 6 Broker’s session life-cycle model

Figure 6 shows the life-cycle model of a session. After a session is created, it goes through a busy state, an idle state, and ends up in a closed state. If no client application connects within the clientConnectionTimeout period, the session will be closed. When a session is in the busy state, if all the client applications are idle (no messages are sent for over the specified clientIdleTimeout seconds), the client application is closed. If all the client applications disconnect, the session is in the idle state; if no client application connects to an idled session over the sessionIdleTimeout period, the session is closed.

These broker configuration parameters can be controlled at two levels:

1. Broker node level

2. Session level

Broker Node Level Settings

To control the broker settings at the per-node level, specify the monitor element of the HpcWcfBroker.exe.config file in the %CCP_HOME%\bin folder as follows:

| |

| |

| |

| |

| |

| |

| |

Session Level Settings

The broker node settings can be overridden by the session level settings from the client application code using the session API. For example, the following code sets the client idle timeout to be 1000 seconds:

SessionStartInfo startInfo = new SessionStartInfo("headnode", "servicename");

startInfo.BrokerSettings.clientIdleTimeout = 1000000;

Session = session.CreateSession(startInfo);

Monitoring and Managing the SOA Infrastructure

All IT systems need to be maintained efficiently to maintain a high return on investment (ROI) over time. With Windows HPC Server 2008, administrators can effectively monitor user sessions via the Node Management and Job Management Wunderbar in the Administration Console. With Windows HPC Server 2008, an administrator can configure nodes, monitor broker nodes, manage sessions, and troubleshoot run-time problems.

Monitoring the Cluster

Clicking the Node Management in the Navigation pane opens the Node Management view in the Administration Console. There are two basic views available in the Node Management center pane:

• List View—Shows node properties and resources in a standard list format.

• Heat Map View—Provides an at-a-glance view of the node health metrics in a heat map format.

For a quick overview of the overall health and status of all nodes (or for a subset of nodes based on the filtering properties), display the nodes as a metrics heat map, as shown in the following figure.

[pic]

Figure 7 Node state heat map

From the Heat Map view, you can quickly switch to List view, or take action on the node directly. The list of actions available for a selected node is populated in the Actions pane on the right of the Administration Console, or on the shortcut menu. Double-clicking on a selected node opens a dialog box that provides details about the node.

Advanced Monitoring with System Center Operations Manager

Windows HPC Server 2008 provides basic built-in monitoring. It also includes a custom Microsoft® System Center Operations Manager Management Pack (to be made available when the product is released to manufacturing [RTM]) that supports advanced monitoring of Windows HPC Server 2008 clusters within the familiar and extensive System Center enterprise management environment. With System Center Operation Manager, administrators can monitor and aggregate events, provide e-mail alerts, provide for application monitoring, and perform other services.

Enabling a Broker Node

To run SOA applications, it is necessary to enable a broker node. The head node can be configured as a WCF broker node; however, when a node is configured as a WCF broker, it cannot also be a compute node.

To verify that the broker role is enabled on a node, launch the Administration Console. If enabled, WcfBrokerNodes is listed under Groups, as shown in the following figure:

[pic]

Figure 8 Verifying that broker node is enabled

If the broker node role is not enabled, a WCF broker node can be configured using the following steps:

1. Click Node Management in the Navigation Pane.

2. Navigate to HeadNodes, and then select By Group.

3. On the Actions pane, select Take Offline.

4. Verify in the central Results pane that the state of the node changes to Offline.

5. On the Actions pane, select Change Role.

6. The Change Additional Role dialog is displayed. Choose Select Router Node, and then click OK.

[pic]

Figure 9 Change Additional Role dialog box

7. On the Action pane, click Take Online.

8. Verify that RouterNode appears under the head node’s Groups column.

Monitoring a Session

When an SOA client application creates a session, the session API creates two jobs: a WCF broker job and a service job. The WCF broker job can only be started on the broker nodes, and the service job can only be started on the compute nodes. If a session uses the core as the allocation type, then there are as many services as there are cores allocated to the service job.

Monitoring the WCF Broker Node

WCF broker nodes host the critical SOA infrastructure that serves as the intermediary between the client application and the service. As such, the performance of an SOA application is contingent upon their health.

To let an administrator effectively monitor the broker nodes, Windows HPC Server 2008 provides built-in performance counters that address several areas: system, network, and WCF call rates. These performance counters can be viewed from the heat map of the Administration Console and can make it possible for the administrator to determine whether the system is in a critical health condition.

The Heat Map view is shown in the following figure.

[pic]

Figure 10 Heat map

There is also a List view, as shown in the following figure.

[pic]

Figure 11 Monitoring a broker node

By viewing the memory usage of the nodes, for example, an administrator can determine whether certain nodes are reaching their memory threshold, rendering them unfit for new jobs. To ensure that no new jobs start on these nodes, the administrator can take the node offline until the node is below the memory threshold.

Monitoring a Service Job

To monitor a service job, select Active Jobs from the Job Management pane, and then click on a running job in the Active Jobs pane. Details of the Job Properties are provided, as shown in the following figure.

[pic]

Figure 12 Monitoring a service job

Reporting

To view reports of resource usages by a service, select Job Resource Usage from the Charts and Reports pane. Select the Group By: Service. Details of the Service Resource Usage Reports are provided, as shown in the following figure.

[pic]

Figure 13 Service resource usage report

Troubleshooting and Diagnosing SOA Application Runtime Errors

SOA applications are distributed in nature, and they can present practical challenges for troubleshooting. Source of errors can include application service errors and system configuration issues.

Because services are running on remote compute nodes that are commonly shielded behind a firewall and behind the head node, error conditions are hard to access programmatically from the client application. Windows HPC Server 2008 provides exception propagation, making it possible for service faults to be caught and processed by the client application in a transparent fashion.

To enable the exception propagation, use the attribute […] at the service interface declaration:

    [ServiceBehavior(IncludeExceptionDetailInFaults = true)]

    public class EchoService : IEchoService

    {

        #region IEchoService Members

Because services can be deployed in an out-of-band fashion and because multiple bindings and topologies for the broker and compute nodes are supported in Windows HPC Server 2008, services may be misdeployed and the system may be misconfigured, resulting in potential application runtime failure.

The application exceptions cannot provide detailed diagnostics information, because the client application often does not have privileges to access system information. Windows HPC Server 2008, therefore, provides two diagnostic tools to let the administrator effectively troubleshoot the system:

• Service repository test

• Service model test

Service Repository Test

With the service repository test, an administrator can determine which services are installed on particular nodes. The test report contains two sections: a summary section and a details section.

The summary section displays a table of services and their registered nodes. Using this section, the administrator can verify whether a service has been successfully deployed on computers that are accessible to users.

The details section shows the path, the service, and the contract type of each service for each node. This effectively serves as a post-deployment validation, as shown in the following figure.

[pic]

Figure 14 Servicer repository test results

Service Model Test

The service model test checks the system configuration of and run-time performance of the SOA infrastructure so that the administrator can ensure that the system is ready to run SOA workloads and can determine if the system has any bottlenecks.

To run the test, perform the following steps:

1. Navigate to the page and select Service Model Test.

2. Click Run test. A node selection dialog appears.

3. Select the nodes to run the test and click OK.

4. Navigate to the Show Result page, as shown in the following figure.

[pic]

Figure 15 Broker service test

Advanced Programming Topics

Throttling Requests

Given the asynchronous nature of the client programming model, an application can potentially be memory-demanding if the size of the data or the number of messages is very large. To control the memory footprint of these applications, the client application can throttle the requests by sending batches of requests at a time. In doing so, both the client-side and the broker-side memory usage can be made efficient and effective.

For throttling to work, the client application uses a semaphore to control the number of outstanding requests that the client application issues. In the following sample code, the client application keeps the outstanding request to 10—the sending thread blocks further requests if there are 10 requests outstanding. The sending thread resumes sending when it receives a signal from the receiving threads:

/ Create a semaphore that can satisfy up to 10 concurrent requests. Use an

// initial count of 10 so that initially the sending thread (main program) can

// send up to 10 requests.

outstandingRequests = new Semaphore(10, 10);

SessionStartInfo info = new SessionStartInfo(scheduler, serviceName);

using (Session session = Session.CreateSession(info))

{

int i;

NetTcpBinding binding = new NetTcpBinding(SecurityMode.Transport, false);

EchoSvcClient client = new EchoSvcClient(binding, session.EndpointReference);

// set the timeout to 1 day

client.InnerChannel.OperationTimeout = new TimeSpan(1, 0, 0, 0);

AsyncResultCount = 100;

for (i = 0; i < 100; i++)

{

// Enters the semaphore. This call will

// block is there are 10 outstanding

// requests, until the receiving thread

// signals it from the callback function

// EchoCallBack()

outstandingRequests.WaitOne();

client.BeginEcho("hello world", EchoCallback,

new RequestState(client, i));

}

AsyncResultsDone.WaitOne();

client.Close();

}

}

// receiving thread entry point

static void EchoCallback(IAsyncResult result)

{

RequestState state = result.AsyncState as RequestState;

if (Interlocked.Decrement(ref AsyncResultCount)

     

        ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download