Download.microsoft.com



Leveraging Windows Server AppFabric Caching for Common DataMicrosoft CorporationPublished: January 20115534025409575AbstractThis paper provides a technical overview of how to integrate Windows HPC Server 2008 R2 with Windows AppFabric Caching to support distributed workloads which require large data sets to be shared among the compute nodes on an HPC cluster. Specifically, this addresses workloads that conform to the service-oriented architecture (SOA) model of execution and use the SOA programming model for distributed computation.This paper describes how to add cache hosts to your HPC cluster and includes code samples of SOA clients and services that work with cached data. This paper also provides performance data that you can use as a guideline for setting up and monitoring cache hosts and developing custom applications.Copyright InformationThis White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS plying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, email address, logo, person, place or event is intended or should be inferred.? 2010 Microsoft Corporation. All rights reserved.Microsoft, Windows, Windows Server, and AppFabric are registered trademarks of Microsoft Corporation in the United States and/or other countries.The names of actual companies and products mentioned herein may be the trademarks of their respective owners.Contents TOC \o "1-3" \h \z \u Introduction PAGEREF _Toc283887232 \h 4AppFabric Caching architecture PAGEREF _Toc283887233 \h 4Installation and configuration PAGEREF _Toc283887234 \h 5Configure the cache hosts PAGEREF _Toc283887235 \h 5Configure the compute nodes and clients PAGEREF _Toc283887236 \h 7Additional considerations for AppFabric configuration PAGEREF _Toc283887237 \h 7High Availability PAGEREF _Toc283887238 \h 7Security Model PAGEREF _Toc283887239 \h 7Manage the cache cluster with HPC Cluster Manager PAGEREF _Toc283887240 \h 7Working with cached data PAGEREF _Toc283887241 \h 7Code Samples PAGEREF _Toc283887242 \h 9Data caching client PAGEREF _Toc283887243 \h 9Code Sample: Data Caching Service PAGEREF _Toc283887244 \h 11Additional considerations for application development PAGEREF _Toc283887245 \h 14Performance analysis PAGEREF _Toc283887246 \h 15Baseline – 10 cache hosts, Gigabit Ethernet PAGEREF _Toc283887247 \h 16Larger cache cluster – 20 cache hosts, Gigabit Ethernet PAGEREF _Toc283887248 \h 16Faster network – 10 cache hosts, 10 Gigabit IB PAGEREF _Toc283887249 \h 17Compression – 10 cache hosts, Gigabit Ethernet PAGEREF _Toc283887250 \h 18Core and socket allocation PAGEREF _Toc283887251 \h 19Multiple clients PAGEREF _Toc283887252 \h 21Best practices PAGEREF _Toc283887253 \h 21Infrastructure PAGEREF _Toc283887254 \h 21Application PAGEREF _Toc283887255 \h 22Introduction This paper provides a technical overview of how to integrate Windows HPC Server 2008 R2 with Windows AppFabric Caching to support distributed workloads which require large data sets to be shared among the compute nodes on an HPC cluster. Specifically, this addresses workloads that conform to the service-oriented architecture (SOA) model of execution and use the HPC SOA programming model for distributed computation. Many SOA applications work with a large data set that is shared by all services. For example, a stock risk analysis program can run a large number of simulations against a set of historical stock market data. In this example, each SOA request can have different parameters governing the analysis, but all analyses use the same historical data.This common data must be communicated to the services that are running on the compute nodes. Because the data set is static, it is inefficient to transfer this data within each SOA request, of which there can be thousands. A good implementation option for distributing common data to the SOA services is to pre-stage the data into an in-memory cache that is hosted by Microsoft AppFabric Caching. The cached data can then be retrieved once per compute node instead of passing a copy of the data once per message.AppFabric Caching architectureWhile detailing the entire AppFabric Caching functionality is beyond the scope of this document, you must understand certain aspects to effectively install, configure, and use an AppFabric Caching cache within a SOA solution.The cache is hosted on a group of servers that is referred to as a cache cluster. The cache cluster can be external to the HPC cluster and managed independently, or added to the HPC cluster as offline compute nodes and managed by using HPC Cluster Manager. After these servers have been configured properly, they are abstracted through the AppFabric Caching API and accessed as a single resource. Objects placed in the cache will be hosted in the memory of one of the servers in the cache cluster and requests for these objects will be appropriately routed to the server hosting each object.The diagram below illustrates how a group of cache hosts can be used as part of an HPC Cluster. The four steps in the diagram describe how the cached data and SOA messages interact.Installation and configurationThe following steps walk through the installation and configuration of the AppFabric cache cluster and all cache clients (HPC compute nodes and application clients). Note that all of these computers must support AppFabric caching installation. The supported OS versions are as follows:Windows XP SP3Windows Vista SP1 or laterWindows 7Windows Server 2003 SP2Windows Server 2008 (except for Core or HPC editions)Windows Server 2008 R2 (except for Core or HPC editions)The HPC edition of Windows Server does not support AppFabric caching because it does not include the Application Server role, which is required. To use AppFabric caching in an HPC cluster, the HPC Pack must be installed on a Windows Server Standard, Foundation, Datacenter, or Enterprise edition operating system.Configure the cache hostsThe first step in using AppFabric Caching is to designate some servers to act as cache hosts. The cache host must be accessible to the client computers and the HPC cluster nodes (the same network requirements that WCF broker nodes have). You will configure one of the cache hosts to administer the cache. Data that is placed in the cache will be spread out across these servers. The following steps describe how to install an AppFabric cache across these servers. Note: If you added the cache hosts to your HPC cluster as offline compute nodes or broker nodes, you can use the clusrun command-line utility for concurrent installation and configuration. For more details at any stage, see the AppFabric resources on MSDN.Install .NET 4 on all serversDownload the Microsoft .NET Framework 4 Standalone Installer. This installer can be run as an administrator via clusrun with /q optionOn one serverThere is a minimum of one server required to create a cache cluster and at least one server must be configured with the cache administration role to enable maintenance and further configuration.Install Windows Server AppFabric with Caching Service, Caching Client, and Cache Administration rolesDownload the Windows Server AppFabric Standalone InstallerWindows Server 2008 R2 and Windows 7 should use version 6.1 of the installer while prior operating system versions should use version 6.0.You can use the installation wizard to install the required features (see Install AppFabric) or follow the Automated Installation Instructions using clusrunCreate a new Cache ClusterYou can use a script to create a new cache cluster (see the Automation Sample) or configure the cache cluster configuration wizard.On any additional serversEach additional server in the cache cluster adds to the pool of memory available to host the cache and enables the sharing of object hosting responsibilities. Additional servers also allows for redundancy.Install Windows Server AppFabric with Caching Service and Caching Client rolesUse links above to install and add to cache clusterLog into the server that has the Cache Administration feature installed and perform the following tasks:Start Windows Powershell Modules as an administrator Use the Start menu search box, type Powershell, and then in the search results click Windows PowerShell Modules.Run the following cmdlets:Use-cacheclusterStart-cacheclusterGet-cachehostIn the cmdlet output, verify that each server that you added to the cache cluster is listed and has a server status of “UP”.Grant access to any users that need access to the cache Grant-CacheAllowedClientAccountFor additional help with Powershell cmdlets, see the MSDN Resources or run Get-help *-cache*Configure the compute nodes and clientsInstall Windows Server AppFabric Caching ClientUse the Windows Server AppFabric installer and select only the Caching Client feature.Additional considerations for AppFabric configurationHigh AvailabilityAppFabric caching supports reliable, high-availability, caching by configuring the cache to host replicas of each object in the cache. The number and scope of replications can be configured on a per-cache basis. See the AppFabric Caching High Availability Documentation for details.Security ModelUser account access is configurable via the Powershell cmdlets noted in the Cache Host configuration ([Get/Grant/Revoke]-CacheAllowedClientAccount(s)). Transport security is used for securing the channel over the network. Security can also be turned off to allow any client account to access the cache.A good way to provide access to all cluster users is to add the HPCUsers account as an allowed client. This allows the cluster administrator to maintain an access list of allowed clients through the HPC Cluster Manager. Note: There is no model for allowing partial access to the cache. Users either have full access or no access. It is possible to build a more robust security layer over the AppFabric Cache APIs, but this is out of scope for this document.Manage the cache cluster with HPC Cluster ManagerAs mentioned above, you can use HPC Cluster Manager to monitor and manage the servers that host the AppFabric cache cluster. HPC Cluster Manager provides a remote command line invocation tool (clusrun) and centralized remote desktop access. In addition, the cache hosts can be monitored through the cluster heat map where many metrics such as CPU, memory, and network usage can be observed in a unified interface.The recommended way to manage cache hosts within HPC Cluster Manager is to install configure them as Offline compute nodes or broker nodes. Nodes that are marked as Offline are not used to run jobs or sessions, but are still accessible on the cluster networks. This will prevent jobs from being scheduled on the dedicated cache hosts while still providing the administration and monitoring benefits of HPC Cluster Manager. You can use Online compute nodes, but any allocated jobs would contend with the cache hosting services, which can be very intensive under high load. This would result in substandard performance for both the cache and any job sharing the cache host node.Working with cached dataThe common data model for data caching essentially has two phases: One phase adds data in the cache and another phase retrieves data from the cache. The process that adds data to the cache can be the same process as the SOA client, or a separate process that distributes data to the HPC cluster. Likewise, the process that retrieves cached data can be the same process as the SOA service, or a separate process that runs on the compute node and retrieves data from the cache.The simplest model is to add data in the SOA client process, and to retrieve data in the SOA service process. In this model, the SOA client process performs the following steps:Load data into memory.Preprocess data. For example, structure the data into multiple objects (either logically or for object sizing) and compress the data before caching.Establish a connection to the cache and add the data.Establish a SOA session and send requests.The name of the cached data can be provided through message parameters or environment variables, or specified explicitly in the service code.Receive results.The SOA service performs the following steps:Establish connection to the cache and retrieve cached data once per service instance.The location and name of the cached objects may be supplied by the client or explicitly set in the service code. In the service code, this can be done in the constructor, during the first service operation, or split between both places.Post-process the data, if necessary. For example, decompress or aggregate the cached data before processing requests.Save the retrieved data in local memory and service each subsequent request using the local copy of the data.This model is not appropriate for all applications. Sometimes the user submitting the SOA job might not have access to the data locally. In this case, a separate client could be tasked with processing and caching the data so that it is available to the SOA service whenever a user submits a SOA job. An example of this would be a financial company which maintains a set of historical stock data that is uploaded to the cache once a day by an administrator rather than each user submitting jobs which use that data.Alternatively, some applications may want to retrieve the cached data once and keep it in a local store for any SOA Services to access without having to go to the cache multiple times. An example of this would be any SOA session that starts multiple service instances per compute node (with Core allocation for instance). These services can more efficiently retrieve data from a local service than from a remote service and copying the data only once per compute node reduces network contention on the cache hosts relative to each service retrieving the cached data.Code SamplesThe code samples in this section include a SOA client that caches data and then uses session environment variables to communicate the location of the data to the SOA service. The SOA service uses this information to retrieve the data.Note: An end to end sample that demonstrates how to use AppFabric caching is publically available online. This sample contains client, service, and support code with some advanced options for improving performance through the use of compression and intra-node data sharing. To download the code sample, see the data caching sample in the Microsoft Download Center.Data caching clientThe following code sample demonstrates how to establish a connection to a cache cluster and add a set of objects to a named cache. In this case, we add a large array divided into N pieces of constant size. We add some additional items to inform the service about the nature of the data. A retry loop provides a simple mechanism for fault tolerance on a congested network. private static bool PutDataInCache(){ // Connect to Cache Server - assuming default port selection. DataCacheServerEndpoint[] servers = new DataCacheServerEndpoint[1]; servers[0] = new DataCacheServerEndpoint(cacheNode, 22233); DataCacheFactoryConfiguration config = new DataCacheFactoryConfiguration(); config.Servers = servers; DataCache cacheClient = null; // Gain access to cache. Intermittent failures can be recoverable, // so retry in certain conditions. int retryCount = 5; do { try { // Get access to test cache. Follow included directions // for creating this cache. using (DataCacheFactory factory = new DataCacheFactory(config) { cacheClient = factory.GetCache(cacheName); cacheClient.CreateRegion(cacheRegion); // Put data from file into cache using (MemoryStream stream = new MemoryStream(fileContents)) { byte[] fileSegment = new byte[segmentSize]; for (int i = 0;; i++) { int bytes = stream.Read(fileSegment, 0, segmentSize); if (bytes == segmentSize) { cacheClient.Put( dataRoot + i, fileSegment, cacheRegion); } else if (bytes > 0) { // Only add portion that is actually data byte[] croppedFileSegment = new byte[bytes]; System.Buffer.BlockCopy( fileSegment, 0, croppedFileSegment, 0, bytes); cacheClient.Put( dataRoot + i, croppedFileSegment, cacheRegion); } else { // All data cached dataSegments = i; break; } cachedBytes += bytes; } } // Put configuration into cache at known locations cacheClient.Put(dataRoot + "SegmentSize", segmentSize, cacheRegion); cacheClient.Put(dataRoot + "Segments", dataSegments, cacheRegion); cacheClient.Put(dataRoot + "Bytes", totalBytes, cacheRegion); retryCount = -1; } } catch (DataCacheException dex) { if (dex.ErrorCode.Equals(DataCacheErrorCode.RetryLater) || dex.ErrorCode.Equals(DataCacheErrorCode.Timeout)) { // Retry if error is a timeout or explicit "RetryLater" Console.WriteLine("Warning: {0}. Retrying.", dex.Message); retryCount--; Thread.Sleep(1000); } else { // Fail in other circumstances Console.Error.WriteLine("Error: {0}", dex); retryCount = 0; } } } while (retryCount > 0); // End if failed to connect to cache and write data. if (retryCount == 0) { return false; } return true;}The following code creates a session and uses session environment variables to inform the service about where the data is located.// Add cache access information in environment variablesSessionStartInfo info = new SessionStartInfo(headnode, "DataCachingService");info.Environments.Add("CacheAccessNode", cacheNode);info.Environments.Add("CacheName", cacheName);info.Environments.Add("DataName", dataRoot);// Create a cluster session and pass the cache node to the servicesession = Session.CreateSession(info);Code Sample: Data Caching ServiceThe following set of code samples demonstrates how to retrieve and post-process the data that was added to a cache in the client code sample. The following code reads the environment variables that were set by the client to find out where the data is located.// Get name of node hosting the cachestring cacheAccessNode = Environment.GetEnvironmentVariable( "CacheAccessNode");if (string.IsNullOrEmpty(cacheAccessNode)){ string errorMsg = "CacheAccessNode environment variable must be set."; Console.Error.WriteLine(errorMsg); throw new InvalidOperationException(errorMsg);}// Get the name of the cache to attach tostring cacheName = Environment.GetEnvironmentVariable("CacheName");if (string.IsNullOrEmpty(cacheName)){ string errorMsg = "CacheName environment variable must be."; Console.Error.WriteLine(errorMsg); throw new InvalidOperationException(errorMsg);}// Get the name of the data to get from the cachestring dataRoot = Environment.GetEnvironmentVariable("DataName");if (string.IsNullOrEmpty(dataRoot)){ string errorMsg = "DataName environment variable must be set prior to starting a DataCachingService session."; Console.Error.WriteLine(errorMsg); throw new InvalidOperationException(errorMsg);}// Get the name of the cache region to usestring cacheRegion = Environment.GetEnvironmentVariable("RegionName");if (string.IsNullOrEmpty(cacheRegion)){ string errorMsg = "RegionName environment variable must be set prior to starting a DataCachingService session."; Console.Error.WriteLine(errorMsg); throw new InvalidOperationException(errorMsg);}The environment variable values can then be used to connect to the cache cluster.// Connect to Cache Server - assuming default port selection.DataCacheServerEndpoint[] servers = new DataCacheServerEndpoint[1];servers[0] = new DataCacheServerEndpoint(cacheAccessNode, 22233);DataCacheFactoryConfiguration config = new DataCacheFactoryConfiguration();config.Servers = servers;DataCacheException ex = null;DataCache cacheClient = null;DataCacheFactory factory = null;// Gain access to cache. // Intermittent failures can be recoverable, so retry in certain conditions.int retryCount = 5;do{ try { // Get access to test cache. factory = new DataCacheFactory(config); cacheClient = factory.GetCache(cacheName); retryCount = -1; } catch (DataCacheException dex) { if (dex.ErrorCode.Equals(DataCacheErrorCode.RetryLater) || dex.ErrorCode.Equals(DataCacheErrorCode.Timeout)) { // Retry if error is a timeout or explicit "RetryLater" error. Console.WriteLine("Warning: {0}. Retrying.", dex.Message); retryCount--; Thread.Sleep(1000); } else { // Fail in other circumstances Console.Error.WriteLine("Error: {0}", dex); retryCount = 0; } ex = dex; }}while (retryCount > 0);Finally, the data can be retrieved from the cache and the original array rebuilt for use by subsequent requests.try{ byte[] cachedData = null; // Attempt to access data and store it in memory for subsequent calls bool retry = false; do { try { // Get segment and compression information from cache // Get number of segments int segments = (int)cacheClient.Get(dataRoot + "Segments"); // Get segment size int segmentSize = (int)cacheClient.Get(dataRoot + "SegmentSize"); // Get number of bytes uncompressed int bytes = (int)cacheClient.Get(dataRoot + "Bytes"); // Get data element from cache cachedData = new byte[bytes]; for (int segment = 0; segment < segments; segment++) { byte[] cachedDataSegment = (byte[])cacheClient.Get( dataRoot + segment); System.Buffer.BlockCopy( cachedDataSegment, 0, cachedData, segment * segmentSize, cachedDataSegment.Length); } retry = false; } catch (Microsoft.ApplicationServer.Caching.DataCacheException dex) { // Record any errors and retry under certain conditions if (dex.ErrorCode.Equals(DataCacheErrorCode.RetryLater) || dex.ErrorCode.Equals(DataCacheErrorCode.Timeout)) { Console.WriteLine(dex.ToString()); retry = true; System.Threading.Thread.Sleep(1000); } else { Console.Error.WriteLine(dex.ToString()); throw new FaultException(dex.ToString()); } } catch (Exception dex) { Console.Error.WriteLine(dex.ToString()); throw new FaultException(dex.ToString()); } } while (retry);}finally{ // Clean up cache factory factory.Dispose();} Additional considerations for application developmentMultiple data sets within a sessionThe samples included above show how to cache common data on a per-session basis. There is a large amount of flexibility when writing SOA clients and services however, and alternative access patterns are supported. For example, one service may process multiple batches of requests where each batch uses different data. In this case, the SOA request could contain a data identifier which could be used to determine whether the data has already been retrieved from a cache.Data expiration and clean upAnother application-specific concern is cached data expiration and removal. AppFabric Caching primarily relies on a specified time-to-live parameter which can be set when each item is added to the cache. Additional details on this parameter and how AppFabric Caching handles cases like running out of memory can be found in the MSDN documentation. Applications may also handle removal manually using the ‘Remove’ command.Aliasing avoidanceTo avoid data name aliasing when multiple users are using the same cache, data elements should be uniquely named. Regions can also guarantee uniqueness, but data within a region resides only on a single cache host, which does not allow taking advantage of multiple cache hosts.Java interoperabilityAppFabric currently does not expose APIs outside of .NET. As such, support for Java SOA clients is out of scope for this document.Performance analysisTo provide some indication of the time required to distribute data to a set of compute nodes, some preliminary performance numbers have been gathered on a 200 Node Windows HPC Server 2008 R2 cluster composed of 8-core 1.86 GHz Intel Xeon processors with 8GB RAM per node. These numbers are intended to provide a guideline for expected performance, but will change based on environment and application-specific access patterns.Several variables are of particular interest when evaluating AppFabric caching for data distribution. Size of DataNumber of Cache HostsNumber of Compute NodesSpeed of NetworkWhether Compression UsedThe methodology for the subsequent tests to exercise these variables is to create a SOA job with a certain number of nodes allocated, add a certain size of data to the cache, and send messages to the cluster, forcing each node to retrieve the cached data. The messages each contains 1 KB of data to simulate a real world application and the service operations return immediately once the cached data is accessed. This end-to-end time is then compared to the time required to process the same messages without retrieving any cached data. This difference is the time required for all the compute nodes to retrieve the cached data.These tests were performed with node allocation to measure the time to fetch data into a single service host running on a compute node. Using multiple service hosts per compute node (core or socket allocation) results in a fixed penalty based on the number of service hosts per node and size of data rather than the cluster environment. This penalty is discussed after the node allocation results. The number of cache hosts and network are held constant for the duration of each test to allow effective comparison of their effects in comparison to a baseline. In this case, a set of 10 cache hosts with a 1 Gigabit Ethernet private network was selected as the baseline system.Baseline – 10 cache hosts, Gigabit EthernetThe number of compute nodes allocated to a job is specified on the X-axis. The Y-axis reports the time to access some cached data, the size of which varies for each trend line as shown in the legend. The 190 node data access time is shown numerically in the following table.Data Size16 MB64 MB256 MB1 GBData Access Time (Seconds)3.4512.9367.89284.78The time to access each size of data increases approximately linearly as the number of compute nodes increases with the exception of the 16 MB line which stays nearly flat. Increasing the size of data causes the slope of the data access time change to increase at approximately the same rate. This correlation points to a saturation of the network on the cache hosts providing the data to many requesting compute nodes, which is confirmed when monitoring network activities on the cache hosts.Larger cache cluster – 20 cache hosts, Gigabit EthernetOne way to reduce the load on the cache hosts is to increase their number. This will allow the cached data to be spread out among more hosts and therefore reduce the demands on each. The graph below is laid out the same as above, except that all measurements were taken when using 20 cache hosts rather than 10.Close examination reveals that the curves have both lowered in absolute time and decreased in slope. Comparing the 190 node time numerically to the baseline yields the following results.Data Size16 MB64 MB256 MB1 GBData Access Time (Seconds)3.628.7544.89203.66Improvement Over Baseline-5.07%32.27%33.89%28.49%The benefit to larger data sizes is very apparent, and validates the bottleneck in hosting resources observed in the baseline performance data. This result shows that adding hardware resources in the form of additional cache hosts can help spread out data access and improve performance when the network is a bottleneck. However, it’s also apparent that the absolute performance improvement is not in line with the increase in resources. A full 100% resource increase only improved access time by ~30% for large data sizes and actually hurt access time for small pieces of data where the added overhead of forwarding requests to more hosts and managing their behavior offset any gain in data spreading.Faster network – 10 cache hosts, 10 Gigabit IBAnother strategy for dealing with network saturation is to improve the speed of the network. Our test cluster has 10 Gigabit IB links connecting all the compute nodes and cache hosts and as such it was possible to transition to a 10 Gbps connection while keeping the rest of the system constant. The graph below again shows the same set of data points as the baseline.Again, this change produces a reduction in the time required to access the cached data. The numerical change for 190 nodes is shown in the following table.Data Size16 MB64 MB256 MB1 GBData Access Time (Seconds)2.478.2539.10173.14Improvement Over Baseline28.38%36.14%42.41%39.20%When improving the network performance overall, there is an improvement across the range of data sizes with even small pieces of data exhibiting improvement. Comparing this result to the use of 20 Cache Nodes over Gigabit Ethernet, the faster network outperforms the additional cache hosts. A change in network infrastructure may be more complex and costly than adding additional cache hosts and therefore it is important to balance the costs and benefits of each pression – 10 cache hosts, Gigabit EthernetDepending on the application, compression prior to caching may result in reduced data access time. The benefit will be in direct proportion to how compressible the data is. Spending the time to compress data from 1 GB to 950 MB is probably not worth the effort, while compressing 1 GB to 256 MB can be very valuable. To demonstrate the potential improvement, a real insurance customer’s common data was compressed to 27.7% of its original value. Performing compression on this data, caching the result, and decompressing the data upon retrieval resulted in the following behavior.This graph shows that the time to compress, cache, and decompress data when it can be compressed to 27.7% of its original size results in an impressive reduction in the time that is required to distribute the data into the services running on the compute nodes. Numerical comparisons for 190 nodes are in the following table.Data Size16 MB64 MB256 MB1 GBCompression/Decompression Time (Seconds)0.301.214.6718.94Total Data Access + C/D Time (Seconds)2.576.3024.4584.27Improvement Over Baseline25.46%51.24%63.98%70.41%It is apparent from these improvements that compression can provide a substantial reduction in the time required to distributed data, especially large quantities of data using AppFabric caching. The data access time improvement would look much different if the data were not amenable to compression, as the cost of compressing the data would still be paid, while the time required to access the data would remain constant.Core and socket allocationTo perform core or socket allocation without retrieving multiple copies of the data from the AppFabric Cache, a node preparation task retrieves the cached data and stores it in a local Windows service on each compute node. Each service host then queries the windows service for a copy of the data, which is communicated over a named-pipe WCF connection. This adds a fixed cost to the time required for all service hosts to share the data, which is dependent on the number of service hosts running on each node and how much data is being shared.The following table shows the time required for each service host to retrieve the data from the local service in seconds.Service Hosts on Each Compute Node2 48 Data Size16 MB1.35 s1.42 s1.64 s64 MB4.60 s5.08 s5.71 s256 MB19.96 s21.00 s23.81 s1 GB75.03 s80.66 s93.33 sThe data access time across the cluster when using core allocation allowing eight service hosts per compute node is shown below. The penalty to share data among multiple service hosts on a node can be high, but the cost of additional concurrent cache accesses that would be required if each service host retrieved data from the cache is much more severe. Extrapolating linearly from the baseline performance would suggest that 1520 service hosts each retrieving one gigabyte of data from the AppFabric cache would result in a data access time of approximately 1600 seconds, or 4.25 times slower than storing the data locally. This estimate is very conservative as the additional congestion on the cache hosts is likely to further delay response to cache requests.Multiple clientsTypically a cluster and cache are both shared among multiple clients to verify that this usage pattern follows the performance curves discussed above, similar performance tests were performed using up to 8 clients sharing the cluster evenly with varying data sizes. The performance observed when utilizing the whole 190 cluster split evenly among multiple clients is shown below. The Y-axis shows the average data access time while the number of simultaneous clients is increased along the X-axis.This result shows that the number of clients has little effect on the cache performance when the number of compute nodes and size of data are held constant. The moderate decrease in the time to access data as the number of clients increases can be attributed to decreased contention for each piece of data and fairer spreading of accesses to each cache host at any given time.Best practicesWhen using AppFabric Caching to handle the data management in a SOA solution, there are a number of best practices to adhere to. Some are infrastructure level and in the domain of the cluster administrator while others are application specific. InfrastructureEnsure cache hosts and compute nodes have adequate memory resources to meet application demandsIf applications are attempting to cache more data than fits into the total physical memory on the cache hosts, performance will degrade drastically as virtual memory starts to page onto disk pute nodes may also run into memory bottlenecks if a large amount of data is loaded by multiple service hosts per node. This is a general concern for application development that is more pressing when using caching only because loading large quantities of data into memory is easier.Faster networks are betterAt large numbers of compute nodes and/or large data sizes, the network on the cache hosts can saturate before the processor or memory becomes a bottleneck. Therefore, moving from Gigabit Ethernet to 10 Gigabit Ethernet or Infiniband private network can improve performance.Multiple NICs accessing a single logical network may provide an alternative solution, although this has not been verified.Ensure enough cache hosts are available to adequately serve the compute nodes retrieving cached data.Cached data is spread out among the available cache hosts, so incoming connections from compute nodes are similarly balanced across multiple nodes, which will provide a performance improvement under high congestion.Disperse cache hosts among compute nodes physicallyCache access times may benefit from distributing cache hosts among compute nodes rather than putting all the cache nodes in one physical location. Therefore one cache host per rack of servers would be preferable to locating all the cache nodes on one rack.ApplicationNode allocationWhen possible, it is beneficial to use node allocation when accessing common data from an AppFabric cache. Establishing a connection and retrieving the cached data for multiple services on a single compute node is expensive when compared to the single cache connection afforded by node allocation.Alternatively, creating a Windows service which can host retrieved data locally and leveraging node preparation and release tasks to access the cache once per node is an effective way to use core allocation while retaining the benefit of node allocation for cache access. This approach is demonstrated in the SDK pressionWhen the data to be cached is amenable to compression, the reduced size of data to be placed in the cache and retrieved by the compute nodes reduces data transfer times. Compression is demonstrated in the SDK sample.Cache small pieces of data Rather than putting single, large chunks of data into the cache, it is beneficial to split up the data and cache it in pieces around 64 KB in size. Larger sizes are supported by AppFabric, but optimal performance results from breaking large objects down into smaller pieces. The benefit results from the load balancing across cache hosts and reduced load on memory and garbage collection logic. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download