Integrating FutureGrid and Commercial Clouds



FutureGrid Platform FGPlatform: Rationale and Possible DirectionsGeoffrey Fox June 8 2010Integrating FutureGrid and Commercial CloudsThe computing landscape is rapidly changing and this offers remarkable opportunities for scientific discovery. Important changes include multicore, cloud computing, smartphone and tablet interfaces as well as the data deluge driving new requirements for Cyberinfrastructure and new opportunities for scientific discovery. FutureGrid is an NSF TeraGrid facility providing a distributed testbed for developing research applications and middleware as well as supporting new approaches to education. FutureGrid supports many leading edge Grid and Cloud Technologies including MPI, MapReduce, Nimbus, gLite and Globus. Partners are responsible for making particular technologies are available and users request hardware and software resources for their experimental work. Some experiments only need the 6 major resources currently on FutureGrid (4 IBM iDataPlex clusters, 1 Dell Cluster and a Cray XT5). Within FutureGrid, the user has full control over resources allowing reproducible experiments with hardware configured on demand between different operating systems and with either "bare metal" or virtual machine as base. This enables a suite of experiments that compare the performance of different approaches in a robust fashion. Further most of FutureGrid lies on a private network that can be isolated allowing security sensitive experiments while a programmable network fault generator gives even greater richness to FutureGrid use. However some FutureGrid experiments can involve outside resources such as Condor flocks, GPU clusters or commercial cloud resources and the unique features of external resources make use of them essential in some cases. Some of these external resources may be dedicated to FutureGrid but unlike core resources treated as though external, as the operating environment is not under the control of FutureGrid. Alternatively an experiment "centered" outside FutureGrid (say in the Azure Cloud) may wish to access FutureGrid to run a component -- high performance cluster or GPU enhanced cluster -- that is not practical to run on Azure. Our experiments are managed by a framework built around the Pegasus system from USC (acting as a manager and not a workflow engine) and the INCA monitoring environment from SDSC. Note that although we offer Eucalyptus and Nimbus on FutureGrid with similar core functionality to an Amazon cloud, one might still wish to experiment on Amazon with its rich set of functionalities such as queuing, notification and multiple storage offerings. FutureGrid can support experiments involving Azure and Amazon (and in fact other external systems but that's not focus here) in several ways given in Table 1. Note we largely ignore the Google Application Engine as currently is targeted at Web applications while Azure and Amazon offer a general cloud environment.Table SEQ Table \* ARABIC 1: Support of Commercial Clouds in FutureGridWe support experiments that link Commercial Clouds and FutureGrid experiments with one or more workflow environments and portal technology installed to link components across these platformsWe support environments on FutureGrid that are similar to Commercial Clouds and natural for performance and functionality comparisons. These can both be used to prepare for using Commercial Clouds and as the most likely starting point for porting to them (item c below). One example would be support of MapReduce-like environments on FutureGrid including Hadoop on Linux and Dryad on Windows HPCS which are already part of FutureGrid portfolio of supported software. Of course offering an advanced platform on FutureGrid could be good just because the environment is more attractive than conventional scientific computing environments.We develop expertise and support porting to Commercial Clouds from other Windows or Linux environmentsWe support comparisons between and integration of multiple commercial Cloud environments -- especially Amazon and Azure in the immediate future We develop tutorials and expertise to help users move to Commercial Clouds from other mercial Cloud CapabilitiesCommercial Clouds offer cost effective utility computing with the elasticity to scale up and down in power. However as well as this key distinguishing feature, they are adding a growing number of additional capabilities commonly termed "Platform as a Service". For Azure, current Platform features include Azure Table, Queues, Blob, Database SQL, Web and Worker roles. Amazon is often viewed as "just" Infrastructure as a Service but it continues to add Platform features including SimpleDB (similar to Azure Table), Queues, Notification, Monitoring, Content Delivery Network, Relational Database, MapReduce (Hadoop) . Google does not currently offer a broad-based cloud service but the Google Application Engine offers a powerful Web application development environment. We define a FutureGrid high performance platform FGPlatform given in Table 2 that includes those capabilities of Cloud platforms that appear particularly interesting for large scale scientific computing plus those needed to run applications that link commercial clouds to outside resources -- in particular FutureGrid itself. FGPlatform allows us to support table 1 for Azure and Amazon. Table SEQ Table \* ARABIC 2: Features of FGPlatform supporting Integration of FutureGrid and Commercial CloudsAuthentication and Authorization: Provide single sign in to both FutureGrid and Commercial Clouds linked by workflowWorkflow: Support workflows that link job components between FutureGrid and Commercial Clouds. Trident from Microsoft Research is initial candidateData Transport: Transport data between job components on FutureGrid and Commercial Clouds respecting custom storage patternsProgram Library: Store Images and other Program material (basic FutureGrid facility)Blob: Basic storage concept similar to Azure Blob or Amazon S3DPFS Data Parallel File System: Support of file systems like Google (MapReduce), HDFS (Hadoop) or Cosmos (dryad) with compute-data affinity optimized for data processingTable: Support of Table Data structures modeled on Apache Hbase or Amazon SimpleDB/Azure TableSQL: Relational DatabaseQueues: Publish Subscribe based queuing systemWorker Role: This concept is implicitly used in both Amazon and TeraGrid but was first introduced as a high level construct by AzureMapReduce: Support MapReduce Programming model including Hadoop on Linux, Dryad on Windows HPCS and Twister on Windows and LinuxSoftware as a Service: This concept is shared between Clouds and Grids and can be supported without special attentionWeb Role: This is used in Azure to describe important link to user and can be supported in FutureGrid with a Portal frameworkNote there are some features like notification and monitoring that could be straightforwardly supported but did not seem as important as those in Table ponents of the FutureGrid PlatformAuthentication and AuthorizationWe will provide a single sign on between FutureGrid and Commercial Clouds linked by workflows with the following discussion emphasizing Azure. The current security architecture of FutureGrid is explicitly designed to allow integration with other efforts to integrate with other ongoing NSF OCI based project such as TeraGrid and in future XD. One of the features of FutureGrid will be to provide integration of multiple Identity Providers (one of which will be InCommon) as part of the authentication. Authorization can be based on an LDAP directory that integrates a project registry and allows the definition of group attributes and roles to provide a powerful secure architecture. This approach seamlessly integrates with the Azure framework while using claims based security allowing the integration of external identity providers (in our case InCommon in conjunction with our LDAP server) Thus, developers are able to leverage from both efforts while allowing authorized users of FutureGrid to develop secure applications that can delegate tasks to services found on FutureGrid or the Azure Platform. In this architecture one can either use a "FutureGrid" LiveID (Azure) account or that of individual users.WorkflowWe need to support workflows that link job components between FutureGrid and Commercial Clouds. One possibility (especially for Azure) is to use Trident from Microsoft Research ADDIN EN.CITE <EndNote><Cite><Author>Microsoft</Author><Year>2010</Year><RecNum>392</RecNum><record><rec-number>392</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">392</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Microsoft</author></authors></contributors><titles><title>Project Trident: A Scientific Workflow Workbench</title></titles><volume>2010</volume><number>June 3</number><dates><year>2010</year></dates><urls><related-urls><url>;[1] which is built on top of Windows Workflow Foundation. If Trident runs on Azure, then it will in conventional fashion, use workflow services that are proxies and launch those components that need to run on FutureGrid. Alternatively one could run Trident on FutureGrid and use proxies to launch components on Amazon or Azure.Data TransportThe cost (in time and money) of transport of data in (and to a lesser extent) out of commercial clouds is often discussed as a difficulty in using clouds. If commercial clouds become an important component of the National Cyberinfrastructure we can expect that high bandwidth links will be made available between clouds and TeraGrid (and hence FutureGrid). The special structure of cloud data with blocks (in Azure Blobs) and Tables could allow high performance parallel algorithms but initially simple HTTP mechanisms will be used to transport data PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5XZWkgTHU8L0F1dGhvcj48WWVhcj4yMDEwPC9ZZWFyPjxS

ZWNOdW0+MzkzPC9SZWNOdW0+PHJlY29yZD48cmVjLW51bWJlcj4zOTM8L3JlYy1udW1iZXI+PGZv

cmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRiLWlkPSJyZngyMHByOXQ1enh0bWVlMHhuNWZ3emJ4

dncwcjl2ejJ0ZWUiPjM5Mzwva2V5PjwvZm9yZWlnbi1rZXlzPjxyZWYtdHlwZSBuYW1lPSJDb25m

ZXJlbmNlIFBhcGVyIj40NzwvcmVmLXR5cGU+PGNvbnRyaWJ1dG9ycz48YXV0aG9ycz48YXV0aG9y

PldlaSBMdSw8L2F1dGhvcj48YXV0aG9yPkphcmVkIEphY2tzb24sPC9hdXRob3I+PGF1dGhvcj5S

b2dlciBCYXJnYSw8L2F1dGhvcj48L2F1dGhvcnM+PC9jb250cmlidXRvcnM+PHRpdGxlcz48dGl0

bGU+QXp1cmVCbGFzdDogQSBDYXNlIFN0dWR5IG9mIERldmVsb3BpbmcgU2NpZW5jZSBBcHBsaWNh

dGlvbnMgb24gdGhlIENsb3VkPC90aXRsZT48c2Vjb25kYXJ5LXRpdGxlPlNjaWVuY2VDbG91ZDog

MXN0IFdvcmtzaG9wIG9uIFNjaWVudGlmaWMgQ2xvdWQgQ29tcHV0aW5nIGNvLWxvY2F0ZWQgd2l0

aCBIUERDIDIwMTAgKEhpZ2ggUGVyZm9ybWFuY2UgRGlzdHJpYnV0ZWQgQ29tcHV0aW5nKTwvc2Vj

b25kYXJ5LXRpdGxlPjwvdGl0bGVzPjxkYXRlcz48eWVhcj4yMDEwPC95ZWFyPjxwdWItZGF0ZXM+

PGRhdGU+SnVuZSAyMTwvZGF0ZT48L3B1Yi1kYXRlcz48L2RhdGVzPjxwdWItbG9jYXRpb24+Q2hp

Y2FnbywgSUw8L3B1Yi1sb2NhdGlvbj48cHVibGlzaGVyPkFDTTwvcHVibGlzaGVyPjx1cmxzPjxy

ZWxhdGVkLXVybHM+PHVybD5odHRwOi8vZHNsLmNzLnVjaGljYWdvLmVkdS9TY2llbmNlQ2xvdWQy

MDEwL3AwNi5wZGY8L3VybD48L3JlbGF0ZWQtdXJscz48L3VybHM+PC9yZWNvcmQ+PC9DaXRlPjxD

aXRlPjxBdXRob3I+RGlzdHJpYnV0ZWQgU3lzdGVtcyBMYWJvcmF0b3J5IChEU0wpIGF0IFVuaXZl

cnNpdHkgb2YgQ2hpY2FnbyBXaWtpPC9BdXRob3I+PFllYXI+MjAxMDwvWWVhcj48UmVjTnVtPjM5

NDwvUmVjTnVtPjxyZWNvcmQ+PHJlYy1udW1iZXI+Mzk0PC9yZWMtbnVtYmVyPjxmb3JlaWduLWtl

eXM+PGtleSBhcHA9IkVOIiBkYi1pZD0icmZ4MjBwcjl0NXp4dG1lZTB4bjVmd3pieHZ3MHI5dnoy

dGVlIj4zOTQ8L2tleT48L2ZvcmVpZ24ta2V5cz48cmVmLXR5cGUgbmFtZT0iV2ViIFBhZ2UiPjEy

PC9yZWYtdHlwZT48Y29udHJpYnV0b3JzPjxhdXRob3JzPjxhdXRob3I+RGlzdHJpYnV0ZWQgU3lz

dGVtcyBMYWJvcmF0b3J5IChEU0wpIGF0IFVuaXZlcnNpdHkgb2YgQ2hpY2FnbyBXaWtpLDwvYXV0

aG9yPjwvYXV0aG9ycz48L2NvbnRyaWJ1dG9ycz48dGl0bGVzPjx0aXRsZT5QZXJmb3JtYW5jZSBD

b21wYXJpc29uOlJlbW90ZSBVc2FnZSwgTkZTLCBTMy1mdXNlLCBFQlM8L3RpdGxlPjwvdGl0bGVz

Pjx2b2x1bWU+MjAxMDwvdm9sdW1lPjxudW1iZXI+SnVuZSA4PC9udW1iZXI+PGRhdGVzPjx5ZWFy

PjIwMTA8L3llYXI+PC9kYXRlcz48dXJscz48cmVsYXRlZC11cmxzPjx1cmw+aHR0cDovL2RzbC13

aWtpLmNzLnVjaGljYWdvLmVkdS9pbmRleC5waHAvUGVyZm9ybWFuY2VfQ29tcGFyaXNvbjpSZW1v

dGVfVXNhZ2UsX05GUyxfUzMtZnVzZSxfRUJTPC91cmw+PC9yZWxhdGVkLXVybHM+PC91cmxzPjwv

cmVjb3JkPjwvQ2l0ZT48Q2l0ZT48QXV0aG9yPkRhcnJlbiBKZW5zZW48L0F1dGhvcj48WWVhcj4y

MDA5PC9ZZWFyPjxSZWNOdW0+Mzk1PC9SZWNOdW0+PHJlY29yZD48cmVjLW51bWJlcj4zOTU8L3Jl

Yy1udW1iZXI+PGZvcmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRiLWlkPSJyZngyMHByOXQ1enh0

bWVlMHhuNWZ3emJ4dncwcjl2ejJ0ZWUiPjM5NTwva2V5PjwvZm9yZWlnbi1rZXlzPjxyZWYtdHlw

ZSBuYW1lPSJXZWIgUGFnZSI+MTI8L3JlZi10eXBlPjxjb250cmlidXRvcnM+PGF1dGhvcnM+PGF1

dGhvcj5EYXJyZW4gSmVuc2VuLDwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRyaWJ1dG9ycz48dGl0

bGVzPjx0aXRsZT5CbG9nIGVudHJ5IG9uIENvbXBhcmUgQW1hem9uIFMzIHRvIEVCUyBkYXRhIHJl

YWQgcGVyZm9ybWFuY2U8L3RpdGxlPjwvdGl0bGVzPjx2b2x1bWU+MjAxMDwvdm9sdW1lPjxudW1i

ZXI+SnVuZSA4PC9udW1iZXI+PGRhdGVzPjx5ZWFyPjIwMDk8L3llYXI+PHB1Yi1kYXRlcz48ZGF0

ZT5EZWNlbWJlciAzMDwvZGF0ZT48L3B1Yi1kYXRlcz48L2RhdGVzPjx1cmxzPjxyZWxhdGVkLXVy

bHM+PHVybD5odHRwOi8vamVuc2VuZGFycmVuLndvcmRwcmVzcy5jb20vMjAwOS8xMi8zMC9jb21w

YXJlLWFtYXpvbi1zMy10by1lYnMtZGF0YS1yZWFkLXBlcmZvcm1hbmNlLzwvdXJsPjwvcmVsYXRl

ZC11cmxzPjwvdXJscz48L3JlY29yZD48L0NpdGU+PC9FbmROb3RlPgB=

ADDIN EN.CITE PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5XZWkgTHU8L0F1dGhvcj48WWVhcj4yMDEwPC9ZZWFyPjxS

ZWNOdW0+MzkzPC9SZWNOdW0+PHJlY29yZD48cmVjLW51bWJlcj4zOTM8L3JlYy1udW1iZXI+PGZv

cmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRiLWlkPSJyZngyMHByOXQ1enh0bWVlMHhuNWZ3emJ4

dncwcjl2ejJ0ZWUiPjM5Mzwva2V5PjwvZm9yZWlnbi1rZXlzPjxyZWYtdHlwZSBuYW1lPSJDb25m

ZXJlbmNlIFBhcGVyIj40NzwvcmVmLXR5cGU+PGNvbnRyaWJ1dG9ycz48YXV0aG9ycz48YXV0aG9y

PldlaSBMdSw8L2F1dGhvcj48YXV0aG9yPkphcmVkIEphY2tzb24sPC9hdXRob3I+PGF1dGhvcj5S

b2dlciBCYXJnYSw8L2F1dGhvcj48L2F1dGhvcnM+PC9jb250cmlidXRvcnM+PHRpdGxlcz48dGl0

bGU+QXp1cmVCbGFzdDogQSBDYXNlIFN0dWR5IG9mIERldmVsb3BpbmcgU2NpZW5jZSBBcHBsaWNh

dGlvbnMgb24gdGhlIENsb3VkPC90aXRsZT48c2Vjb25kYXJ5LXRpdGxlPlNjaWVuY2VDbG91ZDog

MXN0IFdvcmtzaG9wIG9uIFNjaWVudGlmaWMgQ2xvdWQgQ29tcHV0aW5nIGNvLWxvY2F0ZWQgd2l0

aCBIUERDIDIwMTAgKEhpZ2ggUGVyZm9ybWFuY2UgRGlzdHJpYnV0ZWQgQ29tcHV0aW5nKTwvc2Vj

b25kYXJ5LXRpdGxlPjwvdGl0bGVzPjxkYXRlcz48eWVhcj4yMDEwPC95ZWFyPjxwdWItZGF0ZXM+

PGRhdGU+SnVuZSAyMTwvZGF0ZT48L3B1Yi1kYXRlcz48L2RhdGVzPjxwdWItbG9jYXRpb24+Q2hp

Y2FnbywgSUw8L3B1Yi1sb2NhdGlvbj48cHVibGlzaGVyPkFDTTwvcHVibGlzaGVyPjx1cmxzPjxy

ZWxhdGVkLXVybHM+PHVybD5odHRwOi8vZHNsLmNzLnVjaGljYWdvLmVkdS9TY2llbmNlQ2xvdWQy

MDEwL3AwNi5wZGY8L3VybD48L3JlbGF0ZWQtdXJscz48L3VybHM+PC9yZWNvcmQ+PC9DaXRlPjxD

aXRlPjxBdXRob3I+RGlzdHJpYnV0ZWQgU3lzdGVtcyBMYWJvcmF0b3J5IChEU0wpIGF0IFVuaXZl

cnNpdHkgb2YgQ2hpY2FnbyBXaWtpPC9BdXRob3I+PFllYXI+MjAxMDwvWWVhcj48UmVjTnVtPjM5

NDwvUmVjTnVtPjxyZWNvcmQ+PHJlYy1udW1iZXI+Mzk0PC9yZWMtbnVtYmVyPjxmb3JlaWduLWtl

eXM+PGtleSBhcHA9IkVOIiBkYi1pZD0icmZ4MjBwcjl0NXp4dG1lZTB4bjVmd3pieHZ3MHI5dnoy

dGVlIj4zOTQ8L2tleT48L2ZvcmVpZ24ta2V5cz48cmVmLXR5cGUgbmFtZT0iV2ViIFBhZ2UiPjEy

PC9yZWYtdHlwZT48Y29udHJpYnV0b3JzPjxhdXRob3JzPjxhdXRob3I+RGlzdHJpYnV0ZWQgU3lz

dGVtcyBMYWJvcmF0b3J5IChEU0wpIGF0IFVuaXZlcnNpdHkgb2YgQ2hpY2FnbyBXaWtpLDwvYXV0

aG9yPjwvYXV0aG9ycz48L2NvbnRyaWJ1dG9ycz48dGl0bGVzPjx0aXRsZT5QZXJmb3JtYW5jZSBD

b21wYXJpc29uOlJlbW90ZSBVc2FnZSwgTkZTLCBTMy1mdXNlLCBFQlM8L3RpdGxlPjwvdGl0bGVz

Pjx2b2x1bWU+MjAxMDwvdm9sdW1lPjxudW1iZXI+SnVuZSA4PC9udW1iZXI+PGRhdGVzPjx5ZWFy

PjIwMTA8L3llYXI+PC9kYXRlcz48dXJscz48cmVsYXRlZC11cmxzPjx1cmw+aHR0cDovL2RzbC13

aWtpLmNzLnVjaGljYWdvLmVkdS9pbmRleC5waHAvUGVyZm9ybWFuY2VfQ29tcGFyaXNvbjpSZW1v

dGVfVXNhZ2UsX05GUyxfUzMtZnVzZSxfRUJTPC91cmw+PC9yZWxhdGVkLXVybHM+PC91cmxzPjwv

cmVjb3JkPjwvQ2l0ZT48Q2l0ZT48QXV0aG9yPkRhcnJlbiBKZW5zZW48L0F1dGhvcj48WWVhcj4y

MDA5PC9ZZWFyPjxSZWNOdW0+Mzk1PC9SZWNOdW0+PHJlY29yZD48cmVjLW51bWJlcj4zOTU8L3Jl

Yy1udW1iZXI+PGZvcmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRiLWlkPSJyZngyMHByOXQ1enh0

bWVlMHhuNWZ3emJ4dncwcjl2ejJ0ZWUiPjM5NTwva2V5PjwvZm9yZWlnbi1rZXlzPjxyZWYtdHlw

ZSBuYW1lPSJXZWIgUGFnZSI+MTI8L3JlZi10eXBlPjxjb250cmlidXRvcnM+PGF1dGhvcnM+PGF1

dGhvcj5EYXJyZW4gSmVuc2VuLDwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRyaWJ1dG9ycz48dGl0

bGVzPjx0aXRsZT5CbG9nIGVudHJ5IG9uIENvbXBhcmUgQW1hem9uIFMzIHRvIEVCUyBkYXRhIHJl

YWQgcGVyZm9ybWFuY2U8L3RpdGxlPjwvdGl0bGVzPjx2b2x1bWU+MjAxMDwvdm9sdW1lPjxudW1i

ZXI+SnVuZSA4PC9udW1iZXI+PGRhdGVzPjx5ZWFyPjIwMDk8L3llYXI+PHB1Yi1kYXRlcz48ZGF0

ZT5EZWNlbWJlciAzMDwvZGF0ZT48L3B1Yi1kYXRlcz48L2RhdGVzPjx1cmxzPjxyZWxhdGVkLXVy

bHM+PHVybD5odHRwOi8vamVuc2VuZGFycmVuLndvcmRwcmVzcy5jb20vMjAwOS8xMi8zMC9jb21w

YXJlLWFtYXpvbi1zMy10by1lYnMtZGF0YS1yZWFkLXBlcmZvcm1hbmNlLzwvdXJsPjwvcmVsYXRl

ZC11cmxzPjwvdXJscz48L3JlY29yZD48L0NpdGU+PC9FbmROb3RlPgB=

ADDIN EN.CITE.DATA [2-4] between job components on FutureGrid and Commercial Clouds.Program Library We can extend FutureGrid's virtual machine image library to manage images used in commercial clouds.Blobs and DrivesThe basic storage concept in clouds is Blobs for Azure and S3 for Amazon. These can be organized (approximately as in directories) by Containers for Azure. Further as well as service interface for Blobs and S3, one can attach "directly" to compute instances as Azure Drives and the Elastic Block Store for Amazon. This concept is similar to shared file systems such as Lustre used in TeraGrid and offered on FutureGrid. The cloud storage is intrinsically fault tolerant while that on FutureGrid needs backup storage (HPSS at Indiana University). However the architecture ideas are similar between clouds and FutureGrid and initially we should just add support for the Simple Cloud File Storage API ADDIN EN.CITE <EndNote><Cite><Author>Zend PHP Company</Author><Year>2010</Year><RecNum>376</RecNum><record><rec-number>376</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">376</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Zend PHP Company,</author></authors></contributors><titles><title>The Simple Cloud API for Storage, Queues and Table</title></titles><volume>2010</volume><number>June 1</number><dates><year>2010</year></dates><urls><related-urls><url>;[5].DPFS Data Parallel File System This covers the support of file systems like Google File System(MapReduce), HDFS (Hadoop) or Cosmos (Dryad) with compute-data affinity optimized for data processing. It could be possible to link DPFS to basic Blob and Drive based architecture but simpler is regard DPFS as application centric storage model with compute-data affinity and Blobs and Drives as the repository centric view. In general data transport will be needed to link these two data views. It seems important to consider this carefully for FutureGrid as DPFS file systems are precisely designed for efficient execution of data-intensive applications. However the importance of DPFS for linkage with Amazon and Azure is not clear as these clouds do not currently offer fine grain support for compute-data affinity. We note here Azure Affinity Groups as one interesting capability ADDIN EN.CITE <EndNote><Cite><Author>Microsoft</Author><Year>2009</Year><RecNum>390</RecNum><record><rec-number>390</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">390</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Microsoft</author></authors></contributors><titles><title>Windows Azure Geo-location Live</title></titles><volume>2010</volume><number>June 5</number><dates><year>2009</year><pub-dates><date>April 30</date></pub-dates></dates><urls><related-urls><url>;[6]. We expect that initially Blobs, Drives, Tables and Queues will be the areas that FutureGrid will most usefully provide a platform similar to Azure (and Amazon).Table and NOSQL Non Relational DatabasesThere has been substantial important developments in simplified database structures -- termed NOSQL ADDIN EN.CITE <EndNote><Cite><Author>NOSQL Movement</Author><Year>2010</Year><RecNum>378</RecNum><record><rec-number>378</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">378</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>NOSQL Movement,</author></authors></contributors><titles><title>Wikipedia list of resources</title></titles><volume>2010</volume><number>June 5</number><dates><year>2010</year></dates><urls><related-urls><url> Link Archive</Author><Year>2010</Year><RecNum>383</RecNum><record><rec-number>383</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">383</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>NOSQL Link Archive,</author></authors></contributors><titles><title>LIST OF NOSQL DATABASES</title></titles><volume>2010</volume><number>June 5</number><dates><year>2010</year></dates><urls><related-urls><url>;[7-8] -- typically emphasizing distribution and scalability. These are present in the three major clouds: Bigtable ADDIN EN.CITE <EndNote><Cite><Author>Fay Chang</Author><Year>2006</Year><RecNum>384</RecNum><record><rec-number>384</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">384</key></foreign-keys><ref-type name="Conference Paper">47</ref-type><contributors><authors><author>Fay Chang,</author><author>Jeffrey Dean, </author><author>Sanjay Ghemawat, </author><author>Wilson C. Hsieh, </author><author>Deborah A. Wallach, </author><author>Mike Burrows, </author><author>Tushar Chandra, </author><author>Andrew Fikes, </author><author>Robert E. Gruber </author></authors></contributors><titles><title>Bigtable: A Distributed Storage System for Structured Data</title><secondary-title>OSDI&apos;06: Seventh Symposium on Operating System Design and Implementation</secondary-title></titles><dates><year>2006</year></dates><pub-location>Seattle, WA</pub-location><publisher>USENIX</publisher><urls><related-urls><url>;[9] in Google; SimpleDB ADDIN EN.CITE <EndNote><Cite><Author>Amazon</Author><Year>2010</Year><RecNum>380</RecNum><record><rec-number>380</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">380</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Amazon</author></authors></contributors><titles><title>Welcome to Amazon SimpleDB</title></titles><volume>2010</volume><number>June 5</number><dates><year>2010</year></dates><urls><related-urls><url>;[10] in Amazon and Azure Table ADDIN EN.CITE <EndNote><Cite><Author>Jai Haridas</Author><Year>2009</Year><RecNum>386</RecNum><record><rec-number>386</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">386</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Jai Haridas,</author><author>Niranjan Nilakantan, </author><author>Brad Calder</author></authors></contributors><titles><title>WINDOWS AZURE TABLE </title></titles><volume>2010</volume><number>June 5</number><dates><year>2009</year><pub-dates><date>May</date></pub-dates></dates><urls><related-urls><url>;[11] for Azure. Tables are clearly important in science as illustrated by the VOTable standard in Astronomy ADDIN EN.CITE <EndNote><Cite><Author>International Virtual Observatory Alliance</Author><Year>2004</Year><RecNum>379</RecNum><record><rec-number>379</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">379</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>International Virtual Observatory Alliance,</author></authors></contributors><titles><title>VOTable Format Definition Version 1.1</title></titles><volume>2010</volume><number>June 5</number><dates><year>2004</year></dates><urls><related-urls><url>;[12] and the popularity of Excel. However there does not appear to be substantial experience in using tables outside clouds. There are of course many important uses of non relational databases -- especially in use of triple stores for metadata storage and access. Recently there is interest in building scalable RDF triple stores based on MapReduce and Tables or the Hadoop File System ADDIN EN.CITE <EndNote><Cite><Author>Apache Incubator</Author><Year>2010</Year><RecNum>377</RecNum><record><rec-number>377</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">377</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Apache Incubator,</author></authors></contributors><titles><title>Heart (Highly Extensible &amp; Accumulative RDF Table) planet-scale RDF data store and a distributed processing engine based on Hadoop &amp; Hbase. </title></titles><volume>2010</volume><number>June 1</number><dates><year>2010</year></dates><urls><related-urls><url> BBN</Author><Year>2010</Year><RecNum>387</RecNum><record><rec-number>387</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">387</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Raytheon BBN,</author></authors></contributors><titles><title>SHARD (Scalable, High-Performance, Robust and Distributed) Triple Store based on Hadoop</title></titles><volume>2010</volume><number>June 5</number><dates><year>2010</year><pub-dates><date>March</date></pub-dates></dates><urls><related-urls><url>;[13-14] with early success reported on very large stores. The current cloud Tables fall into two groups: Azure Table and Amazon SimpleDB are quite similar ADDIN EN.CITE <EndNote><Cite><Author>Matthew King</Author><Year>2007</Year><RecNum>382</RecNum><record><rec-number>382</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">382</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Matthew King,</author></authors></contributors><titles><title>Amazon SimpleDB and CouchDB Compared</title></titles><volume>2010</volume><number>June 5</number><dates><year>2007</year><pub-dates><date>December 14</date></pub-dates></dates><urls><related-urls><url>;[15] and support lightweight storage for "document stores" while Bigtable aims to manage large mammoth distributed data sets without size limitations. All these tables are schema free (each record can have different properties) although Bigtable has a Schema for column (property) families. It seems likely that tables will grow in importance for scientific computing and FutureGrid could support this using two Apache projects Hbase ADDIN EN.CITE <EndNote><Cite><Author>Apache</Author><Year>2010</Year><RecNum>388</RecNum><record><rec-number>388</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">388</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Apache</author></authors></contributors><titles><title>Hbase implementation of Bigtable on Hadoop File System</title></titles><volume>2010</volume><number>June 5</number><dates><year>2010</year></dates><urls><related-urls><url>;[16] for Bigtable and CouchDB ADDIN EN.CITE <EndNote><Cite><Author>Apache</Author><Year>2010</Year><RecNum>385</RecNum><record><rec-number>385</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">385</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Apache </author></authors></contributors><titles><title>The CouchDB document-oriented database Project</title></titles><volume>2010</volume><number>June 5</number><dates><year>2010</year></dates><urls><related-urls><url>;[17] for a document store. Another possibility is the open source SimpleDB implementation M/DB ADDIN EN.CITE <EndNote><Cite><Author>M/Gateway Developments Ltd</Author><Year>2009</Year><RecNum>381</RecNum><record><rec-number>381</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">381</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>M/Gateway Developments Ltd,</author></authors></contributors><titles><title>M/DB Open Source &quot;plug-compatible&quot; alternative to Amazon&apos;s SimpleDB database</title></titles><volume>2010</volume><number>June 5</number><dates><year>2009</year></dates><urls><related-urls><url>;[18]. The new Simple Cloud API's ADDIN EN.CITE <EndNote><Cite><Author>Zend PHP Company</Author><Year>2010</Year><RecNum>376</RecNum><record><rec-number>376</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">376</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Zend PHP Company,</author></authors></contributors><titles><title>The Simple Cloud API for Storage, Queues and Table</title></titles><volume>2010</volume><number>June 1</number><dates><year>2010</year></dates><urls><related-urls><url>;[5] for File Storage, Document Storage Services and Simple Queues could help providing a common environment between FutureGrid and commercial clouds.SQL and Relational DatabasesBoth Amazon and Azure clouds offer relational databases and it is straightforward for FutureGrid to offer a similar capability unless there are issues of huge scale where in fact approaches based on Tables and/or MapReduce might be more appropriate ADDIN EN.CITE <EndNote><Cite><Author>Raytheon BBN</Author><Year>2010</Year><RecNum>387</RecNum><record><rec-number>387</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">387</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Raytheon BBN,</author></authors></contributors><titles><title>SHARD (Scalable, High-Performance, Robust and Distributed) Triple Store based on Hadoop</title></titles><volume>2010</volume><number>June 5</number><dates><year>2010</year><pub-dates><date>March</date></pub-dates></dates><urls><related-urls><url>;[14]. As one early user we are developing on FutureGrid a new private cloud computing model for OMOP Observational Medical Outcomes Partnership for patient related medical data which uses Oracle and SAS where FutureGrid is adding Hadoop for scaling to many different analysis methods.Note that databases can be used to illustrate two approaches to deploying capabilities . Traditionally one would add database software to that found on computer disk. This software is executed providing your database instance. However on Azure and Amazon, the database is installed on a separate virtual machine independent from your job (worker roles in Azure). This implements "SQL as a Service". It may have some performance issues from messaging interface but the "aaS" deployment clearly simplifies one's system. For N platform features, one only needs N services whereas number of possible images with alternative approach is a prohibitive 2N.QueuesBoth Amazon and Azure offer similar scalable robust queuing services that are used to communicate between the components of an application. The messages are short (< 8KB) and have a REST Service interface with "deliver at least once semantics". They are controlled by time-outs for posting length and time allowed for a client to process. We can build a similar approach (on the small and less challenging} FutureGrid environment basing it on Publish Subscribe systems ActiveMQ ADDIN EN.CITE <EndNote><Cite><Year>2009</Year><RecNum>55</RecNum><record><rec-number>55</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">55</key></foreign-keys><ref-type name="Electronic Article">43</ref-type><contributors></contributors><titles><title>ActiveMQ</title></titles><dates><year>2009</year></dates><urls><related-urls><url>;[19] or NaradaBrokering ADDIN EN.CITE <EndNote><Cite><Author>Pallickara</Author><Year>2003</Year><RecNum>54</RecNum><record><rec-number>54</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">54</key></foreign-keys><ref-type name="Conference Proceedings">10</ref-type><contributors><authors><author>Pallickara, Shrideep</author><author>Fox, Geoffrey</author></authors></contributors><titles><title>NaradaBrokering: a distributed middleware framework and architecture for enabling durable peer-to-peer grids</title><secondary-title>ACM/IFIP/USENIX 2003 International Conference on Middleware</secondary-title></titles><dates><year>2003</year></dates><pub-location>Rio de Janeiro, Brazil</pub-location><publisher>Springer-Verlag New York, Inc</publisher><urls></urls></record></Cite><Cite><Author>NaradaBrokering</Author><Year>2010</Year><RecNum>375</RecNum><record><rec-number>375</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">375</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>NaradaBrokering</author></authors></contributors><titles><title>Scalable Publish Subscribe System</title></titles><volume>2010</volume><number>May</number><dates><year>2010</year></dates><urls><related-urls><url>;[20-21] with which we have substantial experience.Worker and Web RolesThe concepts of roles introduced by Azure is an interesting concept providing non trivial functionality that FutureGrid could offer while preserving the better affinity support that is possible on FutureGrid as it is not fully virtualized. Worker roles are the basic schedulable process and are automatically launched. Note that explicit scheduling is unnecessary in clouds either for individual worker roles or for "gang-scheduling" supported transparently in MapReduce. Queues are a critical concept here as they provide a natural way to manage the task assignment in a fault tolerant distributed fashion.Web roles provide an interesting approach to portals and here we note that the Google Application Engine is largely aimed at web applications. Science Gateways are very successful in TeraGrid but still require non trivial development. Perhaps the support of Web Roles in FutureGrid could both ease the transition to Azure and make it easier to develop Gateways.MapReduceThere has been substantial interest in "data parallel" languages largely aimed at loosely coupled computations which execute over different data samples. The language and runtime generate and provide efficient execution of "many task" problems that are well known as successful Grid applications. However MapReduce summarized in table 3, has several advantages over traditional implementations of many task problems as it supports dynamic execution, strong fault tolerance and an easy to use high level interface. The major commercial MapReduce implementations are Hadoop ADDIN EN.CITE <EndNote><Cite><Year>2009</Year><RecNum>12</RecNum><record><rec-number>12</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">12</key></foreign-keys><ref-type name="Electronic Article">43</ref-type><contributors></contributors><titles><title>Apache Hadoop</title></titles><dates><year>2009</year></dates><urls><related-urls><url>;[22] and Dryad PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5Fa2FuYXlha2U8L0F1dGhvcj48WWVhcj4yMDA5PC9ZZWFy

PjxSZWNOdW0+MTwvUmVjTnVtPjxyZWNvcmQ+PHJlYy1udW1iZXI+MTwvcmVjLW51bWJlcj48Zm9y

ZWlnbi1rZXlzPjxrZXkgYXBwPSJFTiIgZGItaWQ9InJmeDIwcHI5dDV6eHRtZWUweG41Znd6Ynh2

dzByOXZ6MnRlZSI+MTwva2V5PjwvZm9yZWlnbi1rZXlzPjxyZWYtdHlwZSBuYW1lPSJDb25mZXJl

bmNlIFByb2NlZWRpbmdzIj4xMDwvcmVmLXR5cGU+PGNvbnRyaWJ1dG9ycz48YXV0aG9ycz48YXV0

aG9yPkVrYW5heWFrZSwgSi48L2F1dGhvcj48YXV0aG9yPkJhbGtpciwgQS5TLjwvYXV0aG9yPjxh

dXRob3I+R3VuYXJhdGhuZSwgVC48L2F1dGhvcj48YXV0aG9yPkZveCwgRy48L2F1dGhvcj48YXV0

aG9yPlBvdWxhaW4sIEMuPC9hdXRob3I+PGF1dGhvcj5BcmF1am8sIE4uPC9hdXRob3I+PGF1dGhv

cj5CYXJnYSwgUi48L2F1dGhvcj48L2F1dGhvcnM+PC9jb250cmlidXRvcnM+PHRpdGxlcz48dGl0

bGU+RHJ5YWRMSU5RIGZvciBTY2llbnRpZmljIEFuYWx5c2VzPC90aXRsZT48c2Vjb25kYXJ5LXRp

dGxlPkZpZnRoIElFRUUgSW50ZXJuYXRpb25hbCBDb25mZXJlbmNlIG9uIGVTY2llbmNlOiAyMDA5

PC9zZWNvbmRhcnktdGl0bGU+PC90aXRsZXM+PGRhdGVzPjx5ZWFyPjIwMDk8L3llYXI+PC9kYXRl

cz48cHViLWxvY2F0aW9uPk94Zm9yZDwvcHViLWxvY2F0aW9uPjxwdWJsaXNoZXI+SUVFRTwvcHVi

bGlzaGVyPjx1cmxzPjwvdXJscz48L3JlY29yZD48L0NpdGU+PENpdGU+PEF1dGhvcj5Fa2FuYXlh

a2U8L0F1dGhvcj48WWVhcj4yMDA5PC9ZZWFyPjxSZWNOdW0+MjwvUmVjTnVtPjxyZWNvcmQ+PHJl

Yy1udW1iZXI+MjwvcmVjLW51bWJlcj48Zm9yZWlnbi1rZXlzPjxrZXkgYXBwPSJFTiIgZGItaWQ9

InJmeDIwcHI5dDV6eHRtZWUweG41Znd6Ynh2dzByOXZ6MnRlZSI+Mjwva2V5PjwvZm9yZWlnbi1r

ZXlzPjxyZWYtdHlwZSBuYW1lPSJSZXBvcnQiPjI3PC9yZWYtdHlwZT48Y29udHJpYnV0b3JzPjxh

dXRob3JzPjxhdXRob3I+RWthbmF5YWtlLCBKLjwvYXV0aG9yPjxhdXRob3I+R3VuYXJhdGhuZSwg

VC48L2F1dGhvcj48YXV0aG9yPlFpdSwgSi48L2F1dGhvcj48YXV0aG9yPkZveCwgRy48L2F1dGhv

cj48YXV0aG9yPkJlYXNvbiwgUy48L2F1dGhvcj48YXV0aG9yPkNob2ksIEouIFkuPC9hdXRob3I+

PGF1dGhvcj5SdWFuLCBZLjwvYXV0aG9yPjxhdXRob3I+QmFlLCBTLiBILjwvYXV0aG9yPjxhdXRo

b3I+TGksIEguPC9hdXRob3I+PC9hdXRob3JzPjwvY29udHJpYnV0b3JzPjx0aXRsZXM+PHRpdGxl

PkFwcGxpY2FiaWxpdHkgb2YgRHJ5YWRMSU5RIHRvIFNjaWVudGlmaWMgQXBwbGljYXRpb25zPC90

aXRsZT48L3RpdGxlcz48ZGF0ZXM+PHllYXI+MjAwOTwveWVhcj48L2RhdGVzPjxwdWJsaXNoZXI+

Q29tbXVuaXR5IEdyaWRzIExhYm9yYXRvcnksIEluZGlhbmEgVW5pdmVyc2l0eTwvcHVibGlzaGVy

Pjx1cmxzPjwvdXJscz48L3JlY29yZD48L0NpdGU+PENpdGU+PEF1dGhvcj5Jc2FyZDwvQXV0aG9y

PjxZZWFyPjIwMDc8L1llYXI+PFJlY051bT4xMTwvUmVjTnVtPjxyZWNvcmQ+PHJlYy1udW1iZXI+

MTE8L3JlYy1udW1iZXI+PGZvcmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRiLWlkPSJyZngyMHBy

OXQ1enh0bWVlMHhuNWZ3emJ4dncwcjl2ejJ0ZWUiPjExPC9rZXk+PC9mb3JlaWduLWtleXM+PHJl

Zi10eXBlIG5hbWU9IkNvbmZlcmVuY2UgUHJvY2VlZGluZ3MiPjEwPC9yZWYtdHlwZT48Y29udHJp

YnV0b3JzPjxhdXRob3JzPjxhdXRob3I+SXNhcmQsIE0uPC9hdXRob3I+PGF1dGhvcj5CdWRpdSwg

TS48L2F1dGhvcj48YXV0aG9yPll1LCBZLjwvYXV0aG9yPjxhdXRob3I+QmlycmVsbCwgQS48L2F1

dGhvcj48YXV0aG9yPkZldHRlcmx5LCBELjwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRyaWJ1dG9y

cz48dGl0bGVzPjx0aXRsZT5EcnlhZDogRGlzdHJpYnV0ZWQgZGF0YS1wYXJhbGxlbCBwcm9ncmFt

cyBmcm9tIHNlcXVlbnRpYWwgYnVpbGRpbmcgYmxvY2tzPC90aXRsZT48c2Vjb25kYXJ5LXRpdGxl

PkFDTSBTSUdPUFMgT3BlcmF0aW5nIFN5c3RlbXMgUmV2aWV3PC9zZWNvbmRhcnktdGl0bGU+PC90

aXRsZXM+PHBhZ2VzPjU5LTcyPC9wYWdlcz48dm9sdW1lPjQxPC92b2x1bWU+PGRhdGVzPjx5ZWFy

PjIwMDc8L3llYXI+PC9kYXRlcz48cHVibGlzaGVyPkFDTSBQcmVzczwvcHVibGlzaGVyPjx1cmxz

PjwvdXJscz48L3JlY29yZD48L0NpdGU+PENpdGU+PEF1dGhvcj5ZdTwvQXV0aG9yPjxZZWFyPjIw

MDg8L1llYXI+PFJlY051bT40MzwvUmVjTnVtPjxyZWNvcmQ+PHJlYy1udW1iZXI+NDM8L3JlYy1u

dW1iZXI+PGZvcmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRiLWlkPSJyZngyMHByOXQ1enh0bWVl

MHhuNWZ3emJ4dncwcjl2ejJ0ZWUiPjQzPC9rZXk+PC9mb3JlaWduLWtleXM+PHJlZi10eXBlIG5h

bWU9IkNvbmZlcmVuY2UgUHJvY2VlZGluZ3MiPjEwPC9yZWYtdHlwZT48Y29udHJpYnV0b3JzPjxh

dXRob3JzPjxhdXRob3I+WXUsIFkuPC9hdXRob3I+PGF1dGhvcj5Jc2FyZCwgTS48L2F1dGhvcj48

YXV0aG9yPkZldHRlcmx5LCBELjwvYXV0aG9yPjxhdXRob3I+QnVkaXUsIE0uPC9hdXRob3I+PGF1

dGhvcj5Fcmxpbmdzc29uLCBVLjwvYXV0aG9yPjxhdXRob3I+R3VuZGEsIFAuSy48L2F1dGhvcj48

YXV0aG9yPkouLCBDdXJyZXk8L2F1dGhvcj48L2F1dGhvcnM+PC9jb250cmlidXRvcnM+PHRpdGxl

cz48dGl0bGU+RHJ5YWRMSU5ROiBBIFN5c3RlbSBmb3IgR2VuZXJhbC1QdXJwb3NlIERpc3RyaWJ1

dGVkIERhdGEtUGFyYWxsZWwgQ29tcHV0aW5nIFVzaW5nIGEgSGlnaC1MZXZlbCBMYW5ndWFnZTwv

dGl0bGU+PHNlY29uZGFyeS10aXRsZT5TeW1wb3NpdW0gb24gT3BlcmF0aW5nIFN5c3RlbSBEZXNp

Z24gYW5kIEltcGxlbWVudGF0aW9uIChPU0RJKTwvc2Vjb25kYXJ5LXRpdGxlPjwvdGl0bGVzPjxw

ZXJpb2RpY2FsPjxmdWxsLXRpdGxlPlN5bXBvc2l1bSBvbiBPcGVyYXRpbmcgU3lzdGVtIERlc2ln

biBhbmQgSW1wbGVtZW50YXRpb24gKE9TREkpPC9mdWxsLXRpdGxlPjwvcGVyaW9kaWNhbD48ZGF0

ZXM+PHllYXI+MjAwODwveWVhcj48L2RhdGVzPjx1cmxzPjwvdXJscz48L3JlY29yZD48L0NpdGU+

PC9FbmROb3RlPgB=

ADDIN EN.CITE PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5Fa2FuYXlha2U8L0F1dGhvcj48WWVhcj4yMDA5PC9ZZWFy

PjxSZWNOdW0+MTwvUmVjTnVtPjxyZWNvcmQ+PHJlYy1udW1iZXI+MTwvcmVjLW51bWJlcj48Zm9y

ZWlnbi1rZXlzPjxrZXkgYXBwPSJFTiIgZGItaWQ9InJmeDIwcHI5dDV6eHRtZWUweG41Znd6Ynh2

dzByOXZ6MnRlZSI+MTwva2V5PjwvZm9yZWlnbi1rZXlzPjxyZWYtdHlwZSBuYW1lPSJDb25mZXJl

bmNlIFByb2NlZWRpbmdzIj4xMDwvcmVmLXR5cGU+PGNvbnRyaWJ1dG9ycz48YXV0aG9ycz48YXV0

aG9yPkVrYW5heWFrZSwgSi48L2F1dGhvcj48YXV0aG9yPkJhbGtpciwgQS5TLjwvYXV0aG9yPjxh

dXRob3I+R3VuYXJhdGhuZSwgVC48L2F1dGhvcj48YXV0aG9yPkZveCwgRy48L2F1dGhvcj48YXV0

aG9yPlBvdWxhaW4sIEMuPC9hdXRob3I+PGF1dGhvcj5BcmF1am8sIE4uPC9hdXRob3I+PGF1dGhv

cj5CYXJnYSwgUi48L2F1dGhvcj48L2F1dGhvcnM+PC9jb250cmlidXRvcnM+PHRpdGxlcz48dGl0

bGU+RHJ5YWRMSU5RIGZvciBTY2llbnRpZmljIEFuYWx5c2VzPC90aXRsZT48c2Vjb25kYXJ5LXRp

dGxlPkZpZnRoIElFRUUgSW50ZXJuYXRpb25hbCBDb25mZXJlbmNlIG9uIGVTY2llbmNlOiAyMDA5

PC9zZWNvbmRhcnktdGl0bGU+PC90aXRsZXM+PGRhdGVzPjx5ZWFyPjIwMDk8L3llYXI+PC9kYXRl

cz48cHViLWxvY2F0aW9uPk94Zm9yZDwvcHViLWxvY2F0aW9uPjxwdWJsaXNoZXI+SUVFRTwvcHVi

bGlzaGVyPjx1cmxzPjwvdXJscz48L3JlY29yZD48L0NpdGU+PENpdGU+PEF1dGhvcj5Fa2FuYXlh

a2U8L0F1dGhvcj48WWVhcj4yMDA5PC9ZZWFyPjxSZWNOdW0+MjwvUmVjTnVtPjxyZWNvcmQ+PHJl

Yy1udW1iZXI+MjwvcmVjLW51bWJlcj48Zm9yZWlnbi1rZXlzPjxrZXkgYXBwPSJFTiIgZGItaWQ9

InJmeDIwcHI5dDV6eHRtZWUweG41Znd6Ynh2dzByOXZ6MnRlZSI+Mjwva2V5PjwvZm9yZWlnbi1r

ZXlzPjxyZWYtdHlwZSBuYW1lPSJSZXBvcnQiPjI3PC9yZWYtdHlwZT48Y29udHJpYnV0b3JzPjxh

dXRob3JzPjxhdXRob3I+RWthbmF5YWtlLCBKLjwvYXV0aG9yPjxhdXRob3I+R3VuYXJhdGhuZSwg

VC48L2F1dGhvcj48YXV0aG9yPlFpdSwgSi48L2F1dGhvcj48YXV0aG9yPkZveCwgRy48L2F1dGhv

cj48YXV0aG9yPkJlYXNvbiwgUy48L2F1dGhvcj48YXV0aG9yPkNob2ksIEouIFkuPC9hdXRob3I+

PGF1dGhvcj5SdWFuLCBZLjwvYXV0aG9yPjxhdXRob3I+QmFlLCBTLiBILjwvYXV0aG9yPjxhdXRo

b3I+TGksIEguPC9hdXRob3I+PC9hdXRob3JzPjwvY29udHJpYnV0b3JzPjx0aXRsZXM+PHRpdGxl

PkFwcGxpY2FiaWxpdHkgb2YgRHJ5YWRMSU5RIHRvIFNjaWVudGlmaWMgQXBwbGljYXRpb25zPC90

aXRsZT48L3RpdGxlcz48ZGF0ZXM+PHllYXI+MjAwOTwveWVhcj48L2RhdGVzPjxwdWJsaXNoZXI+

Q29tbXVuaXR5IEdyaWRzIExhYm9yYXRvcnksIEluZGlhbmEgVW5pdmVyc2l0eTwvcHVibGlzaGVy

Pjx1cmxzPjwvdXJscz48L3JlY29yZD48L0NpdGU+PENpdGU+PEF1dGhvcj5Jc2FyZDwvQXV0aG9y

PjxZZWFyPjIwMDc8L1llYXI+PFJlY051bT4xMTwvUmVjTnVtPjxyZWNvcmQ+PHJlYy1udW1iZXI+

MTE8L3JlYy1udW1iZXI+PGZvcmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRiLWlkPSJyZngyMHBy

OXQ1enh0bWVlMHhuNWZ3emJ4dncwcjl2ejJ0ZWUiPjExPC9rZXk+PC9mb3JlaWduLWtleXM+PHJl

Zi10eXBlIG5hbWU9IkNvbmZlcmVuY2UgUHJvY2VlZGluZ3MiPjEwPC9yZWYtdHlwZT48Y29udHJp

YnV0b3JzPjxhdXRob3JzPjxhdXRob3I+SXNhcmQsIE0uPC9hdXRob3I+PGF1dGhvcj5CdWRpdSwg

TS48L2F1dGhvcj48YXV0aG9yPll1LCBZLjwvYXV0aG9yPjxhdXRob3I+QmlycmVsbCwgQS48L2F1

dGhvcj48YXV0aG9yPkZldHRlcmx5LCBELjwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRyaWJ1dG9y

cz48dGl0bGVzPjx0aXRsZT5EcnlhZDogRGlzdHJpYnV0ZWQgZGF0YS1wYXJhbGxlbCBwcm9ncmFt

cyBmcm9tIHNlcXVlbnRpYWwgYnVpbGRpbmcgYmxvY2tzPC90aXRsZT48c2Vjb25kYXJ5LXRpdGxl

PkFDTSBTSUdPUFMgT3BlcmF0aW5nIFN5c3RlbXMgUmV2aWV3PC9zZWNvbmRhcnktdGl0bGU+PC90

aXRsZXM+PHBhZ2VzPjU5LTcyPC9wYWdlcz48dm9sdW1lPjQxPC92b2x1bWU+PGRhdGVzPjx5ZWFy

PjIwMDc8L3llYXI+PC9kYXRlcz48cHVibGlzaGVyPkFDTSBQcmVzczwvcHVibGlzaGVyPjx1cmxz

PjwvdXJscz48L3JlY29yZD48L0NpdGU+PENpdGU+PEF1dGhvcj5ZdTwvQXV0aG9yPjxZZWFyPjIw

MDg8L1llYXI+PFJlY051bT40MzwvUmVjTnVtPjxyZWNvcmQ+PHJlYy1udW1iZXI+NDM8L3JlYy1u

dW1iZXI+PGZvcmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRiLWlkPSJyZngyMHByOXQ1enh0bWVl

MHhuNWZ3emJ4dncwcjl2ejJ0ZWUiPjQzPC9rZXk+PC9mb3JlaWduLWtleXM+PHJlZi10eXBlIG5h

bWU9IkNvbmZlcmVuY2UgUHJvY2VlZGluZ3MiPjEwPC9yZWYtdHlwZT48Y29udHJpYnV0b3JzPjxh

dXRob3JzPjxhdXRob3I+WXUsIFkuPC9hdXRob3I+PGF1dGhvcj5Jc2FyZCwgTS48L2F1dGhvcj48

YXV0aG9yPkZldHRlcmx5LCBELjwvYXV0aG9yPjxhdXRob3I+QnVkaXUsIE0uPC9hdXRob3I+PGF1

dGhvcj5Fcmxpbmdzc29uLCBVLjwvYXV0aG9yPjxhdXRob3I+R3VuZGEsIFAuSy48L2F1dGhvcj48

YXV0aG9yPkouLCBDdXJyZXk8L2F1dGhvcj48L2F1dGhvcnM+PC9jb250cmlidXRvcnM+PHRpdGxl

cz48dGl0bGU+RHJ5YWRMSU5ROiBBIFN5c3RlbSBmb3IgR2VuZXJhbC1QdXJwb3NlIERpc3RyaWJ1

dGVkIERhdGEtUGFyYWxsZWwgQ29tcHV0aW5nIFVzaW5nIGEgSGlnaC1MZXZlbCBMYW5ndWFnZTwv

dGl0bGU+PHNlY29uZGFyeS10aXRsZT5TeW1wb3NpdW0gb24gT3BlcmF0aW5nIFN5c3RlbSBEZXNp

Z24gYW5kIEltcGxlbWVudGF0aW9uIChPU0RJKTwvc2Vjb25kYXJ5LXRpdGxlPjwvdGl0bGVzPjxw

ZXJpb2RpY2FsPjxmdWxsLXRpdGxlPlN5bXBvc2l1bSBvbiBPcGVyYXRpbmcgU3lzdGVtIERlc2ln

biBhbmQgSW1wbGVtZW50YXRpb24gKE9TREkpPC9mdWxsLXRpdGxlPjwvcGVyaW9kaWNhbD48ZGF0

ZXM+PHllYXI+MjAwODwveWVhcj48L2RhdGVzPjx1cmxzPjwvdXJscz48L3JlY29yZD48L0NpdGU+

PC9FbmROb3RlPgB=

ADDIN EN.CITE.DATA [23-26] with execution possible with or without virtual machines. Hadoop is currently offered by Amazon and we expect Dryad to be available on Azure. On FutureGrid we already intend to support Hadoop, Dryad and other MapReduce approaches including Twister ADDIN EN.CITE <EndNote><Cite><Author>J.Ekanayake</Author><Year>2010</Year><RecNum>370</RecNum><record><rec-number>370</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">370</key></foreign-keys><ref-type name="Conference Paper">47</ref-type><contributors><authors><author>J.Ekanayake</author><author>H.Li</author><author>B.Zhang</author><author>T.Gunarathne</author><author>S.Bae</author><author>J.Qiu</author><author>G.Fox. </author></authors></contributors><titles><title>Twister: A Runtime for iterative MapReduce</title><secondary-title>Proceedings of the First International Workshop on MapReduce and its Applications of ACM HPDC 2010 conference June 20-25, 2010</secondary-title></titles><dates><year>2010</year></dates><pub-location>Chicago, Illinois</pub-location><publisher>ACM</publisher><urls></urls></record></Cite></EndNote>[27] supporting iterative computations seen in many datamining and linear algebra applications. Note that our approach has some similarities with Cloudera ADDIN EN.CITE <EndNote><Cite><Author>Cloudera</Author><Year>2010</Year><RecNum>391</RecNum><record><rec-number>391</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">391</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Cloudera</author></authors></contributors><titles><title>CDH: A free, stable Hadoop distribution offering RPM, Debian, AWS and automatic configuration options.</title></titles><volume>2010</volume><number>June 5</number><dates><year>2010</year></dates><urls><related-urls><url>;[28] which offers a variety of Hadoop distributions including Amazon and Linux.Table SEQ Table \* ARABIC 3: Comparison of MapReduce type systems relevant to FutureGridGoogle MapReduce ADDIN EN.CITE <EndNote><Cite><Author>Dean</Author><Year>2008</Year><RecNum>42</RecNum><record><rec-number>42</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">42</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Dean, J.</author><author>Ghemawat, S.</author></authors></contributors><titles><title>MapReduce: simplified data processing on large clusters</title><secondary-title>Commun. ACM</secondary-title></titles><periodical><full-title>Commun. ACM</full-title></periodical><pages>107-113.</pages><volume>51</volume><number>1</number><dates><year>2008</year></dates><urls></urls></record></Cite></EndNote>[29]Apache Hadoop ADDIN EN.CITE <EndNote><Cite><Year>2009</Year><RecNum>12</RecNum><record><rec-number>12</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">12</key></foreign-keys><ref-type name="Electronic Article">43</ref-type><contributors></contributors><titles><title>Apache Hadoop</title></titles><dates><year>2009</year></dates><urls><related-urls><url>;[22]Microsoft Dryad ADDIN EN.CITE <EndNote><Cite><Author>Isard</Author><Year>2007</Year><RecNum>11</RecNum><record><rec-number>11</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">11</key></foreign-keys><ref-type name="Conference Proceedings">10</ref-type><contributors><authors><author>Isard, M.</author><author>Budiu, M.</author><author>Yu, Y.</author><author>Birrell, A.</author><author>Fetterly, D.</author></authors></contributors><titles><title>Dryad: Distributed data-parallel programs from sequential building blocks</title><secondary-title>ACM SIGOPS Operating Systems Review</secondary-title></titles><pages>59-72</pages><volume>41</volume><dates><year>2007</year></dates><publisher>ACM Press</publisher><urls></urls></record></Cite></EndNote>[25]Twister ADDIN EN.CITE <EndNote><Cite><Author>J.Ekanayake</Author><Year>2010</Year><RecNum>370</RecNum><record><rec-number>370</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">370</key></foreign-keys><ref-type name="Conference Paper">47</ref-type><contributors><authors><author>J.Ekanayake</author><author>H.Li</author><author>B.Zhang</author><author>T.Gunarathne</author><author>S.Bae</author><author>J.Qiu</author><author>G.Fox. </author></authors></contributors><titles><title>Twister: A Runtime for iterative MapReduce</title><secondary-title>Proceedings of the First International Workshop on MapReduce and its Applications of ACM HPDC 2010 conference June 20-25, 2010</secondary-title></titles><dates><year>2010</year></dates><pub-location>Chicago, Illinois</pub-location><publisher>ACM</publisher><urls></urls></record></Cite></EndNote>[27]Azure Twister ADDIN EN.CITE <EndNote><Cite><Author>Gunarathne</Author><Year>2010</Year><RecNum>374</RecNum><record><rec-number>374</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">374</key></foreign-keys><ref-type name="Personal Communication">26</ref-type><contributors><authors><author>Thilina Gunarathne,</author></authors><secondary-authors><author>Geoffrrey Fox</author></secondary-authors></contributors><titles><title>MapReduce Implementation on Azure</title></titles><dates><year>2010</year><pub-dates><date>May 31 2010</date></pub-dates></dates><urls></urls></record></Cite></EndNote>[30]Programming ModelMapReduceMapReduceDAG execution, Extensible to MapReduce and other patternsIterative MapReduceMapReduce-- will extend to Iterative MapReduceData HandlingGFS (Google File System)HDFS (Hadoop Distributed File System)Shared Directories & local disks Local disks and data management toolsAzure Blob Storage SchedulingData LocalityData Locality; Rack aware, Dynamic task scheduling through global queueData locality;Networktopology basedrun time graphoptimizations; Static task partitionsData Locality; Static task partitionsDynamic task scheduling through global queueFailure HandlingRe-execution of failed tasks; Duplicate execution of slow tasksRe-execution of failed tasks; Duplicate execution of slow tasksRe-execution of failed tasks; Duplicate execution of slow tasksRe-execution of IterationsRe-execution of failed tasks; Duplicate execution of slow tasksHigh Level Language SupportSawzall ADDIN EN.CITE <EndNote><Cite><Author>Pike</Author><Year>2005</Year><RecNum>58</RecNum><record><rec-number>58</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">58</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Pike, R.</author><author>Dorward, S.</author><author>Griesemer, R.</author><author>Quinlan, S.</author></authors></contributors><titles><title>Interpreting the data: Parallel analysis with sawzall</title><secondary-title>Scientific Programming Journal Special Issue on Grids and Worldwide Computing Programming Models and Infrastructure vol. 13, no. 4</secondary-title></titles><periodical><full-title>Scientific Programming Journal Special Issue on Grids and Worldwide Computing Programming Models and Infrastructure vol. 13, no. 4</full-title></periodical><pages>227–298</pages><dates><year>2005</year></dates><urls></urls></record></Cite></EndNote>[31]Pig Latin ADDIN EN.CITE <EndNote><Cite><Author>Pig!</Author><Year>2010</Year><RecNum>371</RecNum><record><rec-number>371</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">371</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Pig!</author></authors></contributors><titles><title>Platform for analyzing large data sets</title></titles><dates><year>2010</year></dates><urls><related-urls><url> Olston</Author><Year>2008</Year><RecNum>372</RecNum><record><rec-number>372</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">372</key></foreign-keys><ref-type name="Conference Paper">47</ref-type><contributors><authors><author>Christopher Olston,</author><author>Benjamin Reed,</author><author>Utkarsh Srivastava,</author><author>Ravi Kumar,</author><author>Andrew Tomkins,</author></authors></contributors><titles><title>Pig latin: a not-so-foreign language for data processing</title><secondary-title>Proceedings of the 2008 ACM SIGMOD international conference on Management of data</secondary-title></titles><pages>1099-1110 </pages><dates><year>2008</year></dates><pub-location>Vancouver, Canada</pub-location><publisher>ACM</publisher><urls><related-urls><url>;[32-33]DryadLINQ ADDIN EN.CITE <EndNote><Cite><Author>Yu</Author><Year>2008</Year><RecNum>43</RecNum><record><rec-number>43</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">43</key></foreign-keys><ref-type name="Conference Proceedings">10</ref-type><contributors><authors><author>Yu, Y.</author><author>Isard, M.</author><author>Fetterly, D.</author><author>Budiu, M.</author><author>Erlingsson, U.</author><author>Gunda, P.K.</author><author>J., Currey</author></authors></contributors><titles><title>DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language</title><secondary-title>Symposium on Operating System Design and Implementation (OSDI)</secondary-title></titles><periodical><full-title>Symposium on Operating System Design and Implementation (OSDI)</full-title></periodical><dates><year>2008</year></dates><urls></urls></record></Cite></EndNote>[26]Pregel ADDIN EN.CITE <EndNote><Cite><Author>Grzegorz Malewicz</Author><Year>2009</Year><RecNum>373</RecNum><record><rec-number>373</rec-number><foreign-keys><key app="EN" db-id="rfx20pr9t5zxtmee0xn5fwzbxvw0r9vz2tee">373</key></foreign-keys><ref-type name="Conference Paper">47</ref-type><contributors><authors><author>Grzegorz Malewicz,</author><author>Matthew H. Austern,</author><author>Aart J.C. Bik,</author><author>James C. Dehnert,</author><author>Ilan Horn,</author><author>Naty Leiser,</author><author>Grzegorz Czajkowski,</author></authors></contributors><titles><title>Pregel: a system for large-scale graph processing</title><secondary-title>Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures</secondary-title></titles><pages>48-48</pages><dates><year>2009</year></dates><pub-location>Calgary, Canada</pub-location><publisher>ACM</publisher><urls><related-urls><url>;[34] has related featuresN/AEnvironmentLinux Cluster. Linux Clusters, Amazon Elastic Map Reduce on EC2Windows HPCS clusterLinux ClusterEC2Window Azure Compute, Windows Azure Local Development FabricIntermediate data transferFile File, HttpFile, TCP pipes, shared-memory FIFOsPublish/Subscribe messagingFiles, TCPSoftware as a ServiceServices are used in a similar fashion in commercial clouds and most modern distributed systems. We expect users to package their programs wherever possible and so no special support is needed to enable Software as a Service. In section 3.8, we already discussed the advantages of "Systems Software as a Service".References ADDIN EN.REFLIST 1.Microsoft. Project Trident: A Scientific Workflow Workbench. 2010 [accessed 2010 June 3]; Available from: Lu, Jared Jackson, and Roger Barga, AzureBlast: A Case Study of Developing Science Applications on the Cloud, in ScienceCloud: 1st Workshop on Scientific Cloud Computing co-located with HPDC 2010 (High Performance Distributed Computing). 2010, ACM: Chicago, IL.3.Distributed Systems Laboratory (DSL) at University of Chicago Wiki. Performance Comparison:Remote Usage, NFS, S3-fuse, EBS. 2010 [accessed 2010 June 8]; Available from: Jensen. Blog entry on Compare Amazon S3 to EBS data read performance. 2009 December 30 [accessed 2010 June 8]; Available from: PHP Company. The Simple Cloud API for Storage, Queues and Table. 2010 [accessed 2010 June 1]; Available from: . Windows Azure Geo-location Live. 2009 April 30 [accessed 2010 June 5]; Available from: Movement. Wikipedia list of resources. 2010 [accessed 2010 June 5]; Available from: Link Archive. LIST OF NOSQL DATABASES. 2010 [accessed 2010 June 5]; Available from: Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and R.E. Gruber, Bigtable: A Distributed Storage System for Structured Data, in OSDI'06: Seventh Symposium on Operating System Design and Implementation. 2006, USENIX: Seattle, WA.10.Amazon. Welcome to Amazon SimpleDB. 2010 [accessed 2010 June 5]; Available from: Haridas, Niranjan Nilakantan, and B. Calder. WINDOWS AZURE TABLE 2009 May [accessed 2010 June 5]; Available from: Virtual Observatory Alliance. VOTable Format Definition Version 1.1. 2004 [accessed 2010 June 5]; Available from: Incubator. Heart (Highly Extensible & Accumulative RDF Table) planet-scale RDF data store and a distributed processing engine based on Hadoop & Hbase. . 2010 [accessed 2010 June 1]; Available from: BBN. SHARD (Scalable, High-Performance, Robust and Distributed) Triple Store based on Hadoop. 2010 March [accessed 2010 June 5]; Available from: King. Amazon SimpleDB and CouchDB Compared. 2007 December 14 [accessed 2010 June 5]; Available from: . Hbase implementation of Bigtable on Hadoop File System. 2010 [accessed 2010 June 5]; Available from: . The CouchDB document-oriented database Project. 2010 [accessed 2010 June 5]; Available from: Developments Ltd. M/DB Open Source "plug-compatible" alternative to Amazon's SimpleDB database. 2009 [accessed 2010 June 5]; Available from: .(2009) ActiveMQ., S. and G. Fox. NaradaBrokering: a distributed middleware framework and architecture for enabling durable peer-to-peer grids. in ACM/IFIP/USENIX 2003 International Conference on Middleware. 2003. Rio de Janeiro, Brazil: Springer-Verlag New York, Inc.21.NaradaBrokering. Scalable Publish Subscribe System. 2010 [accessed 2010 May]; Available from: .(2009) Apache Hadoop., J., A.S. Balkir, T. Gunarathne, G. Fox, C. Poulain, N. Araujo, and R. Barga. DryadLINQ for Scientific Analyses. in Fifth IEEE International Conference on eScience: 2009. 2009. Oxford: IEEE.24.Ekanayake, J., T. Gunarathne, J. Qiu, G. Fox, S. Beason, J.Y. Choi, Y. Ruan, S.H. Bae, and H. Li, Applicability of DryadLINQ to Scientific Applications. 2009, Community Grids Laboratory, Indiana University.25.Isard, M., M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. in ACM SIGOPS Operating Systems Review. 2007: ACM Press.26.Yu, Y., M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P.K. Gunda, and C. J. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. in Symposium on Operating System Design and Implementation (OSDI). 2008.27.J.Ekanayake, H.Li, B.Zhang, T.Gunarathne, S.Bae, J.Qiu, and G.Fox., Twister: A Runtime for iterative MapReduce, in Proceedings of the First International Workshop on MapReduce and its Applications of ACM HPDC 2010 conference June 20-25, 2010. 2010, ACM: Chicago, Illinois.28.Cloudera. CDH: A free, stable Hadoop distribution offering RPM, Debian, AWS and automatic configuration options. 2010 [accessed 2010 June 5]; Available from: , J. and S. Ghemawat, MapReduce: simplified data processing on large clusters. Commun. ACM, 2008. 51(1): p. 107-113. 30.Thilina Gunarathne, MapReduce Implementation on Azure, Personal Communication to, G. Fox. 2010.31.Pike, R., S. Dorward, R. Griesemer, and S. Quinlan, Interpreting the data: Parallel analysis with sawzall. Scientific Programming Journal Special Issue on Grids and Worldwide Computing Programming Models and Infrastructure vol. 13, no. 4, 2005: p. 227–298. 32.Pig! Platform for analyzing large data sets. 2010; Available from: Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins, Pig latin: a not-so-foreign language for data processing, in Proceedings of the 2008 ACM SIGMOD international conference on Management of data. 2008, ACM: Vancouver, Canada. p. 1099-1110 34.Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski, Pregel: a system for large-scale graph processing, in Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures. 2009, ACM: Calgary, Canada. p. 48-48. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download