Let me put my experience & views on design and implementation of a system used for automatically deploying distributed applications on infrastructure clouds. I am big fan of open systems so the efforts driven in that direction. The system interfaces with several different cloud resource providers to provision virtual machines, coordinates the configuration and initiation of services to support distributed applications, and monitors applications over time.
Infrastructure as a Service (IaaS) clouds are becoming an important platform for distributed applications. These clouds allow users to provision computational, storage and networking resources from commercial and academic resource providers. Unlike other distributed resource sharing solutions, such as grids, users of infrastructure clouds are given full control of the entire software environment in which their applications run. The benefits of this approach include support
for legacy applications and the ability to customize the environment to suit the application. The drawbacks include increased complexity and additional effort required to setup and deploy the application.
Current infrastructure clouds provide interfaces for allocating individual virtual machines (VMs) with a desired configuration of CPU, memory, disk space, etc. However, these interfaces typically do not provide any features to help users deploy and configure their application once resources have been provisioned. In order to make use of infrastructure clouds, developers need software tools that can be used to configure dynamic execution environments in the cloud.
The execution environments required by distributed scientific applications, such as workflows and parallel programs, typically require a distributed storage system for sharing data between application tasks running on different nodes, and a resource manager for scheduling tasks onto nodes. Fortunately, many such services have been developed for use in traditional HPC environments, such as clusters and grids. The challenge is how to deploy these services in the cloud given the dynamic nature of cloud environments. Unlike clouds, clusters and grids are static environments. A system
administrator can setup the required services on a cluster and, with some maintenance, the cluster will be ready to run applications at any time. Clouds, on the other hand, are highly dynamic. Virtual machines provisioned from the cloud may be used to run applications for only a few hours at a time. In order to make efficient use of such an environment, tools are needed to automatically install, configure, and run distributed services in a repeatable way.
Deploying such applications is not a trivial task. It is usually not sufficient to simply develop a virtual machine (VM) image that runs the appropriate services when the virtual machine starts up, and then just deploy the image on several VMs in the cloud. Often the configuration of distributed services requires information about the nodes in the deployment that is not available until after nodes are
provisioned (such as IP addresses, host names, etc.) as well as parameters specified by the user. In addition, nodes often form a complex hierarchy of interdependent services that must be configured in the correct order. Although users can manually configure such complex deployments, doing so is time consuming and error prone, especially for deployments with a large number of nodes. Instead, we advocate an approach where the user is able to specify the layout of their application declaratively, and use a service to automatically provision, configure, and monitor the application deployment. The service should allow for the dynamic configuration of the deployment, so that a variety services can be deployed based on the needs of the user. It should also be resilient to failures
that occur during the provisioning process and allow for the dynamic addition and removal of nodes.
For this blog we have considered a system called Wrangler that implements this functionality. Wrangler allows users to send a simple XML description of the desired deployment to a web service that manages the provisioning of virtual machines and the installation and configuration of software and services. It is capable of interfacing with many different resource providers in order to deploy applications across clouds, supports plugins that enable users to define custom behaviors for their application, and allows dependencies to be specified between nodes. Complex deployments can be created by composing several plugins that set up services, install and configure application software,
download data, and monitor services, on several interdependent nodes.
We have been using Wrangler since mid 2010 to provision virtual clusters for scientific workflow applications on Amazon EC2, the Magellan cloud at NERSC, the Sierra and India clouds on the FutureGrid, and the Skynet cloud at ISI. We have used these virtual clusters to run several hundred
workflows for applications in astronomy, bioinformatics and earth science.
So far we have found that Wrangler makes deploying complex, distributed applications in the cloud easy, but we have encountered some issues in using it that we plan to address in the future. Currently, Wrangler assumes that users can respond to failures manually. In practice this has been a
problem because users often leave virtual clusters running unattended for long periods. In the future we plan to investigate solutions for automatically handling failures by re-provisioning failed nodes, and by implementing mechanisms to fail gracefully or provide degraded service when re-provisioning is not possible. We also plan to develop techniques for re-configuring deployments, and for dynamically scaling deployments in response to application demand.
This is just the initial steps writing the completed scenario in next blog. Do write to me at ravindrapande@gmail.com
Infrastructure as a Service (IaaS) clouds are becoming an important platform for distributed applications. These clouds allow users to provision computational, storage and networking resources from commercial and academic resource providers. Unlike other distributed resource sharing solutions, such as grids, users of infrastructure clouds are given full control of the entire software environment in which their applications run. The benefits of this approach include support
for legacy applications and the ability to customize the environment to suit the application. The drawbacks include increased complexity and additional effort required to setup and deploy the application.
Current infrastructure clouds provide interfaces for allocating individual virtual machines (VMs) with a desired configuration of CPU, memory, disk space, etc. However, these interfaces typically do not provide any features to help users deploy and configure their application once resources have been provisioned. In order to make use of infrastructure clouds, developers need software tools that can be used to configure dynamic execution environments in the cloud.
The execution environments required by distributed scientific applications, such as workflows and parallel programs, typically require a distributed storage system for sharing data between application tasks running on different nodes, and a resource manager for scheduling tasks onto nodes. Fortunately, many such services have been developed for use in traditional HPC environments, such as clusters and grids. The challenge is how to deploy these services in the cloud given the dynamic nature of cloud environments. Unlike clouds, clusters and grids are static environments. A system
administrator can setup the required services on a cluster and, with some maintenance, the cluster will be ready to run applications at any time. Clouds, on the other hand, are highly dynamic. Virtual machines provisioned from the cloud may be used to run applications for only a few hours at a time. In order to make efficient use of such an environment, tools are needed to automatically install, configure, and run distributed services in a repeatable way.
Deploying such applications is not a trivial task. It is usually not sufficient to simply develop a virtual machine (VM) image that runs the appropriate services when the virtual machine starts up, and then just deploy the image on several VMs in the cloud. Often the configuration of distributed services requires information about the nodes in the deployment that is not available until after nodes are
provisioned (such as IP addresses, host names, etc.) as well as parameters specified by the user. In addition, nodes often form a complex hierarchy of interdependent services that must be configured in the correct order. Although users can manually configure such complex deployments, doing so is time consuming and error prone, especially for deployments with a large number of nodes. Instead, we advocate an approach where the user is able to specify the layout of their application declaratively, and use a service to automatically provision, configure, and monitor the application deployment. The service should allow for the dynamic configuration of the deployment, so that a variety services can be deployed based on the needs of the user. It should also be resilient to failures
that occur during the provisioning process and allow for the dynamic addition and removal of nodes.
For this blog we have considered a system called Wrangler that implements this functionality. Wrangler allows users to send a simple XML description of the desired deployment to a web service that manages the provisioning of virtual machines and the installation and configuration of software and services. It is capable of interfacing with many different resource providers in order to deploy applications across clouds, supports plugins that enable users to define custom behaviors for their application, and allows dependencies to be specified between nodes. Complex deployments can be created by composing several plugins that set up services, install and configure application software,
download data, and monitor services, on several interdependent nodes.
We have been using Wrangler since mid 2010 to provision virtual clusters for scientific workflow applications on Amazon EC2, the Magellan cloud at NERSC, the Sierra and India clouds on the FutureGrid, and the Skynet cloud at ISI. We have used these virtual clusters to run several hundred
workflows for applications in astronomy, bioinformatics and earth science.
So far we have found that Wrangler makes deploying complex, distributed applications in the cloud easy, but we have encountered some issues in using it that we plan to address in the future. Currently, Wrangler assumes that users can respond to failures manually. In practice this has been a
problem because users often leave virtual clusters running unattended for long periods. In the future we plan to investigate solutions for automatically handling failures by re-provisioning failed nodes, and by implementing mechanisms to fail gracefully or provide degraded service when re-provisioning is not possible. We also plan to develop techniques for re-configuring deployments, and for dynamically scaling deployments in response to application demand.
This is just the initial steps writing the completed scenario in next blog. Do write to me at ravindrapande@gmail.com