Sand Castle

by Bert Armijo and Peter Nickolov

Utility computing has gained considerable popularity over the past eighteen months as businesses big and small seek to take advantage of the flexibility the new computing model offers. This hasn’t always been the case, though. For a time, utility computing seemed a lackluster space that hadn’t been able to deliver on its early promise. The renewed interest comes on the heels of rapid market acceptance of server virtualization solutions like VMware and Xen.

VMWare XenSource

Virtualization is commonly used for server consolidation, carving physical servers into smaller virtual machines (VM) that can be used as if they were real servers. However, to accomplish this virtualization creates a separation of hardware and software, decoupling virtual machine images from physical assets. Users of virtualization have come to accept that virtual machine images can be moved among servers in their data center, so it’s a logical progression for them to think about running them on external resources – utility computing. Therefore, it’s not surprising to find that Xen, an open source virtualization system, is the foundation for more than one commercial utility computing system.

Virtualization by itself, however, is not a complete utility computing solution. While virtualization systems deal exceptionally well with partitioning CPU and memory within a server, they lack abstractions for network and storage interactions, image management, life-cycle control and other services critical to utility computing. However, there are two commercial utility computing solutions based on virtualization that are more than a year old now, Amazon’s EC2 and 3tera’s AppLogic.

Amazon EC23tera AppLogic

Therefore, we can start to evaluate the required elements of a successful utility computing solution based on those services. The rest of this article is a list of services required beyond virtualization in order to build a utility computing system.

1. Storage

Storage is easily the biggest hurdle to utility computing, and if poorly architected can affect cost, performance, scalability and portability of the system.

Xen provides a basic redirection of block devices to the virtual machines. The block devices may be partitions of a physical hard disk attached to the server, a large file from the server’s hard disk (loopback), or a SAN logical disk. How the disk is associated with the VM and how it becomes available on the server prior to being redirected to the VM is not something virtualization systems deal with.

Utility computing systems have to deal with this, they cannot leave to the customer to partition physical hard disks or deal with hard disk and server failures that may make the local disk unavailable. Some systems provide near-line storage outside of the VM, others use IP SANs with an associative namespace. In all cases, what is required is a self-managed storage system that fully mimics physical server behavior inside a VM so that regular, existing software code can be used — databases, web servers, etc.

Looking forward, some form of quota or throttling of disk I/O is needed that prevents one virtual machine from monopolizing a storage device and starving others. In addition, better detection of hardware failures will allow utility computing systems to take corrective actions automatically. The tools for such detection are available but currently are poorly integrated with the virtualization system and thus require manual intervention.

2. Network virtualization

When installing software on a physical server or virtual machine it’s normal practice for each system to be configured with the name or IP addresses of numerous other resources within the data center. For instance, a web server may have the name of a database or NAS. In a utility system, however, configuration isn’t quite so simple.

Xen’s network configuration provides two mechanisms: (1) flat L2 network, in which domain 0 acts also as a network switch, forwarding packets between the physical network and the VMs; and (2) routed solution, in which domain 0 acts as an IP router, creating a subnet for all VMs on the same server. Both approaches create their own set of problems when used in utility computing systems — from exceeding the MAC address limits on L2 switches to complicating the IP address space and preventing live migration of VMs.

Most existing utility systems implement either point-to-point connection virtualization or security groups similar to VLANs. As with storage, some form of quota or throttling of network I/O is needed so that one VM cannot monopolize the network interface and starve other VMs. In addition, network virtualization services like better VLAN systems, DHCP and DNS variants that take VM and utility needs into account are needed.

3. Scheduling

As users start their applications, the utility system needs a scheduling mechanism that determines where virtual machines will run on available hardware resources. The scheduler must deal not only with CPU and memory, but also with storage and network capacity across the entire system.

Xen’s scheduler already deals well with allocating CPU within a server among multiple virtual machines. Even though Xen also offers some manual CPU assignment, the automatic scheduling seems to work best. The scheduler in Xen v. 3 also provides the ability to cap the CPU use of a VM to a fixed resource even if there are no other VMs to run; this provides the ability provide repeatable performance — a strong requirement for utility computing.

All utility computing systems take the responsibility of scheduling VMs among the pool of physical servers automatically. Where they differ is in the VM sizing — from single size (fixed CPU/memory), to only a few standard sizes, to the full flexibility given by Xen. Some systems also have provisions for scheduling multiple related VMs in a way to provide a deterministic and fast network between related VMs. Further, they provide the ability to ensure the placement of VMs on different physical servers, so that VMs that serve as backup for each other will not all go down together in case of a server hardware failure.

More advanced methods are still needed that avoid fragmentation without losing flexibility in the size of each virtual machine. This is an important economic factor, as fragmentation leads to wasted resources and therefore higher costs. Additionally, utility computing systems would definitely benefit from global scheduling and the ability to place VMs and whole services in specific geographic locations to optimize cost and quality of service.

4. Image management

Experienced users of virtualization have observed how the number of images can seemingly explode. Utility systems need to provide image management that allows users to organize their images and easily deal with version control across the system. While all of them provide various forms of image management, they vary widely in the type of image management — from a global store of numerically identified images, to catalogs, namespaces and classes.

Additional improvements are needed to develop a fast way to create instances of images throughout geographically distributed systems, global access to images, access control to licensed images, and convenient version control. More difficult perhaps, but still required, will be licensing mechanisms that allow for the most popular software to be purchased directly through the utility.

5. VM configuration

The tremendous increase in the number of images also exacerbates the manual configuration of virtual machines. Unlike physical servers which are usually configured carefully once and then ideally left alone for a long time, in utility computing systems VMs are frequently moved around and reconfigured, restarted or shut down. Virtualization systems offer little to help the configuration process as they’re supposed to emulate physical machines and therefore leave the configuration to the VMs themselves.

Existing utility computing systems provide a variety of ways to provide configuration to the VMs, from passing a BLOB, to property abstractions and modifying parameters in configuration files, and setting up network configurations. Some systems also provide the ability to package the configuration of multiple VMs and encapsulate the actual VM changes.

As more operating systems are offered on utility systems these parameterization methods will need to expand. What is needed is an OS-independent configuration method (e.g., being able to configure a Solaris VM from Linux domain 0), as well as better configuration abstraction facilities.

6. IP address allocation

IP address assignment can create bindings between virtual machines, yet applications often require static IP addresses for public facing interfaces.

Xen simply provides to VMs what is available to physical servers, essentially either a static IP or DHCP configuration per VM. Utility computing systems extend this by automatically constructing private VLANs for related VMs, or automatically assigning IP addresses without the need for global DHCP service. Systems differ in the way they provide access to fully routable IP addresses, from disallowing routable addresses and using NAT, to fully allowing VMs to use IP addresses and configuration with the same flexibility that is available to physical servers.

A mechanism is needed for establishing external IP addresses in a way that allows automatic allocation, yet is still flexible enough to maintain static addresses for service end-points and interaction with DNS. IP address assignment needs to be enforced so that one VM cannot interfere with the operation of another. In addition, services that provide the ability to move IP addresses between geographic locations for disaster recovery will be required as larger users begin moving mission critical applications onto the services.

7. Monitoring/high availability

With applications running on a utility computing service, system administrators still need to be able to monitor operations and create systems that offer high availability.

Virtualization breaks the one-box-one-function relationship and makes it very hard to manually track down hardware failures and map them to logical servers (VMs) and services (services built from multiple VMs). At the same time, virtualization allows us to provide near transparent failover.

Existing utility computing systems deal with the isolation of VMs belonging to different customers, as well as with providing performance data of multiple related VMs in context of a bigger service. Collecting, correlating and analyzing the performance data of large services built of multiple VMs, as well as the ability to take actions based on performance data requires improved tools.

High availability is beyond the single server scope of standard server virtualization. Some utility computing system leverage the array of physical resources they control to automatically restart VMs from a failed server to another ready server. To improve this capability, though, requires more reliable failures detection and handling whether the issues arises from a server, disk, or network. Better integration with existing data center monitoring systems will also improve response and reporting.

Extended services

The preceding 7 services are those clearly recognizable as being required for basic utility computing based on existing commercial utility computing systems, but it’s not a complete list of possible innovations. Other services may be needed in order to build commercially viable systems. Here are few examples:

  • import/export of VMs, including multiple VMs and their configuration, in a way that can be recovered elsewhere
  • dynamic resizing of VMs, handling live migration and its interactions with the storage systems
  • resource metering and reporting, including self-serve access
  • really open standards that allow for interoperability of systems. The currently proposed open virtual machine format (OVF) got attention but is completely inadequate for the task and also closed from public comment.

In summary, the current level of virtualization technologies is inadequate to support and deliver true utility computing systems. However, there are seven critical services to complete the process toward utility computing model for transactional web applications. The issues IT industry is facing before delivering a utility computing are storage, Network virtualization, Scheduling, Image Management, virtual machine configuration, IP address allocations, Monitoring for achieving high vailability.

———
Bert Armijo and Peter Nickolov are founders of 3Tera, the leading innovator of grid and utility computing, simplifying the deployment and scaling of Web applications.

Peter is a serial enterpreneur and recognized expert in scalable infrastructure technologies including operating systems, networks and storage. In the last 20 years he has over 20 patents and 30 innovations in the areas of operating systems, multithreading, kernel mode and real-time software, network protocols, file systems and computing resource aggregation.

Bert is a veteran of multiple startups in the networking industry among wihich TopSpin Communications (now part of Cisco Systems) and Rapid City (acquired by Bay Networks and later folding into Nortel) . His blog is at www.3tera.com/hotcluster.html