The decoupling of a users physical machine from the desktop and software

Desktop Virtualization Journal

Subscribe to Desktop Virtualization Journal: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Desktop Virtualization Journal: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Desktop Virtualization Authors: Bruce Popky, Vinod Mohan, Peter Silva, Jayaram Krishnaswamy, Bob Gourley

Related Topics: Virtualization Magazine, Desktop Virtualization Journal, Virtual Application Appliances

Article

Virtualization Eyes the Big Prize of Critical Production Applications

Now it’s time for virtualization software to pass the big test

Virtualization at Cloud Expo

Right after virtualization software brings world peace and cures H1N1, it will move on to solving cold fusion and global climate change.

All right, virtualization hasn't been hyped that heavily, but it has been a major hot topic in IT for a few years now, and in reality it has lived up to a good amount of the expectations it has generated.

Virtualization software running on x86 servers is proving itself to be a credible corporate data center technology.

Virtualization software's failover capabilities are a useful initial defense against failure for applications where some downtime is acceptable.

Early x86 virtualization adopters have so far focused on consolidating lightly used servers with neither performance-sensitive nor critical workloads. For these more forgiving apps, the failover and restart endemic to virtualized environments - which may extend three minutes, four minutes or more - is an improvement over a prolonged outage.

Now it's time for virtualization software to pass the big test: supporting business-critical production applications that have little or no downtime tolerance. These include e-mail, messaging and database servers, and business services such as online transactions and credit card authorization. This is where companies make their money - or lose it, if their applications fail for any reason.

Despite its image as a failure-proof solution, virtualization software alone can't deliver the uptime that most critical production applications require. Although virtualized environments can be more failure resistant than conventional IT environments, they are still vulnerable to system failures that can cause unacceptable downtime and data loss for critical applications. IT organizations have to be mindful of virtualization's pitfalls, and how they should influence infrastructure design. If implemented and managed improperly, virtualization adds layers of complexity to IT management. It also re-introduces an old problem that faded with the advent of client-server computing: single points of failure.

Nevertheless, virtualization software's accomplishments demonstrate that a virtualization solution can support the most demanding production applications, given proper consideration of maintaining uptime and controlling "virtual server sprawl" and other unnecessary complications. Creating virtualized environments that can support critical production applications means combining virtualization software with the right supporting technology to minimize downtime and complexity.

Understanding Virtualization's Limits
Virtualization software's claim to high availability comes from its use of shared resource pools of physical servers and storage to make applications "portable" across the IT infrastructure. When a physical server shuts down, virtualization software can automatically (if supported by the implementation) use a storage area network (SAN) to retrieve image(s) of the affected VM(s), including configuration state, disk state and so on. The replicated VM images can then be restarted at the disaster recovery site. Data can also be replicated to storage elsewhere on the network.

Virtualized environments' inherent resilience can be deceptive, however, and it comes with its own risks. When a VM reboot is necessary, the information about application state as well as in-flight data - data not yet written to disk - will be lost. At the same time, restarting virtual machines can take anywhere from three to 10 minutes, which is more downtime than many critical applications can tolerate. Concentrating multiple production applications on a single physical server to realize virtualization's cost benefits also violates the proven folk wisdom of not putting too many eggs in one basket. A single server crash in a conventionally architected data center can be troublesome, but they're rarely critical. In a virtual environment, a single server crash can take several critical workloads with it, subjecting applications to data loss and additional downtime when a physical server fails.

"The more workloads that you stack on a single machine, when something does go wrong, the more pain an organization is going to feel because all those applications have dropped on the floor," said industry analyst Dan Kuznetsky, vice president of research operations at The 451 Group. "Even when people are thinking about availability and reliability, they don't often think about how long it takes for a function to move from a failed environment to one that's working. Next, when they choose an availability regimen - a set of tools and processes - they often don't think about how long it will take for that resource to become available again once the failure scenario has started."

In addition to causing immediate data loss, downtime-inducing errors can propagate to the secondary virtual machines upon restart and crash it as well. That's because virtualization software solutions on the market today do not isolate the cause of a failure, whether in software or hardware. The failure that required the VM reboot could be repeated/replicated along with the same threats to uptime and data integrity. Finally, software-based high-availability or fault-tolerant solutions are not designed to deal with transient (temporary) hardware errors including device driver malfunctions, which may cause downtime and data corruption when left uncorrected.

The Hardware Factor
Complementing virtualization solutions with fault-tolerant server hardware hardens virtualized environment against unplanned downtime and data loss. When used with software-based x86 virtualization, fault-tolerant server hardware addresses availability and performance concerns, including scalability limitations, single points of failure, unplanned downtime, fault isolation, data integrity, problem resolution and more.

Fault tolerant hardware is defined as two processing units running in lockstep, performing the exact same task in the same way for every compute cycle. Hardware-based fault tolerance provides the highest levels of system availability - 99.999 percent or better, which translates to about five minutes of unscheduled downtime per year. Eliminating single points of failure is inherent to fault tolerant hardware design, which is described as providing continuous availability.

If one processor fails, the other keeps operating with no failover time - which means no break in data collection. If one of the server's components fails, its duplicate keeps the server processing with zero interruption and no degradation in performance. Failover is eliminated rather than minimized. Because protecting uptime and availability happens on a single physical server rather than requiring multiple servers, hardware cost of ownership and software license fees are less expensive and less complicated.

Software-based lockstepping is an often-mentioned alternative to hardware lockstepping for the same reasons software is always positioned as an alternative to hardware: cost and flexibility. Software lockstepping has its place in corporate IT infrastructures. But supporting critical applications isn't it. Software lockstepping increases management complexity and imposes additional overhead on CPU, network and I/O processing.

Software lockstepping depends on primary and secondary physical systems: at least one duplicate physical server, a duplicate copy of any software and the planning to ensure that failover is going to work properly. That amounts to adding servers - not consolidating them, as is the over-arching point of virtualization programs. In this sense, software lockstep resembles the server cluster approach that's been a high availability alternative for many years.

Software lockstepping can also degrade performance. It uses replication and a dedicated Ethernet connection to provide a heartbeat that keeps the primary and secondary virtual machines in sync. The replication and heartbeat can slow down application response time due to latency and processor overhead, particularly for applications that have high transaction volumes. Where does that lead? Complaints from users and a drop in service level performance.

Another performance-related drawback is that software lockstepping limits a virtual machine to a single processor core. That means virtual machines can't use the symmetric multiprocessing (SMP) or multicore capacity of a CPU, which restricts the application from scaling up in performance.

Fault-tolerant server architectures provide component and functional redundancy within the footprint of a single server to keep deployment and support simple.

Hardware-based fault tolerance eliminates the overhead of software emulation along with the I/O limitations and scalability constraints that software lockstepping imposes. Being able to use true symmetric multiprocessing (SMP) allows applications running in virtual machines to scale across multiple CPU cores. IT must be able to guarantee such performance and scalability before they can entrust revenue-producing applications to a virtual environment.

A Winning Combination - or Else
Although high availability and disaster recovery drive many virtualization projects, companies must determine whether their "high-availability" or "fault tolerant" virtualization software leaves them exposed to the risks of application downtime and data loss. Corporate IT staffs have little choice but to tackle these questions. Economic pressure is driving their companies relentlessly toward virtualization to cut costs, according to Burton Group Senior Analyst Chris Wolf.

"With the budget constraints IT is under, they are going to be under even more pressure to virtualize more production applications. As you virtualize production applications, availability of those virtualized applications becomes an even more critical issue," Wolf said. "The thing that's in virtualization's favor is that from an ROI perspective, it continues to be an absolute no-brainer. It allows you to remove hardware from your data center, so I'm reducing my hardware maintenance costs. I'm reducing my energy cost associated with that hardware as well. The bottom line is that with a well-architected virtualization solution, a 6 to 18 month ROI is highly likely. Many organizations have already made the initial investments for virtualization - at least in the large enterprises - so it's an incremental investment."

Virtualization software and fault-tolerant hardware together are the right combination for realizing virtualization' economic benefits while providing the reliability, scalability and simple management required for running critical production applications.  And who knows, maybe cold fusion isn't that far off.

More Stories By Phil Riccio

Phil Riccio is product manager, virtualization, at Stratus Technologies. He is a customer-focused technology professional with extensive product marketing, business development, product management and sales experience.