Virtualization shakes up backup strategy
- 22 February, 2008 09:38
Virtualization is causing customers to rethink their backup strategies, with technology that combines pieces of traditional and well-understood enterprise backup with some pieces that are unique in the virtualized world.
In the past, traditional enterprise backup in the vast majority of shops has included spinning disk for short-term and intermediate data use, archival tape for long-term storage, and software such as IBM Tivoli and HP StorageWorks.
But some say that's no longer enough in a virtualized world.
"You definitely can't take a wait-and-see approach with backup, especially now that more and more companies are using server virtualization in critical production environments," says Stephanie Balaouras, a senior analyst for virtualization strategies at Forrester Research. "Backup is going to become a major challenge if companies haven't explored their options."
Traditional backup systems have a one-to-one relationship with servers. These tried-and-true backup systems and associated software already support storage-area networks (SAN), fiber optics, and the latest operating system and server hardware updates. But they are not geared specifically for the complex world of virtualization, which involves multiple guest operating systems on the same box.
Dave Russell, Gartner's vice president of research for servers and storage, outlined three popular strategies for virtualization backups. The most common is putting software agents on each virtual machine (VM) and then using traditional enterprise backup software. A second approach is to create an image of the VM and either use a storage service hosted elsewhere or take daily snapshots of the logical unit number (LUN).
A third strategy is to use VMware consolidated backup (VCB) that incrementally archives the VM -- meaning it copies only what has changed since the last backup. In this way, companies can restore a single file, even from one of 30 guest operating systems that all reside on a single physical server.
"Most companies gravitate toward the backup agents and traditional backup software, which they are used to doing with a physical server, and it feels very natural and easy," says Russell. "But this approach has proven to be cost prohibitive because of the number and scale of VMs and the licensing required."
Backup agents are included with VMware and other virtualization products to help administrators integrate VMs into the traditional backup process. The main advantage is cost: The agents are free or add a relatively minimal fee. On the downside, agents force administrators to use a fairly simplistic approach: Admins can archive an entire virtualized server, but not pick and choose volumes or guest operating systems. Nor can server administrators restore specific portions of data, or substantiate (verify the data integrity) of VM volumes.
A new trend is for companies to create mirrors of the VM volumes, says Russell, because it provides more flexibility, reduces costs and allows a company to substantiate an entire location, which fits into an enterprisewide backup strategy for disaster recovery.
For example, at the Immune Tolerance Network (ITN) -- part of the University of California clinical research group in the US -- virtualization backups have become not just a part of disaster planning, but they actually help researchers with clinical trials to fight new diseases.
ITN archives the LUN, or the specific address of the hard disk drive. Using data de-duplication algorithms that weed out redundant data, it keeps multi-terabyte archives of virtual servers. Researchers can request additional archival LUNs, a process that would be difficult or impossible with physical servers.
"The traditional method of putting a tape in a backup system serving multiple servers is outdated," says Michael Williams, ITN's executive director of IT. "Once you move to virtual storage and separation of the volume from the physical disk, you can do very interesting things. The first thing we do when we provision a LUN is we oversubscribe it. A researcher believes they have 2TB volumes -- and they do."
But in reality, the LUNs are thin-provisioned, or allocated just enough storage space on a physical disk, based on snapshot policies, and they might only be 20GB each. That volume of data is backed up every four hours. This is equivalent to a hard crash backup (a complete archive of data that can be restored to a prior state), Williams says.
Williams explains that the archives -- created using Network Appliance's SnapShot and SnapMirror -- are then moved to an off-site location and archived further using Veritas NetBackup over a wide-area network to create a full-image backup on low-cost Serial Advanced Technology Attachment drives.
He describes the snapshot process as beneficial to the researchers because it is easier to request a restore and faster than it was in previrtualization days, but it is still complex for IT. A scientist could request a data retrieval, which is similar to a traditional storage-restore request, and not have to wait for IT to access a library of tapes and make the restore. But the virtual restore process is more complex for IT, because staffers might have to, say, find and mount a virtual LUN from a restore point located on a separate backup system, such as a Veritas archive. The end user can access the data in a matter of hours instead of the much longer time frames required by tape.
Another advantage involves data de-duplication, a process in which the backup software is smart enough to see the same data multiple times and keep only one archive of it. At ITN, for example, there are 150 virtual servers, and there may be as many as 100 Windows machines. NetApp can make one copy of an identical 8.5GB image for Windows and create a fingerprint file (a reference point) for each additional archive, which saves on disk space because NetApp does not make multiple backups of the same Windows data.
Continuous data protection
Health First, a group of hospitals and trauma centers, is using this strategy. The company runs 300 guest machines on 19 VMware ESX servers connected to a 150TB SAN. Health First uses IBM Tivoli for traditional backup, but because of its large virtual server infrastructure, the company decided to add a continuous backup system.
"We needed faster rebuild time in case of a disaster," says Jeff Allison, a Health First network engineer in charge of virtualization planning. "We use Vizioncore vRanger for hot backups of every virtual machine we have every night," he says. "Backups start at 5:00pm on two different machines and by 2:00am, we have backups for 230 boxes." The remaining 70 VMs are archived by the morning, and performance for the clinical applications is "not affected by the hot backups," Allison says.
Allison explains that the environment is more demanding in terms of uptime requirements for trauma centers and clinics, because data loss at a health facility could mean loss of life.
He describes one scenario where a controller failed on one test/development physical server that caused 80 VM development servers to be unavailable and unusable until a lengthy restore process could be initiated. On average, it could take several hours, he says. With a continuous backup system, the restore would now take about an hour and require perhaps one technician instead of several.
Indiana University presents another case for continuous backups, as opposed to VM mirroring or agents, because of the faster disaster recovery time and more granular data-archiving benefits.
A virtual machine is contained within a file that can be quiesced (archived incrementally) via a snapshot file, says Robert Reynolds, a senior software analyst at the university. "For the majority of our VMs, that quiesced file is stable enough to be then copied as a [disaster recovery] backup," he says. "Obviously, database servers and other transaction in-flight servers need more care in creating a DR backup.
"We run weekly jobs on each of our VMware ESX servers, using PHD's esXpress virtual backup appliances, to create the DR backups for our VMs," Reynolds says. "We create a copy on the local storage of the ESX server and we are in the process of developing a second phase to FTP the DR backup to another server where it will be picked up by Tivoli Storage Manager and sent offsite..."
A blend of approaches
"In the near term, a blend of these technologies might be the best approach -- taking an image-level backup and indexing those files continuously so that companies could do a single file restore, taking snapshots very rapidly, using a traditional backup application and VM agents to index the content on servers," says Gartner's Russell. "It does add more management complexity and another layer of abstraction to traditional backup, but the storage-resource tools are now catching up."