Averting disaster

Comments

So your organization has gone global, with mission-critical applications spanning time zones and national borders. You're more extended - and more vulnerable, relying on not only the glass house down the hall but also on an Internet service provider in Guatemala or a telecommunications company in Kazakhstan to get your fancy Web-enabled applications to customers and suppliers.

How do you protect these far-flung systems against natural or man-made disasters? With a mix of centrally developed recovery processes, enough flexibility to account for local differences in culture and infrastructure, and the clout of upper management to ensure that it all gets done, say IT managers and disaster planning professionals.

Multinational companies have been running global applications for decades, of course. But in the past, they were often hosted on tightly controlled internal computer systems, accessed over expensive but reliable private networks and could tolerate an occasional 24-hour outage. Today's global applications are often a mishmash of custom and off-the-shelf applications running across the less-reliable Web, and because they're important, they must be brought back up within hours or even minutes - not days - after a crash.

Global systems often involve not only multiple locations or divisions of a company but also systems controlled by suppliers or customers.

"We have more than 300 [e-commerce] initiatives in our organization," says Julia Graham, group risk manager at London-based Royal & Sun Alliance Insurance Group PLC. "With a Web-based business, you could have many joint venture partners and suppliers, and the plan becomes a matrix of different recovery needs based on the potential scenarios that might arise."

Different regions of the world differ widely in the quality of physical recovery sites and the quality of staff at those sites, say IT managers. And because these applications support vital business functions, they must often be brought back up immediately.

"It's not fun," says Jay Leader, director of application development at Nypro Inc. in Clinton, Mass., a plastics molding company that operates 75 servers and has 4,000 users around the world. "It's hard enough ... to do domestically, when everyone speaks the same language and is in the same time zone," he says, but it's even harder "to try to coordinate an [IT] vendor in Singapore and a vendor from China."

The first step should be for business managers - ideally at the local business units, to ensure their buy-in - to decide what applications are most in need of protection and how much protection they're worth. This is often the point at which the critical but touchy issue of who will pay for this "application insurance" should get tackled but often isn't, says Gerard Minnich, a global business continuity program manager at Electronic Data Systems Corp. in Plano, Texas.

"Typically, where programs fail is at the [funding] level," he says, especially at a local business unit. Along with a corporate edict to provide disaster recovery, says Minnich, management must also provide a clear process for determining backup priorities and how to fund them.

"If you don't have guidelines and you don't have criteria, you won't have funding," Minnich says.

A Range of Price Tags

Business recovery costs vary widely. A basic assessment of a company's recovery needs might cost $50,000 to $100,000, while a large company might spend US$1 million per month for high-level disaster protection, says Todd Gordon, general manager of IBM's Business Continuity and Recovery Services division. In general, he says, companies should expect to spend between 7 percent and 15 percent of their overall IT budgets on disaster recovery.

Agreeing on how to bring a failed system back up is both more important and more tricky in a multinational environment. People in different parts of the world work according to different schedules and cultural rules - not to mention the fact that they speak different languages and live in different time zones.

"Synchronization of the recovery is real key," says Bill DiMartini, vice president of consulting operations at SunGard Planning Solutions, part of SunGard Data Systems Inc. in Wayne, Pa.

Say, for example, an outage that hits an enterprise resource planning system at midnight in Germany stops data flowing to and from a factory in Singapore. The factory will keep using parts and shipping products. But when the system in Germany is brought back up, the staffs in Singapore and Germany must synchronize the two databases not to the point when the German system went down but to the last backup on the German system.

Since synchronization is also required in day-to-day operations, some companies link disaster recovery planning to regular IT operations. That means linking the change management and version control done in the corporate data center to that done at a backup site, says Marshall McGraw, manager of IT business services at Phillips Petroleum Co. in Bartlesville, Okla.

"Let's say we do an upgrade internally to SAP [R/3] that affects the data that needs to be recovered, or [we change] the configuration of the hardware" on which R/3 runs, says McGraw. Unless the backup site knows about every such change, he says, " you spend a week trying to find all the changes you made [since you last] declared a disaster." Once the procedures are in place to keep the backup site in the loop, the ongoing effort to communicate those changes is minimal, he says.

Think Globally, Act Locally

Given the obstacles, few, if any, multinational firms are doing real-time recovery of global applications. They instead recover applications at local sites and then reconcile the changes around the world later, says Gordon.

But one global recovery practice won't serve everyone's needs. "Some of our operations are fairly small, and some of our operations are fairly significant," says Leader. One plan might be overkill at a small location but grossly inadequate at a large facility.

Many multinational companies issue centrally mandated guidelines for business recovery, leaving local units substantial flexibility in how they reach the goal. Some keep the strictest rein on applications that gather and share information affecting the entire business, giving local units more autonomy on site-specific systems.

Phillips Petroleum, for example, has centralized the operation and backup of its core SAP R/3 and Oracle applications, says McGraw. Every 24 hours, IT staff at headquarters ship backup tapes to a disaster recovery center. The central IT group also arranges for backup network links should the primary Web connections go down.

Remote sites are free to make their own arrangements for hot sites, data backup and backup network links, assuming they follow common recovery procedures, says McGraw.

Graham's colleagues at Royal & Sun are currently working on the third release of the company's worldwide standard for business continuity planning, part of which is based on basic principles of disaster recovery planning and part of which "will be very much influenced by the local business needs, including call centers and those related to the e' world," she says. If a business unit can develop a disaster recovery plan without using the central standards, "I'm perfectly happy with that," she says.

Something as expensive and unglamorous as disaster planning won't happen unless senior executives demand it and corporate auditors check to make sure it's done.

"The biggest challenges have come down to ... making executives aware of the critical nature of technology and accurately depicting the risks that a company or technology is exposed to," says Damian Walch, senior vice president of professional services at IT services firm Comdisco Inc. in Rosemont, Ill. He estimates that only 15 percent of his customers are proactively planning disaster recovery processes. "Most companies are still managing it in a reactive mode," he says.

Management backing makes disaster recovery an easier sell at Phillips, says McGraw. "We in IT aren't going out there trying to beat on people or begging people to have these things in place," he says. "Our board expects business recovery plans to be in place."

Minnich advises managers to not only establish clear processes for developing and funding disaster plans but also to set specific timetables and goals for each stage of the work. "Don't just throw a process at people and let [them] spin around for months and months," he says. "Set a clear finish line so the people who are writing the plans know when they are finished."

And "don't try to do everything at once," says Minnich. "Go after the things everyone knows needs to be protected," such as critical data centers.

"Show some success, show some value and then start building on top of that capability," he advises.

Scheier is a freelance writer in Boylston, Mass.