Considering continuity

Comments

Planning for business continuity means weighing risk: what systems can your business not live without? How can you get access to that data or computing power should something happen? And, perhaps most importantly, how much do you want to spend to have a backup plan - a plan which you, hopefully, might never need?

Taking the first step and conducting a business impact analysis not only shows executives the importance of planning for business continuity, but also highlights which areas need immediate attention and which can be pushed out or planned for future budgets.

Business continuity has evolved beyond IT, encompassing more than technology and moving toward a more communications - or people-centric view. Nevertheless, IT remains an important part of business continuity, often serving as the common link that ties everything together. Companies adopt business continuity plans in many different ways and for different reasons, but all agree on one thing: it can no longer be ignored.

Andreas Tilch, IT security manager at Melbourne-based National Foods, defines business continuity planning as planning for the rapid recovery of business operations in the wake of a disaster by ensuring continuity of the critical business functions. As a publicly listed Australian dairy company with products that have a relatively short shelf life, it is essential National Foods addresses such issues.

"As IT usually has a business supporting function, BCP should be considered as only part of a whole [of company] strategy," Tilch says. "IT certainly plays a critical part in the continuity planning but it is important to understand that the recovery of IT [alone] will not enable other business functions to operate."

The dairy company's critical operations - manufacturing and distribution - bring in annual revenues of more than $1 billion from brands such as Pura, Masters, Farmers Union, Yoplait, Fruche, Divine Classic, and YoGo. It also sells some premium cheese brands including King Island, South Cape, Timboon and Tilba.

Tilch said business units should drive BCP, based on their needs to ensure they have all the critical services and resources available so they can operate. "These critical services and resources are usually more than those IT is providing to them."

All about accessibility

Michael Zepernick, president of Computer Integrated Services (CIS), a systems integrator based in New York, learnt how important accessibility can be when operating in a distributed business model. CIS had been backing up, among other things, a Windows 2000 server, all its office applications, a RedHat Linux database server, a NetWare server, and its SQL-based help desk dispatch software, all part of an in-the-works plan to possibly create a remote-backup option for customers.

On September 11, the CIS office lost all phones, power, and Internet connectivity, and because the office was inside the "frozen zone" encircling the World Trade Centre, no one could enter, Zepernick says. But by restoring the most recent data from its tape libraries and setting up Citrix MetaFrame application-access solution, CIS was up and running again within 24 hours.

The company adopted multiple locations - a rented site in Manhattan with a fluctuating T1 line for sales staff, technical staff working from CIS customer sites, a network of help desks throughout the city for customer calls, and a good number of employees working from their homes. "We were accessing our data whether it was over dial-up lines, DSL, cable modems, T1 lines, and it didn't matter, MetaFrame handled it," Zepernick said.

Despite the scattered locations, everyone was able to get to the information he or she needed to be productive. The key, says Zepernick, was the commitment to doing remote backups consistently, "Because the need to have your servers and data in the same place seemed silly.

"At the end of the day, because we had the distributed environment ability, our sales and service - compared to what had happened - was really not impacted. We just had to inform people where to report to and where to go. People could choose to work where they felt most comfortable, not in a location dictated by where our systems were," Zepernick says.

Taking this experience to heart, Zepernick is convinced of the importance of backing up systems and data on a regular basis, and testing those backups to make sure everything is being correctly copied.

"[A business continuity plan] is all about staying in business - that's of paramount importance. It's the continuation of your operations, period," Zepernick says. "If you don't have a plan and something happens, you can almost start to assume that you're going to be out of business completely."

Obviously, most potential situations are not as dramatic as September 11. Tilch noted that "fortunately" he had not experienced a disaster situation yet, but added that it would be interesting to see whether Australian businesses hit by bushfires [or floods] had successfully used a BCP or were now creating such a plan.

Protecting centralised resources

At Henry Schein, a US distributor of healthcare products and services, a mix of distributed call centres and distribution centres and a centralised network and computing architecture presents an interesting business continuity planning challenge for CIO Jim Harding.

"Because we have eight distribution centres, if we lost one we can easily reroute that volume to other distribution centres," he said.

"But if we lose our centralised computing or our network, we're down for the count, basically."

Because of its centralised computing structure, the company put great emphasis on revamping its entire disaster recovery plan during 2001. This included working with AT&T to shore up and back up various systems, both centralised and decentralised. For example, by working with AT&T and adding call centre technology from Avaya, the company can automatically reroute customer calls should one call centre go down, or if too many calls flood a single location.

But the main networking resources required a more complex plan. According to Harding, Henry Schein changed its disaster services provider to IBM and added network capabilities from AT&T to back up systems, data, and other resources to IBM's offsite location, contracting with these companies to allow full recovery within 24 hours.

"We could have gone for a four-hour recovery, which means pretty much every time you record a transaction, you're rewriting that to a disk drive at the IBM location," Harding said.

"We opted out of that because the cost is about 10 times what it is to do what we're doing. Now, if you're a financial institution, 24 hours might be the end of the world - but for us, it's OK."

After losing some network capacity when a previous carrier was affected by the September 11 events, Henry Schein consolidated more services with AT&T, "because its own internal recovery capabilities are so far superior to most carriers", Harding says. Still, the company is going a step further and is in the process of adding a third frame carrier to its main sites as a backup to the ISDN dial-up line that backs up the AT&T connectivity.

"That wouldn't normally be necessary if we weren't so highly centralised," Harding adds. "There's a price for being centralised, and that's part of it. We still think the efficiency of it and our ability to service the market is superior because of our centralisation, but there are some costs associated with the redundancy and recovery systems you need to put in place."

The entire continuity plan, including the IBM and AT&T services, will be put to work in July when Henry Schein tests its resiliency by building systems from scratch based on a previous night's backup tapes, and then cutting the network over for a little while to test recovery plans. Harding says the company expects to do this kind of full test "at least annually", with more moderate continuity tests performed semi-annually.

A key concern for Harding was finding providers that had excellent business continuity and disaster recovery plans of their own, something he says every company should consider when making any outsourcing or service decisions - especially if it involves access and communications.

"Without the capability, I don't care what the price is," Harding explains. "Part of that is not only 'Do you have this hardware and connectivity and so forth', but 'How can you handle a regional disaster? If 10 of us go down, do you have the capacity to handle that?' That was a big criteria in making our selection."

*ational Foods' Tilch said it is a good idea to partner with your main IT infrastructure supplier.

"This could be IBM or HP, for instance, as it would be too expensive for companies to replicate their IT operations to [a standard that would allow] hot or warm recovery," he said.

However, Tilch noted, such disaster services providers would only recover IT services. Many other issues such as the urgent need to move employees to their new and temporary places of work and then supplying them with everything else they need would not be an IT function.

Another criteria was company buy-in; at Henry Schein, executives understood the need for a full business continuity plan involving not just IT but the support and participation of other business units as well, and were willing to make the necessary investments of time and money. From there, it was a matter of putting all the pieces in place over time, Harding adds.

"There is nothing particularly clever about it: it's just basic, fundamental, good, solid IT management," Harding says. "Getting it done, getting people to focus on it, getting organised - easier said than done."

Facing the nightmare

After experiencing a lot of growth during the past three years, advertising agency The Tracey Edwards Co had a legacy network and a need for better storage strategy, as well as "no real business continuance plan, which is to say: none. It was really an absolute nightmare waiting to happen," says Scot Villeneuve, vice president of e-business.

The company's advertising business creates numerous massive graphics and text files in both PC and Macintosh formats, which must be stored and remain accessible for long periods of time. This information is "crucial - the No. 1 thing to keeping our business alive if a disaster happened", Villeneuve said.

Because Tracey Edwards needed to upgrade its network and technology platform, Villeneuve was able to use that upgrade process to also add some business continuity elements, choosing to add an NAS solution from Storage Computer Corp. The new deployment, put in place about four months ago, includes hot spare drives and running RAID 5 to add fault-tolerance, as well as a "pretty robust nightly, weekly, and monthly backup system with 80GB DLT tapes".

"Those [tapes] are removed offsite, put into safe deposit, and if something happened to that network room and everything was destroyed, we could recover all our data a week back and only be out a week's worth of work, so the disaster recovery problem is addressed nicely," Villeneuve adds. Also, should something go wrong within the network such as a failed drive or a server getting too hot, Villeneuve and his team get an alarm or a page to notify them of the event.

The weekly schedule was chosen after examining and balancing the agency's specific backup needs with the built-in redundancy of the Storage Computer product and the cost of data backups, which become more expensive if done more frequently. Even though the business continuity elements were added as a subset of the network and storage upgrade, they were still a good chunk of the project budget.

"To be honest, the business continuity component of our hardware and software deployment purchase was probably 10 per cent of the total cost of the network, so we're talking a lot of money to have this tape backup automated and configured," Villeneuve says. However, he adds that making sure company owners or executives support business continuity is extremely important to formulating a successful plan, which will mean some added costs and some cooperation from each part of the business to work well.

"The bottom line for us is, if our building was all of a sudden beamed away by aliens and I was still here on Earth, I could take these tapes and probably - within a matter of a few weeks with a serious reinvestment of capital [to replace hardware] - get back up completely," Villeneuve said. "We sleep a little bit better at night."

- David Beynon contributed to this article.

The proof of the plan is in the testing

Your enterprise may have a business continuity plan in place, but when was the last time it was put to the test? If it's sitting in the bottom drawer under 10 centimetres of dust, then it's time for a revamp. Computerworld spoke to users, analysts and business continuity vendors to get some tips to keeping business continuity plans alive.

Analysts agree that the first stage of creating a disaster recovery and business continuity plan is risk assessment. Risks represent a likely cost to a business. It is crucial to quantify that cost in order to determine the appropriate controls. However, while controls reduce the costs associated with risks, it is important to remember controls themselves have a cost.

Stephen Frede, information security manager at AMP, said the fundamental equation to bear in mind when measuring risk is: risk = likelihood x consequences, where likelihood is the probability of an event and consequence is the dollar value.Frede, speaking at the SecurIT - Technology & Beyond conference in Sydney in May, said systems face many threats with lots of controls available to counter these threats.However, limited resources in IT shops make it hard to determine which control to implement. This is why risk assessment is so important for business continuity plans.

Frede, who works with AMP to ensure appropriate information security controls are in place, outlined the steps involved in risk management:

Establish a risk management framework - protecting your company's mission-critical information and assets
Define an appropriate degree of risk
Utilise a risk/security process that (a) identifies IT assets and (b) assesses threats
Define threats associated with the assets
Identify appropriate mitigation strategies
Integrate the IT security risk plan with the project management structure
Measure its success or failure

Graham Penn, research director, Asia-Pacific storage for IDC Australia, said it is critical to classify all the functions and assets, even staff, when undertaking risk assessment."There are two ways to look at [assessing your organisation's business continuity status]. Companies can do it from the top down, or the bottom up."When you do it top down, most organisations will ask questions like: 'What are my critical systems? Are there are others that are critical? How secure are they? Also physical security, senior mangers want to know how well [the systems] are protected," he said.

"Disaster recovery assessment from the bottom up requires looking at who provides business continuity and who does the physical security. It requires looking at the whole infrastructure of the organisation, how secure it is and when did you last prove that [your BC plan] works?".

According to Penn, a common misconception among organisations is when they think of disasters they'll think about the possibility of fire, but not all disasters are of that scale. All risks need to be taken into account when laying out a business continuity plan."Not all disasters are September 11 style. Some are caused by the delete key," he said. John Perkins, general manager of business continuity provider Alphawest 6 said: "There's not really a process that everyone holds up [to assess business continuity]."However, he believes companies should do it both from the top down and from the bottom up to get an accurate figure on the cost to the business.Perkins advises that enterprises ask themselves what the loss would be to productivity and the cost to business in the short, medium and long term for each IT asset?

For Perkins, business continuity encompasses more than IT, but involves "the four Ps": process, policy, procedures and people"."You basically need to look at processes and procedures as laid out in the policies. And you need to look at educating people so that they know," he said.While IT staff spend time making sure data is backed up or that people know how to access it, training and testing can get overlooked, Perkins said, adding that it is important to constantly review the processes to ensure they stay in tune with a changing market and its changing requirements."A few years ago, the process of mobility, such as handhelds and laptops was not really considered in a business continuity plan, whereas today it's very important. The data that people have needs to be safeguarded," he said.

Finally, Perkins recommends you test your plans and make sure your personnel know how to complete their responsibilities in an emergency."Make sure [the plan] works," Penn added.

-- Siobhan Chapman

Small change can cost big dollars

If business operations come to a halt, the damage to any business is related to the nature of the industry in which it functions, according to Ed Binney, chief information officer of IT market researcher Ideas International.In a foreign currency-dealing operation, for instance, where the main activity is to cover exposures in foreign currencies when big-dollar sales are transacted, the losses could be equated to hard dollars. "Small movements in the exchange rates can add up to large losses in these types of operations," Binney said.

Most Australian organisations have business continuity plans in place, he said, but this trend was mainly driven by their Y2K efforts.However, Binney pointed out: "A lot of companies would have plans that have not been fully maintained; that are out of step with their current business operations."Just how mismatched to the current operations these plans are will determine their effectiveness," he said.

Some classic examples of organisations with half-baked business continuity strategies were both start-ups and mature businesses, which during the early days of the Internet explored it for commercial use - mainly as a sales channel, Binney said. "During the mid to late 90s, the travel industry [in particular] was dotted with smaller players that believed a virtual business could build the scale of business activity that could otherwise only be achieved by having many bricks and mortar outlets.""Most of these players failed to invest in infrastructure supported by a business continuity plan to sustain their operations 24x7. "They disregarded the 24x7 reality of transacting on the Net, which meant they lost credibility and potential sales opportunities as a result."

According to an Ernst & Young 2002 Global Information Security Survey, which canvassed 450 CIOs, including 22 IT executives in Australia, 53 per cent of organisations had business continuity plans. The survey was undertaken after September 11.The main causes of business interruption failures were cited as hardware or software failure (56 per cent) and telecommunications failure (49 per cent).

-- Helen Han