When five 9s aren't enough

One of the largest financial systems in the world is hidden in a nondescript building near Washington. The owner, Visa International Inc., hasn't put its name on the building, nor will it allow a reporter to say exactly where it is. The secret data center is a fireproof, earthquakeproof concrete fortress with 5,000-pound doors and a basement full of backup gear, but it has fake windows to make it look like any of hundreds of ordinary office buildings in the area.

Paranoia? Not when you consider the stakes. Five minutes of downtime in Visa's worldwide processing system, called VisaNet, would block US$55 million in payment transactions, estimates the Foster City, Calif.-based firm.

"There is no such thing as 99.9 percent reliability; it has to be 100 percent," says Richard L. Knight, senior vice president for operations at Inovant Inc., the Visa subsidiary that runs its data centers. "Anything less than 100 percent, and I'm looking for a job." The company has had 98 minutes of downtime in 12 years.

Visa fights the battle against outages and defects on two broad fronts: Its physical processing plant is protected by multiple layers of redundancy and backups, and the company's IT shop has raised software testing to a fine art.

There are more than 1 billion Visa payment cards outstanding around the world, spawning $2 trillion in transactions per year for 23 million merchants and automated teller machines and Visa's 21,000 member financial institutions.

"We run the biggest payments engine in the world," says Sara Garrison, senior vice president for systems development at Visa U.S.A. Inc. in Foster City, Calif. "If you took all the traffic on all the stock markets in the world in 24 hours, we do that on a coffee break. And our capacity grows at 20 percent to 30 percent year to year, so every three years, our capacity doubles."

Visa has four major processing centers to handle that load, but the Washington facility is the largest, with half of all global payment transactions flowing through the building. It shares U.S. traffic with a center in San Mateo, Calif., but it can instantly pick up the full U.S. load if San Mateo goes down.

Indeed, everything in Visa's processing infrastructure from entire data centers to computers, individual processors and communications switches has a backup. Even the backups have backups. For example, the Washington center has four rotating uninterruptible power supply (UPS) units (only three are needed) driven by the local utility and backed up by an array of batteries and four 1-megawatt diesel-powered generators. The 24,000 gallons of diesel fuel stored on-site is enough to power the center for a week. The UPS units protect the center from possible power fluctuations. The facility has enough redundant cooling capacity to air-condition 300 homes.

"Visa understood early on that things like triple redundancy and scalability would be the critical, defining factors in a highly competitive landscape," says Randi Purchia, research director at AMR Research Inc. in Boston. "They realized that they are a technology company; it is their business."

The eight IBM mainframes at the Washington data center are rated collectively at 3,000 MIPS. Altogether, worldwide, 7,000 MIPS of processing power can conduct 10,000 payment-authorization transactions per second. Visa's network, one of the largest private networks in the world, consists of 9 million miles of copper and optical fiber, and every Visa customer has two paths into Visa via commercial carriers.

Every operations area at the data center is equipped with a blue light mounted high on a wall. The lights flash when the San Mateo center is down and the Washington facility has picked up the entire U.S. processing load. The lights are a warning to workers not to take any action that might escalate the outage.

"If the light comes on, everyone gets off the floor," says Anthony LaManna, vice president for operations and network services at Inovant. "They go get a cup of coffee or something."

While all these backups and safeguards contribute to Visa's ultrareliable operations, they're only part of the story. Every summer, well in advance of its year-end peak processing season, Visa runs a full-scale stress test at IBM's $1 billion Performance & Scalability Center in Gaithersburg, Md., where IBM has 14,000 MIPS of processing power. The tests cap months of requirements analysis, modeling and testing at Visa's own facilities.

"We introduce failures at that point as well," says Mike Wolfson, senior vice president of engineering at Inovant. "So while we are processing 5,000 messages a second, we'll knock off a storage controller and make sure the system doesn't skip a beat."

This kind of full-volume testing which Visa doesn't have the capacity to do in-house has proved itself, Wolfson says. Several applications that ran flawlessly in production at peak loads failed when the test load was increased to reflect volumes projected for the coming holiday season, he says.

And Visa tests more than the impact of higher volumes at the IBM center. New software is tested as well, says Mike McGraw, vice president of systems engineering at Inovant (see related story at left).

"These [legacy] applications have, for the most part, been written in IBM assembler," he says. "But now, with the use of C and C++, we have to see how that's going to behave. You can do all the modeling in the world, but unless you push it to its limit, you won't find out where things break."

Change Control

Visa makes 2,500 system changes per month and modifies 2 million lines of code annually, yet it has essentially no downtime in its worldwide payment-processing systems. How is that possible?

"We spend a lot of time on change management," says Anthony LaManna, vice president for operations and network services at Inovant. Visa recently completed a three-year overhaul of its 25-year-old, assembler-language-based clearing application, which processes 50 million to 100 million transactions each night to settle accounts among merchants and banks. In addition to unit and systems testing of the new C code by the development staff, 50 people in two quality assurance groups put the software through its paces.

One quality assurance group tested 600,000 transactions, carefully selected from production data to represent each of 50 types of services. The other group ran full-scale tests using five days of production data at 70 million transactions per day and then compared the results with actual runs for those days.

Visa also conducted user-acceptance testing among a sample of member banks, as well as life cycle testing in which 3,000 composite transactions (for example, a charge plus a later adjustment) were tracked over a seven-day period. About 40 percent of the entire project was devoted to these efforts, says Joel Mittler, Visa's senior vice president for strategic projects. "We added almost a year to the schedule when we realized the complexity of the testing," he says.

Scrutiny of the new software didn't end when it went into production, says Richard L. Knight, senior vice president for operations at Inovant. A command center was set up for 30 days and staffed around the clock with senior technical people able to respond to problems. And the firm set up a help desk for customers who had problems with their own software, which had to be modified to interface with Visa's new system.

Each of the 2,500 system changes is assigned one of four risk ratings, with Level IV being the lowest risk and Level I the highest, Knight says. He reviews those ranked I and II and routinely disapproves or reschedules any for which he feels the risk to system uptime is too great. And he insists that changes be designed in such a way that they can be made or reversed in less than an hour, if necessary.

Employee attitudes toward quality at Visa may be the biggest success factor, says Randi Purchia at AMR Research. "Pride in what they do pervades the entire organization. They've engineered that across divisions and down into incentive systems," she says.

"The most important thing is the people," Knight agrees. "They know what 1 second of downtime means."

Inside the Secret Center

Visa's Washington-area processing center houses 50 million lines of code for some 300 applications. Major functions include the following:

-- Authorization system. This online, IBM-mainframe-based system propels a payment card request from a cardholder to a merchant, to the merchant bank, then on to the card issuer and back to the merchant.

-- Clearing and settlement system. This mainframe batch system runs nightly and settles accounts among merchants, merchants' banks and card issuers.

-- Fraud-detection system. This online systems runs on Sun Microsystems Inc. servers and uses neural networks and pattern-recognition algorithms to look for fraud in each payment transaction.

-- Data warehouse. This mammoth storage facility consists of 18 Storage Technology Corp. silos and a 250,000-volume tape library holding up to seven years' worth of transaction histories. It grows by 250TB each month.

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

More about AMR ResearchIBM AustraliaInovantVisa

Show Comments
[]