Data center fabrics catching on, slowly

It takes some planning -- and expense -- to revamp switching gear at the enterprise level.

Comments

When Government Employees Health Association (GEHA) overhauled its data center to implement a fabric infrastructure, the process was "really straightforward," unlike that for many IT projects, says Brenden Bryan, senior manager of enterprise architecture. "We haven't had any 'gotchas' or heartburn, with me looking back and saying 'I wish I made that decision differently.'"

GEHA, based in Kansas City, Mo., and the nation's second largest health plan and dental plan, processes claims for more than a million federal employees, retirees and their families. The main motivator behind switching to a fabric, Bryan says, was to simplify and consolidate and move away from a legacy Fibre Channel SAN environment.

When he started working at GEHA in August 2010, Bryan says he inherited an infrastructure that was fairly typical: a patchwork of components from different vendors with multiple points of failure. The association also wanted to virtualize its mainframe environment and turn it into a distributed architecture. "We needed an infrastructure in place that was redundant and highly available," explains Bryan. Once the new infrastructure was in place and stable, the plan was to then move all of GEHA's Tier 2 and Tier 3 apps to it and then, lastly, move the Tier 1 claims processing system.

GEHA deployed Ethernet switches and routers from Brocade, and now, more than a year after the six-month project was completed, he says they have a high-speed environment and a 20-to-1 ratio of virtual machines to blade hardware.

"I can keep the number of physical servers I have to buy to a minimum and get more utilization out of them," says Bryan. "It enables me to drive the efficiencies out of my storage as well as my computing."

Implementing a data center fabric does require some planning, however. It means having to upgrade and replace old switches with new switching gear because of the different traffic configuration used in fabrics, explains Zeus Kerravala, principal analyst at ZK Research. "Then you have to re-architect your network and reconnect servers."

Moving flat and forward

A data center fabric is a flatter, simpler network that's optimized for horizontal traffic flows, compared with traditional networks, which are designed more for client/server setups that send traffic from the server to the core of the network and back out, Kerravala explains.

In a fabric model, the traffic moves horizontally across the network and virtual machine, "so it's more a concept of server-to-server connectivity." Fabrics are flatter and have no more than two tiers, versus legacy networks, which have three or more tiers, he says. Storage networks have been designed this way for years, says Kerravala, and now data networks need to migrate this way.

We look at it as an evolution in the architectural landscape of the data center network. Bob Laliberte, Senior Analyst, Enterprise Strategy Group

One factor driving the move to fabrics is that about half of all enterprise data center workloads in Fortune 2000 companies are virtualized, and when companies get to that point, they start seeing the need to reconfigure how their servers communicate with one another and with the network.

"We look at it as an evolution in the architectural landscape of the data center network," says Bob Laliberte, senior analyst at Enterprise Strategy Group. "What's driving this is more server-to-server connectivity ... there are all these different pieces that need to talk to each other and go out to the core and back to communicate, and that adds a lot of processing and latency."

Virtualization adds another layer of complexity, he says, because it means dynamically moving things around, "so network vendors have been striving to simplify these complex environments."

When data centers can't scale

As home foreclosures spiked in 2006, Walz Group, which handles document management, fulfillment and regulatory compliance services across multiple industries, found its data center couldn't scale effectively to take on the additional growth required to serve its clients. "IT was impeding the business growth," says Chief Information Security Officer Bart Falzarano.

The company hired additional in-house IT personnel to deal with disparate systems and management, as well as build new servers, extend the network and add disaster recovery services, says Falzarano. "But it was difficult to manage the technology footprint, especially as we tried to move to a virtual environment," he says. The company also had some applications that couldn't be virtualized that would have to be managed differently. "There were different touch points in systems, storage and network. We were becoming counterproductive."

To reduce the complexity, in 2009 Walz Group deployed Cisco's Unified Data Center platform, a unified data center fabric architecture that combines compute, storage, network and management into a platform designed to automate IT as a service, across physical and virtual environments. The platform is connected to a NetApp SAN Storage Flexpod platform.

Previously, when they were using HP technology, Falzarano recalls, one of their database nodes went down, which required getting the vendor on the phone and eventually taking out three of the four CPUs and going through a troubleshooting process that took four hours. By the time they got the part they needed, installed it and returned to normal operations, 14 hours had passed, says Falzarano.

Walz Group found its data center couldn't scale to grow as quickly as the business needed to serve clients, says Chief Information Security Officer Bart Falzarano. But there's been a dramatic change after the company installed a fabric.

"Now, for the same [type of failure], if we get a degraded blade server node, we un-associate that SQL application and re-associate the SQL app in about four minutes. And you can do the same for a hypervisor," he says.

IT has been tracking the data center performance and benchmarking some of the key metrics, and Falzarano reports that they immediately saw a poor-density reduction of 8 to 1, meaning less cabling complexity and fewer required cables. Where IT previously saw a low virtualization efficiency of 4 to 1 with the earlier technology, Falzarano says that's now greater than 15 to 1, and the team can virtualize apps that it couldn't before.

Other findings include a rack reduction of greater than 50% due to the amount of virtualization the IT team was able to achieve; more centralized systems management -- now one IT engineer handles 50 systems -- and what Falzarano refers to as "system mean time before failure."

"We were experiencing a large amount of hardware failures with our past technology; one to two failures every 30 days across our multiple data centers. Now we are experiencing less than one failure per year," he says.

(Next: Easy to implement)

Case study: Fabrics at work

When he used to look around his data center, all Dan Shipley would see was "a spaghetti mess" of cables and switches that were expensive to manage and error-prone. Shipley, architect at $600 million Supplies Network, a St. Louis-based wholesaler of office products, says the company had all the typical issues associated with a traditional infrastructure: some 300 servers that consumed a lot of power, took up a lot of space and experienced downtime due to hardware maintenance.

"We're primarily an HP shop, and we had contracts on all those servers, which were from different generations, so if you lose a motherboard from one model, they'd overnight it and it was a big pain," Shipley says. "So we said, 'Look, we've got to get away from this. Virtualization is ready for prime time, and we need to get out of this traditional game.'"

Today, what Supplies Network has built in its data center is about as far from traditional as it gets. Rather than deploying Ethernet and Fibre Channel switches, the company turned to I/O Director from Xsigo, which sits on top of a rack of servers and directs traffic. All of the servers in that rack are plugged into the box, which dynamically establishes connectivity to all other data center resources. Unlike other data center fabrics, I/O Director offers InfiniBand, an open standards-based switched fabric communications link that provides high-performance computing.

"On all your servers you get rid of all those cables and Ethernet and Fibre switches and connect with one InfiniBand cable or two, for redundancy, which is what we did," says Shipley. The cables are plugged into I/O Director. "You say 'On servers one through 10, I want to connect all of those to this external Fibre Channel storage' and it creates a virtual Fibre Channel storage network. So in reality, this is all running across InfiniBand, but the server ... thinks it's still connecting via Fibre Channel."

The configuration means they now only have two cables instead of several, "and we have a ton of bandwidth."

Supplies Network is fully virtualized, and has seen its data center shrink from about 20 racks to about four, Shipley says. Power consumption and cooling have also been reduced.

Shipley says he likes that InfiniBand has been used in the supercomputer world for a decade, and is low-cost and open, whereas other vendors "are so invested in Ethernet, they don't want to see InfiniBand win." Today, I/O Director runs at 56 gigabits per second, compared with the fastest Ethernet connection, which is 10 gigabits per second, he says.

In terms of cost, Shipley says a single port 10-gigabit Ethernet card is probably around $600, and an Ethernet switch port is needed on the other side, which runs approximately $1,000 per port. "So for each Ethernet connection, you're looking at $1,600 for each one." A 40-gigabit, single-port InfiniBand adapter is probably about $450 to $500, he says, and a 36-port InfiniBand switch box is $6,000, which works out to $167 per port.

Shipley says the company has now gotten rid of all of its core Ethernet switches in favor of InfiniBand.

"I was afraid at first because ... I didn't know much about InfiniBand," he acknowledges, and most enterprise architectures run on Fibre Channel and Ethernet. "We brought [I/O Director] out here and did a bake-off with Cisco's [Unified Data Center]. It whooped their butt. It was way less cost, way faster, it was simple and easy to use and Xsigo's support has been fabulous," he says.

Previously, big database jobs would take 12 hours, Shipley says. Since the deployment of I/O Director, those same jobs run in less than three hours. Migrating a virtual machine from one host to another now takes seconds, as opposed to minutes, he says.

He says he was initially concerned that because Xsigo is a much smaller vendor, it might not be around over the long term. But, says Shipley, "we found out VMware uses these guys."

"What Xsigo is saying is, instead of having to use Ethernet and Fibre Channel, you can take all those out and put [their product] in and it creates a fabric," explains Bob Laliberte, senior analyst at Enterprise Strategy Group. "They're right, but when you're talking about data center networking and data center fabrics, Xsigo is helping to create two tiers. But the Junipers and Ciscos and Brocades are trying to create that flat fabric."

InfiniBand is a great protocol, Laliberte adds, but cautions that it's not necessarily becoming more widely used. "It's still primarily in the realm of supercomputing sites that need ultra-fast computing."

Easy to implement

Like the IT executives at Walz Group, IT team leaders at GEHA believed that deploying a fabric model would not only meet the business requirements, but also reduce complexity, cost and staff needed to manage the data center. Bryan says the association also gained economies of scale by having a staff of two people who can manage an all-Ethernet environment, as opposed to needing additional personnel who are familiar with Fibre Channel.

"We didn't have anyone on our team who was an expert in Fibre Channel, and the only way to achieve getting the claims processing system to be redundant and highly available was to leverage the Ethernet fabric expertise, which we had on staff," he says.

Bryan says the association has been able to trim "probably a half million dollars of capital off the budget" since it didn't have to purchase any Fibre Channel switching, and a quarter of a million dollars in operating expenses since it didn't need staff to manage Fibre Channel. "Since collapsing everything to an Ethernet fabric, I was able to eliminate a whole stack of equipment," says Bryan.

GEHA used a local managed services provider to help with setting up some of the more complex pieces of the architecture. "But from the time we unpacked the boxes to the time the environment was running was two days," says Bryan. "It was very straightforward."

When he used to look around his data center, all Dan Shipley, architect at Supplies Network, would see was a spaghetti mess of cables and switches that were expensive to manage and error-prone.

And the performance, he adds, is "jaw-dropping." A test they did copying a 4-gigabyte ISO file from one blade to another blade through the network, with the network and storage going through the same fabric, occurred in less than a second, "and we didn't even see the transfer; I didn't think it actually copied," he says.

IT has now utilized the fabric for its backup environment with software from CommVault. Bryan says the association is seeing performance of about a terabyte an hour of throughput on the network, "which is probably eight to 10 times greater than before" the fabric was in place.

Today, all of GEHA's production traffic is on the fabric, and Bryan says he couldn't be more pleased with the infrastructure. He says scaling it out is not an issue, and is one of the major advantages with converged fabric and speed. GEHA is also able to run a very dense workload of virtual machines on a single blade, he says. "Instead of having to spend a lot of money on a lot of blades, you can increase the ROI on those blades without sacrificing performance," says Bryan.

Laliberte says he sees a long life ahead for data center fabrics, noting that this type of architecture "is just getting started. If you think about complexity and size, and you have thousands of servers in your environment and thousands of switches, any kind of architecture change isn't done lightly and takes time to evolve."

Just as it took time for a three-tier architecture to evolve, it will take time for three tier to get broken down to two tier, he says, adding that flat fabric is the next logical step. "These things get announced and are available, but it still takes years to get widespread deployments," says Laliberte.

Esther Shein is a freelance writer and editor. She can be reached at eshein@shein.net.

Read more about data center in Computerworld's Data Center Topic Center.