Computerworld

Cisco bets state-of-the-art data center on UCS

Cisco has a new green data center built on integrated blade architecture
  • John Dix (Network World)
  • 09 November, 2010 02:06

Cisco bet big on its UCS products for data centers – and now it's going "all in" with a massive, resilient and green data center built on that integrated blade architecture.

The world's coolest data centers

In fact, the company as a whole is migrating to the year-old Unified Computing System – Cisco's bold entree into the world of computing -- as fast as possible. Plans call for 90 per cent of Cisco's total IT load to be serviced by UCS within 12 to 18 months.

The strategy - what Cisco calls "drinking its own champagne" instead of the industry's more commonly used "eating your own dog food" - is most evident in the new data center the company is just now completing in the Dallas/Fort Worth area (exact location masked for security) to complement a data center already in the area.

Texas DC2, as Cisco calls it, is ambitious in its reliance on UCS, but it is also forward leaning in that it will use a highly virtualized and highly resilient design, act as a private cloud, and boast many green features. Oh, and it's very cool.

But first, a little background.

John Manville, vice president of the IT Network and Data Services team, says the need for the new data center stemmed from a review of Cisco's internal infrastructure three years ago. Wondering if they were properly positioned for growth, he put together a cross-functional team to analyze where they were and where they needed to go.

The result: a 200-page document that spelled out a wide-ranging, long-term IT strategy that Manville says lays the groundwork for five to 10 years.

"It was taken up to the investment committee of Cisco's board because there was a request for a fairly substantial amount of investment in data centers to make sure we had sufficient capacity, resiliency, and could transform ourselves to make sure we could help Cisco grow and make our customers successful," Manville says. (Manville talks data center strategy, the migration to UCS, cloud TCO and describes a new IT org structure in this Q&A.)

The board gave the green light and Manville's team of 450 (Cisco all told has 3,100 people in IT) is now two and a half years into bringing the vision to reality.

"Part of the strategy was to build data centers or partner with companies that have data centers, and we bundled the investment decisions into phases," Manville says.

The company had just recently retrofitted an office building in the Dallas area –- what Cisco calls "Texas DC1" -- to create a data center with 28,000 square feet of raised floor in four data halls. The first phase of new investments called for complementing Texas DC1 with a sister data center in the area that would be configured in an active/active mode – both centers shouldering the processing load for critical applications -- as well as enhancements to a data center in California and the company's primary backup facility in North Carolina.

The second investment round, which the company is in the middle of, "involves building a data center and getting a partner site in Amsterdam so we can have an Active/Active capability there as well," Manville says.

A third round would involve investment in the Asia-Pacific region "if the business requirements and latency requirements require that we have something there," he says.

Excluding the latter, Cisco will end up with six Tier 3 data centers (meaning n+1 redundancy throughout), consisting of a metro pair in Texas, another pair in the Netherlands, and the sites in North Carolina and California. The company today has 51 data centers, but of that only seven are production centers while the rest are smaller development sites, says IT Team Leader James Cribari. So while there is some consolidation here, this overhaul is more about system consolidation using virtualization and migration to new platforms, in this case UCS.

Cisco today has more than 16,000 server operating system instances, dedicated and virtual, production and development. Of that, 6,000 are virtual and 3,000 of those VMs are already on UCS (Cisco has about 2,500 UCS blades deployed globally). The plan is to get 80 per cent of production operating system instances virtualized and have 90 per cent of the total IT workload serviced by UCS within 12 to 18 months, Manville says.

While job one is about capacity and resiliency, there is a significant TCO story, Manville says.

The cost of having a physical server inside a data center is about $3,600 per server per quarter, including operations costs, space, power, people, the SAN, and so forth, Manville says.

Adopting virtualization drives the average TCO down 37 per cent, he says. "We think once we implement UCS and the cloud technology we can get that down to around $1,600 on average per operating system instance per quarter. Where we are right now is somewhere in the middle because we're still moving into the new data center and still have a lot of legacy data centers that we haven't totally retrofitted with UCS or our cloud."

But he thinks they can achieve more: "If we get a little bit more aggressive about virtualization and squeezing applications down a more, we think we can get the TCO down to about $1,200 per operating system instance per quarter."

Texas DC1

The current anchor site for the grand IT plan is the relatively new DC1 in the Dallas area.

The 5-megawatt facility is already outfitted with 1,400 UCS blades, 1,200 of which are in production, and 800 legacy HP blades. HP was, in fact, Cisco's primary computer supplier, although it also uses Sun equipment in development circles. The goal is get off the HP stuff as quickly as possible, Manville says. (Tit for tat, HP just announced it has eradicated Cisco WAN routers and switches from its six core data centers.) (Pic2: A UCS rack with 5 UCS blade chassis, each of which can accommodate up to eight multicore servers, and top of rack switches for connection to storage and network switches.)

While Cisco had initially thought it would need to keep its HP Superdomes for some time – essentially these are mini-mainframes – Manville says tests show a 32-core UCS is an adequate replacement. It also looks like Cisco can migrate off the Sun platforms as well.

Of Cisco's 1,350 production applications, 30 per cent to 40 per cent have been migrated to DC1 and eventually will be migrated to DC2 as well. DC2 will be the crown jewel of the new global strategy, a purpose-built data center that will be UCS from the ground up and showcase Cisco's vision and data center muscle. It will also work hand-in-hand with DC1 to support critical applications.

Texas DC2

Cisco broke ground on DC2 in October 2009, a 160,000-square foot building with 27,000 square feet of "raised floor" in two data halls. Actually the data center doesn't have raised floors because of an air-side economizer cooling design (more on that later) that preempts the need, but many insiders still refer to the data halls using the old lingo. Another twist: the UPS room in this 10 megawatt facility doesn't have any batteries; it uses flywheels instead.

IT Team Leader Cribari, who has built data centers for Perot Systems and others, says it normally takes 18 to 20 months to build a Tier 3 data center, while the plan here is to turn the keys over to the implementation folks in early December and bring the center online in March or April.

"This is very aggressive," agrees Tony Fazackarley, the Cisco IT project manager overseeing the build.

While the outside of the center is innocuous enough – it looks like a two-story office building – more observant passersby might recognize some tell tales that hint at the valuable contents. Besides the general lack of windows, the building is surrounded by an earthen berm designed to shroud the facility, deflect explosions and help tornados hop the building (which is hardened to withstand winds up to 175 mph). And if they know anything about security, they might recognize the fence as a K8 system that can stop a 15,000 pound truck going 40 mph in one meter. (Pic 3: K8 fence system backed by a hydraulic road block.)

Another thing that stands out from outside: the gigantic power towers next door, one of the main high voltage lines spanning Texas, Fazackarley says. Those lines service a local substation that delivers a 10 megawatt underground feed to the data center, but Cisco also has a second 10 megawatt feed coming in above ground from a separate substation. The lines are configured in an A/B split, with each line supplying 5 megawatts of power but capable of delivering the full 10 megawatts if needed. (Pic 4. Ample supply of power.)

Network connections to the facility are also redundant. There are two 1Gbps ISP circuits delivered over diversely routed, vendor-managed DWDM access rings, both of which are scheduled to be upgraded to 10Gbps. And there are two 10Gbps connections on DWDM links to the North Carolina and California data centers, with local access provided by the company's own DWDM access ring. As a backup, Cisco has two OC-48 circuits to those same remote locations, both of which are scheduled to be upgraded to 10Gbps in March.

The lobby of Texas DC2 looks ordinary, although the receptionist is behind a bulletproof glass wall and Fazackarley says the rest of the drywall is backed by steel plate.

Once inside you'll find space devoted to the usual mix of computing and networking, power and cooling, but there's innovation in each sector.

Take the UPS rooms. There are two, and each houses four immense assemblies of flywheels, generators and diesel engines, which together can generate 15 megawatts of power.

The flywheels are spun at all times by electric motors and you have to wear earplugs in the rooms because the sound is deafening, even when the diesel engines are at rest.

In the event of a power hiccup, the flywheels spinning the generators keep delivering power for 10 to 15 seconds while the diesel engines are started (each diesel has four car-like batteries for starting, but if the batteries are dead the flywheels can be used to turn over the diesels). Once spun up, clutches are used to connect the diesels to the generators. (Pic 5: The silver tube contains the electric motor that drives the flywheel, the flywheel itself and the generator, and the diesel engine is in blue.)

All the generators are started at once and then dropped out sequentially until the supply matches the load required at the moment, Fazackarley says. But the transfer is fast because the whole data center is powered by AC current and, because there are no batteries, there is no need to step the current up and down and resynch it as is required when DC battery power is used.

The facility has 96,000 gallons of diesel on premise that can power the generators for 96 hours at full load. If more is needed, there is a remote refueling station and Cisco has service-level agreements with suppliers that dictate how fast the facility has to be resupplied in the event of an emergency.

Page Break

Cooling plant

To cool the data center Cisco uses an air-side economizer design that reduces the need for mechanical chilling by simply ducting filtered, fresh air through the center when the outside temperature is low enough. The design saves energy and money and of course is very green.

To understand how that works you need to have a handle on the main components of the cooling system, the pre-chilling external towers, the internal chillers and the air handlers.

The first stage includes three 1,000-ton cooling towers on the roof of the facility, where water is cooled by dripping it down over a series of louvers in an open air environment and then collected and fed to the chillers in a closed loop. (Pic 6. Tony Fazackarley in front of the cooling towers.)

That pre-cooled water is circulated through five chillers (three 1,000-ton and two 500-ton machines), reducing the amount of refrigeration required to cool water in a second closed loop that circulates from the chillers to the air handlers. (The chillers don't use CFC coolant, another green aspect of the facility.) (Pic 7. One of five chillers.)

A series of valves activated by cranks spun by chains makes it possible to connect any tower to any chiller via any pump, a redundancy precaution. And on the green side, the chillers have variable frequency drives, meaning they can operate at lower speeds when demand is lower, reducing power consumption. (Pic 8. The pumps used to circulate the cooling fluids; note the chains hanging from the valves that can be used to reconfigure the system on the fly.)

The chillers feed coils in the big boxy air handlers which pull in hot air from the data halls and route conditioned air back to the computing rooms. So far, nothing too outlandish for a large, modern data center. But here is where the air-side economizer design comes into play, a significant piece of the green story. (Pic 9. Air handlers play a key role in the air-side economizer design, making it possible to cool the facility using fresh, outside air.)

When it is below 78 degrees, the chillers are turned off and louvers on the back of the air handlers are opened to let fresh air in, which gets filtered, humidified or dehumidified as needed, and passed through the data halls and out another set of vents on the far side.

Fazackarley says they estimate that, even in hot Texas, they will be able to operate in so-called free-air mode 51 per cent of the time, while chillers will be required 47 per cent of the time and two per cent of the time they will use a mix of the two.

Savings in cooling costs are expected to be $600,000 per year, a huge win on the balance sheet and in the green column.

When online, DC2 should boast a Power Usage Effectiveness (PUE) rating of 1.25. PUE indicates how much of the power in the data center goes to computing vs. cooling and other overhead.

How good is a PUE of 1.25? "Very good, as it requires a very high level of IT and physical infrastructure optimization in tandem," says Bruce Taylor, vice president of Uptime Institute Symposia. "But keep in mind a new data center usually has a 'bad' utilization effectiveness ratio because of the standard practice of building the physical facility, including the power and cooling systems, prior to its actually being needed, to allow for capacity demand growth. Leaders like Intel are able to design facilities that tightly couple the IT hardware and the electrical and mechanical systems that power and cool it."

And Taylor is a fan of the air-side economizer design: "Wherever it is feasible to use 'free' outside air in the management of thermals, that increases effectiveness and energy efficiency."

Other green aspects of the facility:* Solar cells on the roof generate 100 kilowatts of power for the office spaces in the building.* A heat pump provides heating/cooling for the office spaces.* A lagoon captures gray water from lavatory wash basins and the like and is used for landscape irrigation.* Indigenous, drought-resistant plants on the property reduce irrigation needs.(Pic 10. The roof-top solar arrays provide electricity for the office spaces.)

Data halls

The data halls, of course, haven't yet been filled with computing gear, just the empty racks that will accept the UCS chassis. While there is no raised floor, the concrete slab has been tiled to mimic the standard raised floor layout to help the teams properly position equipment. (Pic 11. Tony Fazackarley and Jim Cribari in a data hall with the racks that will accept the UCS systems. Note the tiles on the concrete slab mimic the typical raised floor dimensions.)

Air can't be circulated through the floor, but Cisco uses a standard hot/cold aisle configuration, with cold air pumped down from above and hot air sucked up out of the top of the racks through chimneys that extend part way to the high ceiling above the cold air supply. The idea, Cribari says, is keep the air stratified to avoid mixing. The rising hot air either gets sucked out in free-air mode or is directed back to the air handlers for chilling.

Power bus ducts run down each aisle and can be reconfigured as necessary to accommodate different needs. As currently designed, each rack gets a three-phase, 240-volt feed.

All told, this facility can accommodate 240 UCS clusters (120 in each hall). A cluster is a rack with five UCS chassis in it, each chassis holding eight server blades and up to 96GB of memory. That's a total of 9,600 blades, but the standard blade has two sockets, each of which can support up to eight processor cores, and each core can support multiple virtual machines, so the scale is robust. The initial install will be 10 UCS clusters, Cribari says.

Network-attached storage will be interspersed with the servers in each aisle, creating what Cribari calls virtual blocks or Vblocks. The Vblocks become a series of clouds, each with compute, network and storage. (Pic 12, UCS racks.)

The UCS architecture reduces cable plant needs by 40 per cent, Cribari says. Each chassis in a cluster is connected to a top-of-rack access switch using a 10Gbps Fibre Channel over Ethernet (FCoE) twinax cable that supports storage and network traffic.

From that switch, storage traffic is sent over a 16Gbps connection to a Cisco MDS SAN switch, while network traffic is forwarded via a 40Gbps LAN connection to a Cisco Nexus 7000 switch. In the future, it will be possible to use FCoE to carry integrated storage/LAN traffic to the Nexus and just hang the storage off of that device.

The cable reduction not only saves on upfront costs – the company estimates it will save more than a million dollars on cabling in this facility alone – but it also simplifies implementation, eases maintenance and takes up less space in the cabinet. The latter increases air circulation so things run cooler and more efficiently.

That air circulation, in fact, is what enables Cisco to put up to five chassis in one rack, Cribari says. That's a total of about 13 kilowatts per rack, "but we can get away with it because the machines run cooler without all that cabling and air flow is better."

Put to use

When all is said and done and Texas DC2 comes online, it will be married to Texas DC1 in an active/active configuration -- creating what Cisco calls a Metro Virtual Data Center (MVDC) -- that will enable critical applications to live in both places at once for resiliency, Cribari says.

With MVDC, which will be emulated in a pair of data centers in the Netherlands as well, traffic arrives at and data is stored in two locations, Cribari says. Applications that will implement MVDC include critical customer facing programs, such as Cisco.com to safeguard order handling, and apps that are central to operations, such as the company's demand production program.

Cisco is currently trialing MVDC using applications in DC1 and a local collocation facility.

DC2 will otherwise serve as a private internal cloud, supporting what the company calls Cisco IT Elastic Infrastructure Services, or CITEIS. "It's basically targeted at the infrastructure-as-a-service layer, combining compute, storage, and networking," Manville says. "CITEIS should be able to service 80 per cent of our x86 requirements, but we think there are still going to be some real high-end production databases we'll have to serve with dedicated environments, and maybe not even virtualized, so using UCS as a bare-metal platform."

The virtualization technology of choice for CITEIS is VMware supporting a mix of Linux and Windows. Regarding the operating system choice, Manville says "there is no religion about that. We'll use whatever is needed, whatever works."

While Manville says cloud tech will account for half of his TCO expectations, the other half will stem from capabilities baked into UCS, many of which improve operational efficiencies.

When you plug a blade into a UCS chassis, for example, the UCS Manager residing in the top-of-rack switch delivers a service profile that configures everything from the IP address to the BIOS, the type of network and storage connections to be used, the security policies and even the bandwidth QOS levels.

"We call it a service profile instead of a server profile because we look more at what the apps that will be supported on the blade will require," says Jackie Ross, vice president of Cisco's Server Access and Virtualization Group.

Once configured, service profiles can be applied to any blade, and storage and network connections can be changed as needed without having to physically touch the machine; any blade can access Ethernet, Fibre Channel, FCoE, etc., Ross says.

That speeds provisioning, aiding agility, Manville says. The goal is to get to 15-minute self-service provisioning. "We have this running but haven't turned it over to the application developers for various chargeback and other authorization issues. But our sys admins are seeing significant productivity gains by being able to provision virtual machines in an automated fashion."

Taken all together, the broad new IT strategy – including the build-out of Texas DC2 and the shift to a highly virtualized cloud environment driven by the company's new computing tools – is quite ambitious and, if they pull it all off, will be quite an accomplishment.

Cisco is definitely taking the long view. There is enough real estate at the DC2 complex, and the core infrastructure has been designed to accommodate, the doubling of the "raised floor" space in coming years.