As more and more servers are virtualized, connections between them are increasingly handled by virtual switches running on the same servers, begging the question, does the top of rack data center network switchultimately get subsumed into the server?
Advocates say yes, especially given that servers today are packed with multicore processors, additional Layer 2 intelligence and dense optical connectors. Upstream core connectivity could then be provided by optical cross connects that just move traffic based on directional guidance from the server.
Pessimists say no, or not right away. Servers will continue to assume more switching duties between virtual machines, but the ToR will live on for some time to come.
"The short answer is no," says Alan Weckel, switching analyst at Dell'Oro Group, when asked if servers will eventually replace ToR switches. "At the end of the day, it will be rack servers connected to top of rack switches. That's 80% of the market now. That ToR isn't going anywhere."
Alan Weckel, switching analyst at Dell'Oro Group
Fiber Mountain is one company that disagrees. The startup makes software-controlled optical cross connects designed to avoid as much packet processing as possible by establishing what amounts to directly attached, point-to-point fiber links between data center server ports.
+ MORE ON NETWORK WORLD: Juniper unbundles switch hardware, software +
"We're getting rid of layers: layers of switches, layers of links between switches," says MH Raza, Fiber Mountain founder and CEO. "Switching as a function moves from being inside a box called a switch to a function that co-resides inside a box we call a server. If we put the switching function inside a server, it's the same logic as a rack front-ending a number of servers; it's the housing of a server with a switch in it front-ending a bunch of VMs. Why can't that decision be made at the server? It can be made at the server."
Raza says he knows of a vendor whom he wouldn't name offering an Intel multicore server motherboard with a Broadcom Trident II switch chip and a high capacity fiber connector. The 1U device has a fiber port that can support up to 64 25Gbps lanes at 800G to 1.6Tbps of capacity which is similar in capacity to the Intel and Corning MXC connector. With the MXC and similar silicon photonics, servers can communicate directly without any switch between them, Raza says.
"The decision could be made by the server," he says. "I can assign packets going out the right lane. How many places does it need to go? Ten, 12, 40? Not a problem. When you have an MXC connector you could take them to 32 different destinations."
Raza says this is possible now but no one is talking about it due to its disruptive potential. We are still wearing the blinders of traditional network thinking. "Nobody is talking about this because it is based on how fast silicon photonics will get adopted," Raza says. "But it can be done now. The timing depends on investments and shifts" in technology and markets.
Given VMware's NSX product is designed to handle virtual switching in VMware virtual server environments, you might think the company would be a big proponent of the idea that servers will eventually subsume switches. But even though the server-as-a-ToR switch architectural model is being proposed for hyperscale environments, Guido Appenzeller, chief technology strategy officer in VMware's Network Security Business Unit, has never seen it used.
"If you want to get rid of ToR altogether, you need new silicon in servers," like packet sorting engines, Appenzeller says. "You probably need a mini switch in the server. It doesn't work with today's architecture."
That mini switch would be an Ethernet device enabling a direct server-to-server fabric. Another option would be a Layer 1 cross connect and multiplexer on the server motherboard, Appenzeller says.
Appenzeller gives the nod to the Ethernet mini switch implementation due to its familiarity in the server world, and its ability to do virtual LAN separation, something optical cross connects cannot do, he says. "I've never seen either deployed," Appenzeller says. And both may be impractical given the steadily dropping price of ToR switch ports. "ToR prices are coming down quickly."
Dell'Oro Group group agrees. The company reports that the average selling price of a 10G Ethernet port will drop from $715 to $212 between 2011 and 2016.
And the price/performance of network silicon from suppliers like Broadcom and Mellanox is outpacing that of general purpose CPUs, according to JR Rivers, CEO and cofounder of Cumulus Networks, a maker of network operating system software for bare metal switches. Also, bogging down the central CPU with networking features would sap its value. "When you start to put really beefy stuff in the middle of your CPU silicon, you diminish returns," Rivers says.
Optical interconnects and backplanes have also been evaluated before but never took off due to cost and complexity, Rivers says. Intel's RackScale architecture, which disaggregates and pools compute, networking and storage to make an IT rack more flexible and agile through software, proposes a photonic interconnect as a fabric weaving together those pooled resources.
But this may prove too complex to be practical, Rivers suggests.
"Optical backplanes are too complex, and that's why it hasn't played out," he says. "RackScale is too tightly coupled for today's data center environments and too much of a highly engineered system, as opposed to a loosely coupled system able to move at network scale and speeds. RackScale looks like a one type fits all, which inevitably it doesn't, and oftentimes customers don't get the benefit out of it."
He likens it to efforts to embed blade switches in blade servers, which users virtually ignored. Instead, they used pass thru modules to Cisco switch ports.
In the same vein, Rivers is skeptical of the idea of using optical technology in the data center to directly connect servers and bypass a ToR switch.
"A large file takes less than a second to transfer, so it's hard to get leverage out of that technology," Rivers says. "It's hard to see optical crosspoint switches being the fundamental technology element that changes networking forever. They've existed for quite some time."
As will ToR switches, according to Intel. Even though servers will take on more switching intelligence and local function, the horsepower will remain with a physically separate and distinct switch.
"The ToR will still be relevant in the data center," says Steve Price, general manager of Intel's Communications Infrastructure Division. "The trend is that network intelligence is also increasing in the server shelf. Policy enforcement and multi-tenant tunneling capability today occurs in either the vSwitch or the ToR, for example. With increased compute density within the rack and the emergence of SDN and NFV on servers, there will be an increase in east-west traffic at the shelf level across both virtual and physical switching. The server will become a hybrid platform able to process packets through software on Intel Architecture and use a shelf-level switch to aggregate and manage traffic workloads across multiple servers."
Shelf-level switching in the server can provide low latency connectivity to multiple server sleds within the shelf, along with traffic aggregation through 100G Ethernet uplinks to the ToR, Price says. Besides, providing high port count switching within each server shelf increases cabling costs, he says, so Intel proposes the aggregation of all server shelf traffic through 100G uplinks to a ToR.
Intel's strategy is to increase investment in Open vSwitch community projects with a focus on a data plane development kit (DPDK) to advance both virtual switching performance on Intel Architecture, as well as enable hardware offloading to the NIC and/or physical switch when needed. DPDK is currently planned for inclusion in Open vSwitch 2.4, Price says.
As for the RackScale architecture initiative, that's focused on hyperscale data centers where administrators want to reduce total cost of ownership, and increase resource flexibility and agility, Price says.
Intel and Cisco have had discussions on the RackScale architecture, and generally on server/switch disaggregation and distributed memory, according to Dan Hanson, technical marketing director from Cisco's Computing Systems Product Group. Cisco's views on switch disaggregation are complementary with Intel's, Hanson says, but they diverge on how best to achieve it.
"The idea holds a lot of promise. A lot of people have been driving towards something like that," Hanson says. "We're just trying to figure out the best way to do it."
Intel's DPDK is an enabler, Hanson says, as are some of the hardware assist capabilities of Cisco's UCS servers in Network Functions Virtualization applications where general purpose x86 platforms lack adequate horsepower. But how best to achieve distributed, disaggregated switching and memory management and when the industry is ready for it is still open for debate, he says.
"The reasons we're having these discussions with Intel around RackScale is as a complementary architecture, where we're looking at expanding more sections of that server, distributed and disaggregated, across a rack of servers," Hanson says. "We have that right now within a rack of UCS to share some of these components, but maybe not down to the memory channels that Intel is looking at."
Hanson pointed to Cisco's System Link technology in its UCS M-Series servers, unveiled three months ago, as a capability that could map into RackScale. System Link is silicon that gives M-Series the ability to connect disaggregated subsystems into a fabric with software-defined, policy-based provisioning, deployment and management of resources per application.
But like Dell'Oro's Weckel, Hanson believes the rate at which customers adopt System Link, RackScale or server/switch disaggregation will ultimately determine if or when servers become the ToR switch of the future.
"The question is, how fast some of this might happen and the depth at which that will come together," Hanson says. "There will be underlying technical hurdles that need to be addressed. Depending on the customer ability to consume that kind of change in technology will be the primary driving factor. We're always looking at new and better technology than what we can bring, but a lot depends on the rate of absorption of technology that customers are willing to accept."