Every nanosecond can count when you’re offering web services. When it comes to search, games, stock-trading and social media, latency is unacceptable.
To avoid it, the likes of Google and Twitter make sure they have enough server capacity to meet peak demand. That equates to a lot of machines running for a lot of time without actually doing any work – a huge waste of hardware, energy and money, explains Xi Yang from the Australian National University.
But it doesn’t have to be this way. Yang’s ANU research team has found a way to save 25 per cent on data centre energy bills, while improving the responsiveness of services. No wonder the world’s biggest tech brands are scrambling to find out how.
“The companies have no control of when users will request a search, so they have large server capacity that is mostly idle,” explains Yang, a PhD student at the university’s Research School of Computer Science. “The conservative solution providers often take is to significantly over-provision.
“Since these services are widely deployed in large numbers of data centres, their poor utilisation incurs enormous capital and operating costs.”
Companies don’t slip in other processes while servers are idle because doing so has a detrimental effect on responsiveness. Lower priority processes simply can’t ‘get out of the way’ fast enough when a latency critical job comes in. It’s made more difficult by the unpredictable nature of the latency critical jobs, as Yang describes them: “highly variable and bursty”.
Now Yang, ANU Professor Stephen Blackburn and Microsoft Research’s Kathryn McKinley have found a fix, inspired by the fairytale the Elves and the Shoemaker.
“We're the first to be able to do this,” says Blackburn. “We can slot in tiny little packs of work in the gaps. It's just like the elves that used the shoemaker's tools at night.”
First you need to spot “those tiny, little itty-bitty gaps”, says Blackburn, then you need "the ability to switch from [less critical to latency critical] really fast. And that didn’t exist.”
Spotting the gaps is possible following 2015 research by Yang and Blackburn into ‘performance microscopy’, which allows for much closer analysis of an operating system's performance.
“It’s like having an electron microscope for your performance,” says Blackburn, “it’s improved our ability to see such things by a factor of 100.”
Squeezing work in the gaps and out again quick enough is done by sharing resources, exploiting a hardware feature called Simultaneous Multi-threading.
“Many companies turn off this feature, because without our approach, sharing wreaks havoc with the responsiveness of interactive services, such as searches,” explains McKinley, a principal researcher at Microsoft Research.
"With our new fine grain control hardware control, we can substantially improve the efficiency of data centre servers while achieving the same responsiveness."
They call it ELFEN. "In some cases that we studied, the new techniques made a server nine times more efficient," said Yang. What’s more it’s not that difficult to implement, adds Blackburn, requiring no “exotic hardware or invasive software”.
“There’s even a little sweetener,” Blackburn adds. “One very sweet little kicker in the tail.”
While sitting idle, CPUs move to lower power states. When a latency critical request comes in it takes time for them to move back up to a high power state.
“Because we’re having the machine busy all the time with useful work, they don’t fall into those lower power states and as a result your responsiveness actually improves very slightly. It’s a very cool little result.”
The paper was presented to the 2016 USENIX Annual Technical Conference in Denver last month, and the calls are already coming in.
“The uptake from the community has been great already. All the feedback we’ve got from the key players is very positive. It’s a fairly simple change that gives you a real win. That’s why we expect uptake to be very strong,” says Blackburn, “It’s a no-brainer”.