Computerworld

Sandia tests supercomputer virtualization

High-performance computer virtualization could provide more bang for the buck, researchers contend

The U.S. Energy Department's Sandia National Laboratories is investigating the possibility of using virtualization to allow its researchers to make better use of its behemoth Red Storm supercomputer. Researchers from Northwestern University and the University of New Mexico are also participating in the project.

"Our focus is to create a more flexible environment for supercomputer users by leveraging virtual machines," said researcher Kevin Pedretti, who heads up the project for Sandia. "The end-goal is to create a more flexible supercomputer environment for users, without sacrificing performance or scalability."

Pedretti said that, to his knowledge, this is the first systematic test of virtualization for HPC (high-performance computing) platforms. Sandia and the U.S. National Science Foundation funded the work.

Thus far, supercomputing hasn't made great use of virtualization. "The conventional wisdom in the HPC community is that virtualization has too high of an overhead to be useful," Pedretti said, in an e-mail interview. "We hoped to challenge this by showing reasonable overhead was possible even for tightly-coupled, communication-intense scientific applications running at a large scale in a virtual machine."

Virtualization could prove valuable, Pedretti contends, because researchers wishing to do large-scale simulations or number-crunches would no longer be bound to using only those applications that run on supercomputer operating systems.

In the team's tests, virtualized programs ran at 95 percent of the speed on bare metal.

To conduct the tests, the researchers got 12 hours of dedicated system time on the Sandia Red Storm supercomputer to conduct the tests. The tests were run on 6,240 quad-core compute nodes

They ran the applications in two modes -- real and virtual. To set a baseline, one run was done in the native Red Storm environment, which employs the Catamount Lightweight Kernel (LWK), a lightweight operating system.

For the virtual deployment, the researchers booted onto Red Storm nodes their own next generation version of LWK, called Kitten. To run a virtual machine, Kitten was embedded with the Palacios virtual machine monitor, developed by Northwestern University in Evanston, Illinois, and the University of New Mexico.

The virtual machines then booted the Catamount as the guest OS. The test applications were launched from Red Storm's application launch tool, as per the normal protocol. "From the user's perspective on the service node, there is no visible change to the normal Red Storm software environment," Pedretti said.

A range of different programs was tested, including simple benchmarking operations and two actual operational applications, Sandia's own CTH shock-wave simulation code and the SAIC Adaptive Grid Eulerian (SAGE) hydrodynamics simulation application.

The programs were spread across multiple nodes, from two nodes to all 6,240, depending on the program itself.

The researchers decided to test only virtualization software developed for the HPC computing environments because mainstream virtualization software, such as Xen, is oriented more toward commodity server consolidation rather than high-performance computing, Pedretti said. Also, using HPC software allows the researchers to fine tune the settings to such a degree that would not be possible with the mainstream hypervisors.

Red Storm was ranked as the 17th most powerful supercomputer in the world by the most recent compilation of the Top500.org listing of supercomputers. A Cray XT-based machine running Linux, Red Storm runs 12,960 Advanced Micro Devices Opteron processors (6,720 dual-core and 6,240 quad core for a total of 38,400 cores). The system has been demonstrated to calculate 204 TFlops (trillion floating-point operations per second).

This project is not Sandia's first foray into supercomputer virtualization. Last summer, two Sandia researchers investigated the use of the Lguest Linux hypervisor to launch 1 million nodes on another Sandia supercomputer, in order to replicate a massive botnet, for research purposes.