University of Auckland moves 550 virtual machines to distributed cluster

Centralisation project enabling the university to achieve a recovery point objective of zero

The University of Auckland has moved 550 virtual machines to a distributed cluster for disaster avoidance without impacting daily operations as part of project to centralise its IT infrastructure.

The university deployed VMware vSphere Metro Storage Cluster, which enables system administrators to move workloads from one data centre to another without any impact to the business.

“The university is going through a process of centralisation, where the servers and applications from the various faculties are being migrated onto the central infrastructure,” University of Auckland’s storage team lead, Sanit Kumar, said this week at the vForum event in Sydney.

“There were a lot of unknown applications and servers that we were inheriting on this central infrastructure. And the Metro Storage Cluster allowed us to put in a highly resilient infrastructure that would cater for these unknown applications and would avoid disaster should an unplanned outage occur.”

The University of Auckland is the largest university in New Zealand, with about 42,000 students and about 5,000 staff. The Metro Storage Cluster has about 550 virtual machines sitting on about 28 ESXi (Elastic sky X Integrated) hosts, with 650TBs worth of storage.

“Traditionally, the University of Auckland ran a data centre that was unidirectional. Unfortunately, we lost power to our data centre, and this took down some of the applications and servers. The Metro Storage Cluster allowed us to [take] a bidirectional [approach] where we could run the loads across two data centres in an active-active manner,” Kumar said.

“The Metro Storage Cluster also allowed us to achieve a recovery point objective of zero. The recovery time objective entirely depends on your environment, and how fast your virtual machines and their respective services start across on the other data centre,” he said.

The university decided to run 50 per cent loads of workloads in each data centre. This is to sustain a 50 per cent service disruption in an unplanned outage, as well as ensures ample capacity.

“An added operational benefit was that the timeline for recovery of services and the cost of recovery reduced from a productivity perspective. This also reduced stress levels amongst staff especially around the recovery effort and post reviews."

The university also installed HP 3PAR storage in each data centre, which they can communicate with each other. The storage replication is done through the DWDM (dense wavelength division multiplexing) layer, whereby it achieves the requirement of 2.6 round trip time (RTT).

Peer persistence functionality was used, which flips across the connections between the two ESXi hosts so that they do not lose storage and don’t cause any disruptions to the VMs.

DRS (distributed resource scheduler) VM-host affinity rule was used to avoid added latency to the VMs performance when it links across to the storage in the opposite data centre. The VMs compute component runs on the same data centre as its storage component; virtual machines are grouped together to run on a subset of hosts inside the cluster.

When it comes to the cost, Kumar said the cost to improve its Tier 1 data centre was almost the same as implementing a distributed cluster solution, but with the added benefit of increased availability than a single data with a higher level of power redundancy.