Staying alive after migrating to the cloud

Test, prod, poke and break, but even that won't stop every outage.
  • Liam Tung (CSO Online)
  • 11 August, 2011 10:00

Multi-tenant cloud providers might promise greater resiliency, ‘five nines’ uptime and better security than some in-house managed infrastructure, but organisations would be wise not to assume the provider has covered all bases.

US movie streaming service Netflix, which began migrating its data centre to Amazon’s EC2 cloud in 2009, has gone well beyond Amazon’s dashboard to better understand the risks it faces.

Wanting to discover what would happen in the event of various disasters, the company has created a dozen automation tools it calls Monkeys to simulate chaos in the cloud and show what would happen to variously dependent systems in the event of “once in a blue moon” failures.

Latency Monkey, for example, simulates service degradation, Conformity Monkey finds and ousts sub-optimal instances, and Janitor Monkey hunts for wasted resources, while Security Monkey checks SSL and DRM certificates are valid and whether security violations or vulnerabilities exist.

The biggest 'monkey' is of course Chaos Gorilla, a rendition of its predecessor, Chaos Monkey. Like the gorilla name suggests, it simulates an outage of an entire Amazon availability zone to test whether Netflix can shift resources to another functioning zone without disrupting services. 

The company claims that the Monkeys gave it an “almost free” set of tools to automate resilience and security testing, but its efforts highlight some of the additional investments that could be required by moving infrastructure to the cloud.

And its efforts still could not prevent a two hour disruption of services this week. Netflix advised customers between August 9 and 10 that it was experiencing problems with its streaming service, which came a day after an Amazon EC2 zone suffered “connectivity issues” North America.

Carlo Minassian, chief executive officer of Australian network security specialist Earthwave was impressed with Netflix’s automation tools since it allowed the company to take AWS cloud performance measurements in its own hands and challenge assumptions about cloud provider reliability.

“Most organisations will assume their cloud provider has security covered,” he told

“After all, doesn’t the five 9’s mean close to no downtime at all? Doesn’t that mean next to no hardware problems and no security breaches? Does your cloud provider define how they measure uptime or availability?

Although the two mean separate things for the customer, vendors often "carelessly" interchange them.

"Uptime is a measure of whether the service is actually running; availability is a measure of whether the service is running and accessible," explained Minassian.

“There are a few among us who may have suffered an outage or two on the services offered by their cloud providers.”

Page Break

Despite the mystique around cloud computing, when it comes to security, cloud infrastructure faces the exact same risks that in-house or hosted infrastructure face, according to Drazen Drazic, managing director of penetration testing firm, Securus Global.

“The technologies are the same, just adapted differently somewhat – so essentially you get the same issues as you would with any standard security review/penetration testing,” he told

Potential weak points include security threats commonly overlooked in the enterprise such as web application vulnerabilities that lead to cross site scripting flaws and SQL injections. Cloud vendors too may have poorly configured or un-patched systems.

“But also, because of the nature of the cloud environments in many cases, the ability to jump out of your environment and see data from other clients. That’s always a scary thing,” said Drazic.

When Securus conducts a review of a cloud provider, its staff look for twelve pressure points that could fall to a motivated attacker, from the initial security breach to routes available to the attacker once inside. These include the ability to:

§ Gain unauthorised access to servers or devices
§ Access protected functionality without valid credentials
§ Bypass firewalls and access control devices
§ Modify and manipulating information.
§ Access another customers information and accounts
§ Access protected functionality without valid credentials
§ Perform unauthorised financial transactions, move funds and change payments.
§ Capture another user’s information
§ Hijack another user’s session
§ Obtain sensitive information
§ Brute-force services requiring authentication
§ Leverage compromised devices and services to pivot deeper

Sense of Security’s chief technology officer, Jason Edelstein, agreed that customers should be certain that their chosen provider clearly segments their networks from other customers.

“Otherwise you can open yourself up to puddle hopping attacks where one client behind the firewall gets hacked and then another customer is attacked sideways not afforded the protection of the firewall,” he said.

Of course, if a customer does not have penetration testers or for that matter an army of monkey simulators, there are security standards and audits that could help provide assurance that the ship is not a leaky one.

ISO 27001 and Payment Card Industry Data Security Standard (PCI DSS) were useful standards to indicate the trustworthiness of a cloud provider, said Edelstein.

Although a customer might not plan to house payment data in the cloud, PCI DSS is a good proxy for the provider’s ability to host non-payment systems to an acceptable level of security for most commercial operations, he added.

Finally, customers should also insist on the contract with the provider including a “right to audit” clause.

“[This] entitles the customer the right to audit the environment at any frequency, but recommended at least annually, at the client’s expense with any determined remedial activities for the service provider’s account,” said Edelstein.