Computerworld

Microsoft issues service credits after cloud outage

Microsoft would not answer questions regarding the value of the service credits or how many customers were affected

Microsoft has been forced to issue service credits to customers affected by recent problems accessing hosted Exchange and SharePoint services, and is promising to improve its system for notifying customers about cloud computing outages.

But Microsoft's woes probably won't have any lasting impact on the company's growing hosted software business, one analyst says. The likes of Google Apps and Rackspace have suffered service outages prior to the recent Microsoft incidents.

"We're at the point where everybody's having some hiccups" in the cloud, says Burton Group analyst Guy Creese. "The issue is that whoever it is, Microsoft, Google, you just need to make sure the hiccups don't last a long time. I don't think this most recent episode is a huge issue."

Still, some Microsoft customers were clearly inconvenienced by the limited availability of hosted Exchange and SharePoint, also known as BPOS, the Business Productivity Online Suite.

On 23 August, a Microsoft network infrastructure upgrade unexpectedly led to a two-hour period in which North American customers suffered "intermittent access," according to an official Microsoft blog post. What was described as "another underlying issue" caused similar problems related to the BPOS sign-in service and administrative portals on 3 September and 7 September. Microsoft called the BPOS downtime "unacceptable" and said there are "24/7 efforts underway to ensure we do not have a repeat of these events."

One BPOS user named Guy Gregory used the comment board on the Microsoft blog post to ask "Given the 2 hour outage equates to 99.7 per cent for August, will you be honoring your pledge to refund affected users? My understanding was that the 99.9 per cent uptime promise was backed by a money-back guarantee."

Microsoft BPOS official Jim Glynn responded Thursday of this week, saying "In the case of the widespread 23 August incident, we proactively provided a credit to all affected customers. However, in general practice, customers who believe that we have not met our service level agreement should contact Support to request an SLA credit."

Microsoft would not answer questions regarding the value of the service credits or how many customers were affected. "The blog post as well as the comments you have seen from Microsoft executives in the comments section of the blog is all of the information we have to share at this time," a Microsoft spokesperson said in an e-mail to Network World.

In a similar instance last year, Rackspace had to pay about $3 million in service credits to customers after a power outage took its hosted IT services offline.

With some Microsoft customers complaining about the company's communication during the downtime incidents, Microsoft also promised to bolster its online tools for keeping users up to date about service health.

"We're reviewing all of our communications and service level measurements to identify areas of improvement," Microsoft BPOS official Morgan Cole wrote in an online comment. "One area of focus that we have is to build better tools to provide timely, accurate and targeted communications about service health."

One customer had complained that on 7 September the Microsoft administration site claimed the "services were 'healthy' during a time when the services were not accessible."

Microsoft's Hotmail service also suffered an outage on 2 September that locked some users out of their e-mail accounts for hours.

Despite these recent service interruptions, Creese of Burton Group says cloud services may still provide better uptime than many organizations are capable of providing internally. While Fortune 500 companies with large data centers and sophisticated backup and failover processes may be able to do just as well on their own, numerous smaller companies and universities are choosing cloud services such as Google Apps because they have trouble guaranteeing uptime themselves, he says.

Cloud services are dispersed over numerous data centers and are highly virtualized, so they also have fewer single points of failure, he notes. In some cases, this means that one part of a service will go down, but the rest of it will remain up. That appears to be what happened with Microsoft on 3 September and 7 September when the service degradation primarily affected the sign-in service and administrative portals.

Most IT shops "aren't supporting as large a population, so they can't afford the kind of dispersed infrastructure that these suppliers have," Creese says. "You're more likely to have a single point of failure."

Follow Jon Brodkin on Twitter: www.twitter.com/jbrodkin

Read more about data center in Network World's Data Center section.