Business leaders don’t care about the cost of downtime

By the end of 2021, 65 percent of I&O leaders will underinvest in their availability and recovery needs

Comments

Infrastructure and operations leaders (I&O) leaders are often on a misguided mission to find mythical “cost of downtime” numbers to build justifications for project investments in availability and recovery capabilities.

These broad brush numbers are wrong every time. Their use completely invalidates the objective of justifying increased investments. This not only leads to a lack of credibility, but ultimately, a denial of necessary funding.

Generic cost of downtime metrics often use a constant time frame, don’t consider seasonality, use measurements that aren’t impactful to business leaders, use a linear impact for any duration and use soft costs to enlarge impacts. This varies from slightly inaccurate to completely wrong.

Gartner predicts that by the end of 2021, 65 percent of I&O leaders will underinvest in their availability and recovery needs because they use estimated cost of downtime metrics.

Stop using generic or average cost of downtime numbers in your discussions. Instead, use metrics that are aligned with your leadership’s leading performance indicators and matter to your organisation. This could include customer experiences, revenue impacted, cost increases or safety issues that can arise as a result of outages.

With a focus on such valuable results, it becomes much easier to justify the need for investments in availability and recovery capabilities for IT systems to reduce outages.

Demystifying cost of downtime myths

To help guide you away from using generic cost of downtime metrics, here are five common myths they’re built on and how you can refocus your efforts:

Myth No. 1 — Assume immediate impact for all transactions

During the first few hours of Amazon Prime Day sale in the US last year, the company experienced a 75 minute outage that affected consumers’ ability to browse online. Even though customers were inconvenienced, Amazon still grew sales more than ever before within the first 10 hours, largely due to the isolated nature of the event. Only Amazon offers it, so customers couldn’t turn to competitors.

Don’t assume immediate impact for all transactions. Identify if any transactions could be lost to competitors or time, as retried or delayed transactions may have zero impact.

Build business justification for high levels of availability and recovery capabilities by focusing on business processes that have just-in-time inventory supply or have commodity natures that would result in significant loss of revenue from any interruption to the process.

Myth No. 2 — Assume business transactions occur constantly

Many environments can often be highly seasonal in nature. A cold drinks manufacturer, for example, will need greater levels of stock during the summer periods, whereas a luxury goods retailer will generate most of its revenue during festive periods.

Determine if there’s a seasonal nature of any specific business process to find possible impacts from outages. When planning for the availability and recovery of those systems, validate what the worst case would be for any outage, while also identifying when those periods are. Also, validate the subsequent value of those individual applications to the business.

Myth No. 3 — Cost of an outage is the most important impact

I&O leaders often try to leverage information from various sources about how much an outage will cost per hour. Instead, focus on impacts to stakeholders of the business — the end users or the individual business operations leaders — and how those outcomes are directly impacted by the loss of IT services.

Plan for individual business processes that generate revenue or are directly exposed to broad numbers of end users that will have meaningful and measurable impacts. Focus first on those systems that are primary drivers of the business or organisational activities, versus secondary supporting systems.

Myth No. 4 — Impacts are linear for all durations of an outage

There are often thresholds where the impact of an outage changes from meaningless to meaningful. These will change based on business activities that have well-defined time periods, such as peak sales or production periods, or end-of-month reporting.

Determine when an outage becomes impactful by understanding what your outage duration thresholds are. Allocate higher levels of recovery or availability capabilities to those systems in critical operational periods by understanding the available capacity and potential use for critical systems.

Myth No. 5 — All costs accrue to the downtime impact measurement

Many attempts to build out calculations of the cost of outages attempt to quantify all impacts of outages. This often includes many “soft costs,” such as labour usage increases, lower efficiency in terms of worker productivity or reputation damages that are perceived in the minds of stakeholders. They often don’t translate into any direct impacts in terms of revenue, costs or profitability.

Use hard, quantifiable metrics to measure impacts from an outage. Quantify any actual increases in the cost of production, decreases in revenue or profitability that accrue from an outage by looking at the factors of production and how they’re affected by an outage.

David Gregory is a senior director analyst at Gartner. His focus areas include risk management, business continuity, crisis management, business impact analysis and recovery strategies. David will speak about continuity and recovery at the Gartner IT Infrastructure, Operations & Cloud Strategies Conference in Sydney, 29-30 April.