Computerworld

A cost-effective approach for petabyte storage systems

The onslaught of unstructured digital content -- video, audio and images -- is taxing storage systems and creating the need to be able to store multi-petabytes

The onslaught of unstructured digital content -- video, audio and images -- is taxing storage systems and creating the need to be able to store multi-petabytes, but current industry practices using RAID and replication to accomplish data protection are expensive at this scale.

Dispersal, a new approach, is cost effective for petabytes of digital content storage. Further, it provides extraordinary data protection, meaning digital assets will not be lost. Executives who make a strategic shift from RAID to dispersal can realize significant cost savings for enterprises with at least 50TB under management.

RAID schemes are based on parity and, at its root, if more than two drives fail simultaneously, data is not recoverable. The statistical likelihood of multiple drive failures has not been an issue in the past. However, as systems grow to hundreds of terabytes and petabytes, the likelihood of multiple drive failure is a reality.

Further, drives aren't perfect, and typical SATA drives have a published bit rate error (BRE) of 1014, meaning once every 100,000,000,000,000 bits, there will be a bit that is unrecoverable. Doesn't seem significant? In today's larger storage systems, it is.

Unfortunately, the likelihood of having one drive fail, and encountering a BRE when rebuilding from the remaining RAID set is highly probable. To put this into perspective, when reading 10 terabytes, the probability of an unreadable bit is likely (at 56%), and when reading 100TB it is nearly certain (at 99.97%).

As a result, enterprises address the data protection shortcomings of RAID by using replication, the technique of making additional copies of data to avoid unrecoverable errors and lost data. However, those copies add additional costs, typically 133% or more additional storage is needed for each additional copy, after including the overhead associated with a typical RAID 6 configuration.

Organizations also use replication to help with failure scenarios, such as a location failure, power outages, bandwidth unavailability and so forth. Having seamless access to data is key to keeping businesses running and profitable.

Executives should realize their storage approach has failed once they are replicating data three times, as it is clear that the replication band-aid is no longer solving the underlying problem associated with using RAID for data protection.

Dispersal -- a better approach

Dispersal can help organizations significantly reduce storage costs, reduce power consumption and the footprint of storage, as well as streamline IT management processes.Here's how it works. Information Dispersal Algorithms (IDAs) separate data into unrecognizable slices of information, which are then distributed -- or dispersed -- by the dispersed storage protocol to disparate storage locations. These locations can be situated in the same city, the same region, the same country or around the world.

The dispersed storage protocol handles all of the slicing and reconstitution transparently, so users are presented with standard storage interfaces. The time to reconstitute data depends on the speed of the network.

A management system enables administrators to select which storage nodes and locations to utilize to act as a single storage container. The dispersed storage protocol then manages storing and retrieving the slices across the storage nodes.

Since dispersed storage is typically used for distributed networked storage, it is best for cloud storage use cases such as backup, archive or a content store for large unstructured content like videos, images and audio files. Dispersed storage isn't optimized for transactional storage requirements today.

Each individual slice does not contain enough information to understand the original data, and only a subset of slices is needed to reconstitute the original data. When data is encoded using the IDAs, two variables are set to provide M of N fault tolerance -- a width (N) meaning how many slices are being generated, and a threshold (M), meaning how many slices are minimally required to reconstruct the data.

An example M of N configuration is 10 of 16, in which 16 slices are created, and any permutation of 10 are needed to recreate data. This fault-tolerant design enables multiple simultaneous failures across a string of hosting devices, servers or networks and the data can still be accessed in real time.

It also enables data protection without needing replication. The 16 storage nodes can be split across four locations, and could tolerate an entire location outage as well as two additional storage node failures while still providing access to data. This approach is storage efficient -- the sum of the 16 slices would equal 1.6 times the original data size.

Realizable cost savings

Comparing dispersal to RAID and replication, dispersal offers greater efficiency with superior reliability.

Suppose an organization needs 1 petabyte of usable storage, and has the requirement for the system to have six nines of reliability -- 99.9999%. The graphic below shows how a system built using dispersal would stack up against one built with RAID and replication.

Comparing these two solutions side by side for 1 petabyte of usable storage, dispersal requires 40% of the raw storage of RAID 6 and replication. This translates to 40% less hardware, floor space, power and cooling. Factor in less time of a storage administrator as well, and it is easy to see how much more cost-effective dispersal is for large-scale systems.

Face it -- many executives fear the resume-generating event of having to admit the system they are responsible for has failed and is expensive and impossibly difficult to migrate to a more efficient storage solution.

Some executives may be in denial though -- "My storage is only 40 terabytes, and I have it under control, thank you."

Consider a storage system growing at the average rate of 10 times in five years per IDC's estimation. A system of 40TB today will be 400TB in five years, and 4 petabytes within just 10 years. This illustrates that a system that in the terabyte range that is currently using RAID will most likely start to fail within the executive's tenure.

Forward-thinking executives will make a strategic shift to dispersal to realize greater data protection as well as substantial cost savings for digital content storage.

Bellanca is a founder and the director of marketing and communications for Cleversafe, which offers resilient storage solutions ideally suited for storage clouds and massive digital archives. For organizations that value assurance in confidentiality, data integrity and limitless availability, Cleversafe provides information resiliency built directly into the storage system DNA.

Read more about data center in Network World's Data Center section.