Data redundancy is a primary contributor to the explosive growth in data. Initially deduplication focused on eliminating data redundancy in specific cases like full backups, e-mail attachments, and VMware images. Over time, however, customers have noticed the pervasiveness of duplicated data.
Test and development data multiplies across an organization: replication, backup, and archiving create multiple data copies scattered across your enterprise, and sometimes users simply copy data to multiple locations for their own convenience.
Studies estimate that multiple copies of data now require organizations to buy, use, and administer two- to 50-times more storage than they would actually need with deduplication. Given that impact on the bottom line, organizations are recognizing that, far from being a niche technology, deduplication needs to become an integrated and mandatory element in their overall IT strategy.
Candidates for deduplication
The best candidates for deduplication solutions are mid-size or enterprise customers experiencing issues with:
- Exponential growth of data, resulting in out-of-control storage costs.
- Shrinking or inadequate backup windows.
- Longer recovery times, especially for older data not on the primary backup media.
- Cost, risk and complexity of sending tapes to disaster recovery (DR) sites.
- Slow throughput on both backup and archiving systems.
- eDiscovery, compliance and SLA requirements.
- Bottlenecks in expensive LANs and WANs.
Features to look for in a deduplication solution
When evaluating deduplication solutions, IT decision-makers should look for the following essential features:
- Ability to scale without expensive hardware upgrades.
- More recovery points and with shorter recovery times.
- Point-and-click deduplication management.
- Built-in reporting of deduplication across vendors, data types, sources and platforms.
- Tight integration with all necessary applications to minimize end-user downtime.
- Single solution simplicity for ease of deployment and administration.
- Ability to rapidly and securely recover business-critical data across all locations, applications, storage media and points-in-time.
- D2D2T-optimized for backup performance and reliable data recovery.
- Fast, comprehensive search to aid in recovery.
- Data integrity and security features.
- Built-in DR capabilities.
- Data classification.
- Cost-effective and timely eDiscovery.
- Use of a common technology platform.
- Single point of management.
Challenges in Deploying a deduplication solution
Like disk-to-disk backup or server virtualization, deduplication should not be evaluated as an isolated product or feature. Customers must consider the broader implications of deduplication within the context of their entire data management and storage strategy. Common challenges in deploying a deduplication solution are related to performance, increased complexity of management, and proliferation of deduplicated data silos.
Finding and eliminating redundant data can be extremely expensive for an appliance-based deduplication solution. Without contextual knowledge of the data that it deduplicates, it faces significant challenges scaling to the size of most enterprises.
Storage systems perform best when data is written sequentially thus smaller data sets can cripple disk performance for most deduplication solutions. By “rote” sharing small segments of data across multiple objects, appliance-based deduplication can lead to widespread data fragmentation tool. Over time, read, write, backup, and replication performance on deduplication appliances becomes painfully slow.
Increased complexity of management
Many deduplication solutions today behave as if the entire work flow revolves around their value; it is almost impossible to move data from a deduplication appliance to tape in a D2D2T work flow. To reap the benefits of network optimization, an organization needs to either install new hardware or software in its remote offices.
Many deduplication solutions require organizations to integrate a unique combination of hardware and software, or buy new standalone appliances that must be managed manually on an individual basis. Extra management complexity degrades your storage and network savings, especially as you scale the scope of the amount of data being deduplicated.
Islands of deduplication
Proprietary solutions create vendor lock-in, combining poor performance with proprietary storage layouts. They make it nearly impossible to move data from a deduplication appliance to other storage.
Duplicate data extends across numerous storage tiers, including data replicas, archives, and test-and-development copies. Too often, deduplication solutions address only one of those areas. As a result, you end up limiting future opportunities to further reduce your storage consumption.
- Paul McClure is a product manager with CommVault Australia and New Zealand