Fast Company is the magazine for a generation of business leaders with high expectations for their companies -- and even higher expectations for themselves.
Please Note: You are viewing the non-styled version of Fast Company. Either your browser does not support Cascading Style Sheets (CSS) or it is disabled. Learn more about the technology we use and browsers we recommend. Skip to the content of this page


font size: Change text to small (default) Change text to medium Change text to large

Stock quotes from Yahoo! Finance
Symbol lookup
Market Overview
Fast Company Magazine Cover Image


 
Software & Services Directory Home
> View this now
View Company Report
View all content by this company
Return to Search Results
Published on: April 10, 2009
Type of content: WHITE PAPER
Format: Unknown
Length: 14 pages
Price: FREE
Overview:
Disk-based deduplication storage has emerged as the new-generation storage system for enterprise data protection to replace tape libraries. Deduplication removes redundant data segments to compress data into a highly compact form
and makes it economical to store backups on disk instead of tape. A crucial requirement for enterprise data protection is high throughput, typically over 100 MB/sec, which enables backups to complete quickly. A significant challenge is to identify and eliminate duplicate data segments at this rate on a low-cost system that cannot afford enough RAM to store an index of the stored segments and may be forced to access an on-disk index for every input segment.


This paper describes three techniques employed in the production Data Domain deduplication file system to relieve the disk bottleneck. These techniques include:


  1. The Summary Vector, a compact in-memory data structure for identifying new segments
  2. Stream-Informed Segment Layout, a data layout method to improve on-disk locality for sequentially accessed segments
  3. Locality Preserved Caching, which maintains the locality of the
    fingerprints of duplicate segments to achieve high cache hit ratios.

Together, they can remove 99% of the disk accesses for deduplication of real world workloads. These techniques enable a modern two-socket dual-core system
to run at 90% CPU utilization with only one shelf of 15 disks and achieve 100 MB/sec for single-stream throughput and 210 MB/sec for multi-stream throughput.
View this now
 
The Fast Company Software & Services Directory is a part of the KnowledgeStorm Network.
Solution & Research Index

KnowledgeStorm is brought to you by TechTarget, the most targeted IT media.
Copyright © 2009 KnowledgeStorm and TechTarget. All rights reserved. Privacy Statement - Terms of Use
TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Reprints




  TechTarget - The IT Media ROI Experts