What Is Data Deduplication?

The prime cause of out-of-control data duplication is (ironically) the current standard backup protocol requiring numerous copies of every document just in case. The situation is further complicated by ever-expanding legal requirements.

The best way out of this quagmire is data deduplication – a key technology for any organization wanting to optimize the performance, efficiency, and cost-effectiveness of its data storage environment.

As business becomes increasingly paperless, everyone wants to be absolutely sure that their documents are backed-up and safe. What this means is multiple copies of everything in the data center file share, the Internet FTP server, personal folders, etc. – all contributing to a system-wide clog.

Data deduplication software removes the clog and keeps the stream of data running smoothly.

The term Data deduplication is commonly shortened to data dedup. Essentially, it’s a process of identifying and removing multiple occurrences of the same data. The first time a deduplication system identifies a file, block, or bit, it flags it. From there, the system marks each subsequent identical item with a placeholder before removing it from the system. The placeholder links to the original data so that users will always bring up the original data when they try to open the removed duplicate.

This deduplication process significantly reduces the amount of storage space needed in the system. For example, a system that has 200 copies of the same 5 MB document- one in each employee’s personal folder- can reduce it to a single copy of the original file plus 199 placeholders that link back to the original document. This means 200 copies of a 5 MB file will take up 5 MB of space (plus the size of the index file) instead of 1,000 MB.

You’re probably already compressing files to save storage space. Compression reduces the size of the file by eliminating redundant bits. Compression is better than doing nothing, but it doesn’t eliminate redundant files. In fact, it just compresses multiple copies of the same files.

Data deduplication goes a step further by eliminating those redundant copies, storing only one. Storage-wise, this makes a big difference. Simply compressing files reduces storage space by about 50 percent, but data deduplication reduces storage space by a much greater percentage, as the 5 MB document example above illustrates.

A number of characteristics differentiate deduplication processes. Some approaches are inline, others are postprocess. Inline deduplication means that data is deduplicated before it’s committed to a disk drive. The postprocess approach first writes data to disk, and once a job or a dataset has been  data hk completed, deduplication follows

Benefits

With data deduplication, users can streamline backup, facilitate emergency data restoration, and reduce costs.

Efficiency – Streamline Backup
As system back-ups become quicker and easier, users will be able to create and maintain more backup sets that stretch further back in time. This lets users keep a complete set of document versions without straining the system.

 

Leave a comment

Your email address will not be published. Required fields are marked *