Data deduplication, or dedupe for short, is a method used to minimize redundant copies of data stored to a storage device.
By Howard Young
Dedupe is often used in the context of backing up individual files or storage area networks and will use techniques comprising of compression and duplication identification and elimination. The combination optimizes the backup storage system with goals of reducing requirements in both time and storage space needed to perform a backup and restore. Solutions which are deployed are at the object/file or block level and benefit users at two different levels.
Storing objects once, or in the generic case of files, is often difficult since it usually involves multiple schemes to minimize redundant copies of the files. The typical process employed results in saving the baseline file and changes to the file. This works best in desktop environments where only revisions to the files are saved. Often IT departments will install software which performs nightly backups to capture the changes to files an employee makes during the day. These files are pushed to a backup system which may reside local to the company or distributed to the cloud as in the case of Iron Mountain. Best practices include an application which will allow the user to restore file revisions without the aid of IT staff.
Block level deduplication inspects chunks of data written to an appliance during a system level backup. The appliance detects duplicate data and replaces the written data with a reference to where the data is stored on the appliance. Since the data does not need to be written to disk, the time it takes to perform the write is substantially faster. Overall, this reduces backup time and the amount of storage need to backup the system. Trends in utilizing backup software that can query an appliance prior to committing the data to the device, are becoming common. This reduces the transport time especially as cloud appliances are becoming widely available.