Various Backup and Archival Approaches

Backup Archival Approaches

There are 3 primary purposes for data backup.

  •   First is to restore after a data loss or corruption.
  •   Second is to retrieve an earlier version.
  •   Third is to be able to recreate your data set after a Disaster event.

Let us first review the various backup archival approaches for different needs and requirements. All backup archival approaches have pros and cons in regards to ease of restoring the data and the amount of memory needed for the scheme. One must select the appropriate one from the available backup archival approaches based on the work pattern and type protection needed.

Note: If the backup archive stored, using a certain backup archival approach, are compressed for storage space reduction reason, it may be rendered useless if the decompression tool is not available when you need to restore.

Manual Backup picks which files need to be backed up. Time and space requirement for backup is very optimized, but is prone to human error and can miss backup cycles. It is also tedious to track of multiple generations.

Full Backup copies onto the backup medium the entire region to be protected. Subsequent backup cycles create another generation of backup set, preserving the prior set with a limit of how many generations are kept. When the maximum is reached, the oldest generation gets overwritten. Since all files are in one complete dataset, restoring process does not need to reconstruct the data. But the amount of time and space needed to hold an additional backup set is the same even if only a few files had been changed.

Incremental Backup starts with a Full Backup dataset. On subsequent cycles, backup only files that are changed since the previous cycle. Advantage of this method is it requires less time and space because subsequent sets only store the changed files of the previous dataset. But the restore logic must reconstruct the data set. First the Full Backup set is restored. Then, changes from subsequent backup cycles are restored until all subsequent cycles are restored. To minimize reconstruction, a full backup may be done again after so many incremental backup cycles. For example, 5-10 cycles.

Differential Backup starts with a Full Backup data set. Every cycle, the difference of what is on the computer versus the Full backup is saved. Only the Full Backup dataset and one subsequent dataset are needed to recreate a full dataset. The downside is subsequent difference dataset becomes larger and larger requiring more and more time and space.

In Continuous Data Protection (CDP), a Full Backup is created and subsequently any changes to any of the files protected are continuously recorded. Some CDP record a changed file when it is saved. Some record at finer granularity at each changed character or word. True-CDP records changes instantly. Near-CDP records changes at set time intervals. This method gives users choices of versions to restore. It is useful for data that constantly change such as word processing files, spread sheets, software source code, etc. But True-CDP at fine granularity requires much processing bandwidth to detect and backup the changes. Near-CDP requires much less bandwidth and is not OS invasive making it less affected by system upgrades or changes.

System backup (or Mirror) protects at the system level. A duplicate, complete dataset is kept on a similar device. Each change on the primary dataset is recorded in the mirror dataset instantly. Recovery does not need reconstruction, giving rise to High Availability (HA) configuration. Mirror device can instantly replace a failed primary device. This method needs specialized software and purpose-designed hardware.

System backup (Full PC Backup) is sometimes called bare metal backup. It clones the entire hard disk image (data + system) and hardware settings stored in the PC. When the PC failed, the stored info can be used to replicate the lost PC on a new one. This provides an easy way to restore a broken PC. But many dependencies and limitations exist. The new system must be the same as the old one to be able to accept the system setting. Deviations cause the restore to become useless.

Beside the backup archival approaches that we have discussed above you must also consider the Location where the Archive is stored.

Archive Location – Pros and Cons

1. Local
The backup archive is kept on-site in the office.

  • Advantage: It is quickly accessible for restore if needed.
  • Disadvantage: Since it is in the same place as the primary copy, it is not protected against natural disasters such as a fire or flood. Can be prone to internal hacking.

2. Off-site copy
A backup archive is kept off-site usually on an external hard disk that is brought to the business location to back up the PC’s and taken off-site for safekeeping. Mediums such as digital tape, CD/DVD can be used and may be stored in vaults.

  • Advantage: Protection against disaster situations.
  • Disadvantage: There may be significant time-lag difference between the current on-site copy and off-site copy. Restores cannot be achieved instantly.

3. Cloud

Similar pros and cons as Off-site copy except the dataset is transmitted over the Internet to a service provider’s storage device.

  • Advantage: Keeps a copy of your dataset outside the company without the hassle of using your own medium and carry it off-site.
  • Disadvantage: Incurs monthly fees and capacity overage fees. It is not suitable for large datasets and slow Internet speeds, as it could affect your Internet speed. Concerns about the data privacy due to data mining, hackers and agencies. Restores are not instant.

back button