“Data Backup” Virtual Machine

Required Planning
Overview

Although backing up data seems like a fairly simple concept (and, indeed, the cpytobak software is fairly simple), the topic of backup can quickly become complex because:

  • There are many options
  • Not everything works well

The first thing that needs to be considered is what to back up. “Everything” can, in theory, be done, although there are challenges with that.

Concept: Open Files

The most notable challenge is that files which may be getting actively written can need special care to back up. The problem is that backing up data takes some time, and so the file may look different at the end of a backup compared to the beginning of a backup. Some software might require that a group of files is in a consistent state, meaning that if one file is updated, then the other file needs to have corresponding updates. Failure to take care of these considerations can result in saving data that is not useful after the data is restored.

To protect people from getting invalid versions of files, many operating systems simply prohibit access to files that are actively being written, and even files that are recognized as being readily writable. (The common term is an “open” file; a file that some software has “opened” in preparation of writing to the file.)

Open Files in Microsoft Windows

With Windows XP, Microsoft introduced a feature called “shadow copy”. This can be used to back up a copy of a file that is opened. This is supported by some backup software, including Hobocopy (which may be a rather simple way to copy a specified directory, or a single specified file from that directory).

Concept: How many copies

Reverting to an earlier version of a file is a nice ability to have. Simply making lots of copies is generally an unideal use of resources. Often most notably, that results in lots of disk space being used up. Instead of just two full backups, a person may be able to store one full backup and a larger number of incremental backups. Besides just disk space, full backups can also take longer to create.

Determine what type of data to back up

Common categories include:

Closed files

These are relatively easy

Databases

Popular “database software” provides an ability to get a consistent/clean backup, resulting in a valid collection of data.

It might be possible for software to require that a database has some collection of records updated at the same time. Details of specific requirements may vary among different pieces of software.

E-Mail data

Many organizations would like to have their E-Mail backed up. The technical implementations are often similar to those of traditional databases like what is found in SQL. Some popular E-Mail software suites will provide special support for being able to assist backup software to get a useful copy of E-Mail data without needing to take an E-Mail server offline.

Disk images

With “virtual machines” becoming more popular, some people like to simply back up the entire hard drives of machines. If the virtual machine is not running, this is rather simple: the machine is treated like a closed file. If the virtual machine is running, there are two options. One is to run “backup software” on the virtual machine. The other option is to back up the disk image, even if it is being used. Doing that may require some special support, similar to trying to back up a database which is actively being used.

Other data

Hopefully most data will typically fit into one of the earlier categories, such as “Closed files” because files are sitting inactive more frequently than they are being actively written, or the data uses a popular “database” format which can support backups.

Sometimes, there is data that really does not need to be backed up. Data does not need to be backed up if it can be very easily re-created. For example, if a person can partition a disk and install an operating system about as quickly as the data can be restored from a backup, then having that data be backed up is rather unnecessary.

Determine how information is to be stored

Sometimes the most useful backup is a backup of an entire disk, including data that is not within a standard partition (like the “boot record” data stored in a classic MBR). However, such backups are larger. They may be more challenging to create. They can also be less easy to work with. If the goal is to restore a copy of a machine that has become completely non-functional, then a full disk backup might be the easiest and fastest type of data to work with. If the goal is to simply restore a single small file, then being able to retrieve an entire disk image might be just the first step of accessing data, while being able to directly retrieve the individual file may be quite a bit faster/easier.

Very often, the “backup software” may have a huge impact on how the data is stored. If a particular piece of software is desirable, that may impact what choices are available regarding how the data is backed up. If a particular method of backup storage is desired, that may impact decisions about what software to use.

Choose software

Some software has been bundled with operating systems. In addition, there is a variety of backup software that has been released. backup provides some information about many of the options.

Having (virtual) hardware

See: creating and configuring a “child image”.

Simple instant backups

This essentially just means making a copy of a file.

More elaborate backups

There are multiple options. Here are some documented approaches:

Using Bacula
Virtual Machine: using Bacula
Using BackupPC
Virtual Machine: using BackupPC
Using rsnapshot
Supports
Data type
Offline files