[#backup]: Backups

Backups

Backing up data
Goals to achieve
Restoring quickly after a mistake

One of the most understood goals for data backup is so that an extra copy can reverse mistakes, such as accidentally saving unwanted changes to a file.

Off-site

Another goal to pursue is to have the data on multiple devices, so that the failure of a single hard drive doesn't cause all data to be lost. In fact, a very desirable goal is typically to, at all times, have at least one copy of the data in a different physical location than at least one other copy.

The further away, the better: having information on multiple storage devices is better than relying on one hard drive, but having both of those storage devices connected to the same computer power supply could still allow one issue (like bad power) to damage all copies of the data. Storing information in another computer is better, although two side-by-side computers could be damaged by one localized disaster such as flooding or dirtied atmosphere caused when an earthquake causes wooden/metal beams to fall from the ceiling. Having data is separate rooms is better; seperate floors of a building may be even better; seperate buildings may be better; seperate physical addresses may be even better. Seperate time zones may be better than using one city. Seperate continents may be feasible for large organizations to implement. Seperate planets is, as of the early twenty-first century A.D., not generally economically feasible.

Point in time

A simple backup may contain a copy of what the data was like at a single “point in time”. When another backup is created, there are methods to allow for multiple points in time to be represented, so that data may be restored to a copy of whatever the data looked like at one time, or another time.

This way, if a problem with data (such as an accidentally deleted file, or a data file that got altered undesirably) is discovered, and a recent backup is affected by that problem, it may be possible to restore an older copy of the backup from before the problem affected the data. This sort of flexibility for restoring may not be as available of an option if there is only one backup that stores whatever was worked on today or yesterday.

The simple and obvious way to make multiple points of time is simply to back up the data twice, to different locations, so that the newer backup doesn't erase an older backup. This method works, although it does use up a linear amount of disk space, so a second point of time will likely require about twice as much disk space, and a third point in time will likely require about three times as much disk space as a single point in time. In addition to the disk capacity becoming filled, a method that requires this much extra data will involve that much extra wear and tear on the equipment.

There may be other ways to effectively get points of time, by using backups that only stores copies of data that has been recently changed, relying on an additional backup to take care of data that hadn't been recently changed. Another possibility is to use patching technology which is really designed to store just the differences in data, rather than complete files. The precise available options may vary among the different backup options that may exist.

Scheduling
...
Some specific types of data to be backed up

Of course, trying to list the types of data that could be backed up could be a huge, possibly endless undertaking, as there coudl be all sorts of combinations of data. For example, there could be a special backup category for logs, and that could be called a “log backup”.

Files and directories/folders
Restoring by finding and moving
Note: If an end user has stated that a file, or a directory of files, is missing, consider searching the hard drive for the files. There have been many instances where users of a graphical interface have accidentally dragged the data to another folder. (Perhaps the person's computer was slow, and the person simply meant to click on a file but then ended up dragging the file.) If the data is found in a nearby folder, then simply moving the data (back to a desired location) may be faster than using specialized backup software; also this may (depending on whether the filesystem implements de-duplication?) prevent the data from using up twice as much disk space, and prevent the extra copy of the data from being accidentally shared. Another possible issue is that the user might find the extra copy of the data, and then use that data (e.g. printing a document, or editing the document), not realizing that the data is old. Then the user may wonder why those changes aren't found in the first copy of the data, or the user may keep working with the other copy but wonder why some parts of the document seem to be using old versions of the data.
[#cpytobak]: Backing up by copying files

Information about this program, which was previously here, has been moved to a new, separate page about the cpytobak program.

[#amanda]: AMANDA (“Advanced Maryland Automatic Network Disk Archiver)

The home page at amanda.org hyperlinks to the FAQ at Zmanda. That FAQ notes: FAQ:What is the difference between Amanda and Zmanda? The answer given is, “Amanda is the open source backup application. Zmanda is the company sponsoring Amanda's development. So, it would seem that http://amanda.zmanda.com may also be a valid home page for this software. To clarify, the free software is amanda.org. Commercial users may want to check out Commercial support for Amanda and Amanda Enterprise FAQ.

Some more documentation includes AMANDA README.txt, AMANDA Wiki, AMANDA FAQ. Additionally, for OpenBSD there had been multiple packages, including amanda-doc-2.4.5.1 which was described as “network-capable tape backup (documentation)”

OpenBSD FAQ for OpenBSD 5.7, section 14 (“Disk Setup”), section on Backups, archived by the Wayback Machine @ Archive.org mentioned Amanda. However, the package did not seem to be available for OpenBSD 5.6 or 5.7 (it was available for OpenBSD 5.5). CVSWeb for AMANDA shows support getting removed.

Zmanda Windows Client Wiki says, “Zmanda Windows Client Community Edition is an Amanda client for Windows platform that uses native Win32 API and uses Volume Shadow Services (VSS) for backup.”

BackupPC

BackupPC's home page (@ SF) is short, references screenshots of BackupPC's web interface. BackupPC Documentation is available.

BackupPC.com says “Zmanda is working on various aspects of development of BackupPC and will provide formal support for BackupPC installations very soon.” BackupPC Wiki @ Zmanda says “BackupPC Community Edition is an enterprise-grade, disk-based backup system.”

Some additional information is currently available at Tutorial: BackupPC.

[#bacula]: Bacula

AGPL

This is widely viewed as being a solution that is capable of handling complex backup scenarios, but is not a very slim solution.

Box Backup

Designed around online, encrypted backups.

Box Backup guide to Configuring a Server involves creating a user account, and “sign your certificate and install it as directed.” The encryption support might require more overhead than some simpler solutions.

dump/restore

OpenBSD FAQ about backing up (and restoring) using a tape drive covers using dump and restore. This FAQ also refers to Amanda and Bacula as “More advanced backup utilities”, implying that dump and restore may be fairly basic. Also, the FAQ mentions that these other options have support “for backing up multiple servers to disk and tape.”

To back up a single file, use:

dump -0au -f /safespot/somefile.bak /somespot/demofile

Backing up multiple files might typically be done by referencing entire filesystem volumes. For example, OpenBSD FAQ about backing up (and restoring) using a tape drive discusses how to use this program, and another, to interact with a tape drive (and has information to help avoid accidentally overwiting a previous backup). That FAQ discusses the meaning of the parameters in the -0au parameter combination.

Note that the file formats are not standardized. The restore command should be able to restore data that was created by the dump command that was bundled with the same operating system as the restore command. However, unlike gzip, where one implementation can generally extract data that is created by another implmentation, the restore command isn't necessarily able to correctly retrieve data that was stored with a mismatching implementation of the dump command. This isn't a big deal as long as many things are the same, such as the operating system (e.g. Debian), platform (e.g. i386), release version (e.g. 5.0), and a type of filesystem bundled with the operating system (e.g., ext2fs), but differences may cause issues, perhaps particularly when working with the metadata of an entire filesystem volume.

duplicity

OpenPorts.se description states, “uses librsync”, “ Currently duplicity supports deleted files, full unix permissions, directories, symbolic links, fifos, etc., but not hard links.” “Currently local, ftp, ssh/scp, rsync, WebDAV, WebDAVs, HSi, and Amazon S3 backends are available.”

FauBackup

OpenPorts.se page on FauBackup describes as “full and incremental backups on filesystem”. The OpenBSD package is listed as “131.818 KB”.

rdiff-backup
Overview

This program aims to be a more through “backup” program, keeping track of “generations” of files (also referred to as a “snapshot” in time, so a person can notice what files looked like at a certain previous time).

rdiff-backup FAQ about “OSError: [Errno 39] Directory not empty” discourages use with NFS. The main page for rdiff-backup says “Using rdiff-backup to backup files to a server mounted via smbfs or CIFS has been a troublesome configuration for some users.” The difference between SMB and CIFS seem to refer to different specific implementations meant for running under Linux, and the information may be rather old; rdiff-backup FAQ: section about CIFS refers to a 2GB limit of smbfs, and different mount point options for things like “only partial Unicode support”.

rsnapshot
Overview

This is a summary based on one person's current understanding. Hopefully it is rather correct...

Some people seem to think that with rsync's capabilities, and a scheduler such as cron (or the “anacron” package), a person should be able to create a “backup software” program that is really just a fancy command line or two.

The rsnapshot program was designed to be a very simple program that implements that idea.

The rsnapshot.org main/home page refers to the HOWTO, without a hyperlink. Fortunately the FAQ did provide a hyperlink, which was archived before becoming unavailable sometime in October 2015. rsnapshot.org HOWTO archived by the Wayback Machine @ Archive.org

rsync
Overview

Some people may refer to rsync as “backup software”. Let's see what it really does.

This software supports using any one of these separate approaches:

direct data copy

Like the cp command, this can work on local disks and it does not have any special support for networking (although it could be used to copy data over the network if it has help from something like NFS or SMB/CIFS).

SSH

The rsync client can communicate with an SSH server. This can be a standard SSH server that typically receives SSH connections and lets people interact with a remote command line. The SSH server doesn't need to be running a bunch of specialized software that is specific to rsync.

rsync protocol

This may provide some additional capabilities, making it work a bit nicer than the option of just using an SSH server.

The rsync command can compare information about local files with information about remote files, and come up with details about what data needs to be transferred (while being able to ignore a bunch of data that does not need to be transferred because a copy of that data already exists on the server).

Some others
glastree from igmus code, pdumpfs
Options specific to Microsoft Windows
Options that may be built into the operating system
Removable Storage Manager
RSM Notes

(This may apply regardless of whether command line or GUI is used.)

This isn't as much about backing up files, as handling media. Support for RSM (or at least support for backing up to tapes with RSM) has been reduced/eliminated (perhaps by Server 2008; perhaps Server 2008 R2 and Win7 dropped RSM altogether?) Microsoft forum post about RSM API states, “The tape drive management solution is not supported and there is no alternative OS API available to access tape drives on Windows Server 2008 R2.” However “there are inbox tape drivers and changer drivers via which tapes can be accessed (through IOCTLs).” So, the concept is simply handled by a different design/implementation.

MS KB Q250916 notes “If you manually rearrange media in a tape changer without using Removable Storage Manager (RSM) eject / inject wizards, RSM may attempt to mount the wrong media” which can lead to an unpleasant issue: RSM handles inserted backup media by re-cataloging it, “and moves it to the offline media library”. “This behavior is by design.”

So, if RSM is being used to organize media, make sure that RSM is consistently being sufficiently informed anytime that media gets ejected.

KB 250468: How Removable Storage Manager and Programs Recognize Media

Command line
Windows XP Pro Documentation: Rsm
Graphical interface

May be visible within Computer Management.

BackupAssist ntbackup Media Management (Introduction section) states, “The Microsoft Removable Storage Manager and media (tape) management is perhaps one of the most misunderstood parts of Windows. Users frequently complain that it is too complex and frustrating to understand and use.” Fortunately, the site provides a guide.

NTBackup
Compatability

This software is bundled with Microsoft Windows XP and Microsoft Windows Server 2003. The NTBackup software from these operating systems will make *.bkf files. Newer versions of Windows (Windows Vista and newer) do not come with software that handles these *.bkf files, although downloads may be available. Downloads are available for Windows NT Backup - Restore for Windows Vista and 2008 (not 2008 R2). The Removable Storage Manager may need to be enabled for those operating systems. Because the Removable Storage Manager is not included in Windows 7 or Windows Server 2008 R2, those operating systems cannot restore data from *.bkf files on tape drives. If the files are obtained (perhaps stored on an external hard drive), restoration from those files can be performed using the restoration software available for download at KB 974674.

For some less official solutions (which may not be legitimate according to at least some license agreements, so be sure not to ignore any requirements before just plunging into these methods), Slick IT: NTBackup in Win7 indicates that the files ntbackup.exe, ntmsapi.dll, and vssapi.dll located in the system32 folder are all that are needed to have most functionality on Windows 7. Considering that some versions of Windows 7 allow Windows XP Mode, those files may be fairly accessible. (Slick IT: Backing up Exchange suggests Exchange may be backed up by pointing HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\BackupRestore\DLLPaths to an esebcli2.dll file that comes from an old server. The web page notes, “Please remember that this is more of a band-aid solution and is totally unsupported by Microsoft.” As a comment on that web page notes, that does not work with 2008 R2.)

Command line usage

(Currently there are few details about this here.)

Windows XP Pro Product Documentation: ntbackup command line parameters, Microsoft KB814583: Windows Server 2003 information on NTBackupcommand line usage.

Bacula 7.2.x manual: Disaster Recovery of Win32 Systems states, “To restore the system state, you first reload a base operating system, then” after getting the *.bkf file ready, “run NTBackup and catalogue the system statefile, and then select it for restore. The documentation says you can't run a command line restore of the systemstate.”

Graphical interface usage
(Currently there are few details about this here.)

The program may show a calendar view indicating when jobs may run, but not seem to provide much of an interface to see more information about those jobs. Here is how to do it: All such backup jobs show up in Task Scheduler. Find the “backup selection file” (generally a “*.bks file) that is referenced (after the @ in the command line). Then, in NTBackup, Load Selections.

Microsoft KB Q326216: How to use the backup feature to back up and restore data in Windows Server 2003

WBAdmin
WBAdmin command line reference
Backup

Any command called “Backup” from a Microsoft/IBM operating system that predates NTBackup's inclusion (in Windows XP, perhaps earlier) should not be trusted without being thoroughly investigated, and perhaps not then either. Microsoft KB Q100012 refers to BACKUP.EXE file. There have been numerous reliability issues from various bugs in the various versions of such older software.

Third party options often targeting/marketing Microsoft Windows
[#strarc]: strarc (Stream Archive I/O Utility)

Tools and Utils for Windows (Warning: This software hasn't been tested by the author of this text, at the time of this writing. As a standard disclaimer, please determine any sort of security/stability impacts before using it.)

There is a strarc64 for users of 64-bit versions of Microsoft Windows, and strarc for users of 32-bit versions of Microsoft Windows. Online documentation for strarc is available, and documents some known limitations.

[#hobocopy]: Hobocopy

Hobocopy

command line program that uses “Volume Shadow Copy” service and “Microsoft Software Shadow Copy Provider” service. Howtogeek's Hobocopy guide notes, “They can be left as Manual startup, so they don”t need to be running all the time. Hobocopy will start the two services automatically when needed, and the Volume Shadow Copy service will be turned back off after it”s done.”

Wangdera Tools and Utilities: Hobocopy states, “Hobocopy has moved! Please download hobocopy from GitHub:” https://github.com/candera/hobocopy/downloads

http://www.faqforge.com/windows/back-up-files-on-windows-with-hobocopy/

http://www.pcworld.com/article/232769/hobocopy_64bit.html

http://www.howtogeek.com/howto/windows-vista/backupcopy-files-that-are-in-use-or-locked-in-windows/

BackupAssist

Largely a wrapper around NTBackup, this software handles things like scheduling and reporting.

Free updates may be available on the vendor's website, while upgrades from a major version may not be quite as free. To check for the latest version, see BackupAssist Downloads. Perhaps the latest version of some major version numbers may include: (BackupAssist 5.4.8?), BackupAssist 4.2.2, BackupAssist 3.5.3 upgrader (which should be done before Upgrading from BackupAssist v3 to v5, if upgrading to that version).

BackupExec

This software with a history of being marketed/distributed under many brand names, including Veritas which was later purchased by Symantec. This software may use its own drivers rather than relying so much on drivers by the operating system. (That may be bad, with notable overhead and likelihood of more compatibility issues, or it may possibly be good in some cases where these drivers may work better than what is bundled in with Microsoft Windows.)

The software has some unique design elements which may cause a higher learning curve over many other simpler solutions. As an example, the term DLO refers to “Desktop and Laptop Option”, and there is at least one service with “DLO” in its name, and there may even be some licesning consideration (with “DLO licenses”). Symantec Forums: DLO Maintenance Server Info: Comment 767431 has a comment by an account marked as “Symantec Employee” and “Accredited”, which states, “You can find information on a DLO Maintenance Server in the BE Admin Guide. Start on page 1048 where the top of the pages starts with Using Delta File Transfer. At the bottom of the page and for the next few pages you can find more information.” (Hyperlink moved from original text.)

Some software designers tend to like to take a simple concept, such as making a copy of data, and providing simple solutions that hardly really need any documentation. This software has an employee referring people to page 1,048 in a guide.

Acronis

See: Acronis (namely see Acronis report for applicable warnings).

Image-based backups
The benefits
Image based backups are about backing up an image of an entire filesystem/partition/computer, including any data which may be on it. Doing this, there may be less of a need for backing up individual components. Re-installing things may be quicker: instead of needing to re-build an operating system and re-install programs and then restoring files as one step and then restoring databases as another step, all those individual steps may be reduced down to one step: restoring an image file.
The disadvantages

There are some disadvantages to image-backed backups. One is that image-based backups tend to take longer. They store more data overall, because in addition to storing the most critical data (such as files), these types of backups store other information that the file system may be keeping track of, such as how much free space there is on the file system and how files are fragmented.

By being less modular, if there is a problem with a backup job, the entire backup job becomes suspect. If the task of backing up data is split up into multiple jobs, and all the data gets backed up successfully except for E-Mail, that helps to focus on where the problem may be. The problem might not be E-Mail specific; perhaps the problem was that the backup media ran out of space. However, if someone needs to re-make the specific backup job (because that has been known to help with some software), that will go more quickly if the replacement backup job doesn't require the user to again set up everything, including settings related to backing up databases. After making some changes to the settings of the E-Mail backup, a user may try to re-run the backup job for just the E-Mail. That smaller backup job may require less disk space and time than re-running an image-based backup. By completing more quickly, the tasks of fixing the backups and having all the data backed up may be able to be done sooner.

[#ddncsshb]: Using dd/nc/ssh with pv

This assumes there is:

  • an SSH server on the system that will be storing the data
    • or something more complex, such as an SSH server on the system that has a mounted remote filesystem that causes data to be stored in the actually-desired location
  • An SSH client running on the system that contains the data that should be sent.
  • Some other software, mentioned later in the guide (in the section that describes checking that the software is available).

Tunnel creation is the main reason for the SSH software.

This is considered nice because it uses some common programs that are found on many Unix machines. A whole device, such as a whole hard drive, can be backed up with this method. There are some drawbacks to this approach. For instance, if there is an error in the process (like if part of a disk is unreadable), this may effectively disrupt the pipe and affect the whole backup. Recovering, while retaining any successful progress, may be a bit more challenging (possibly requiring some additional thought in the process). These instructions simply show how to get things set up.

boot the system

This may be done with some Live CDs. Knoppix has been known to have all of the required software components.

Secure the client

If the system was booted rather insecure, such as a “Live CD” using a publicly known login, then change the logins. (Do so for both the standard account, and the administrator/superuser “root”) account.)

Hopefully the system is rather secure, other than that. (Such a hope might be based on assumptions which may be quite unwarranted.)

Establish connectivity

If this is going to be done over Wi-Fi, then get associated. (Connect to the SSID.)

Make sure the client system has an IP address.

Optional: Run a “remote access” server on the client

On the s ystem that will be running the SSH client, it may be nice to do things remotely. Have that machine run a “remote access” server (such as an SSH server). Then, a person can use a remote location to log in using the recently-secured credentials. The remaining work can then be done from the remote location.

Note: this next suggestion might be a very terrible idea, or it might improve security. (You probably do NOT want to do this on an established system.) For a LiveCD, the first step might be to delete any pre-existing keys in /etc/ssh/ in case some sort of bundled keys got used, and re-create them using “ ssh-keygen -A ”.

Optional: run a terminal multiplexer

This is technically considered optional, because a backup can be performed without this. However, using a terminal multiplexer is recommended because it may add stability, permitting a connection to succeed even if a terminal connection becomes broken.

Basically, try running tmux. If that isn't installed, try screen. If neither are installed, try installing one (or both) of those. (The tmux program is newer, and recommended by the creator of this guide.)

(Once in the terminal multiplexing software, you can interact with the terminal multiplexer as described by the section on terminal multiplexers.)

Verify that the commands exist

You may want to do this earlier in the process. Verify that these are installed:

dd

Try “ which dd " or “ dd --help ". If you enter “ dd " with no parameter, it will just sit there until it gets an interrupt. (In that case, try sending the program the “Interrupt” signal (a.k.a. “SIGINT”), which can be done by pressing the Interrupt sequence (Ctrl-C).)

netcat

These instructions use netcat. Running netcat without any parameters will safely show some online help. Often, the executable name for this software is nc, though some systems may use netcat instead of (or in addition to) having the command available as nc. Additionally, there may be some other programs, like Socat (socat/netcat comparison), NMap's NCat, Cryptcat, or others may also be used. Wikipedia's article on Netcat: section called “Ports and reimplementations mentions some other possible executable names.

(For another use of netcat, you can see: remote access for redirected shells.)

Microsoft Windows does not include a copy of netcat. A downloadable variation is available from EternallyBored.org: netcat for Win32(/Win64).

An SSH client

An SSH client is needed. Specifically, these instructions use SSH tunnel creation as the primary feature that gets used.

The samples in this guide will likely be based on OpenSSH's ssh. Running this softare without parameters will safely show some help.

It is believed that PuTTY can be used with some parameters similar to OpenSSH's parameters. The precise details might not be identical, and these instructions do not (yet) contain details about using PuTTY instead.

Compression program

Recommended: a compression program, such as xz or bzip2 or gzip all provide a message, about how to show additional help, if the program is run without any parameters.

(At the moment, these instructions are just for gzip.)

Progress Display
optional (but recommended for easy progress tracking) : Run “ pv -h ” which shows help. If pv is run without parameters, then it will start showing a progress display. In that case, try sending the program the “Interrupt” signal (a.k.a. “SIGINT”), which can be done by pressing the Interrupt sequence (Ctrl-C).
Create SSH keys

The easiest way to have this work is generally to have the SSH server allow a connection to be authorized without any manual keystrokes being needed.

If no such key is currently installed/handy, then simply making a new one may be a good approach.

generating SSH key files

This might be easiest to do on a client.

ssh-keygen -N "" -f /tmp/tmpkey

Deploy deploying SSH keys

On server, paste key, e.g.:
cpytobak ~/.ssh/authorized_keys
cat /tmp/tmpkey.pub | tee -n ~/.ssh/authorized_keys

Make sure the file identifies the key, so that the key isn't unrecognized later:
cpytobak ~/.ssh/authorized_keys
echo $VISUAL
$VISUAL ~/.ssh/authorized_keys

Find out how big to back up

The following may be a command that shows the number of bytes on the disk.

In BSD, try: “ fdisk diskname ”. In Linux, list the contents with “ fdisk -l diskname

(Alternatively, it might show some other information that could be useful, such as the number of sectors on a disk. If the disk has a known sector size, such as a half-kilobyte used on many disks (particularly older disks, smaller than 2TB)), then that information may be useful.)

Test SSH

Create an SSH connection. A key reason for doing this is to handle key verification. (See: Manually verifying/validating key signature fingerprints.) If the key can be trusted, then a key purpose of this test is to be able to manually answer yes to the prompt shown by the OpenSSH client. That way, the answer to that prompt may be saved (by the OpenSSH client, which gets stored in ~/.ssh/known_hosts on a Unix machine). By saving the ansewr to that prompt, the prompt won't need to show up during future attempts, such as when trying to perform the backup.

Verify ports unused

If this has been attempted multiple times, it is possible that some additional listeners may be in the background. Doing so can cause problems, as data may be getting sent to the wrong listener. Verify that the intended TCP ports are unused. The safest way is to check what ports are actually used (e.g., netstat -na). Another approach, which may be less effective (at least in theory), but possibly faster (in actual practice, becasue noticing problems might be a bit quicker) may be to check for ssh running on the computer with the SSH client, and nc running on the computer with the SSH server.

Jot down the upcoming commands

Customize them as needed.

The purpose of this is so that commands can be run quickly (possibly by using “copy and paste”). Preliminary testing indicated that the tunnel might time out, so running the next command quickly is desirable.

Set up the tunnel

Set up the tunnel. On the client, run:

ssh -f -i /tmp/tmpkey -p 22 username@192.0.2.1 -L 23333:127.0.0.1:3333 'nc -l 3333 | pv -ptrba -cN recvd | gzip -dc | pv -petrbas 123456789123 -cN expand | dd of=/tmp/trythis'

  • Some changes may be needed to successfully login, including the host address (DNS host name or IP address), username, TCP port number, and location of the private key.
  • Towards the end is a filename. This should probably be customized. The filename specified here is the output filename, which will be created on the remote system. (The specified path must exist on the remote system, and does not have to exist on the local system.)
  • The 3333 could be customized. Just make sure the customize it in both the -L for the SSH client, and the -l for netcat.
  • The 23333 can be customized, and if it is, the next line also needs to be customized.
    • In this example, the numbers 3333 and 23333 are different, but similar looking. They do not need to be similar looking. There is no reason why they must be different, but they can be, so this example shows different values. (That way people can see what values must match, and what values may be different.)
  • All of the commands in between the apostrophes will be run on the remote server. (So, if the remote server doesn't have the pv command, then remove that reference.)
  • The 123456789123 is the amount of data to be transferred. (An earlier instruction involved figuring out that number.) (Specifying this number incorrectly causes no problem except for inaccurate estimates. So, if this detail is problematic to get, then just make an estimate.)
  • You may want to not extract the file right away. For example, remove the “ gzip -dc | ”, and then add a .gz to the filename. (Make appropriate changes to that example suggestion if some other program, like xz, is being used.) One disadvantage is that the incoming data won't use the decompressor's integrity checks to detect potential problems with the data (from when the file was compressed, or transferred). That check can be done after the file is transferred, but any such errors may not be reported, or have any other impact, until a much later time (when the check is performed, which is presumably after the file is transferred).

An optional step at this point: you can confirm that the client is listening on TCP port 23333 and that the server is listening on TCP 3333.

Start the transfer

On the client, run:

sudo dd if=/dev/sda conv=noerror,sync | pv -petrbas 123456789123 -cN rawdat | gzip -c -9 | pv -ptrba -cN compr | nc 127.0.0.1 23333
  • The 23333 needs to match what was specified when the tunnel was created.
  • The 123456789123 is the amount of data to be transferred. (An earlier instruction involved figuring out that number.) (Specifying this number incorrectly causes no problem except for inaccurate estimates. So, if this detail is problematic to get, then just make an estimate.)

If pv was unavailable, progress may be estimatable by monitoring the size of the file (or perhaps the amount of disk space that is free on the server, if other changes are not expected). Make sure to note the time that the transfer started (if pv is unavailable; while pv is running, it will show the time that has been used so far.)

Misc:

Q: Why use netcat?
A: In theory, one could try to pipe directly to ssh. The problem is that sending output directly over ssh could cause an “escape sequence”, such as “~.” (tilde, period), to be treated special by SSH software. That may effectively break the file transfer. By using netcat, SSH is handling all of the data with a tunnel. The SSH software doesn't check the tunnel for the escape sequence. TechRepublic.com: How to escape SSH sessions without ending them
Bad output:
You don't want this:
bind: Address already in use
channel_setup_fwd_listener: cannot listen to port: 23333
Could not request local forwarding.
(then are the results of file transfer, showing 0% done)

De-buffer
Speculation: disabling pipe buffer (OpenBSD), disabling pipe buffer (Linux?), disabling buffering through pipe bi-redirection
Follow-up

If you're using a terminal multiplexer, you may wish to split the screen, reducing the chance of an accidental keystroke being sent to the window that is causing the backup. Recommended directions are as follows:

tmux
Press the command key (Ctrl-B by default), and then " (that's a quotation mark, which is Shift-' (apostraphe)).
screen

If using screen, the instructions are a bit longer. Press the command key (Ctrl-A by default), and then S (that's a capital letter S, so Shift-s).

Then press the command key again (Ctrl-A by default), and then press Tab to start interacting with the new area. The cursor should be shown in the new area.

Finally, create a prompt (if desired) by pressing the command key (Ctrl-A by default) again, and then press c.

Databases
SQL
Microsoft SQL
Using VSS with specialized backup software

(This section may benefit from further research. Consider this material to be preliminary; verify these detials before counting on any of them.)

A method, which is often preferred by many professionals, is to use backup software which has specific support for backing up SQL. There may be some advantage to using a method that supports a VSS (Volume Shadow Copy Service) writer. Such a writer might be visible by using vssadmin List Providers.

Using Microsoft's SQL software

Another approach is to use SQL-specific software. The choices available may depend on what version of the SQL software is being used.

Handling log files

This information may be relevant whether using a command line approach, or the graphical user interface. (This might only apply to situations where the data recovery model is set to full or bulk-logged, not when using the recovery model called simple?)

A process which may be at least a bit destructive, and so which should generally be avoided, is flipping the database recovery model from anything (“full” or “bulk-logged”) to Simple. There may be some benefits to using the Simple recovery model (after all, why would Microsoft keep supporting that recovery model if there were no advantages to it). However, if backup methods seem to be resulting in log files remaining large, choosing to start using the Simple recovery model is NOT as good of an approach as improving the method of backing up the data. Since the switch to Simple may cause some undesirable data loss, it is instead better to try following the instructions in this guide.

Even the alternative of detaching the databases (which is generally recognized as an undesirable alternative) may cause less data loss than flipping to Simple recovery mode. So, do not just flip to Simple as an attempt to quickly implement a solution! However, even better instructions (better than detaching the databases) are provided by the following text.

When learning about transaction logs becoming large, a person may quickly learn that the *.LDF “Log” data files store transactions, and that these files should not become excessively large. They should not get to be larger than the main database. If they do, this may be an indication that the files are not getting backed up properly.

The preferred process, if VSS isn't being used, is to follow these four steps:

  1. Back up the main database file
  2. Separately, back up the transaction logs
  3. Perform a safe “truncation”, which moves needed data to the beginning of the file. Note that there are also unsafe truncations, so learn about that before proceeding with this step. This step should be done right after a successful backup of the transaction log file, and the main database file should also have been backed up.
  4. Shrink a file, which eliminates unneeded space at the end of the file. This can be done anytime, but unless it is done right after a truncation, it is likely that more shrinkage will be possible after performing the truncation task.
Truncating
Discussing the danger

First and foremost: there may be commonly be some dangerous advice given. For instance, with at least some versions of Microsoft SQL, the Transact-SQL command called BACKUP may support an option called “TRUNCATE_ONLY”, and a synonym for “TRUNCATE_ONLY” called “NO_LOG”. This is considered to be dangerous (risking data loss). MSDN: Microsoft SQL Server 2005 reference for the BACKUP (Transact-SQL) command notes, “We recommend that you never use NO_LOG or TRUNCATE_ONLY to manually truncate the transaction log, because this breaks the log chain.” Furthermore, (an earlier part of) the article says, “This option will be removed in a future version of SQL Server. Avoid using it in new development work, and plan to modify applications that currently use it.” Indeed, that does seem to be true: Pinal Dave's SQLAuthority.com Blog about “SHRINKFILE and TRUNCATE Log File in SQL Server 2008” shows that a reader had problems using the command in Windows Server 2008.

SQLServerPedia article: “Backup Log with Truncate_Only: Like a Bear Trap” likens this situation to... an unpleasant situation. The article's second paragraph is unique and creative enough to be worth quoting in its entirety:

It’s somewhat akin to asking, “What’s the best way to cut my hand off to free myself from this bear trap before I starve to death in the wilderness?”  Well, you shouldn’t be sticking your hand in bear traps to begin with, but if you find your hand in a bear trap, ANY way to get out of it is a good way.  Pocket knife, teeth, band saw, whatever it takes.  (Now there’s a sunny image to start the day.)

So, a fast way to reduce the size of the log file, which is also a way that might initially appear like it worked okay, can be destructive. So, let's look for a better way to handle the log files.

SQLServerPedia article: “Backup Log with Truncate_Only: Like a Bear Trap” seems to take a dim view of truncating log files, and instead seems to be recommending to just allocate whatever space is needed.

OTOH, Microsoft's documentation on the subject doesn't sound nearly so bad. MSDN: Microsoft SQL Server 2005 reference for the BACKUP (Transact-SQL) command states, “After a typical log backup some transaction log records become inactive, unless you specify WITH NO_TRUNCATE or COPY_ONLY. The log is truncated after all the records within one or more virtual log files become inactive. If the log is not being truncated after routine log backups, something might be delaying log truncation.”

So, surely truncating can't be all that bad if it happens automatically during a “typical” log backup.

Overview

“truncate”, which basically involves moving needed data towards the beginning of a file. The log file may remain fairly large, with unneeded/inactive/discarded data left at the end. This way, when more data is needed, the unneeded/inactive/discarded data may be overwritten, and the file doesn't need to grow. This may be preferred so that there isn't unnecessary file shrinkage and growing, since growing takes some resources (e.g. time) and the series of shrinking and growing may lead to increased fragmentation.

Details on preferred implementation
...
Shrinking

Note that repeatedly shrinking the file may be considered to be a novice and futile approach if the file will simply re-grow regularly. The software will spend resources (like time) re-growing the log file, and the practice of regularly shrinking and re-growing the log file may cause the file to be more likely to experience disk fragmentation.

However, there may be some cases where it simply makes sense. A log file for a SharePoint_Config database may grow regularly, unnecessarily. Shrinking the file may cause it to grow from a shorter size, and so it never gets too ridiculously big.

It appears that perhaps the truncation and/or shrinking operation(s) may not be immediate.

MSDN: MS SQL 2008 R2 documentation: Managing the Size of the Transaction Log states, “Log truncation is essential because it frees disk space for reuse, but it does not reduce the size if the physical log file. To reduce its physical size, the log file must be shrunk to remove” inactive sections of the file.

Shrinking is simply removing the unneeded/inactive/discarded data that is left at the end of a log file. In a pinch, shrinking may be done before other actions, such as running a backup and truncating. However, the effect may be minimal. Shrinking after truncating after a backup may have a more noticeable impact. So, after performing such a shrink operation, it will likely be desirable to run a backup, and then carefully truncate (being wary that some methods of truncation can cause data loss), and then to shrink again, so that an impactful shrink operation occurs.

Further details may be in MSDN MS SQL 2008 R2 documentation: Transaction Log Management.

The guide to using command line options will cover shrinking the file.

Microsoft's command line options for working with SQL

Microsoft KB 325003: How to manage the SQL Server Desktop Engine (MSDE 2000) or SQL Server 2005 Express Edition by using the osql utility states, “The only tool that is provided with MSDE 2000 is the osql utility. The executable file, Sql.exe, is located in the MSSQL\Binn folder for a default instance of MSDE 2000.” ... “you can also use the osql utility to manage SQL Server 2005 Express Edition. However, this feature will be removed in a future version of Microsoft SQL Server 2005. We recommend that you do not use this feature in new development work and plan to modify applications that currently use the feature. Use the Sqlcmd utility instead.”

A cursory review makes it appear that both utilities may have some similar syntax, so (in some cases) the same command line parameters might even work with both pieces of software. However, the command to run may need to be different.

The programs may run “Transact-SQL” queries.

The following was written by memory and/or documentation, and so may need to be verified before being put to use:

First, identify the name of the database to be backed up. (Admittedly, further details on this process might be helpful.)

Make a text file, that looks something like this:

CHECKPOINT
BACKUP DATABASE databasename TO 'D:\MyDir\dbdata.bak' WITH NOINIT, COMPRESSION, CHECKSUM, STATS=1, NO_TRUNCATE;
GO
BACKUP LOG databasename TO 'D:\MyDir\dblog.bak' WITH NOINIT, COMPRESSION, CHECKSUM, STATS=1, NO_TRUNCATE;
GO
DDBC SHRINKFILE ( databasename, 0, NOTRUNCATE )
DDBC SHRINKFILE ( databasename, 0)
GO
QUIT

Note that the first BACKUP line and the second BACKUP line have two differences: the type of data getting backed up (which is the second word on the line) is just one difference. The other key difference, to ensure to be different, is the specified destination (output filename). Make sure to use different filenames.

Then run a command that looks something like this:

Sqlcmd -S . -i input.sql -o output.log

The above command is a bit abbreviated, and may need to be customized a bit further. Most strikingly, the “-S ” parameter specifies the SQL instance. The period (“.”) refers to the current machine. Alternatively, a UNC may be provided (as in something like “\\MachName”) After the name of the machine, one may optionally specify the path for an SQL Server “instance” that is running on the server. In most common cases, that will consist of a backslash, followed by the name of the SQL Server instance. (To help identify the name of the instance, check out the name(s) of related Windows services.) If no instance path is provided, a “default” instance on the machine is used. An “instance” is similar in concept to a seperatable, isolated installation of the running SQL Server software, and this is different than the concept of separate databases that are part of a single SQL Server instance.

Other command line parameters may be related to authentication.

It may also be possible to put a Transact-SQL query directly on the command line, without using an input file, by using “ -q "command" ”, however just doing a single “query” command, like backing up, may function to backup and output files, but then not automatically exit the program.

More information may be available by using: MSDN: Using the sqlcmd Utility, MSDN: MS SQL Server 2008 R2's documentation on the CHECKPOINT Transact-SQL command, MSDN: MS SQL Server 2008 R2's BACKUP (Transact-SQL) DDBC Shrinkfile.

SQL Server Management Studio

For those who like a graphical approach, SQL Management Studio may come bundled with the software, and/or may be downloadable. If it isn't installed yet, check to see if there is a version available to download from Microsoft. This SQL Server Management Studio software may be compatible only with specific intended version(s) of Microsoft's SQLServer software.

Once this software is installed on one machine, it may be able to handle SQL databases stored on the same machine, and/or may be able to work with remote machines.

MSDN: SQL Server 2008 R2 documentation: How to Back Up a Database with SQL Server Management Studio.

Detaching the database files

Detaching the database files is yet another method to back up SQL data. Once the database is detached, the database files shouldn't be in use, so they can be backed up like any other files. (The only thing about database files that are detached, that may make them a bit more difficult to deal with than other regular ol' files, is just that the database files may be fairly large.)

Taking this approach causes the database to be offline while the database files get copied, which makes this situation less than classy. Another factor that makes this situation less than ideal is that the SQL software may not be notified by the backup software that a suitable backup was made, and so Transaction Log Files (*.ldf files) may not end up lowering in size. However, despite these advantages, this approach can end up getting the data reliably backed up.

SQL Server Forums Post: Topic 136413 (“Difference between .bak & .mdf, .ldf”), Tara Kizer discusses SQL backups, saying, “We specifically exclude the file backups of all mdf, ldf, and ndf files from our file backup software (Netbackup). On the SQL Server boxes, all we need are the bak, trn, and dif files.” Review should be performed to check into that.

Some information about the files

(This information may, sometime, be moved out of this section about backups, and into a section more specific to database software. For now, though, it is here.)

MDF file
The primary data file, which usually ends with an extension of “.mdf”. Each database has just one “.mdf” file. This file points to the other files as needed. This database's “primary filegroup” will include this file.
NDF file
Many databases may have zero NDF files, but if one or more NDF files do exist, then each holds critical data (much like an MDF file). These may be part of a “filegroup”. SQL Server Forums Post: Topic 136413 (“Difference between .bak & .mdf, .ldf”), a user titled “behrman” notes “.mdf is extension of primary file,” and “.ndf is extension of secondary file.”
LDF

LDF files are transaction log files. Each database should have at least one LDF file. If an LDF file is missing, it may be possible to get an old version of the database to be somewhat functional, although there may still be some data loss.

Relatively large LDF files (particularly relatively large compared to the MDF file) may indicate that data is not being backed up very sufficiently. Beware of using truncate commands.

About the requires size: MS KB Q110139: “INFO: Causes of SQL Transaction Log Filling Up” says: “Since a certain fixed amount of transaction record overhead must be recorded for each logged row, the ratio of updated data to log space consumption will vary depending on the row width. For a narrow row, the amount of log space consumed for a particular UPDATE, DELETE or INSERT could be ten times the data space consumed. For wider rows, the amount of log space consumed will be proportionately less.” “The amount of log space required can vary depending on many factors and is very difficult to predict accurately beforehand.” and “Attempting to size the log based solely on calculations and without tests is difficult and often inaccurate.” An example of a test is provided (“DBCC CHECKTABLE(syslogs), which returns the number of 2048-byte data pages in the log”), but requires “executing a representative sample of your data modification queries.” The term “representative” may cause very different input for those tests, even if two different databases used by two different organizations are using the same software.

E-Mail
...
Registry

For Microsoft Windows systems (starting with the 32-bit releases: Windows 95 and probably Windows NT), the “registry” stores a lot of data which is considered critical for the operating system itself to start, as well as for devices to work properly and for a lot of software that has been installed onto the operating system.

There are a few ways to back it up. A lot of backup software will typically just try to include the “system state” which includes the registry. (More details on that approach are in the “System state” section.) By saving the system state, the registry and some other critical data gets saved. Restoring a portion of a registry may be more difficult if it is only available as part of a backup of a system state, but the advantage to the system state is that more information is available (even if it is less convenient to access). Having all of the needed information available, even if some of it may be less conveneint to restore, is generally preferred over not having all of the needed information available (even if some of it is more convenient to restore). This is why backing up the system state is such a common practice.

However, if only the registry, or a portion of the registry, is meant to be backed up, then there may be other options. These options may be faster for backing up an entire system state, and exporting does offer restore options that may be much easier.

Exporting

Exporting: This can be done from the command line (with the REG EXPORT command) in some versions of Windows (including Vista; this should be checked to determine what earlier versions/service-packs support this), or using the graphical tool (which may, almost certainly does work in more operating systems).

Other command line options

The REG command may have SAVE and RESTORE options. (The effectiveness of these options, and details on how to use these options, is not documented here at this current time.)

For Windows Mellennium Edition (Windows ME) and Windows 98: This may be hanlded using the “Registry Checker Tool” ( Scanreg.exe and/or Scanregw.exe ). Note that restoring the full registry with this “Registry Checker Tool” may be (or most certainly is) something that requires the user to not be actively running the graphical user interface that relies on the registry. Details are in the section of restoring. The actions of this software may be determined by the command line switches, documented by MS KB Q184023: Command-Line Switches for the Registry Checker Tool, and the ???\Scanreg.ini file which is documented by MS KB Q183603: How to Customize Registry Checker Tool Settings. A special tool for customizing this is SREdit (Is that SREdit.exe? Documentation?) from the Windows 98 Resource Kit CD-ROM. If ScanReg.ini does not have a “BackupDirectory=” line, the default location is Windows\Sysbckup (according to Q183603, although more likely it is Sysbckup under the Windows folder which is %windir%. MS KB Q183887: Description of the Windows Registry Checker Tool seems to suggest the filenames are rb0*.cab filenames. MS KB Q221512: Manually restoring Win98/Me registry says “By default, five previous copies or the registry are stored.” That number, five, seems to correspond with the default value of MaxBackupCopies in ScanReg.ini. MS KB Q183887: Description of the Windows Registry Checker Tool notes that “no more than five is recommended” for the number of backups to keep.

“System state”

The system state includes several “components”. For Windows XP Professional and Windows Server 2003, System State Data for Windows Server 2003 (and XP Pro) list these components. The components do include the registry, COM+ Class Registration database, system files that are under Windows File protection, and boot files including system files. Additional components may also be included depending on the system.

For users of Microsoft Windows systems, the typical way to back up certain data on an active system is to perform a backup of the “system state”. After a special system state backup is performed, that system state backup can be handled directly by the backup software, or the system state may be stored in a file which still isn't backed up (but which can easily be backed up just like any other standard file that can be backed up simply).

One key component of the system state is the registry. There are methods to back up the registry other than using the system state.

Backing up the system state in Windows Server 2008

For Windows Server 2008, this may be backed up using something like:

wbadmin start systemstatebackup -quiet -backuptarget:c:

However, if the specified destination target is a “critical volume”, such as the boot disk, an error will occur unless the registry specifies this should be allowed.

Backing up the system state in Windows Server 2003, Windows XP, and Windows 2000
For Windows Server 2003, Windows XP, and Windows 2000, the following worked:
ntbackup backup systemstate /F c:\sysstate.bkf
Rotation Schemes
...
Scheduling

Scheduling has been known to be a bit challenging when organizations strive to minimize disruption of a network.

][CyberPillar][ now provides: Backup Sample (from June 2012) (extracted, also available as Backup Sample (from June 2012) compressed). This was taken from an actual IT company (with permission). To protect confidentiality, some details were slightly edited (the names of the computer systems weren't actually “SysB” and “SysD”). It shows how one organization had 64 different jobs that were set up in the backup software. Different types of data, such as the E-Mail database and SQL information, were backed up with different plugins, and a key reason for multiple backup jobs was so different types of data would get backed up frequently to multiple locations. The organization eventually switched to some different backup software. Still, this historic sample shows a taste of how complex the jobs may be. This was for an organization that had just one primary office with data that needed to be backed up.

[rstordat]: Restoring data
Considerations

Two key questions are as follows: What version of the data is desired? Where should the data be restored to?

If backups are made once per day, and a file was deleted on Wednesday, and today is Friday, then Wednesday's version is desired if it is available, otherwise Tuesday's version is desired. Clearly Thursday's version version of the data, which indicates that the file does not exist and is gone, is probably not desirable. Try to determine when the incident occurred so that the latest version of the data from before that incident may be restored.

For files on a Microsoft Windows systems (including a network where the data resides on a Microsoft Windows Server, even if the client is using another system), the Shadow Copy Restore functionality may be able to pull a version of the file that is more recent than the full system backups. Consider checking this.

Some specific types of data to be restored
[#restorfl]: Restoring files and directories/folders from backup
[#shdcprst]: Shadow Copy Restore (with Shadow Copy Client)

First, the machine with the data (presumably a server?) needs to have support pre-enabled for this. Also, the client needs to have support for this.

An option may be using Shadow Copy Client software which may be pre-installed or, for Windows 2000 SP3 (and newer) or Windows XP, may be downloadable (and installed if the system has support for Windows Installer 2.0). Archived: Daniel Petri's “How do I setup the Shadow Copy Client? How do I use the Shadow Copy feature?” says “For Windows XP Professional, the code is available on the Windows Server 2003 CD at %Windir%\System32\Clients\Twclient\X86.” (This does not make sense: %Windir%%Windir% would refer to the installation on a hard drive, not on the CD.) That same web page says supported operating systems include Win XP Pro, XP Home, 2K Server SP3+, 2K Pro, and 98 (but not ME or NT Server 4 or NT Workstation 4).

Information about restoring with the shadow copy client may be available at:

[#prevflvr]: “Previous Versions”
The “Previous Versions” support in Microsoft Windows

Windows.Microsoft.com Vista documentation: Previous versions of files (FAQ)

Note: If none of these options seems to be available, see also the section about recovering files.

Image-based backups

The first step of restoring an image may be to plan how the image will be used. In some cases, the restored data may be used in a different manner than how the data was being used when backed up. The two main methods of consideration here are to load the image onto actual hardware which is run directly, or to use a virtual machine.

Databases
...
E-Mail
...
“System state”/Registry

For Windows Mellennium Edition (Windows ME) and Windows 98: Registry backups may exist on the local system and be able to be restored with the following procedure (which is heavily based on MS KB Q221512: Restoring Win98/Me Registry):

  1. Get to a command prompt that isn't using the registry. In Windows 98, Microsoft's recommendation is to hold the Ctrl key while booting, and then choose “Safe Mode Command Prompt Only”. In Windows Me, Microsoft's recommendation is to use a startup disk. Note that there may be another option by altering the startup process on WinME: That does not require applying binary changes to the IO.SYS/MSDOS.SYS files.
  2. From the command prompt, use: “ %windir%\command\scanreg /restore
  3. Note that the word “Started” nearby a date will indicate a “properly working registry” (according to MS KB Q221512, which is hyperlinked above). Choose which item to restore.
  4. Press Enter to reboot the system.
Verifying backed up data
For a single file, see comparing files.
Recovery documentation
Noting things such as where credentials are stored, where software installation keys exist, and where media is located.
[#recovrfl]: Recovering files
Backups
Recovering from backups may be the best way. See the section about restoring data (such as the subsection about Restoring files and directories/folders from backup).
File cache

If a copy of the file hasn't been fully deleted, getting a copy of the not-yet-deleted file could be a way to recover the file. For instance, if a file is in a location used by a web browser's cache, getting the file from that sort of location may be doable until the data is deleted.

Version history

Perhaps related to the previous section about using a file cache.

Microsoft Windows Shadow Copy Restore

See: section about Shadow Copy Restore, previous versions.

Unerase

Note: There was previously some text here about using “Unerase” technologies. That information was an accidentally-created copy, and so was redundant, and so has now been removed from here. An existing copy is still available: see undelete/unerase

Scanning a drive for signatures

Note: There was previously some text here about “Scanning a drive for signatures”. That information was an accidentally-created copy, and so was redundant, and so has now been removed from here. An existing copy is still available: see data signature scanning