[#impristg]:

Images of “Primary Storage” Devices (e.g. Hard Drives)

OLD INFO:

Information about hard drive images is covered in more detail in the Virtualization page: disk image files (and more specifically the Virtualization page: file formats for disk images section). Such disk images could potentially have a use other than being used by software that implements virtual machines. However, software that implements virtual machines will often come bundled with software that handles hard drive images.

For making images, specifically see the Virtualization page: section subsection about making disk images. For a discussion about various disk image formats, see that same approximate section of documentation. For converting, see Primary Storage Devices: converting disk images. For a discussion about various disk image formats, see that same section. For converting, see Virtualization page: converting disk images.

[#mkdskimg]: Creating a hard drive image

Specialized tools do exist for creating disk images in various formats.

[#imprstfm]: Choosing a file format for a hard drive image

Determine what format is desired.

Because support for image types tends to vary between different virtual machine software programs, and because different people may have different goals, this guide does NOT try to recommend any one format over another. Various options are typically documented with the virtual machine software. The biggest thing to be certain of is just to make sure that the chosen format is compatible with the virtual machine software that will be used. Looking over the list of file formats may be one way to do that, as the descriptions of the file formats may make a reference to what virtualization software supports the formats. If the virtualization software provides an interface to create an image, then the image format used to create the image will likely be a compatible format. An even more likely way to get this needed information about image format compatibility is to locate and review documentation about the virtualization software.

Some image formats may support various disk image features, such as data compression or snapshot/child image support.

Some notes about snapshot/child images

For formats that support snapshot/child images, determine whether to make the new image a snapshot/child of an already existing image. As a hint that may be useful when working with some disk formats, a snapshot/child image requires a pre-existing “parent/basing image”/“backing file”, so the first image made needs to be a “parent/basing image”/“backing file”. For at least some formats, there is no difference between the data in a stand-alone image (which has never had a snapshot/child images created from it) compared to the data in a “parent/basing image”/“backing file” which has had one or more snapshot/child images made from it.

There typically is no point to making a snapshot/child image until there is first a useful “parent/base image”/“backing file” image that already has data that is desirable to preserve and/or share to multiple other disk images. Therefore, first-time users making an initial virtual machine will want to be making a “parent/base image”/“backing file” image.

If it is desired to make a snapshot/child image, be sure to choose a format that implements the snapshot/child image feature. The formats listed below will mention support for implementing that feature (and, if instructions are available, the section will also discuss how to use that feature by making a snapshot/child image), if the format is known to support that feature. (Note: Some of these formats may support the feature, but the info may just not be known yet. This guide will require further testing to fully determine which formats use the feature.) Some formats may be known to not commonly support such a feature: some definite examples would be the popular image formats of the a RAW disk image format and *.iso (ISO9660) optical disc images.

Compatability

Some software (e.g. virtual machine software) may support some format(s) better than other format(s). Like many data file formats that describe an identical type of data as other file formats, software incompatility is often NOT a show-stopping issue if the formats can be sufficiently converted. Disk image formats can often be converted successfully. However, compatibility issues could arise simply from one piece of software not supporting an image file format as well as other formats. For details, be sure to check all of the relevant documentation with the virtual machine software. The section on Disk image features may also be helpful.

Determine how big the file will be

The file, of course, ideally should be large enough to store the data that is, and will be, needed. So, this will depend on needs. As an example, when virtual machines are being used, the section about Creating virtual machines: section about determining how big of hard drive space to use may have some more info.

[#lcimprst]: Determine location where the file will be stored

Some hints, if the image is going to be used for a virtual machine, may be included in the guide on Creating virtual machines: section about determining a location for the image.

Create the image

Details on how this may be done, using a particular image format, could be in the section about disk image file formats.

[#dskimcnv]: Converting disk images
Converting to an entirely different image file format

This may often be done with software that implements virtual machines. (In at least some cases, a virtual machine software package may include this support using a different executable file than the program that allows a user to start up a virtual machine.)

Some software that supports multiple formats may be listed here, in this section. (If there is other, more specialized software that may work with fewer conversions, such software might be included in the relevant portions of the section describing the various disk image file formats.)

qemu-img

A program useful for converting images of multiple types is the command qemu-img which does come with Qemu. Some people have installed Qemu solely to use this command.

(Note that some versions of this tool may work better with some formats than other versions of the same tool. Using a different version of the tool may be helpful. Since software should, at least in theory, be easily locatable, the following hint should be less useful: If an older version of the tool for one operating system is not readily located, but another operating system is readily accessible, one may wish to check if that other operating system has a Qemu version readily available. The version of the conversion tool might be different. Despite whether or not this hint should be useful, the reality has been that this approach has sometimes helped make a conversion (involving qcow (version 1)) happen successfully.)

virt-v2v
Xen to KVM Migration Wiki, RedHat Documentation on migrating virtualization data discuss a command called virt-v2v. Multiple web pages about this software seem to have a tie to RedHat.
Changing image properties
e.g.: compression of disk images
[#modskimg]: Modifying contents of a disk image

Of course, one option may be to load an image in a virtual machine and have the virtual machine modify the image as if it was a real disk. Another option, which often might not be a good option for hard drive images (but might be more convenient for a format like an ISO 9660 image where the filesystem would get created at the same time as making the new image), might be to extract the contents of a disk image, and then make a new image. These are, however, often not the most pleasant methods when more specialized tools may offer slicker options.

Other options may include:

  • TOOGAM's Archiver page has some DOS software for manipulating images of FAT-based floppy disks (in the section called “Disk image manipulating”).
  • Imgtool
[#dskifetr]: Disk format features
Software support
An aspect to note, when looking at disk images and various features of a specific way of storing data in a disk image, is whether the image is expected to be supported by the software that is intended to be used.
[#mkchdimg]: Compression of disk images

Some people may think this idea is just as terrible as the idea of using “disk compression” software to try to perform data compression on a hard drive's contents. That is likely true, but see CHD myths to read about some of the misconceptions about just how bad either of those ideas are. (The theory isn't nearly as bad as many people think, and in fact this technology is deployed much more widely than what many of those same people think.)

Naturally, a disk image is data. Although truely randomized data isn't effectively compressible, most interesting data is structured well enough that some level of lossless compressibility may be achievable. Like other data which is compressible, such data is probably compressible using general file compression. (More details about general data compression may be in a guide to compressing data, and TOOGAM's Archiver page.)

However, one noteworthy aspect to compressing images is to determine whether specialized data compression may be available and beneficial. Instead of just relying on a compression techniques which is generalized for various types of data, determine whether specialized (instead of rather general) techniques may be usable. Using specialized compression may offer options that result in smaller disk images and/or faster disk images, and at least some of the virtual machine software options may directly support such specialized compression methods natively, allowing the compressed image to be used.

In fact, there is at least one technique that may be fairly widely implemented and which may very frequently have an extremely substantial impact. That technique is for software to recognize “holes” and implement support for them without taking up large amounts of space. This can frequently offer substantial reductions in the disk space needed for an image. Whether this is available may depend on the virtual machine software, the image format (a disk image of “RAW” bytes would not have this feature), and possibly also how the data is stored on the virtual machine. (For example, if the data on the disk of the virtual machine is stored using an encrypted file system, the encryption may cause the holes to not be visibly usable, even by software that does support holes in other types of file systems.)

The ability to compress the disk image may vary, with a key factor being the format of the disk image. For example, a “RAW” disk image (simply storing the data in each sector) is a format with no support for disk compression. (Such an image may still be compressed, and may still be used by virtual machine software, if compression is used in a way that is transparent to the virtual machine software. The word “transparent” refers to the concept that the virtual machine software would not require any extra work/code/processes to support the data compression. For example, if the hard drive image was stored on a compressed disk... an idea that surely gives some experts shudders when they consider the performance impact of such a setup.)

A sensible time for compressing an image may be when there are thoughts of making a child image. After all, at that point there are probably not a lot of thoughts of changing the parent image, so the version of the data which would be the parent image seems to be some data that won't change frequently, and so may be compressed.

The disk image may (in practice, to make things easier overall) need to be taken offline, meaning that no active virtual machines are actively using the disk image, during the compression. This might not actually be an absolute requirement, although trying to work around might not be reasonably easy.

Having information available during disk compression

It may be nice to run some other commands before running the compression. For example, if modifying an image in Qemu's qcow2 format, the program being used will be the qemu-img command. Using that as an example, running something like the following may be worthwhile.

df ; date ; time qemu-img (qemu-img_subcommand) (other_options_for_qemu-img)

(The above example simply shows a generalization, and is referencing qemu-img as an example command. Further specific details about using qemu-img to compress are covered in the section about converting/compressing an image with qemu-img.)

(These commands show the amount of disk space that was free before the compressed file started to be created, and shows the time when the disk compression started. These commands are being run under the assumption that the compression may use up a fairly high amount of disk space and/or time, although how true that assumption is will depend on the contents of the actual disk image. Progress on completing the compression may be loosely tracked by seeing how much disk space has been taken up by the new image, and how much time has passed so far. The natural assumption used here is that the compression is not likely to need to take up much more space than the uncompressed file. The other assumption is that the amount of disk space which is free is an amount which is not being altered much by other tasks. The latter assumption may be false, but it may help provide an ability to manually perform some time estimates that might be accurate if compression attempts result in insignificant file size savings. If the data written so far was heavily compressed, and so represents a larger portion of the size of the original file, then the compressing task may be done quicker than the estimate.)

[#ecmcmpcd]: Removing ECC data from optical disc images

This is a bit off-topic, since the compressed file is not known to work directly with virtual machines at the time of this writing, but it may still be worth mentioning to see if further interest and support may be drummed up. There is some GPL software called ECM which can compress disc images of (CD-ROM) optical discs. The compression is format-specific and does not involve the standard compression techniques of data which store bits in un-recompressable ways. This means that after the image is losslessly compressed with ECM, it could still be losslessly re-compressed with another general-purpose compression software. The Error Code Modeler (“ECM”) web page shows some statistics of 15%-18.5% improvement compared to just using a standard compressor. (This theory is elaborated upon by How ECM works and some statistics showing improvement between 15%-18.5% are shown on the site's name page, although the second graphic, ecm2.gif, is a bit misleading because the non-Data areas are likely to be compressed at least somewhat.)

The only real problem with using ECM is that ECM-compressed images require the ECM software to decompress the images, because there isn't a lot of other software that supports the ECM images. (If the ECC data is very time consuming to produce on the fly (which could be figured out by uncompressing some ECM images), virtual machine software might be able to work around that fact by using an optimized optical drive which ignores the missing ECC data.) Since the software is GPL'ed, though, there isn't huge compelling reason why this couldn't be used on a more widespread level.

In times past, there was another noteworthy limit, very much related to the issue just mentioned (of there not being a lot of software that handles these files): There had been no indication on the web site that the source has been compiled for any platform other than Microsoft Windows, or that there are any alternate/derivative versions other than the port for Mac OS X.

Some newer information about that subject, though: ECM may now be a part of a Neill Corlett's Command-Line Pack for which source code is available, and which has versions for DOS (16-bit executables for 8086 compatiability, and 32-bit executables that requires compatibility with an i386), Win32 (more specifically there are versions of this software for the following platforms of operating systems that run 32-bit Microsoft Windows code: 32-bit i386/x86, 32-bit Windows NT 3 or 4 for MIPS, and a 64-bit version for x64), and MAC OS X (32-bit PPC, 32-bit i386, and x64).

Compression of files in the QCOW(2)/QED formats may be covered by compressing images with qemu-img and/or using KVM's software.

[#childdsk]: Child images (a.k.a. “snapshot” images)

Does the software support features such as child/snapshot images?

Permanent path reference

At least some snapshot files (QCOW(2)) have been known to have some hard-coded paths for pointing to the parent. Those paths may be relative, but they are hard-coded. So, if the parent's location is flexible, make sure that it is in a most desired location which will be good for the long term. This could involve creating some nice symbolic links in the filesystem, and then making sure those symbolic links are referenced when creating the child/snapshot image. (Then, if the location does need to change later, it may simply be a matter of changing the symbolic links.)

The following is simply an example showing this being taken care of. (This isn't necessarily a recommended series of steps, but just an example. Customize before implementing as appropriate.)

[1]user@ttyp1:host:/somespot/cd /srv
[2]user@ttyp1:host:/srv/ls -laF
total ??
drwxr-xr-x   4 root  wheel   512 Jan  9 22:43 ./
drwxr-xr-x  17 root  wheel   512 Dec 20 15:00 ../
drwxr-xr-x   3 root  wheel  4096 Dec 29 01:31 bigspace/
[3]user@ttyp1:host:/srv/sudo ln -s ./bigspace/somepath/vmsys/ ./virtmach
[4]user@ttyp1:host:/srv/ls -laF
total ??
drwxr-xr-x   4 root  wheel   512 Jan  9 22:43 ./
drwxr-xr-x  17 root  wheel   512 Dec 20 15:00 ../
drwxr-xr-x   3 root  wheel  4096 Dec 29 01:31 bigspace/
lrwxr-xr-x   1 root  wheel    17 Jan  9 23:05 virtmach@ -> bigspace/somepath/vmsys
[5]user@ttyp1:host:/srv/cd virtmach/diskimgs/kiddisks/.
[6]user@ttyp1:host:/srv/virtmach/diskimgs/kiddisks/mkdir newvm
[7]user@ttyp1:host:/srv/virtmach/diskimgs/kiddisks/cd newvm
[8]user@ttyp1:host:/srv/virtmach/diskimgs/kiddisks/df -hi
(Output, might be useful later to help estimate progress of written data)
[9]user@ttyp1:host:/srv/virtmach/diskimgs/kiddisks/time qemu-img create -f qcow2 -b /srv/virtmach/baseimgs/openbsd/50/opnbsd50-16gb-cfged-compressed.qc2 ./mynewhdd.qc2

Of particular note: notice that the backing file is referenced with a full path that includes a symlink rather early in the path. Although a relative path involving “../../” would have resulted in a shorter command line, it would have forced the images to have the same relative path to one another. Using the longer absolute path, and involving a symlink high up in that path, was very intentional.

Hopefully that will suffice. If, by chance, the base image ever needs to move, perhaps the referenced directory may be replaced with a symlink that points to the new location.

The precise process to take will vary based on which one of the file formats often used for primary storage devices will be getting used. e.g. details are in the text about image files of primary storage devices: section about creating child images and, if relevant, Using software from Kernel-based Virtual Machine (“KVM”).

“Saved state” snapshots

Does the image store information beyond just the data on the virtualized disk? For example, are contents of main RAM and other device state data stored in a hard drive image file so that a system “saved state” snapshot of the whole system is effectively stored in the image file?

Warning: with multiple implementations, if system state data is stored in a disk image file, the information is intended to be embedded permanently. Upon having the data be restored, the extra contents that were added to the disk image are contents which are not erased, so they continue to use space in the main disk image.

[#vrtdskfm]: Disk image file formats (often used with virtualization software)

(This is largely a list of formats used for fixed disks. Formats for other formats, such as optical drives, may routinely be supported by virtualization software. Details about such formats may be in a section about disk images.)

Some of these formats may often have common disk format features.

Raw format

Sometimes the name of this “raw” format is written in all capital letters, presumably so that “RAW” looks similar in format to the style used by the abbreviations that make up the names of some other formats. Another possible reason for the capitalization was to try to make it look like a filename, back in the days when most filenames in some popular (MS-DOS compatible) operating systems were most commonly written in upper-case.

This format is probably the most universally supported. In some cases, virtual machine software may be able to interact with this format more quickly than many or any other disk image formats.

For simple/early testing or situations involving limited disk space, using a native format (that supports growing, perhaps onto a larger disk that is anticipated to become available in the future) may be desirable. Raw files may use substantially more disk space. There are some other limitations that may exist with RAW files. They do not embed extra metadata to help support features such as parent/child snapshot imaging. Working with a native format may be simpler, and so in some cases the act of supporting a disk image may be simpler and easier when the image is in another format. These limitations, like using up higher amounts of disk space, may or may not make the RAW format to be unacceptable for some.

In Unix, the following may work to create a disk of 2,097,152 sectors of half a kilobyte each, which would be one gigabyte. (Half of a kilobyte is 512 bytes.)

dd if=/dev/zero of=/somedir/output.img bs=512 count=2097152

Although one gigabyte is half of the maximum size of partitions commonly used with MS-DOS and the first release of Windows 95, it may not be large enough to hold minimal installations of newer operating systems, much less trying to also store any added programs and data. That size of one gigabyte, relatively small by more modern standards, was simply an example used as a basis for convenience by those wanting to use easy math to use an exact number of gigabytes. (For a 10 GB image, multiply 10 by the shown example value used for the count of blocks.)

Qemu formats: QCOW, QCOW2, QED
Warning of usage

Around the time of Qemu version 0.8.3, the qcow2 format (which had just been recently supported) had some issues about corrupting data. It is believed, although an exact reference may have been lost, that a post on the Qemu “devel” (development) mailing list commented that only the RAW format was considered to be stable enough for recommending for production use. (Unfortunately, an exact reference to the post may not be currently available.)

The older qcow format may have had less issues; however it also seems to have had support removed by at least some versions/ports of the qemu-img software bundled with Qemu. There can be much to be gained by using a format that supports child imaging and dynamic growth. However, consider an excellent backup system to be even more critical if using these file formats.

Example of a mailing list post about qcow2 corruption where the poster states, “Beware of the new qcow2 disk format, several people have complained of it corrupting disk images on the qemu-devel mailing list”.

[#qmuimgmk]: Using Qemu's qemu-img

These instructions were initially written for QCOW and/or QCOW2. Version 0.14 of Qemu added support for the QED file format.

Follow the instructions from whichever following set(s) of instructions are most appropriate for the desired end goal.

[#qcwbasim]: Making a QCOW(2) “Parent/base image”/“backing file”
qemu-img create -f qcow2 basesys.qc2 16G

Then, a great idea is to make sure the newly created image went to the desired location. (For instance, if making this for a virtual machine, use the location that is mentioned in the documentation for the virtual machine.)

As the qemu-img command takes a parameter of a filesize in various units, and the smallest such unit is kilobytes, it is not possible to directly create an image with an odd number of half-kilobyte sectors. However, if such an image is created in another supported format (such as raw), such an image may be successfully converted by qemu-img into a valid image of the desired size.

[#qcwmkkid]: Making a QCOW(2) snapshot/child image

See some requriements mentioned in the general section about making hard drive images.

Creating a snapshot/child image requires usage of an existing “parent/base image”/“backing file”. Do not try to use a snapshot/child image until there is an existing “parent/base image”/“backing file”.

The precise details for creating a snapshot/child QCOW(2) image depends on which version of qemu-img is being used.

The format for creating a snapshot/child image is similar to creating a new “parent/base image”/“backing file”. However, the size of the output file is not required because that is detected from the “parent/base image”/“backing file” being used. Using the “ -f format-of-old-file” parameter is required to specify the format of the new image. (Specifying the format of the “parent/base image”/“backing file” is probably not required, but doesn't hurt. After all, any file at all could be correctly viewed by auto-detection as potentially being a RAW disk image, whether or not using such a disk image would make much sense.)

With some newer version, the Qemu documentation about using qemu-img is showing a new syntax (in effect by qemu 0.12.0 RC2 documentation, if not earlier). A newer, still untested example would be:

qemu-img create -f qcow2 -o backing_file=base_file output_file

It is still unclear how to use backing_fmt in addition to backing_file (e.g. is -o a comma-separated list, or use multiple -o parameters?)

In some old versions of Qemu (before version 0.10), the syntax was documented to be:

qemu-img create -f qcow2 -b filename_of_base_file output_file
[#qemcmpim]: [#qemcmpim]: Converting/compressing an image with qemu-img

Check the manual page, that is installed on the system, for qemu-img. Or, run “ qemu-img --help | less ” If the help/manual shows the convert command is supported, then using the older syntax might work better (for now, until more details are provided here about the newer syntax).

Verified qemu-img syntax

The following syntax is what has been found to work well with some versions of Qemu (particularly versions before version 0.10). This may not exactly match the documentation that comes with the program, but is what was found to actually end up working (avoiding an error message about a syntax error occurring),

qemu-img convert -c -p -f format_of_base_file -O output_format_of_new_file -o backing_file=parent_image filename_for_base_file filename_for_new_file
Details about -p

If is believed that this option, which causes a display bar to be output, is newer than some of the other options. If using older versions of the software, and if this seems to cause problems, then leave off the -p option.

Details about specifying a backing file

For a hard drive image that is not a child image, just leave off the while “ -o backing_file=parent_image ” part.

For a child image, do specify the backing file that the newly created output file may be using. Like when creating a child image, specifying the file's full path is likely a good idea. To see the path of a parent image of an existing file, use:

qemu-img info -f qcow2 filename_for_base_file

(For many disk formats, the image's format/type can be autodetected, so specifying the format with “-f disktype” is optional.)

Warning: The syntax shown in this documentation did not seem to be sufficiently compressing new data in a child image. The result could be that the output file is actually substantially larger. In that case, the disk savings may not be successfully achieved from running these commands. (If there was another reason for creating an image, such as wanting a copy, then that additional image could be created using the copy command. That may result in another working image that is much smaller than what gets created when trying to save space by using this command to compress the image.)

Because of the issue that is discussed, checking the size of the output file is recommended for all compressed images that are based on a child image.

Using some made-up numbers to describe how this seems to be working: if a parent image is about 900MB, and then it compresses to 250MB, and then a child image is created, and the child image is about 30MB, and the child image is compressed, then the resulting compressed child image might be about 260MB (which is the 250MB plus a compressed version of 30MB). This happens even if the backing file is specified. (It looks like the backing file is then required for the compressed child image, even though it wasn't very usefully used.)

Here is an example sequence of commands to run to create the compressed image:

df -hi
date
time qemu-img convert -c -p -f qcow2 -O qcow2 -o backing_file=/somepath/parent_image.qc2 opnbsd47_base_16gb_big.qc2 \
opnbsd47_base_16gb_compressed.qc2

This alternate syntax was used even though it was clearly known that the man page said to put -O after the first filename. Despite the text in the man page, placing -O (and its subsequent option) before the first filename actually caused qemu-img to work (instead of fail).

For completeness, an example that is consistant with the documented syntax (even if it didn't seem to work as well) would be the following:

qemu-img convert -c -f format_of_base_file filename_for_base_file -O output_format_of_new_file filename_for_new_file

(Previously, there was a note here suggesting that perhaps pv could be used when reading the original image, as a way to try to show a progress report. Perhaps details might be in the page about copy progress in Unix could have had some useful details. However, before getting that to work, reading documention (which may have been updated due to a later version of qemu-img program) revealed that qemu-img now supports a -p parameter.)

Using newer syntax

These are some older notes which may not be accurate, and so may need to be removed.

The Qemu documentation about using qemu-img is showing the syntax. It might be:

qemu-img -f format_of_new_file -o backing_file=filename_of_older_file filename_of_new_file

(At the time of this writing, this is still untested by the author of this text. Be sure to verify the impact before making assumptions. For instance, before appreciating saved disk space and removing the image, make sure that this doesn't create a snapshot that requires access the original image.)

Once the disk compression is started, there may be a lengthy time for the compressing to complete. Somewhat typically, this timeframe might be several minutes or perhaps more like half an hour. (This is for a typical operating system installation: More data may take notably longer, as could other factors such as overburdened system running on slow hardware.) Of course, this will be dependent on some things: Hardware will likely speed up over time, so it may seem like that estimate is likely to be excessively long as hardware speeds up, but also some operating systems may also grow substantially over time and that may provide some offset to the impact of the speed increase over time that is provided by faster hardware.

Compression effectiveness

The compression seems to be fairly good. To provide some specific examples: A blank (uncompressed) 16GB disk image was 197,120 bytes. After installing OpenBSD/amd64 5.3 to a disk image, the image was 913,833,984 bytes. Using qemu-img reduced the file to 260,816,896 bytes. Using 7-Zip to optimally compress that compressed file reduced it further to 255,186,777 bytes. Using 7-Zip to compress 913,833,984 hard drive image ended up with a zip file that was 223,886,905 bytes. So compressing with qemu-img only a tad bit over 85% as efficient as Zip, and is the file became about 12.3% less compressable afterwards. One benefit is that qemu-img was notably faster (though at the cost of effectiveness). The huge benefit to using qemu-img is that the compressed file is readily usable (including being updatable, and probably can be read from non-sequentially faster). However, the qemu-img program's compression may well be entirely inferior if the contents of a zip file can be accessed like a file (similar to Microsoft Windows's abilities when using the GUI -- no solution for this for Unix has been explored at the time of this writing) and if the file is not going to be changed (which may be the case for a parent image).

The compressed file will contain all of the data accessible from the uncompressed image. If the uncompressed image being specified is a child/snapshot image, the compressed image will contain (a compressed copy of) all of the data from the parent/“backing file” image. Therefore, if a parent image is being kept, then compressing a child image may result in a much larger image than just leaving the child image uncompressed (and instead retaining the space-saving benefits from utilizing a child image). (Perhaps this would be different if using “ -o backingfile=nameOfParentImage”? If so, then qemu-img is just being silly, because qemu-img is clearly capable of finding the parent image itself, as shown when running “ qemu-img info nameOfChildImage ”.)

[#kvmimgmk]: Using software from Kernel-based Virtual Machine (“KVM”)
The presence of Man page for kvm-img suggests that there is a command called kvm-img, but the main page's contents refer to the documented command as being called qemu-img, so kvm-img probably acts the same as qemu-img. This would be no big surprise to those familiar with KVM and Qemu, since KVM is based on Qemu. So, see the section on making new disk images with qemu-img.
Converting/compressing an image in “Kernel Virtual Machine” (“KVM”)
(The following was moved here; perhaps it is redundant?) The URL of Man page for kvm-img suggests that there is a command called kvm-img, but the main page's contents refer to the documented command as being called qemu-img, so kvm-img probably acts the same as qemu-img. This would be no big surprise to those familiar with KVM and Qemu, since KVM is based on Qemu. (For details on using qemu-img to convert/compress a file, see the section for Qemu.)
[#vmdk]: VMDK
See: File format: VMDK files for basic information. (Usage examples may be added here, but is not currently provided by this documentation.)
[#vhd]: Virtual Hard Disk (“VHD”)
Specifications, free licensing

VHD file format says “As of Tuesday, October 17th 2006, Microsoft is providing access to the VHD Image Format Specification Document as a part of the Open Specification Promise (OSP).” Wikipedia's page on the VHD file format says “The format was created by Connectix”.

Additional name

Qemu and its related qemu-img program (and similar derived programs: KVM and kvm-img) may refer to this format as “vpc”. Qemu documentation section 3.6.4: qemu-img Invocation describes the “vpc” format as “VirtualPC compatible image format (VHD).” (It is not clear why this software refers to the format as vpc instead of VHD.)

Supporting software
Sysinternals Disk2vhd, ...
Format options/abilities
“Parent/base image”/“backing file”
Initial size of disk
MS KB 825092: hard disk image types that are available when you use the Virtual Disk Wizard notes that Connectix Virtual PC for Windows (versions 4.0 - 4.3 and 5.0 - 5.2) supports a dynamically expanding disk image and a fixed-size disk image. Which option to use is something that may be selected by the “Virtual Disk Wizard”.
Child/snapshot image

It is likely that support for “parent/basing image”/“backing file” and snapshot/child images are supported, since VHD Image Format Specification info page (at TechNet) does refer to “Implementing a Differencing Hard Disk”.

Making a child/snapshot image by using *.vud “undo disk” support in Microsoft Virtual PC 2007

Some of this may still need to be thoroughly tested and this process should be compared to other virtualization software by Connectix/Microsoft.

Petri IT KB's guide to Virtual PC disks also identifies *.vud “undo disks” which can be committed to the disk image that they are based off of. To get a valid and stable *.vud file, Petri's instructions involve writing to the hard drive of the “Parent/base image”/“backing file”: This is necessary to create the “Undo Disk”. However, once the “Undo Disk” is created, then the “Parent/base image”/“backing file” needs to not be modified.

This basically means there is just one opportunity to create an “Undo Disk”. (Once created, that undo disk may be copied, and so it is recommended to copy that file before using it. In the future, a new snapshot/child image may be created by simply copying the initially created “Undo Disk”.

The following instructions involve using the graphical program.

  • Enable “Undo Disks” support in the settings for the “virtual machine configuration” (which is stored in a *.vmc file.)
  • Run the virtual machine and close the virtual machine. This can be done by telling the virtual machine software to close the virtual machine (by pressing the “Close” button, the “X” button in the upper-right corner.) This can probably also be done by having the virtual machine turn itself off, such as having it “Shut down” or “Turn Off” with MS Windows, at least if the virtual machine is configured to support this. If so, closing the virtual machine that way is probably more likely to result in a cleaner state for the file system when the “Undo disk” is created. That would be preferred so that disk checking/repairing utilities aren't needing to be run every time a new snapshot/child image starts being used.
  • When the virtual machine closes, a “Close” dialog box presents multiple options. In the drop down box, make sure “Turn off and save changes” is selected. Also, ensure that “Commit changes to the virtual hard disk” is unchecked.

Following the above steps results in a *.vmc file. Petri's guide sensibly recommends taking the following steps:

  • Make sure that the “Parent/base image”/“backing file” is never written to again. (This may involve setting some sort of filesystem attributes/permissions.)
  • Petri's guide recommends disabling Undo Disk support in the “Parent/base image”/“backing file”. (It is not particularly understood why this is done considering the next step noted here. Perhaps a change is somehow made to the “Parent/base image”/“backing file”?)
  • Since the “Parent/base image”/“backing file” may never be written to again, chances are that it should not ever be used in a virtual machine again (because most hard drive images write to themselves when they are booted). Therefore, remove from the virtual machine console (graphical interface) any virtual machines which require this hard drive image.

One more wise step to take: BACK UP THE UNDO DISK FILE! It is a *.vud file created in the same location where the *.vmc file is located. (There might be a default location for this? By backing up that undo file before making any further changes, that file may be used to create further virtual machines at a later time. Do so before making *any* changes so that the data in the “undo file” because changes are likely to increase the file size (resulting in more disk space and a higher amount of time anytime that file is handled, such as when it is copied). Save a copy of that file. It may also be desirable to rename the copy of the file from *.vud to *.vhd. The copy should also remain unaltered so that it may most effectively be used in the future. So, changing the filesystem's attributes/permissions for copy of the “undo disk” may be wise.

Finally, copy the saved copy of the “undo disk” to a file that will be used and written to. Make sure the destination filename ends with *.vhd. Have the virtual machine software use that existing *.vhd file as a hard drive image. Make sure that virtual machine configuration has all the desired options: for example, consider whether it is desired to enable “undo disk” support.

[#vhdcmpim]: Compacted VHD (disk) image

Note: At the time of this writing, a suitable test machine was not convenient. So, this information is currently being provided untested.

Here are some directions for Compacting a disk image. Currently, it is not very clear whether this is performing any actual comrpession of data, or if it is performing a far simpler process like eliminating holes. (Eliminating holes can be a great thing to do, and saves some space in an image, but doesn't do much as far as actually compressing any data that is being stored on the disk.)

Checks/preparation/optimization
Disk format

The actual “compact” process might only be available for certain types of drives. In at least some version of Hyper-V, it was known to only be available for the types of drives called “Dynamically expanding virtual hard disk” or, perhaps, “Differencing” disks. (The reason for the weasel word, “might”, is simply that Hyper-V has had substantial design modifications with new versions of the Microsoft Windows Server operating systems. So, it seems conceivable that some new disk formats might become released, and older limitations might end up being removed.)

Disabling Shadow Copies

Naturally, backing up a drive may be wise to do before deleting any data (including Shadow Copy data).

SpiceWorks Community post: “How to Compact Hyper-V VHD”s indicates that Shadow Copies should be disabled. (Directions are provided.) Details, describing why this is desired, were not provided.

Cleaning

Cleaning up known unused areas of a disk may be beneficial. The “Secure Delete” (“SDelete”) program by (Microsoft) Sysinternals can help with that. The program may need to be downloaded from Microsoft: SDelete. If the drive is mounted to D:, then run:

SDelete -c d:

Do that for each drive letter that is related to a partition stored on the drive that is going to be compacted.

This is totally speculation: this might be good to do before defragging. This : perhaps SDelete -c can more thoroughly identify some unnecessary data to remove before part of that data might be overwritten by a major process such as Defrag? (Or maybe not? Maybe in all cases it just looks for where there are unused bytes, and efficiently zeroes out absolutely all unused bytes?)

Defrag

First analyze the drive, in order to get an incidation of how much defragging may be needed. See: defragmenting.

Re-cleaning
Speculation: the SDelete -c might provide additional effectiveness if it is run again after a defragmenation?
Performing the compacting

A full set of tested directions is not yet available here, but here are some pointers that are expected to be very helpful.

Compact VHD Sysinternals

Virtuatopia guide that mentions compacting (via GUI)

Virtual PC Guy article showing how compacting may be scripted

This uses WMI, so using WMIC might be possible. However, a brief review indictes the WMIC command line will likely be fairly complicated, as the VBS version of the script code is storing results in objects. Doing the equivilent with WMIC may require passing return results as parameters for other function calls? (Complexity is anticipated.) Furthermore, a review indicates that perhaps there should basically be 3 WMIC commands. One to initiate the process. One to check the status of the process (to see if it is complete). And perhaps a different one to get a PercentComplete report (under the assumption that it is not done yet).

Follow-up

If Shadow Copies were disabled, remember to re-enable.

VHDX

Taylor Brown's Blog: Expanding Differencing Virtual Hard Disks noted, “With the release of Windows Server 2012 we introduced a new virtual hard disk format known as VHDX. This new format provided the foundation for a number of features and performance enhancements. One such feature is the ability to independently expand differencing virtual hard disks without impacting the base or parent hard disk or any siblings.”

CHD (“Compressed Hunks of Data”)
MAME may supply chdman. MESS also uses this. There have been multiple versions of the CHD format (as noted by MAME 0.130u1 whatsnew). Unlike several of the other hard drive formats listed here, the CHD format is also commonly used for CD images. Guide to backing up hard drives, MESS “how to” (episode II: Apple Macintosh Plus) (information at etabeta's playground)
OVF (Open Virtualization Format)
Specifications
...
Info

Wikipedia's entry on the Open Virtualization Format has some information.

IBM's Open-OVF project on SourceForge is sparse, but an announcement about Open-OVF refers to OVF Proposal from a Xen summit and a (no longer existant location for a) subsection of the (still existant) source code repository for Open-OVF.

Supported
Supported by VirtualBox, VMWare, and more. Forum post about whether Qemu supports OVF indicates Qemu does not, but also points out that “virt-v2v supports ovf just fine”. Blog on OVF discusses using virt-convert.
OVA Package
Wikipedia's entry on the Open Virtualization Format identifies an OVA package as “a TAR file with the OVF directory inside.”
Windows Image (*.wim)

This section does not (yet?) have a clear and concise how-to. Here are some pointers: See: Windows Image (WIM) File System Filter (Wimfltr.sys and possibly Documentation about ImageX, TechNet documentation: How ImageX works, TechNet: Mount and Modify an Image

[#mkbtimg]: Creating images of existing (bootable) media

The virtualization software may be able to use a drive in the “host machine” (which is the machine running the virtualization software). However, with removable media, this means that whenever the software in the virtual machine needs a certain disc in the drive, there are two drives that need to have that disc: The virtual machine's drive and also the drive in the host machine. If disk space is plentiful on the host machine, it may be more convenient to copy any removable media to one or more files on the host machine.

There is various software to perform this. making a data image of an optical disc.

Creating an image in Unix

Identify the device that has the removable media. If it is a CD, one item that may help identify the device to read from is to run: ls -l /dev/cd*

One may be able to capture a disk using dd. For many drives, a proper block size to use is 512 bytes. However, for CD drives, a value of 2048 may work better. An example command line for use in OpenBSD follows:

dd if=/dev/cd0c of=/srv/bigspace/opsys/openbsd/latest/opnbsdCD.iso bs=2048

(In the above example, it may make more sense to replace the text “CD” with a reference to the version number of the operating system being used.)

Some other software may be able to create an image from either removable media connected to the computer or from a collection of files. Examples include readcd from cdrtools/cdrecord, or perhaps cdparanoia, cdio, or the dvd+rw-tools package's growisofs command.

Another option may be to make a new disc image using software designed to create disk images from files. This could end up with a disc image that doesn't match an original disc, but may work suitably well. Examples of such software may include mkisofs and mkhybrid. (The mkhybrid has been known to be included with some operating systems, including OpenBSD which includes mkhybrid (but not mkisofs) in the default operating system (with no need to add additional packages). Such code included with an operating system may be a simplified and/or older version of software, even with a new release of the operating system. (Newer code might be available by downloading a package containing a newer version.) Some details that may help: Man page for OpenBSD's mkhybrid.

Copying from a drive with physical problems

There are some reports that the dd command's command line options of conv=noerror,sync may help to copy a drive with physical problems. (e.g. of such reports existing: DD_Rescue recovery scenario, MacWorld article on recoverying a dead hard drive using dd.)

dd_rescue is similar to, but not identical to, dd. Some of the differences may be paritcularly helpful if dealing with source media with physical hardware problems. (Hopefully the destination media being used doesn't have known physical hardware problems.) There may be a program called dd_rhelp to serve as a wrapper. (Brief review at: dd_rhelp guide.)

GNU ddrescue may also be an option.

Other specialized tools

These may or may not be particularly useful options. (Further research may be required to properly assess how useful some of this software is.)

Ubuntu's documentation on JeOS and ubuntu-vm-builder describes some software. Documentation at https://help.ubuntu.com/9.04/serverguide/C/jeos-and-vmbuilder.html (from Ubuntu 9.04 documentation) has been removed; an updated version in the /11.04 subdirectory has existed. JeOSVMBuilder may be a rather equivilent page, at a URL that may stick around longer.

Peter Eriksson's Disk-to-Disk Copying Tool has ANSI C source code available.

Creating an image in DOS
See the “Disk image creation” section of TOOGAM's Archiver page.