Virtual Machine Disk Handling Technologies: Compressing a “Virtual Machine” Disk Image

Information about compressing a “virtual machine” disk image:

Overview: Expectations
  • The disk image is in the baseimgs directory.
    • If that is not the case, then these instructions might work with some minor adjustments, but adjustment certainly will be needed.
  • VMDskEnd may be set to point to the full path that the compressed image will have, or it may be blank (in which case a default value will be set)
    • The default value is to use VMDskDir and VMDskSml
      • VMDskSml may be set to point to the filename that the compressed image will have, or it may be blank (in which case a default value will be set)
      • VMDskDir gets overwritten; prior values are not preserved.
  • VMDskBeg may be blank, to use a default value. Otherwise, it should point to an existing file that will be compressed.
Shutdown

Shut down the virtual machine.

Rationale

A rationale for doing this is discussed by the section of the guide about creating disk snapshots. (That guide references this guide, so since you're reading this guide right now, you might have just read that other section. If you got to this guide another way, you can read that section for a rationale.)

Confirm that the system has stopped...

ps -auxww | grep -i ${VMLILNAM} | grep -v grep
Some notes about using ps

Note that although the above syntax will work okay with OpenBSD, ps syntax differences are significant enough that no single syntax is preferred by all the Unix implementations. Users of other operating systems may need to adjust the example shown. (Common modifications may include taking out the hyphen right after the word ps and/or removing some parameters, such as one or both of the w characters.)

Multiple redirections are shown. The final redirection is discussed in the section about “Having grep exclude itself”.

Specify disk image's location

This involves locating the image, and setting a variable. (Setting a variable is not a required approach for disk compression, but is how this guide performs the task.) First, this guide uses these additional variables: virtual machine env var

echo VMDirBas=${VMDirBas} VMGenNam=${VMGenNam} VMLILNAM=${VMLILNAM}
Setting the disk number

For a multi-disk virtual machine, make sure to specify the disk to compress. (The first time that these instructions are followed, use the number 1. Then, to compress the next disk, repeat this entire process, but use the number 2.)

[ "X${VMCrDsNm}" = "X" ] && export VMCrDsNm=1

Set another variable and verify that the file is found:

export VMDskDir=${VMDirBas}/diskimgs/baseimgs/${VMGenNam}/${VMLILNAM}/.
[ "X${VMDskBeg}" = "X" ] && export VMDskBeg=${VMDskDir}/${VMLILNAM}${VMCrDsNm}.qc2
ls -l ${VMDskBeg}
Verification
Unused required

Once the disk image is locatable by using the environment variables, then it does seem like the environment variables are set. Now may be a good opportunity to do a quick check that there does not seem to be a virtual machine that is actively using the file. Note that this next command may need to be customized based on what operating system is being used...

ps -auxww | grep -i ${VMLILNAM} | grep -v grep
Some (repeated) notes about using ps

Note that although the above syntax will work okay with OpenBSD, ps syntax differences are significant enough that no single syntax is preferred by all the Unix implementations. Users of other operating systems may need to adjust the example shown. (Common modifications may include taking out the hyphen right after the word ps and/or removing some parameters, such as one or both of the w characters.)

Multiple redirections are shown. The final redirection is discussed in the section about “Having grep exclude itself”.

File format

The upcoming instructions for compressing the disk will depend on what file format is being used.

In general, the easy way to verify this is to just look at the file's extension. (This is one of the least reliable ways, so this should not be relied upon for verifying the format of a disk image downloaded from an untrusted source. However, if you created the image yourself, and you can trust that you chose a sensible filename, then this ought to work okay.) For example:

  • If the extension is .qc2 or .qcow2, presumably the image is in QCOW2 format
  • If the extension is .qcw or .qcow, presumably the image is in QCOW format (or possibly QCOW2 format)
  • If the extention is .vhd, this is a “virtual hard drive” (presumably Connectix/Microsoft, e.g. Virtual PC / Hyper-V ?)

If you are not wanting to just trust the file extension, you could try a more thorough check. This ought to go pretty fast.

qemu-img info ${VMDskBeg}
qemu-img info ${VMDskBeg} | grep "^file format"
Perform the compression
If using Qemu and Qcow2

Set some more variables. This guide uses a variable named VMDskSml to store the filename of the compressed (and, therefore, probably smaller) file. The following example is designed to not overwrite the VMDskSml variable (because that provided some easier flexibility in making these directions simpler to re-use). To overwrite a variable, either unset it, or just leave off the double ampersand and everything before that part.

[ "X${VMDskSml}" = "X" ] && export VMDskSml=${VMLILNAM}${VMCrDsNm}-initial-changes.qc2
echo ${VMDskSml}
[ "X${VMDskEnd}" = "X" ] && export VMDskEnd=${VMDskDir}/${VMDskSml}
Verifying values

This should exist:

ls -l ${VMDskBeg}

This should NOT exist:

echo ${VMDskEnd}
ls -l ${VMDskEnd}
Modern syntax

Attempt to compress:

df -hi
date
time sudo qemu-img convert -c -p -f qcow2 -O qcow2 ${VMDskBeg} ${VMDskEnd}
If syntax error

Presumably, that will take some time. However, if qemu-img immediately quits with an error message about an invalid syntax, there are some changes that can be made to the syntax.

One approach is to simply drop the -p option (which is not recognized by some older versions of the qemu-img program).

Try this alternate syntax which, through experimentation, was sometimes found to work better than the documented approach. (Presumably this was just an issue with older versions of qemu-img.)

df -hi
date
time sudo qemu-img convert -c -f qcow2 ${VMDskBeg} -O qcow2 ${VMDskEnd}
If using Hyper-V
Images of “Primary Storage” Devices (e.g. Hard Drives), “Compacted VHD (disk) image” section discusses compacting, and mentions the concept of compressing.
Otherwise
If not using Qemu, perhaps: Images of “Primary Storage” devices: making a compressed hard drive image compression of disk images.
Follow-up

After the compression, do the following:

echo ${?}

(The normal “return code”/“error level” is zero.)

Don't wrap up this section quite yet. After the “Checking out results” section, there is a step in the “Protecting the file” section.

Checking out results

If you want to see the whole directory, you may:

ls -ltr ${VMDskDir}

However, you'll probably be most interested in checking out the size of at least these two files:

ls -l ${VMDskBeg}
ls -l ${VMDskEnd}
Understanding results

The compression process did go from ${VMDskBeg} to ${VMDskEnd}.

If there was a “child image” installed, those results might look a little confusing: the compressed disk image may look like it is larger than the uncompressed disk image.

If that did not happen, then the new file is smaller, and everything seems fine. In that case, no further explanation may seem necessary. The result of this section (about “understanding results”) describes why the compressed disk image may seem bigger. (People may find themselves to be far more curious about this if they actually bump into the situation.)

In theory, that is always possible with data compression. (The reason for this can be described by the “pigeon-hole” “counting principle”. TOOGAM's tutorial: Compressing bits may discuss and/or hyperlink to additional information on this topic.) Data compression is not mathematically guaranteed to always succeed at creating smaller data, and can actually result in creating a larger amount of data in some cases). Still, even knowing that, data compression often succeeds (in the primary goal of reducing the data size), so these results may still seem a bit unexpected (based on what seems probable).

However, there is another explanation that actually seems to address the issue more convincingly. The compressed file also contains the compressed contents of the backing file(s).

The compressed image may be smaller than the combination of the uncompressed child image and its backing file. If that happened, then the positive side of this (whether this “positive side” was actually desired, or not) is that the compressed file is no longer required to have the “backing file”. Let's see if that is the case:

qemu-img info ${VMDskBeg} | grep "^backing file: "
echo ${?}
qemu-img info ${VMDskEnd} | grep "^backing file: "
echo ${?}

If the first disk had a “backing file”, and the second didn't, then that is almost certainly the explanation. To see some numbers, you can use:

qemu-img info --backing-chain ${VMDskBeg} | grep "^disk size: "
qemu-img info --backing-chain ${VMDskEnd} | grep "^disk size: "

Add up all of the disk space reported by the first command, and then you may commonly see that the disk space used by the second file has some amount of savings. You can also compare those reports to the sizes of the actual files. In the following example, the end file is looked at first.


ls -l ${VMDskEnd}
ls -l ${VMDskBeg}

qemu-img info --backing-chain ${VMDskBeg}| grep "^backing file: "
echo also look at file sizes of any of those backing files:
for x in $( qemu-img info --backing-chain ${VMDskBeg}| grep "^backing file: " | cut -d : -f 2 ) ; do ls -l $x ; done

When taking that into consideration, the resulting amount of data may appear to be smaller, as expected. To see this, first figure out the size of anything in the backing file chain.

Here are some actual numbers that demonstrate how this has actually worked on some real files. For example, if a backing file is a compressed image which is about 514MB, and the uncompressed child image is about 212MB, and the new compressed image is about 596MB, then that's 130MB that the latest compression saved. (514MB+212MB = 726 MB. 726 MB - 596 MB = 130 MB of savings. Those approximated numbers were pulled from some actual files.) So even though the 596MB is way bigger (on a percentage basis) than 212MB, and 596MB is even bigger than 514MB, 596MB is smaller than the combination of files.

Protecting the file

If all looks well, consider reducing the chances that the file will be accidentally altered. If the image is going to be used as a base/parent image, then we may as well make it “read only” now, while a variable conveniently points to this image. (Doing this now could also prevent some accidents that may be particularly prone to happen during the childing process, like placing the wrong filename in a certain spot.)

To identify this disk image as one that should be “read only”, this next command specifies that all users should not be writing to the file.

sudo chmod a-w ${VMDskEnd}
ls -l ${VMDskDir}

If you're going to repeat these instructions for another disk, clear off the disk number.

unset VMCrDsNm

This guide is intentionally leaving some of the (“temporary”) environment variables set. (Some of them may be useful/convenient at a later time.)