Understanding Advanced Snapshot Management

Deleting virtual machine snapshots without wasting disk space

Before using snapshots on your VM, analyzing free disk space on the VMFS volume is very important.  As a best practice or thumb rule you should have least 20% of the virtual machine’s total disk size as free disk space before using snapshots. But this amount can vary depending upon the type of server or how long you will keep the snapshots or if you are planning to use multiple snapshots.

If you are planning to use snapshots on servers like database servers or file servers the amount of free space that should be present on underlying datastore or VMFS volume will change drastically as comparison to using snapshots on servers like web servers or say DNS server because the amount of data written on disks in case of file or database server is much more than any other type of servers.

More importantly if you are planning to include the memory state of the VM’s with snapshots, you’ll also need to allow for extra disk space equal to amount of RAM assigned to the VM.

VM’s with only one snapshot requires no extra disk space when deleting, or committing the snapshots. An extra helper delta file also is created at the time of deleting the snapshots. This helper delta file contains any changes that are made to the VM’s disk while the snapshot is deleted. The size of the helper delta file varies and it’s based on how long the snapshot takes to delete. But in general this file is small in size as most snapshots are deleted in less than an hour.

The amount of extra disk space that is required while deleting multiple snapshot depends on the vSphere version in use because of the way they are merged into the original disk file. The process for deleting multiple snapshots has changed across vSphere versions.

In vSphere 4.0 versions and VMware Infrastructure 3, if a VM has 3 active snapshots and delete operation is performed then the following process occurs:

Snapshot 3 is copied to Snapshot 2, which is then copied to Snapshot 1. Next, Snapshot 1 is copied to the original disk file, and the helper snapshot is copied to the original disk file, as outlined below.

deletesnaphots1

Graphic Thanks to searchvmware.techtarget.com

This process requires extra disk space because each snapshot grows as the previous snapshot is added to it. If there isn’t sufficient free disk space on the data store, the snapshots cannot be committed.

In later vSphere 4.0 versions and vSphere 4.1, each snapshot is merged directly into the original disk, instead of merging with the previous snapshot. The figure below explains what happens when a VM has 3 snapshots active and you deleted them.

deletesnaphots2

Graphic Thanks to searchvmware.techtraget.com

Because each snapshot is directly merged into the original one at a time, no extra disk space is needed, except for the helper file.

Eric Siebert has mentioned one very good word of caution regarding snapshot operation on searchvmware.techtarget.com which is as follows:

Don’t run a Windows disk defragmentation while the VM has a snapshot running. Defragment operations change many disk blocks and can cause very rapid growth of snapshot files

How long does it take to delete a snapshot?

When deleting snapshots through the vSphere Client, the task status bar can be misleading. Generally, the task status jumps to 95% complete fairly quickly, but you’ll notice it will stay at 95% without changing until the entire commit process is completed. vCenter Server has a default 15-minute timeout for all tasks, which can be increased. Thus, even though your files are still committing, vCenter Server will report that the operation has timed out.

One simple method for finding out when a task completes is to look at the VM’s directory using the Datastore Browser in the vSphere Client. When the delta files disappear you know that the snapshot deletion has completed.

There is also command-line method available in ESXi that you can use to monitor the status of snapshot deletions. It is explained in this VMware KB article

Snapshots that have been active for a very long time becomes extremely large in size and can take a very long time to commit when deleted. The amount of time the snapshot takes to commit varies depending on the VM’s activity level; it will commit faster if it is powered off. The amount of activity your host’s disk subsystem is engaging also affects the time the snapshot takes to commit.

A 100 GB snapshot can take hours to merge into the original disk, which can affect VM and host performance. For this reason you should limit the length of time you keep snapshots and delete them as soon as you no longer need them.

Effect of Snapshots and metadata locks on host performance

Snapshots have a negative impact on the performance of your host and virtual machines in several ways.

When the snapshot is taken for the first time activities on the VM activity are paused briefly. Even you will experience a few ping timeouts on your VM when snapshot creation is in progress. Also, creating a snapshot causes metadata updates, which can cause SCSI reservations conflicts that briefly lock your LUN. As a result, the LUN will be available exclusively to a single host for a brief period of time.

When a VM has an active snapshot, the performance of the VM is degraded because of the fact that the host writes to delta files differently and less efficiently than it does to standard VMDK files.

Also, as the delta file grows by each 16MB increment it will cause another metadata lock. This can affect your VMs and hosts. How big an impact on performance this will have varies based on how busy your VM and hosts are.

deletesnapshot3

Deleting/committing a snapshot also creates a metadata lock. In addition, the snapshot you are deleting can create greatly reduced performance on its VM while the delta files are being committed; this will be more noticeable if the VM is very busy. To avoid this problem, it’s better to delete large/numerous snapshots during off-peak hours when the Esxi host is less busy.

Snapshot Best Practices

There are certain things which should be kept in mind while using snapshots. These are discussed as below:

Never expand a disk file with a snapshot running

You should never expand a virtual disk while snapshots are active. You can expand disks using the vmkfstools –X command or the vSphere Client.

In VI3, if you expand a disk using the VI Client, it reports that the task completes successfully. But it won’t actually expand the disk file. And if the virtual disk is expanded with vmkfstools command while a snapshot is active, the VM will no longer start, and you will receive an error:

” Cannot open the disk “.vmdk” or one of the snapshot disks it depends on. Reason: The parent virtual disk has been modified since the child was created”

In later version of vSphere, it is not possible to expand a VM’s virtual disk while a snapshot is running. Also vmkfstools command fails with an error:

” Failed to extend the disk. Failed to lock the file”

The option to resize the disk of VM (select VM disk in edit settings) is grayed out in vSphere Client when a snapshot is running . But once the snapshot is deleted, you can resize the virtual disk.

If a VM has a RDM disk attached, the disk size is managed by the physical storage system and not by vSphere. As a result, you can increase the disk size of an RDM disk while snapshots are active.

Caution: But this action can corrupt the RDM disk, so always ensure that you delete snapshots before increasing the size of an RDM disk.

Excluding virtual disks from using snapshots

If a VM has more than one disk then it is possible to exclude a disk from being included in a snapshot. For this you have to edit the VM’s settings and change the disk mode to Independent (make sure you select Persistent). The independent setting provides you the means to control how each disk functions independently, there is no difference to the disk file or structure. Once a disk is Independent it will not be included in any snapshots.

Note: You will not be able to include memory snapshots on a VM that has independent disks. This is done to protect the independent disk in case you revert back to a previous snapshot with a memory state that may have an application running which was writing to the independent disk. Since the independent disk is not reverted when the other disks are it could potential corrupt data on it.

For VMs that have RDM disks, if the RDM was configured in physical compatibility mode, it will not be included in any VM snapshots. But if the RDM was configured in virtual compatibility mode, it will be included in snapshots.

6 Comments