Sunday, May 14, 2017

VM Snapshots Deep-Dive

Author: Stan Jurena

A while ago I received interesting question regarding snapshot consolidation from one of my customers and as I was not 100% sure about the particular details (file naming, consolidation, pointers, etc.) I went to do some testing in a lab. The scenario was pretty simple; create a virtual machine with non-linear snapshot tree and start removing the snapshots.

Lessons learned: When doing such tests, it is always good to add some files or something a bit more sizable into the each snapshot. My initial work started with just creating the folders named snap[1-7] which during consolidation was really not helpful identifying where the data from snapshot actually went.

The non-linear snapshot tree I mentioned earlier looks like this:


First confusion which was sort of most important and took me a while to turn my brain around was the file naming convention. More or less file SnapTest-flat.vmdk is a main data file of the Server, in this case C: drive of the Microsoft Windows server with size around 26GB. This file is not visible in Web Client as only the descriptor <VM name>.vmdk (in our case SnapTest.vmdk) is directly visible. When you will create a first snapshot this is a file which is being used by it as you can see in the following image:


Command grep -E 'displayName|fileName' SnapTest.vmsd is listing all lines containing displayName and/or fileName from the file SnapTest.vmsd. Going through the vSphere documentation you will find:
A .vmsd file that contains the virtual machine's snapshot information and is the primary source of information for the Snapshot Manager. This file contains line entries, which define the relationships between snapshots and between child disks for each snapshot.

With that being said above output of the command is listing our predefined snapshot names (I used the number of the snapshot and the size of the file I've added) and its respected file. So first created snapshot is named Snap1+342MB and using file SnapTest.vmdk.


Using the 2nd useful command during this test grep parentFileNameHint SnapTest-00000[0-9].vmdk is going through all the snapshot files and listing parentFileNameHint. As you probably guessed it, it is a snapshot it is depending on (parent file).


List of tests I performed:
1) Remove Snapshot 5 (Snap5+366MB)
2) Remove Snapshot 4 (Snap4+356MB)
3) Remove Snapshot 3 (Snap3+337MB)
4) Remove Snapshot 2 (Snap2+348MB)
5) Move Here You Are
6) Remove Snapshot 6 (Snap6+168MB)
7) Remove Snapshot 7 (Snap7+348MB)

Now In more details per every case.

1) Remove Snapshot 5 (Snap5+366MB)
Result can be seen in this visualisation. After removing the Snapshot 5 within the Web Client, Snapshot 6 and Snapshot 5 vmdk files were consolidated, size updated accordingly same as the snapshot's vmdk file.


As for the fist example I will add also the command exports here for illustration. Following scenarios should be understandable even without such.



2) Remove Snapshot 4 (Snap4+356MB)
I did this test just to proof myself the proper functionality, so it is very similar to the previous part.

3) Remove Snapshot 3 (Snap3+337MB)
Now with removing Snapshot 3, things are becoming a bit more challenging. On snapshot 3 are currently depending 3 more snapshots (Snap6, Snap7 and You Are Here). As the consolidation in this case would need to be performed with each of them it would be very "costly" operation. The result was that the Snapshot was removed from GUI but the files remained on the disk and all the dependencies were preserved.


4) Remove Snapshot 2 (Snap2+348MB)
Although it might seem complicated on the "paper" the remove process for Snapshot 2 was very similar with every other snapshot removal only in this case Snapshot 2 was consolidated with temporary file preserved from the previous step.


5) Move "Here You Are"
Moving active state over virtual machine named as "Here You Are" is also quite simple operation. I was performing this test more or less to validate, how many snapshots can be dependent on the parent snapshot until the snapshots are consolidated. To spoil the surprise it has to be just one file as in this case on the temporary file are depending only Snapshot 6 and Snapshot 7.


6) Remove Snapshot 6 (Snap6+168MB)
As mentioned in the previous step if there is only one child snapshot to the parent snapshot and the parent snapshot is being removed, data are being consolidated. Otherwise there would be preserved temporary file for child snapshots to work with.


7) Remove Snapshot 7 (Snap7+348MB)The final step was to remove the last Snapshot 7 and be left with just one snapshot Snap1+342MB and the main file. If this file would be removed all the data would be consolidated into the main VMDK and there would be no delta file for "You Are Here" state and therefore no point to get back to.


Overall the work with the snapshots is not a rocket science but my test today showed me a in a bit more detail what is happening in the background with the file names, snapshots IDs in the vmdk files, data consolidation. It also showed that there are temporary parent files left behind if there is more than one direct child snapshot depending on it. It also forced me to refresh the knowledge about the Space Efficient Sparse Virtual Disks (or SE Sparse Disks for short) which was well explained by my colleague Cormac Hogan in late 2012.

No comments: