Thursday, May 19, 2016

VMware vSphere SDRS - test plan of SDRS initial placement

VMware vSphere Storage DRS (aka SDRS) stands for Storage Distributed Resource Scheduler. It continuously balances storage space usage and storage I/O load while avoiding resource bottlenecks to meet application service levels.

Lab environment:
5x10GB Datastores formed into Datastore Cluster with SDRS enabled.
It is configured to balance based on storage space usage and also I/O load.

  • Storage Space threshold is 1 GB
  • I/O latency threshold is kept on default 15 ms.
  • Each "empty" datastore has real capacity 9.75 GB where real free capacity is 8.89 GB because 882 MB is used. 

You can see configuration details on screenshot below.  


Capacity of one particular 10 GB datastore is depicted below.
Used space (882 MB) is occupied by following system files (.sf) ...


In VMFS 5 every datastore gets its own hidden files to save the file-system structure.

Test 1

Test description: Does SDRS Initial Placement algorithm take into account VM swap file capacity?

Test prerequisites:
  • All 5 datastores in datastore cluster are empty
  • That's mean that each datastore has free capacity 8.89 GB
  • Provisioned VM doesn't have any RAM reservation
Test steps:
  • Deploy Virtual Machine with 4 GB RAM and 8 GB Disk manually (through Web Client)
  • Start deployed Virtual Machine
  • Observe behavior 
Test expectations:

  • I want to test if swap file is considered during SDRS initial placement
  • We have only 8.89 GB free space on datatastores therefore if VM swap file is considered new VM with 8GB disk and 4 GB RAM wont be provisioned because we would need 12 GB space on some datastore which is not our case.
  • In other words, if provisioning fails then we will proof that SDRS doesn't take VM swap file into account.

Test screenshots:

Deployed Virtual Machine.
VM PowerOn Failure 
Test Result:

  • Virtual machine was successfully provisioned and 8GB was decreased from Datastore 5 available space.
  • Virtual machine power on action failed because of not enough storage space for 4 GB swap file. This is expected behavior in case that SDRS doesn't take VM swap into account.

Test Summary:

  • We have tested that SDRS Initial Placement algorithm does NOT take VM swap file capacity into account.
  • Virtual Machine memory (RAM) reservation would have impact on such test because if VM has for example 100% memory reservation it doesn't need any disk space for VM swap.

Test 2

Test description: How SDRS defragmentation is efficient when datastore cluster is running out of storage space?

Test prerequisites:
  • 4 datastores in datastore cluster are almost full
  • 1 datastore (Datastore4) has 7.58 GB free space
  • In one datastore (Datastore5) we have virtual machine (test1_big) having 8GB disk
Test steps:
  • Clone Virtual Machine (test1_big) to datstore cluster (through Web Client)
  • Observe behavior 
Test expectations:
  • SDRS will free up Datastore4 to have enough space for clone of virtual machine (test1_big)
  • Provisioning of virtual machine clone will be successful 

Test screenshots:
Before SDRS defragmentation
After SDRS defragmentation and clone provisioning

Test Result:
  • SDRS freed up Datastore4 as expected
  • Provisioning of virtual machine clone FAILED because of insufficient disk space on Datastore4. 
  • That's unexpected behavior because Datastore4 is empty (thanks to SDRS defragmentation) and another machine with same configuration was successfully provisioned on Datastore5.
Test Summary:

  • SDRS successfully freed up the only datastore where virtual machine clone can be placed but VM clone deployment started before storage vMotion finished therefore clone provisioning failed.
  • SDRS defragmentation works but there can be some cases when initial placement fails even the storage was freed up and there will be free continuous space in some datastore after defragmentation.
  • It is important to understand how VM provisioning to datastore cluster really works. Datastore Cluster is nothing else then the group of single datastores where SDRS is "just" a scheduler on top of Datastore Cluster. You can imagine a scheduler as a placement engine which prepare placement recommendations for initial placement and continuous balancing. That means that other software component (C# Client, Web Client, PowerCLI, vRealize Automation, vCloud Director, etc) is responsible for initial placement provisioning and SDRS give them recommendations where is the best place to put a new storage objects (vmdk file or VM config file).
  • In other words, Initial VM provisioning doesn’t have nothing to do with SDRS initial placement. VM initial provisioning process is managed by vSphere Client, vRA, vRO, PowerCLI or other software component over vSphere API. SDRS is just a placement engine gives recommendation where is the best place at the moment when is asked for recommendations. Provisioning process selects one particular SDRS recommendation and continue with provisioning (API method ApplyStorageDrsRecommendation_Task). However, in the mean time there can be some other software doing VM provisioning and selected datastore can be filled by somebody else. There is always some probability for vm provisioning failure and it is exactly where good vSphere / Storage design has crucial role to decrease probability of provisioning failure. 

Test 3

Test description: How is SDRS initial placement balancing among different datastores?

Test prerequisites:
  • Storage Space threshold is 1 GB
  • I/O latency threshold is kept on default 15 ms.
  • Each "empty" datastore has real capacity 9.75 GB where real free capacity is 8.89 GB because 882 MB is used. 
  • Usage of PowerCLI script to provision multiple VMs. PowerCLI script is available here.
Test steps:
  • Run PowerCLI script to generate 50 virtual machines with following specification (1 vCPU, 512 MB RAM, 1GB Disk - thick) in to datastore cluster with SDRS enabled.
  • Observe behavior
Test expectations:
  • We have datastore cluster with 5 datastores each having 8.89 GB (9,103 MB) available storage.
  • We are deploying VMs with 1000 MB each.
  • It is deployed in not power on state - so swap file doesn't need to be considered. 
  • Therefore we would expect to end up with 45 VMs balanced in round robin fashion across 5 datastores.  
Test screenshots:
Single datastore capacity
PowerCLI Automated Provisioning.
Datastore free space after automatic sequential provisioning

Test Result:
  • 45 VMs was successfully provisioned and 46th-50th VM failed because of "Insufficient disk space on datastore 'Datastore1'." This was expected behavior.
  • Following VMs are provisioned on datastores
  • Datastore 1: TEST-05, TEST-06, TEST-15, TEST-20, TEST-21, TEST-30, TEST-31, TEST-40, TEST-45 
  • Datastore 2: TEST-04, TEST-10, TEST-11, TEST-19, TEST-25, TEST-26, TEST-35, TEST-36, TEST-44
  • Datastore 3: TEST-03, TEST-09, TEST-14, TEST-18, TEST-24, TEST-29, TEST-34, TEST-39, TEST-43
  • Datastore 4: TEST-02, TEST-08, TEST-13, TEST-17, TEST-23, TEST-28, TEST-33, TEST-38, TEST-42
  • Datastore 5: TEST-01, TEST-07, TEST-12, TEST-16, TEST-22, TEST-27, TEST-32, TEST-37, TEST-41
Test Summary: Test passed as expected. Only few details are worth to mention.
  • I would expect VMs evenly distributed across datastores.  Recall that we are using artificial sequence provisioning of 1GB vDisks per VM. I would expect VMs TEST-01, TEST-06, TEST-11, TEST-16, TEST-21, TEST-26, TEST-31, TEST-36, TEST-41 on Datastore 5. And similar VM numbering on other datastores. But at the end of the day it doesn't seems to be a big deal.
  • Please, note that I observed that different provisioning runs can end-up with slightly different machine placement. I have suspicious that it is because other factors (I/O load, storage usage trend) then are also considered in SDRS algorithm.
  • Datastore free space is 103 MB on all datastores. Recall that we have Storage Space threshold set to 1 GB. That's expected behavior. Storage Space threshold is just a threshold (soft limit) used by SDRS for balancing and defragment. 
Test 4

Test description: Will be new VM provisioned to the datastore with the biggest frees space?

Test prerequisites:
  • Storage Space threshold is 1 GB
  • I/O latency threshold is kept on default 15 ms.
  • One datastore (Datastore1) is "empty" has real capacity 9.5 GB where real free capacity is 8.64 GB because 882 MB is used. 
  • One datastore (Datastore2) has 5.71 GB free capacity.
  • All other datastores (Datastore3, Datastore4, Datastore5) are almost full having only 848 MB empty.
Test steps:
  • Usage of vSphere Web Client to provision one VM with 2GB disk into Datastore Cluster.
  • Observe behavior. We are interested where new VM will be placed.
Test expectations:
  • We expect that new VM disk will be placed on Datastore1 because there is the biggest free (available) space.

Test screenshots:
Datastore cluster capacity before VM provisioning.
Datastore cluster capacity after VM provisioning.
Test Result:
  • New virtual machine was provisioned into Datastore1 where was the bigest available storage capacity.
Test Summary: Initial placement behaves as expected. New VM is placed to the datastore with less used space. However, we should be aware that this test was done just for single VM provisioning. Multiple VM provisioning can behaves differently because of other SDRS calculation factors (I/O load, capacity usage trend) and also because of particular provisioning workflow and exact timing when SDRS recommendation is called and when datastore space is really consumed for next SDRS recommendations.

Next steps

See blog post "Storage DRS Design Considerations".

And as always, any comment is appreciated.

No comments: