Thursday, December 21, 2017

SDRS Initial Placement - interim storage lease between recommendation and provisioning

Every day we learn something new. In the past, I blogged about SDRS behavior on these blog posts


Recently (a few months ago),  I have been informed about interesting SDRS behavior which is not exposed through standard GUI nor advanced settings but available through API. Such functionality was not very well known even within VMware so I have decided to blog about it.

Long story short ...

vSphere API Call for SDRS Initial Placement can lease recommended storage resource for some time. 

What does it mean? Just after recommendations, SDRS can lease the storage space on recommended datastores to have an interim reservation for somebody who is, most probably, going to do provisioning. By default, SDRS do not lease storage space on recommended datastores, therefore, you can observe provisioning failures in some situations. I have simulated such situation in Test #3 of test plan available here. Such situations are not very common when you do manual provisioning but there is higher probability when automated provisioning is in use so you can experience such issues on environments with VMware vRealize Automation (vRA) or vCloud Director (vCD).

And now the secret I did not know ... SDRS has the solution for such issues since vSphere 5.1. When somebody (vRA, vCD, anybody else who wants to deploy VM) asked for SDRS recommendation via API call, that API call can include a specific parameter (resourceLeaseDurationSec) which will instruct vSphere to block the recommended storage space on datastores only for provisioning of that specific SDRS recommendation. It's worth to mention, the resource leasing is released immediately after provisioning, therefore the time defined in resourceLeaseDurationSec is actually the maximum reservation time of the resource just in case somebody who wanted to do provisioning change his mind and decided to not deploy VM. This is to avoid unnecessary storage space reservations.
 
If you want to know details check API documentation. Here is what is written in vSphere API documentation about placeSpec.resourceLeaseDurationSecResource lease duration in seconds. If the duration is within bounds, Storage DRS will hold onto resources needed for applying recommendations generated as part of that call. Only initial placement recommendations generated by storage DRS can reserve resources this way.

Parameter resourceLeaseDurationSec is used in StoragePlacementSpec which encapsulates all of the inputs passed to the VcStorageResourceManager method recommendDatastoresVcStoragePlacementSpec is documented here.

So that sounds good, right? Well, there is one issue with this approach. SDRS can give provisioning application more recommendations (multiple datastores) which would lead to blocking more storage space than is really needed. VMware engineering is aware of this issue and at the moment works at least with vRA BU to solve it. As far as I know, the final solution will be a special SDRS setting to return a single recommendation. However, this is planned as a specific integration optimization between SDRS and vRA provisioning. 

UPDATE: details about special SDRS setting is described at https://www.vcdx200.com/2018/06/undocumented-sdrs-advanced-options.html

The challenge with VRA storage (vSphere vDisk) provisioning and SDRS is depicted in the figure bellow


To be honest, there is another design consideration and potential risk associated with this solution. If resourceLeaseDurationSec is used and an external application (VRA, vCloud Director, or other) is using it incorrectly, it can eventually block a storage space in the Datastore cluster and cause Deny of Service (DoS). Incorrect usage would be to leverage SDRS for recommendations that would block the recommended storage space but do not provision anything, therefore storage would be in block state for some defined time and not available for other provisionings until the lease expires.

No comments: