Sunday, December 22, 2019

How to remove VMFS datastore and reuse local disks for vSAN

I'm upgrading the hardware in my home lab to to leverage vSAN. I have 4x Dell PowerEdge R620, each having 2x 500 GB SATA disks but no SSD for cache disks. The cost is always the constraint for any home lab but I've recently found the M.2 NVMe PCI-e adapter for M.2 NVMe SSD in my local computer shop. The total cost of 1x M.2 NVMe PCI-e adapter + 1x M.2 NVMe 512 GB SSD is just $100.




Such hardware upgrade for only $400 would allow me to have vSAN datastore with almost 4 TB raw space because I would have 4-node HYBRID vSAN where each node has 1x NVMe disk as a cache disk and 2x 500 GB SATA disks as capacity disks. The vSAN raw space will be probably 4TB - 10% after disks format but 3.6 TB raw space and 2 TB usable space after decreasing 25% slack space and an additional 25% for RAID 5 protection is still a pretty good deal.

The issue I'm describing in this blog post usually happens in environments where you use local disks as backing storage for local VMFS datastores. Local VMFS datastores work perfectly fine until you would like to remove VMFS datastore and reuse these local disks for example for vSAN. That was exactly my case in my home lab where I have four ESXi hosts each with 2x 500 GB SATA disks having local VMFS datastore on two disks in each ESXi host.

When I tried to remove local datastore (ESX22-Local-SATA-01) it fails with the following error message:

The resource 'Datastore Name: ESX22-Local-SATA-01 VMFS uuid: 5c969e10-1d37088c-3a57-90b11c142bbc' is in use.




Why is the datastore in use? Well, it can be from several reasons. All these reasons are very well described back in 2014 on Virten blog post "Cannot remove datastore * because file system is busy."

Here is Virten's LUN removal checklist:
  • No virtual machine, template, snapshot or CD/DVD image resides on the datastore
  • The datastore is not part of a Datastore Cluster
  • Storage I/O Control is disabled for the datastore
  • The datastore is not used for vSphere HA heartbeat
  • The LUN is not used as an RDM
  • The Datastore is not used as a scratch location
  • The Datastore is not used as VMkernel Dump file location (/vmkdump/)
  • The Datastore is not used as active vsantraced location (/vsantrace/)
  • The Datastore is not used as Scratch location
  • The Datastore is not used to store VM swap files.
The root cause of my issue was the usage of "scratch location". I was blogging about this topic back in 2012 here "Set the Scratch Partition from the vSphere Client".

When you have another datastore available on ESXi host, the solution is very easy. You can simply change "the scratch location". It is much more tricky, in case you do not have any alternative datastore. Fortunately enough, in my home lab, I have three Synology NAS boxes leveraged as shared datastores over NFS and iSCSI, so the fix was quick. If you would need to do it for more then few ESXi hosts, PowerCLI script can be handy.

In case, you do not have any other datastore and you need to remove VMFS datastore you have two options

  1. Reboot the computer to some alternative system (linux, FreeBSD, etc.) and destroy MBR or GPT partition on a particular disk device. Something like gpart destroy -F /dev/ad0 in FreeBSD.
  2.  Physically remove the disk from your computer and when you boot it up VMware should automatically default back to temp scratch location (assuming you don't have any other available datastores on that box). You can then reinsert the disk and correctly remove Datastore from the ESXi host.

No comments: