Tuesday, June 15, 2021

vSphere 7 - ESXi boot media partition layout changes

VMware vSphere 7 is the major product release with lot of design and architectural changes. Among these changes, VMware also reviewed and changed the layout of ESXi 7 storage partitions on boot devices. Such change has some design implications which I'm trying to cover in this blog post. 

Note: Please, be aware that almost all information in this blog post are sourced from external resources such as VMware Documentation, VMware KB, VMware blog posts, and also VMware community blog posts.

Let's start with ESXi 7 Storage Requirements

Here is the list of boot device storage requirements from VMware documentation - source [2]:
  • Installing ESXi 7.0 requires a boot device that is a minimum of 8 GB for USB or SD devices, and 32 GB for other device types.
  • Upgrading to ESXi 7.0 requires a boot device that is a minimum of 4 GB. 
  • When booting from a local disk, SAN or iSCSI LUN, a 32 GB disk is required to allow for the creation of system storage volumes, which include a boot partition, boot banks, and a VMFS-L based ESX-OSData volume. 
  • The ESX-OSData volume takes on the role of the legacy /scratch partition, locker partition for VMware Tools, and core dump destination.

Key changes between ESXi 6 and ESXi 7

Here are listed key boot media partitioning changes between ESXi 6 and :
  • larger system boot partition
  • larger boot banks
  • introducing ESX OSData (ROM-data, RAM-data)
    • consolidation of coredump, tools and scratch into a single VMFS-L based ESX-OSData volume
    • coredumps default to a file in ESX-OSData
  • variable partition sizes based on boot media capacity

The biggest change to the partition layout is the consolidation of VMware Tools Locker, Core Dump and Scratch partitions into a new ESX-OSData volume (based on VMFS-L). This new volume can vary in size (up to 138GB). [4]

Official support for specifying the size of ESX-OSData has been added to the release of ESXi 7.0 Update 1c with a new ESXi kernel boot option called systemMediaSize which takes one of four values [4]:

  • min = 25GB
  • small = 55GB
  • default = 138GB (default behavior)
  • max = Consumes all available space

What is ESX OS Data partition?

ESX-OSData is new partition to store ESXi configuration, system state, and system or agent virtual machines. The OSData partition is divided into two sections 

  1. ROM-data
  2. RAM-data

ROM-data is not read/only as a name can implied, but it is a section for data written to the disk infrequently. Example of such data is VMtools ISOs, ESXi configurations, core dumps, etc.

RAM-data is for frequently written data like logs, VMFS global traces, vSAN EPD and traces, and live system state files.

How the partition layout changed? 

Below is depicted partition Lay-out in vSphere 6.x and Consolidated Partition Lay-out in vSphere 7  [1]



Partition size variations

There are various partition sizes based on boot device size. The only fix size is for the system boot partition which is always 100 MB. All other variations are depicted on picture below [1].

Note: If you use USB or SD storage devices, the ESX-OSData partition is created on an additional storage device such as an HDD or SSD. When an additional storage device is not available, ESX-OSData is created on USB or SD devices, but the ESX-OSData partition is used only to store ROM data and RAM-data are stored on a RAM disk. [1]

What design options do I have? 

ESX-OSData is used as the unified location to store Scratch, Core Dump, and ProductLocker data. By default, it is located on boot media partition (ESX-OSData) but there are advanced settings allowing these type of data relocate to external location.

Design Option #1 - Changing ScratchPartition location

In ESXi 7.0, a VMFS-L based ESX-OSData volume (where logs, coredumps and configuration are stored) replaces the traditional scratch partition. During upgrade, the configured scratch partition is converted to ESX-OSData. The settings described in VMware KB 1033696 [7] are still applicable for cases where you want to point the scratch path to another location. It is about ESXi advanced setting ScratchConfig.ConfiguredScratchLocation. I wrote the blog post about changing Scratch Location here.

Design Option #2 - Create a core dump file on a datastore

Core dump location can be also changed. To create a core dump file on a datastore, see the KB article 2077516 [8].

Design Option #3 - Changing ProductLocker location

To change productLocker location form boot media to directory on a datastore, see the VMware KB article 2129825 [10].

Applying all three options above can significantly reduce I/O operations to boot media with less endurance such as USB Flash Disk or SD Card. However, hardware industry improved over the last years and nowadays we have new boot media options such as SATA-DOM, M.2 slots for SSD, or low-cost NVMe (PCI-e SSD).

Note: I have not tested above design options in my lab, therefore, I'm assuming it works as expected based on VMware KBs reffered in each option.

Other known problems you can observe when using USB or SD media

There are other known issues with using USB or SD as a boot media, but some of these issues are already addressed or will be addressed in future patches as USB and SSD media is officially supported.
 
 I'm aware about these issues:
  • ESXi hosts experiences All Paths Down events on USB based SD Cards while using the vmkusb driver [5] [15]
    • Luciano Patrao blogged about this (or similar) issue at [14] and he has found the workaround until the final VMware fix which should be released in ESXi 7.0 U3. The Luciano's workaround is to 
      1. login to ESXi console (SSH or DCUI)
      2. execute command "esxcfg-rescan -d vmhba32" several times until it finishes without an error.
      3. You need to give some minutes between each time you rerun the command. Be patient and try again in 2/5m.
      4. After all, errors are gone and the command finishes without any error, you should see in logs that “mpx.vmhba32:C0:T0:L0” was mounted in rw mode, and you should be able to do some work on the ESXi hosts again.
      5. If you still have some issues, restart the management agents
        • /etc/init.d/hostd restart
        • /etc/init.d/vpxa restart   
      6. After this, you should be able to migrate your VMs to another ESXi host and reboot this one. Until it breaks again in case someone is trying to use VMtools.
  • VMFS-L Locker partition corruption on SD cards in ESXi 7.0 U1 and U2 [6] (should be fixed in future ESXi patch)
  • High frequency of read operations on VMware Tools image may cause SD card corruption [12]
    • This issue has been addressed in ESXi 6.7 U3 - changes were made to reduce the number of read operations being sent to the SD card, an advanced parameter was introduced that allows you to migrate your VMware tools image to ramdisk on boot . This way, the information is read only once from the SD card per boot cycle.
      • However, it seems that problem reoccurred in ESXi 7.x, because ToolsRamdisk option is not available with ESXi 7.0.x releases [13]
    • The other vSphere design solution is IMHO the change of ProductLocker location mentioned above, because VMtools image is not located on boot media.

Conclusion

ESXi 7 is using ESX-OSData partition for various logging and debugging files. In addition, if vSAN and/or NSX is enabled in ESXi, there are additional trace files leading into even higher I/O. This ESXi system behavior requires higher endurance of boot media than in the past. 

If you are defining the new hardware specification, it is highly recomended to use larger boot media (~150 GB or more) based on NAND flash technology and connected through modern buses like M.2 or PCI-e. When larger boot media is in use, ESXi 7 will do all the magic required for correct partitioning of ESX boot media.

In case of existing hardware and no budget for additional hardware upgrade, you can still use SD cards or USB drives, but you should carefully design boot media layout and consider relocation of Scratch, Core Dump, and ProductLocker to external locations to mitigate the risk of boot media failure.

Hope this write-up helps and if you will have some other finding or comment do not hesitate to let me know via comments bellow the post, twitter or email.

Sources:

2 comments:

Georgi said...

hi there, do you know if vmware will be deprecating stateless auto-deployed hosts? thats how it looks from the article for diskless hosts and the documentation for auto deploy for vsphere 7 just lists how to move FROM stateless to the other two modes, stateful or stateless caching. thanks in advance

David Pasek said...

Hi.
I cannot share any internal plans or roadmap publicly on the internet. I recommend to contact your VMware representative (SE, TAM, CSM, PM, etc.) and ask for product roadmap. You will need signed NDA before roadmap is delivered to you.

However, I can share some facts.
Fact #1: vSphere Lifecycle Management (vLCM) does require Stateful hosts with local disk.
Fact #2: VMware highly recommends to use vLCM for vSphere 7

Make your own decision about your next server specification.
If I would need to buy a new host today, I would order it with a durable boot media having 250GB and more. This is just my personal opinion and your mileage may vary.