Thursday, March 03, 2022

How to get vSAN Health Check state in machine-friendly format

I have a customer with dozens of vSAN clusters managed and monitored by vRealize Operations (aka vROps). vROps has a management pack for vSAN but there are not all features my customer is expecting for day-to-day operations. vSAN has a great feature called vSAN Skyline Health which is essentially a test framework periodically checking the health of vSAN state. Unfortunately, vSAN Skyline Health is not integrated with vROps which might or might not change in the future. Nevertheless, my customer has to operate vSAN infrastructure today, therefore, we are investigating some possibilities for how to develop some custom integration between vSAN Skyline Health and vROps.

The first thing we have to solve is how to get vSAN Skyline Health status in some machine-friendly format. It is well known that vSAN is manageable via esxcli.

Using ESXCLI output

Many ESXCLI commands generate the output you might want to use in your application. You can run esxcli with the --formatter dispatcher option and send the resulting output as input to a custom parser script.

Below are ESXCLI commands to get vSAN HealthCheck status.

esxcli vsan health cluster list
esxcli --formatter=keyvalue vsan health cluster list
esxcli --formatter=xml vsan health cluster list

Option formatter can help us to get the output in machine-friendly formats for automated processing.

If we want to get a detailed Health Check description we can use the following command

esxcli vsan health cluster get -t "vSAN: MTU check (ping with large packet size)"

Option -t contains the name of a particular vSAN HealthCheck test.

Example of one vSAN Health Check:

[root@esx11:~] esxcli vsan health cluster get -t "vSAN: MTU check (ping with large packet size)"

vSAN: MTU check (ping with large packet size) green
Performs a ping test with large packet size from each host to all other hosts.
Ask VMware: http://www.vmware.com/esx/support/askvmware/index.php?eventtype=com.vmware.vsan.health.test.largepin...
Only failed pings
From Host To Host To Device Ping result
--------------------------------------------------------
Ping results
From Host To Host To Device Ping result
----------------------------------------------------------------------
192.168.162.111 192.168.162.114 vmk0 green
192.168.162.111 192.168.162.113 vmk0 green
192.168.162.111 192.168.162.112 vmk0 green
192.168.162.112 192.168.162.111 vmk0 green
192.168.162.112 192.168.162.113 vmk0 green
192.168.162.112 192.168.162.114 vmk0 green
192.168.162.113 192.168.162.114 vmk0 green
192.168.162.113 192.168.162.112 vmk0 green
192.168.162.113 192.168.162.111 vmk0 green
192.168.162.114 192.168.162.111 vmk0 green
192.168.162.114 192.168.162.112 vmk0 green
192.168.162.114 192.168.162.113 vmk0 green

Conclusion

This very quick exercise shows the way how to programmatically get vSAN Skyline Health status via ESXCLI and somehow parse it and leverage vROps REST API to insert these data into vSAN Cluster objects as metrics. There is PowerShell/PowerCLI way how to leverage ESXCLI and do some custom automation, however, it is out of the scope of this blog post.  

No comments: