Wednesday, April 06, 2016

ESXi host vCPU/pCPU reporting via PowerCLI to LogInsight

Some time ago I had a discussion with one of my customers how to achieve vCPU/pCPU ratio 1:1 on their Tier 1 cluster. Unfortunately, there is not any out-of-the box vSphere policy to achieve it. You can try to use vSphere HA Cluster admission control with advanced settings to achieve such requirement but it is based on CPU reservations in MHz so it would be tricky settings anyway with some additional risks for example after physical server hardware replacement.

At the end of the day we agreed that the goal can be achieved by Monitoring and Capacity Planning process. One can probably leverage VMware vRealize Operations Manager (aka vROps) or similar monitoring platform but because my customer does not have vROps and I'm not vROps expert I realized there is very simple alternative.

Let's leverage PowerShell/PowerCLI to report vCPU/pCPU ratio of ESXi hosts. 

As you can see in the script below it is pretty easy task to prepare PowerCLI report however the question is how to visualize it and send alerts in case of exceeded threshold.

And that's another simple idea. Why not leverage vRealize LogInsight?

All my readers most probably know what VMware's LogInsight (LI) is but just in case - LI is highly available and scalable syslog server appliance which main business value is an excellent reporting capabilities from unstructured data. I don't want to describe LogInsight in this blog post but another interesting feature of LI besides syslog messages it also accepts JSON messages sent via API. For more details look here.

So the whole solution is conceptually pretty easy. Bellow is the high level process.

  1. PowerCLI : Go through each ESXi and calculates vCPU/pCPU ratio
  2. PowerCLI : Compose a message including vCPU/pCPU ratio together with additional context information like timestamp, cluster name, ESXi name, number of vCPU and pCPU
  3. PowerCLI : Send the message to LogInsight via REST API
  4. LogInsight : Prepare custom analytics and create Dashboard
  5. LogInsight: Create alert to send e-mail message or trigger web hook when threshold is exceeded    
The latest script version is at GITHUB. Below is complete PowerCLI script ...

 #################################  
 # vCenter Server configuration  
 #  
 $vcenter = “vc01.home.uw.cz“  
 $vcenteruser = “readonly“  
 $vcenterpw = “readonly“  
 $loginsight = "192.168.4.51"  
 #################################  
   
 $o = Add-PSSnapin VMware.VimAutomation.Core  
 $o = Set-PowerCLIConfiguration -InvalidCertificateAction Ignore -Confirm:$false  
   
 #################################  
 # Connect to vCenter Server  
 $vc = connect-viserver $vcenter -User $vcenteruser -Password $vcenterpw  
   
 #################################  
 # Send Message to LogInsight  
 function Send-LogInsightMessage ([string]$ip, [string]$message)  
 {  
  $uri = "http://" + $ip + ":9000/api/v1/messages/ingest/1"  
  $content_type = "application/json"  
  $body = '{"messages":[{"text":"'+ $message +' "}]}'  
  $r = Invoke-RestMethod -Uri $uri -ContentType $content_type -Method Post -Body $body  
 }  
   
 #################################  
 # Count vCPU/pCPU Ratio  
 foreach ($esx in (Get-VMHost | Sort-Object Name)) {  
  $pCPUs = $esx.NumCpu  
  $vCPUs = ($esx | get-vm | Measure-Object -Sum NumCPU).Sum  
  $CPU_ratio = $vCPUs / $pCPUs  
  $date = (Get-Date).ToUniversalTime()  
  $cluster_name = get-cluster -VMHost $esx  
   
  $message = "UTC date time: $date Cluster: $cluster_name ESX name: $esx.Name pCPUs: $pCPUs vCPUs: $vCPUs vCPU/pCPU ratio: $CPU_ratio"  
  Write-Output $message  
  Send-LogInsightMessage "192.168.4.51" $message  
 }  

 disconnect-viserver -Server $vc -Force -Confirm:$false  

The PowerCLI script running in my home lab generate messages depicted below ...

 PS C:\Users\Administrator\Documents\scripts> .\Cluster_hosts_vCPU_pCPU_report.ps1  
 UTC date time: 04/06/2016 12:49:32 Cluster: Cluster ESX name: esx01.home.uw.cz.Name pCPUs: 2 vCPUs: 18 vCPU/pCPU ratio: 9  
 UTC date time: 04/06/2016 12:49:33 Cluster: Cluster ESX name: esx02.home.uw.cz.Name pCPUs: 2 vCPUs: 12 vCPU/pCPU ratio: 6  

I use scheduled tasks to send these messages periodically to LogInsight. You can see LogInsight messages in screenshot below ...

LogInsight Interactive Analysis
It is very simple to create dashboard from the analytic ...

LogInsight vCPU/pCPU Dashboard

And the last task is to create alert in LogInsight when vCPU/pCPU ratio is higher then 1 or you can be informed little bit earlier so you can set an alert when ratio is higher then 0.8 ...


Pretty easy, right?
Hope this helps broader VMware community.

And as always, any comments and thoughts are very welcome.

No comments: