Friday, March 23, 2018

Deploying vCenter High Availability with network addresses in separate subnets

VMware vCenter High Availability is a very interesting feature included in vSphere 6.5. Generally, it provides higher availability of vCenter service by having three vCenter nodes (active/passive/witness) all serving the single vCenter service.

This is written in the official vCenter HA documentation
vCenter High Availability (vCenter HA) protects vCenter Server Appliance against host and hardware failures. The active-passive architecture of the solution can also help you reduce downtime significantly when you patch vCenter Server Appliance.
After some network configuration, you create a three-node cluster that contains Active, Passive, and Witness nodes. Different configuration paths are available.   
The last sentence is very true. The simplest VCHA deployment is within the same SSO domain and within the single datacenter with two Layer2 networks, one for management and second for the heartbeat. Such design can be deployed in fully automated manner and you just need to provide dedicated network (portgroup/VLAN) for the heartbeat network and use 3 IP addresses from separated heartbeat subnet. Easy. But is it what you are expecting from vCenter HA? To be honest, the much more attractive use case is to spread vCenter HA nodes across three datacenters to keep vSphere management up and running even one of two datacenters experiences some issue. Conceptually it is depicted in the figure below.

Conceptual vCenter HA Design
In this particular concept, I have embedded PSC controllers because of simplicity and vCenter HA can increase availability even of PSC services. The most interesting challenge in this concept is networking so let's look into the intended network logical design.

vCenter HA - networking logical design
Networking logical design:

  • Each vCenter Server Appliance node has two NICs
  • One NIC is connected to management network and second NIC to heartbeat network
  • Layer 2 Management network (VLAN 4) is stretched across datacenters A and B because vCenter IP address must work without human intervention in datacenter B after VCHA fail-over.
  • In each datacenter we have independent heartbeat network (VCHA-HB-A, VCHA-HB-B, VCHA-HB-C) with different IP subnets to not stretch Layer 2 across datacenters, especially not to datacenter C where is the witness. This requires specific static routes in each vCenter Server Appliance node to have IP reachability over heartbeat network.
  • Specific VCHA network tcp/udp ports must be allowed among VCHA nodes across a heartbeat network.
Helpful documents:

Implementation Notes: 

Note 1:
VMware KB 2148442 (Deploying vCenter High Availability with network addresses in separate subnets) is very important to deploy such design but one information is missing there. After cloning of vCenter Server Appliances, you have to go to passive node and configure on eth0 the same IP address you use in active node. Configuration is in file  /etc/systemd/network/10-eth0.network.manual
    Note 2:
    In case of badly destroyed VCHA cluster use following commands to destroy VCHA from the command line
    cd /etc/systemd/network 
    mv 10-eth0.network.manual 20-eth0.networkdestroy-vchareboot
      The solution was found at https://communities.vmware.com/thread/552084
        Link to the official documentation (Resolving Failover Failures) - https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.avail.doc/GUID-FE5106A8-5FE7-4C38-91AA-D7140944002D.html

        Note 3:
        In case, you will see the error message MethodFault.summary error during the finalization process it is because a hostname mismatch is detected. The hostname assigned to the Passive node must be the same as the hostname of the Active node. The solution was found  at https://www.altaro.com/vmware/how-to-deploy-a-vcenter-ha-cluster-part-2/ but also written in KB https://kb.vmware.com/kb/2148442

        Friday, March 09, 2018

        How to check I/O device on VMware HCL

        VMware has Hardware Compatibility List of supported I/O devices is available here
        https://www.vmware.com/resources/compatibility/search.php?deviceCategory=io

        VMware HCL for I/O devices

        The best identification of I/O device is VID (Vendor ID), DID (Device ID), SVID (Sub-Vendor ID), SSID (Sub-Device ID). VID, DID, SVID and SSID can be simply entered into VMware HCL and you will find if it is supported and what capabilities have been tested. You can also find supported firmware and driver.

        The get these identifiers you have to log in to ESXi via SSH and use command "vmkchdev -l".  This command shows VID:DID SVID:SSID for PCI devices and you can use grep to filter just VMware NICs (aka vmnic)
        vmkchdev -l | grep vmnic
        You should get similar output


        [dpasek@esx01:~] vmkchdev -l | grep vmnic
        0000:02:00.0 14e4:1657 103c:22be vmkernel vmnic0
        0000:02:00.1 14e4:1657 103c:22be vmkernel vmnic1
        0000:02:00.2 14e4:1657 103c:22be vmkernel vmnic2
        0000:02:00.3 14e4:1657 103c:22be vmkernel vmnic3
        0000:05:00.0 14e4:168e 103c:339d vmkernel vmnic4
        0000:05:00.1 14e4:168e 103c:339d vmkernel vmnic5
        0000:88:00.0 14e4:168e 103c:339d vmkernel vmnic6
        0000:88:00.1 14e4:168e 103c:339d vmkernel vmnic7

        So, in case of vmknic4 there is
        ·       VID:DID SVID:SSID
        ·       14e4:168e 103c:339d


        The same applies to HBAs and disk controllers.  For HBA and local disk controllers use
        vmkchdev -l | grep vmhba
        This is the output from my Intel NUC at my home lab

        [root@esx02:~] vmkchdev -l | more
        0000:00:00.0 8086:0a04 8086:2054 vmkernel 
        0000:00:02.0 8086:0a26 8086:2054 vmkernel 
        0000:00:03.0 8086:0a0c 8086:2054 vmkernel 
        0000:00:14.0 8086:9c31 8086:2054 vmkernel vmhba32
        0000:00:16.0 8086:9c3a 8086:2054 vmkernel 
        0000:00:19.0 8086:1559 8086:2054 vmkernel vmnic0
        0000:00:1b.0 8086:9c20 8086:2054 vmkernel 
        0000:00:1d.0 8086:9c26 8086:2054 vmkernel 
        0000:00:1f.0 8086:9c43 8086:2054 vmkernel 
        0000:00:1f.2 8086:9c03 8086:2054 vmkernel vmhba0

        0000:00:1f.3 8086:9c22 8086:2054 vmkernel 

        vmhba0 is local disk controller
        vmhba32 is USB storage controller
        vmnic0 is network interface

        Hope this helps.