Friday, March 23, 2018

Deploying vCenter High Availability with network addresses in separate subnets

VMware vCenter High Availability is a very interesting feature included in vSphere 6.5. Generally, it provides higher availability of vCenter service by having three vCenter nodes (active/passive/witness) all serving the single vCenter service.

This is written in the official vCenter HA documentation
vCenter High Availability (vCenter HA) protects vCenter Server Appliance against host and hardware failures. The active-passive architecture of the solution can also help you reduce downtime significantly when you patch vCenter Server Appliance.
After some network configuration, you create a three-node cluster that contains Active, Passive, and Witness nodes. Different configuration paths are available.   
The last sentence is very true. The simplest VCHA deployment is within the same SSO domain and within the single datacenter with two Layer2 networks, one for management and second for the heartbeat. Such design can be deployed in fully automated manner and you just need to provide dedicated network (portgroup/VLAN) for the heartbeat network and use 3 IP addresses from separated heartbeat subnet. Easy. But is it what you are expecting from vCenter HA? To be honest, the much more attractive use case is to spread vCenter HA nodes across three datacenters to keep vSphere management up and running even one of two datacenters experiences some issue. Conceptually it is depicted in the figure below.

Conceptual vCenter HA Design
In this particular concept, I have embedded PSC controllers because of simplicity and vCenter HA can increase availability even of PSC services. The most interesting challenge in this concept is networking so let's look into the intended network logical design.

vCenter HA - networking logical design
Networking logical design:

  • Each vCenter Server Appliance node has two NICs
  • One NIC is connected to management network and second NIC to heartbeat network
  • Layer 2 Management network (VLAN 4) is stretched across datacenters A and B because vCenter IP address must work without human intervention in datacenter B after VCHA fail-over.
  • In each datacenter we have independent heartbeat network (VCHA-HB-A, VCHA-HB-B, VCHA-HB-C) with different IP subnets to not stretch Layer 2 across datacenters, especially not to datacenter C where is the witness. This requires specific static routes in each vCenter Server Appliance node to have IP reachability over heartbeat network.
  • Specific VCHA network tcp/udp ports must be allowed among VCHA nodes across a heartbeat network.
Helpful documents:

Implementation Notes: 

Note 1:
VMware KB 2148442 (Deploying vCenter High Availability with network addresses in separate subnets) is very important to deploy such design but one information is missing there. After cloning of vCenter Server Appliances, you have to go to passive node and configure on eth0 the same IP address you use in active node. Configuration is in file  /etc/systemd/network/10-eth0.network.manual
    Note 2:
    In case of badly destroyed VCHA cluster use following commands to destroy VCHA from the command line
    cd /etc/systemd/network 
    mv 10-eth0.network.manual 20-eth0.networkdestroy-vchareboot
      The solution was found at https://communities.vmware.com/thread/552084
        Link to the official documentation (Resolving Failover Failures) - https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.avail.doc/GUID-FE5106A8-5FE7-4C38-91AA-D7140944002D.html

        Note 3:
        In case, you will see the error message MethodFault.summary error during the finalization process it is because a hostname mismatch is detected. The hostname assigned to the Passive node must be the same as the hostname of the Active node. The solution was found  at https://www.altaro.com/vmware/how-to-deploy-a-vcenter-ha-cluster-part-2/ but also written in KB https://kb.vmware.com/kb/2148442

        2 comments:

        Chris Butler said...

        Hi David. Our network team is attempting to remove all layer 2 stretched VLAN's across our on campus data centers. Is there any way besides VXLAN overlay to accomplish VCSA HA without stretching a layer 2 VLAN?

        Thanks,
        Chris

        David Pasek said...

        Hi Chris,

        in VMware KB 2148442 is written that ...

        The Failover IP (Passive node vCenter's Management Network) can differ from the Active when it is in different networks (subnets/datacenter/building/site).

        See VMware KB "Deploying vCenter High Availability with network addresses in separate subnets (2148442)"
        https://kb.vmware.com/s/article/2148442 for further details. There is written that two Static IP's for the vCenter Server Management network (Active and Passive) must be mapped to the same vCenter Server Appliance FQDN in the DNS server.

        To be honest, I did not test this particular setup because my customer will have stretched vSphere cluster across Datacenter A and B, therefore, he needs stretched management VLAN anyway. They will have CISCO ACI which supports L2 over L3 so we should be ok. Nevertheless, the scenario with two different IP addresses is interesting. I think that this scenario is possible only when you have everything integrated into vCenter via FQDN otherwise vCenter IP address cannot be changed. It would be nice to test it. Hope I will find some time to perform additional tests.

        Cheers.