Tuesday, August 15, 2017

NSX Basic Concepts, Tips and Tricks

NSX and Network Teaming

There are multiple options how to achieve network teaming from ESXi to the physical network. For more information see my another blog post "Back to the basics - VMware vSphere networking".

In a nutshell, there are generally three supported methods how to connect NSX VTEP(s) to the physical network
  1. Explicit failover - only single physical NIC is active at any given time, therefore no load balancing at all
  2. LACP - single aggregated virtual interface where load balancing is done based on hashing algorithm
  3. Switch independent teaming achieved by multiple VTEPs where each VTEP is bind to different ESXi pNIC.
Let's assume we have switch independent teaming with multiple independent uplinks to the physical network. Now the question is how to check VM vNIC to ESXi host pNIC mapping? I'm aware of at least four methods how to check this mapping
  1. ESXTOP
  2. ESXCLI
  3. NSX Controller
  4. NSX Manager
1/ ESXTOP method
  • ssh to ESXi
  • run esxtop
  • Press key [n] to switch to network view
  • Check column TEAM-PNIC – it should be different vmnic (ESXi pNIC) for each VM
2/ ESXCLI method
  • ssh to ESXi
  • Use command “esxcli network vm list” and locate World IDs of VM
  • Use “esxcli network vm port list -w ” and check “Team Uplink” value. It should be different vmnic (ESXi PNIC) for each VM
3/ NSX Controller method
  • Identify MAC address of VM
  • Login to NSX Controller nodes (ssh or console) one by one
  • Use command “show control-cluster logical-switches mac-table ” to show mac-address to VTEP mappings. I assume multi VTEP configuration where each VTEP is statically bound to particular ESXi pNIC (vmnic)
4/ NSX Manager method
  • Identify MAC address of VM
  • Login to NSX Manager (ssh or console)
  • Go through all controllers and show mac address table where is also information behind which VTEP particular mac address is
  • i) show controller list all
  • ii) show logical-switch controller controller-1 vni 10001 mac
  • iii) show logical-switch controller controller-2 vni 10001 mac
  • iv) show logical-switch controller controller-3 vni 10001 mac
The appropriate method is typically chosen based on the role and Role Based Access Control. vSphere Administrator will probably use esxtop or esxcli and Network Administrator will use NSX Manager or Controller.

Distributed Logical Router (DLR)

DLR is a virtual router distributed across multiple ESXi hosts. You can imagine it as a chassis with multiple line cards.  Chassis is virtual (software based) and line cards are software modules spread across multiple ESXi hosts (physical x86 servers).

The basic concept of DLR is that every routing decision is done locally which means that NSX DLR always performs local routing on the DLR instance running in the kernel of the ESXi hosting the workload that initiates the communication. When VM traffic needs to be routed to another logical switch, it first comes to DLR on the same ESXi host where VM is running. Each DLR line card module (ESXi host) has all logical switches (VXLANs) connected locally so DLR forwards the packet to the appropriate destination logical switch and if the target VM runs on another ESXi host the packet is encapsulated on local ESXi host and decapsulated on target ESXi host.

It is good to know, that DLR uses always the same MAC address for default gateway addresses for all logical switches. This MAC address is called VMAC. This is a MAC address used for DLR logical L3 interfaces (LIFs) connected into logical switches (VXLANs).

However, there must be some coordination between multiple DLR "line card" modules (ESXi hosts) therefore each DLR module must also have physical MAC address. This MAC address is called PMAC.

To show DLR PMAC and VMAC run following command on ESXi host
net-vdr -l -C

Distributed Logical Firewall (DFW) - firewall rules

NSX Distributed Firewall applies firewall rules directly to VM vNICs. In the vNIC is the concept of slots where different services are bind and chain together. NSX DFW sits in slot 2 and for example, the third party firewall sits in slot 4.

So the DFW firewall rules are automatically applied on each vNIC so the question is how to double check what rules are at vNIC level.

There are two methods how to check it
  1. ESXi commands
  2. NSX Manager commands
1/ ESXi method
  • ssh to ESXi
  • Use command “summarize-dvfilter” and locate the VM of your interest and its vNIC name is slot 2 used by agent vmware-sfw
  • grep commands can help us here ... "summarize-dvfilter | grep -A 10 "
  •  vNIC name should looks similar to nic-24565940-eth0-vmware-sfw.2
  • Now you can list firewall rules by command "vsipioctl getfwrules -f nic-24565940-eth0-vmware-sfw.2"

2/ NSX Manager method (https://kb.vmware.com/kb/2125482)
  • Log in to the NSX Manager with the admin credentials
  • To display a summary of DVFilter information, run the command "show dfw host-id summarize-dvfilter"
  • To display detailed information about a vnic, run the command "show dfw host host-id vnic"
  • To display the rules configured on the filter, run the command "show dfw host host-id vnic vnic-id filter filter-name rules"
  • To display the addrsets configured on the filter, run the command "show dfw host host-id vnic vnic-id filter filter-name addrsets"
And again, the appropriate method is typically chosen based on the administrator role and Role Based Access Control. 

Distributed Logical Firewall (DFW) - third party integration and availability considerations

NSX Distributed Firewall supports integration with third party solutions. This integration is also called service chaining. Third party solution is hooked to a particular vNIC slot and usually, some selected or potentially all (not recommended) traffic can be redirected to third-party solution agent running on each ESXi host as a special Virtual Machine. The third-party solution can inspect the traffic and allow or deny the traffic. However,  what happens when agent VM is not available? It is easy to test it, you can Power Off Agent VM and see what happens. Actually, the behavior depends on Service failOpen/failClosed policy.  You can check policy setting as depicted on the screenshot below ...

Service failOpen/failClosed policy
If failOpen is set to false then the virtual machine traffic will be dropped in case the agent is unavailable. It has a negative impact on availability but positive impact on security. If failOpen is set to true then the VM traffic will be allowed and everything works even the agent is not available. In such situation, the security policy cannot be enforced and there is a potential security risk. So this is typical design decision point where a decision is dependent on customer specific requirements.

Now the question is how failOpen setting can be changed. Well, my understanding is that it depends on third party solution. Here is the link to TrendMicro how to - "Set vNetwork behavior when appliances shut down"  

Monday, August 14, 2017

Remote text based console to ESXi over IPMI SOL

I have just bought another server into my home lab. I already have 6 Intel NUCs but a lot of RAM is needed for full VMware SDDC with all products like LogInsight, vROps, vRNI, vRA, vRO, ...  but that's another story.

Anyway, I have decided to buy used Dell rack server (PowerEdge R810) with 256 GB RAM mainly because of the amount of RAM but also because of all Dell servers older than 9 Generation support IPMI which is very useful. The server can be remotely managed (power on, power off, etc.) over IPMI and it also supports SOL which stands for Serial-over-LAN for server consoles. IPMI SOL is an inexpensive alternative to the iDRAC enterprise virtual console.

You can read more about IPMI on links below

So, if you will follow instructions on links above, you will be able to use IMPI SOL to see and manage server during the boot process and change for example BIOS settings. I have tested it and it works like a charm. You see the booting progress, you can go to the BIOS and change anything how you want. Console redirection works and the keyboard can be used to control the server during POST. However, after the server POST phase and boot loading of ESXi, the ESXi console was not, unfortunately, redirected to SOL. I think it is because ESXi DCUI is not pure text based console. Instead, it is a graphics mode simulating text mode. A graphics mode consoles cannot be, for obvious reasons, transferred over IPMI SOL.

So there is another possibility. ESXi Direct Console (aka DCUI) can be redirected to a Serial Port. The setup procedure is nicely described in the documentation here. It is done via ESXi host advanced setting "VMkernel.Boot.tty2Port" to the value "com2". It is worth to mention that server console redirection and ESXi DCUI redirection cannot be done on the same time for obvious reasons. So I have unconfigured server console redirection and configured ESXi DCUI redirection. It worked great, but the keyboard was not working. It is pretty useless to see ESXi DCUI without the possibility to use it, right? To be honest, I do not know why my keyboard did not work over IPMI SOL.

So what is the conclusion? Unfortunately,  I have hit another AHA effect ...
"Aha, IPMI SOL will not help me too much with remote access to ESXi DCUI console."
And as always, any feedback or tips and tricks are more than welcome as comments to this blog post.

Update: I have just found and bought very cheap iDRAC Enterprise Remote Access Card on Ebay, which supports remote virtual console and media. So, it is hardware workaround to my software problem :-)

iDrac6 on Ebay