Monday, December 28, 2015

VMware NSX useful resources

I'm trying to deep dive into VMware Network Virtualization (NSX) and I have decided to collect all useful resources I will find during my journey.

There are two NSX flavors. NSX-v and NSX-T. It is good to read NSX-v vs NSX-T: Comprehensive Comparison to understand differences.

Fundamentals
NSX-V Design
NSX-T Design
NSX-V Operations
NSX-T Operations
NSX Automation
NSX in Home Lab
NSX Advanced
NSX Security
NSX Dynamic Routing


Networking 


NSX Tutorial


Other lists of resources
  • Rene Van Den Bedem (@vcdx133) :  NSX Link-O-Rama - great list of resources gathered by Rene
Tools
  • ARKIN - network visibility and analytics
This list will be continuously updated.
If you know any other useful NSX resource don't hesitate to write a comment with link.

Thursday, December 17, 2015

End to End QoS solution for Vmware vSphere with NSX on top of Cisco UCS

I'm engaged on a private cloud project where end to end network QoS is required to achieve some guarantees for particular network traffics.  These traffics are
  • FCoE Storage
  • vSphere Management
  • vSphere vMotion
  • VM production
  • VM guest OS agent based backup <== this is the most complex requirement in context of QoS
Compute and Network Infrastructure is based on
  • CISCO UCS
  • CISCO Nexus 7k and 
  • VMware NSX. 
More specifically following hardware components has to be used:
  • CISCO UCS Mini Chassis 5108 with Fabric Interconnects 6324 
  • CISCO UCS servers B200 M4 with virtual interface card VIC1340 (2x10Gb ports - each port connected to different fabric interconnect)
  • CISCO Nexus 7k
Customer is also planning to use NSX security (micro segmentation) and vRealize Automation for automated VM provisioning.

So now we understand the overall concept and we can consider how to achieve end to end network QoS to differentiate required network traffics. 

Generally we can leverage Layer 2 QoS 802.1p (aka CoS - Class of Service ) or Layer 3 QoS (aka DSCP - Differentiated Services Code Point). However, to achieve end to end QoS on Cisco UCS we have to use CoS because it is the only QoS method available inside Cisco UCS blade system to guarantee bandwidth in shared 10Gb NIC (better to say CNA) ports.

The most important design decision point is where we will do CoS marking to differentiate required network traffics. Following two options are generally available:
  1. CoS marking only in Cisco USC (hardware based marking) 
  2. CoS marking on vSphere DVS portgroups (software based marking)
Lets deep dive into available options.

OPTION 1 - UCS Hardware CoS marking

Option 1 is depicted on figure below. Please, click on figure to understand where CoS marking and bandwidth management is done.


Following bullets describes key ideas of option 1:
  • Management and vMotion traffics are consolidated on the same pair of 10Gb adapter ports (NIC-A1 and NIC-B1) together with FCoE traffic. Active / Standby teaming is used for vSphere Management and vMotion portgroups where each traffic is by default active on different UCS fabric.
  • VTEP and Backup traffics are consolidated on the same pair of 10Gb adapter ports (NIC-A2 and NIC-B2). Active / Standby teaming is used for NSX VTEP and backup portgroup.  Each traffic is by default active on different UCS fabric. 
  • Multiple VTEPs and Active / Active teaming for backup portgroup can be considered for higher network performance if necessary.
  • All vmkernel interfaces should be configured consistently across all ESXi hosts to use the same ethernet fabric in non-degraded state and achieve optimal east-west traffic.
Option 1 implications:
  • Two Virtual Machine's vNICs has to be used because one will be used for production traffic (VXLAN) and second one for backup traffic (VLAN backed portgroup with CoS marking).
OPTION 2 - vSphere DVS CoS marking:

Option 2 is depicted on figure below. Please, click on figure to understand where CoS marking and bandwidth management is done.

Following bullets describes key ideas of Option 2 and difference against Option 1:
  • Management and vMotion traffics are also consolidated on the same pair of 10Gb adapter ports (NIC-A1 and NIC-B1) together with FCoE traffic. The difference against Option 1 is usage of single CISCO vNIC and CoS marking in DVS portgroups. Active / Standby teaming is used for vSphere Management and vMotion portgroups where each traffic is by default active on different UCS fabric.
  • VTEP and Backup traffics are consolidated on the same pair of 10Gb adapter ports (NIC-A2 and NIC-B2). Active / Standby teaming is used for NSX VTEP and backup portgroup.  The difference against Option 1 is usage of single CISCO vNIC and CoS marking in DVS portgroups. Each traffic (VXLAN, Backup) is by default active on different UCS fabric. 
  • Multiple VTEPs and Active / Active teaming for backup portgroup can be considered for higher network performance if necessary.
  • All vmkernel interfaces should be configured consistently across all ESXi hosts to use the same ethernet fabric in non-degraded state and achieve optimal east-west traffic.
  • FCoE traffic is marked automatically by UCS for any vHBA. This is the only hardware based CoS marking.
Option 2 implications:
  • Two Virtual Machine's vNICs has to be used because one will be used for production traffic (VXLAN) and second one for backup traffic (VLAN backed portgroup with CoS marking).

Both options above requires two vNICs per VM. It is introducing several challenges some of them listed below:
  1. More IP addresses required
  2. Default IP gateway is used for production network, therefore, the backup network cannot be routed without static routes inside guest OS. 
Do we have any possibility to achieve QoS differentiation between VM traffic and In-Guest backup traffic with single vNIC per VM? 

We can little bit enhance Option 2. Enhanced solution is depicted in the figure below. Please, click on figure to understand where CoS marking and bandwidth management is done.


So where is the enhancement? We can leverage enhanced conditional CoS marking in each DVS portgroup used as NSX virtual wire (aka NSX Logical Switch or VXLAN). If IP traffic is targeted to the backup server we will mark it as CoS 4 (backup) else we will mark it a CoS 0 (VM traffic). 

You can argue VXLAN is L2 over L3 and thus L2 traffic where we did a conditional CoS marking will be encapsulated into L3 traffic (UDP) in VTEP interface and we will lose CoS tag. However, that's not the case. VXLAN protocol is designed to keep L2 CoS tags by copying inner Ethernet header into outer Ethernet header. Therefore virtual overlay CoS tag is kept even in physical network underlay and it can be leveraged in Cisco UCS bandwidth management (aka DCB ETS - Enhanced Transmission Selection) to guarantee bandwidth for particular CoS traffics.

The enhanced Option 2 seems to me as the best design decision for my particular use case and specific requirements.

I hope it makes sense and someone else find it useful. However, I share it with IT community for broader review. Any comments or the thoughts are very welcome.