VCDX #200 Blog of one VMware Infrastructure Designer: June 2015

Monday, June 29, 2015

DELL Force10 : Virtual Routing and Forwarding (VRF)

VRF Overview

Virtual Routing and Forwarding (VRF) allows a physical router to partition itself into multiple Virtual Routers (VRs). The control and data plane are isolated in each VR so that traffic does NOT flow across VRs. Virtual Routing and Forwarding (VRF) allows multiple instances of a routing table to co-exist within the same router at the same time.

DELL OS 9.7 supports up 64 VRF instances. Number of instances can be increased in future versions therefore check current documentation for authoritative number of instances.

VRF Use Cases

VRF improves functionality by allowing network paths to be segmented without using multiple devices. Using VRF also increases network security and can eliminate the need for encryption and authentication due to traffic segmentation.

Internet service providers (ISPs) often take advantage of VRF to create separate virtual private networks (VPNs) for customers; VRF is also referred to as VPN routing and forwarding.

VRF acts like a logical router; while a physical router may include many routing tables, a VRF instance uses only a single routing table. VRF uses a forwarding table that designates the next hop for each data packet, a list of devices that may be called upon to forward the packet, and a set of rules and routing protocols that govern how the packet is forwarded. These VRF forwarding tables prevent traffic from being forwarded outside a specific VRF path and also keep out traffic that should remain outside the VRF path.

VRF uses interfaces to distinguish routes for different VRF instances. Interfaces in a VRF can be either physical (Ethernet port or port channel) or logical (VLANs). You can configure identical or overlapping IP subnets on different interfaces if each interface belongs to a different VRF instance.

VRF Configuration

First of all you have to enable VRF feature.

conf
feature vrf

Next step is to create additional VRF instance

ip vrf tenant-1

vrf-id is assigned automatically however if you want to configure vrf-id explicitly you can by additional parameter. In example below we use vrf-id 1

ip vrf tenant-1 1

We are almost done. The last step is interface assignment in to particular VRF. You can assign following interfaces

Physical Ethernet interfaces (in L3 mode)
Port-Channel interfaces (static and dynamic/lacp)
VLAN interfaces
Loopback interfaces

Below is example how to assign LAN 100 in to VRF instance tenant-1.

interface vlan 100
ip vrf forwarding tenant-1

Configuration is pretty easy, right?

Working in particular VRF instance

When you want to configure, show or troubleshoot in particular VRF instance you have to explicitly specify in what VRF you want to be.

So for example when you want to do ping from tenant-01 VRF instance you have to use following command

ping vrf tenant-01 192.168.1.1

Conclusion

VRF is great technology for L3 multi-tenancy. DELL Network Operating System 9 supports VRF therefore you can design interesting network solutions.

Saturday, June 20, 2015

DELL Compellent Best Practices for Virtualization

All DELL Compellent Best Practices has been moved here.

The most interesting best practice document for me is "Dell Storage Center Best Practices with VMware vSphere 6.x".

VMware HA Error During VLT Failure

I have received following message in to my mailbox ...

Hi.
I have a customer that has been testing Force10 VLT with peer routing and VMWare and has encountered the warning message on all hosts during failover of the switches (S4810’s) only when the primary VLT node is failed
“vSphere HA Agent, on this host couldn’t not reach isolation address 10.100.0.1”
Does this impact HA at all? Is there a solution?
Thanks
Paul

Force10 is the legacy product name of DELL S-series datacenter networking. Force10 S4810's are datacenter L3 switches. If you don't know what Force10 VLT is look here. Generally it is something like CISCO virtual Port Channel (vPC), Juniper MC-LAG, Arista MLAG, etc.

I think my answer can be valuable for broader networking and virtualization community so here it is ...

First of all let’s make some assumptions:

Force10 VLT is used for multi chassis LAG capability
Force10 VLT peer routing is enabled in VLT domain to achieve L3 routing redundancy
10.100.0.1 is IP address of VLAN interface on Force10 S4810 (primary VLT node) and this particular VLAN is used for vSphere management.
10.100.0.2 is IP address on Force10 S4810 - secondary VLT node.
vSphere 5.x and above is used.

Root cause with explanation:

When primary Force10 VLT node is down then ping to 10.100.0.1 doesn’t work because peer-routing is ARP proxy on L2. Secondary node will route L2 traffic on behalf of primary node but 10.100.0.1 doesn’t answer on L3 therefore ICMP doesn’t work.

VMware (vSphere 5 and above) HA Cluster use network and storage heartbeat mechanism. Network mechanism use two probe algorithms listed below.

ESXi hosts in the cluster are sending heartbeat beacon to each other. This should work ok during primary VLT node failure.
ESXi hosts are also pinging HA isolation addresses (Default HA isolation address is default gateway therefore 10.100.0.1 in your particular case). This doesn’t work during primary VLT node failure.

That’s the reason VMware HA Cluster will log about this situation.

Is there any impact?

There is no impact on HA Cluster because

It is just informative message because algorithm (1) works correctly and there is still network visibility among ESXi hosts in the cluster.
From vSphere 5 and above there is also storage heartbeat mechanism which can eliminate network invisibility among ESXi host in the cluster.

Are there any potential improvements?

Yes they are. You can configure multiple HA Isolation Addresses to mitigate default gateway unavailability. In your particular case I would recommend to use two IP addresses (10.100.0.1 and 10.100.0.2) because at least one VLT node will be always available.

For more information how to configure multiple HA isolation addresses look at http://kb.vmware.com/kb/1002117

Monday, June 15, 2015

No data visibility for vSphere Admin

Recently I did very quick (time constrained) conceptual/logical design exercise for one customer who had virtualization first strategy and was willing to virtualize his Tier 1 business critical applications. One his requirement was to preclude data visibility for VMware vSphere admins.

I was quickly thinking how to fulfill this particular requirement and my first general answer was ENCRYPTION. The customer asked me to tell him more about encryption possibilities and I listed him following options.

Option 1/ Encryption in the Guest OS

Product examples Microsoft BitLocker, HyTrust, SafeNet, etc.
Very nice comparison of disk encryption softwares is here.

Option 2/ Application level encryption

Product examples Database Encryption in SQL Server 2008 and higher, Oracle Database Transparent Encryption, etc.

Option 3/ Encryption in the Fibre Channel SAN

Example is Brocade SAN Encryption Solution or Cisco MDS 9000 Family Storage Media Encryption.

Option 4/ Encryption in the Disk Array

Data encryption behind storage controllers. Usually leveraging Self Encrypted Disks (aka SED).

Next logical question was ... what is the performance impact?.
My quick answer was that there is definitely performance overhead in software encryption but no performance overhead with hardware encryption as it is offloaded into the special ASICs.

Hmm... Right, the most appropriate answer would be that hardware solutions are designed to have none or negligible performance impact. I always recommend to do testing before any real use in production but that's what hardware vendors claim at least in their white papers. Specifically in option (3) storage IO has to be redirected to the encryption module/appliance in the SAN which should be order of magnitude less that typical IO response time therefore impact on storage latency should be theoretically none or negligible.

However the problem with my recommended options is not performance claim.
The problem is that only option 1 and 2 are applicable to fulfill customer's requirement because option 3 and 4 do encryption and decryption on lower levels and data are decrypted and visible on vSphere layer. Therefore vSphere admin would have visibility into data.

Options 1 and 2 has definitely some performance overhead nowadays generally somewhere between 20%-100% depending on software solution, CPU family, encryption algorithm strength, encryption key length, etc.

For completeness let's say that options 3 and 4 are good for different use cases.

Option 3 can help you to secure data from storage admin not having access to SAN network or from someone having physical access to disks.
Option 4 can help you to secure data on disks against theft of physical storage or disks.

It is worth to say that security is always trade-off.

Software based solutions has some negative impact on performance, medium negative impact on price and also negative impact on manageability. Performance of software based solutions can be significantly improved by leveraging AES hardware offload to modern Intel Processors and performance overhead will be mitigated year by year.

Pure hardware based solutions are not applicable options for our specific requirement but even it would be applicable and they will have none or negligible impact on performance there are drawbacks like huge impact on cost and also some impact on scalability and manageability.

Conclusion
I was very quick during my consulting and I didn't realize what options really fulfill specific customer's requirement. I'm often saying that I don't trust anybody nor my self. This was exactly the case - unfortunately :-(

Time constrained consulting usually doesn't offer the best results. Good architecture need some time for review and better options comparison :-)

Pages