Sunday, March 15, 2015

DELL Force10 : mVLT – Ethernet Loop Free Topology Design

Last week I have received following question from one of my reader …
I came to your blog post http://blog.igics.com/2014/05/dell-force10-vlt-virtual-link-trunking.html and I am really happy that you shared this information with us. However I was wondering if you have tested a scenario with 4 S4810 with VLT configured on 2 x 2 and connected together (somewhere called mLAG). How do you continue to add VLT couples to the setup? I would be really happy if you could provide any info regarding such setup.
So let’s deep dive into VLT port-channel between two Force10 VLT Domains also known as mVLT. Please note that VLT can be configured not only between two Force10 VLT domains but also between Force10 VLT domain and other multi chassis port-channel technology like for instance CISCO virtual Port Channel (vPC). However, this blog post is focused to single vendor solution mVLT on DELL S-Series Switches (previously known as Force10 S-Series).

If you are not familiar with DELL Force10 VLT technology read my previous blog post where is VLT described in detail. It is really important to understand VLT before you try to understand mVLT (Multi-domain VLT). By the way mVLT is called eVLT (Enhanced VLT) in Force10 documentation so it might be little bit confusing. Anyway mVLT is nothing else then regular virtual port channel (VLT) between  two VLT domains. Therefore mVLT is quite good term if you ask me.

mVLT Logical Design
mVLT logical design is pretty straight forward. It is required to achieve stretched L2 over two datacenters without any loops. This topology is often called loop free topology and it is depicted on figure below from spanning tree (STP) point of view.


However we would like to have hardware and link redundancy therefore multi chassis port-channel technology (Force10 VLT in our particular case) is used to still have simple loop free topology from spanning tree point of view but with switch unit and physical link redundancy. Force10 mVLT solution is logically depicted on figure below.


Please note, that each single VLT Domain act in spanning tree as a single logical switch.

DELL highly recommends using four links between VLT domains because of higher redundancy and optimal data flow. However, sometimes your are constraint with links between sites. Two links DCI is also supported design but not recommended because there is obviously lower link redundancy and therefore higher probability of communication over VLTi which adds hop and therefore latency. Two links mVLT DCI also known as square design is depicted on figure below. 


Even the topology is loop free and from logical view we have just one switch on each datacenter spanning tree protocol should be enabled and configured just in case of human error or VLT domain failure or split. Rapid Spanning Tree (RSTP) protocol is good enough therefore used later in physical configurations.

mVLT Physical Design and Configuration
Physical design below shows connectivity of four (2x two) Force10 S4810 switches leveraging four links for DCI port-channel (mVLT).


Physical design for just two links DCI is depicted on following schema.


And switch configuration snippets for four links mVLT are listed below for completeness. Two link DCI is just variation of similar configurations so you can simply reuse and slightly change four link configuration.

DCA-SWCORE-A – acts as primary Root Bridge in RSTP in case of loop
!
hostname DCA-SWCORE-A
!
protocol spanning-tree rstp
 no disable
 hello-time 1
 max-age 6
 forward-delay 4
 bridge-priority 4096
!
vlt domain 1
 peer-link port-channel 128
 back-up destination 172.16.201.2
 primary-priority 1
 system-mac mac-address 02:00:00:00:00:01
 unit-id 0
 peer-routing
!
 proxy-gateway lldp
  peer-domain-link port-channel 127
!
interface TenGigabitEthernet 0/46
 no ip address
 mtu 12000
 port-channel-protocol LACP
  port-channel 127 mode active
 dampening 10 100 1000 60
 no shutdown
!
interface TenGigabitEthernet 0/47
 no ip address
 mtu 12000
 port-channel-protocol LACP
  port-channel 127 mode active
 dampening 10 100 1000 60
 no shutdown
!
interface fortyGigE 0/56
 no ip address
 mtu 12000
 no shutdown
!
interface fortyGigE 0/60
 no ip address
 mtu 12000
 no shutdown
!
interface ManagementEthernet 0/0
 ip address 172.16.201.1/24
 no shutdown
!
interface Port-channel 127
 description "mVLT - interconnect link"
 no ip address
 mtu 12000
 switchport
 vlt-peer-lag port-channel 127
 no shutdown
!
interface Port-channel 128
 description "VLTi - interconnect link"
 no ip address
 mtu 12000
 channel-member fortyGigE 0/56,60
 no shutdown
!

DCA-SWCORE-B  – acts as secondary Root Bridge in RSTP in case of loop
!
hostname DCA-SWCORE-B
!
protocol spanning-tree rstp
 no disable
 hello-time 1
 max-age 6
 forward-delay 4
 bridge-priority 8192
!
vlt domain 1
 peer-link port-channel 128
 back-up destination 172.16.201.1
 primary-priority 8192
 system-mac mac-address 02:00:00:00:00:01
 unit-id 1
 peer-routing
!
 proxy-gateway lldp
  peer-domain-link port-channel 127
!
interface TenGigabitEthernet 0/46
 no ip address
 mtu 12000
 port-channel-protocol LACP
  port-channel 127 mode active
 dampening 10 100 1000 60
 no shutdown
!
interface TenGigabitEthernet 0/47
 no ip address
 mtu 12000
 port-channel-protocol LACP
  port-channel 127 mode active
 dampening 10 100 1000 60
 no shutdown
!
interface fortyGigE 0/56
 no ip address
 mtu 12000
 no shutdown
!
interface fortyGigE 0/60
 no ip address
 mtu 12000
 no shutdown
!
interface ManagementEthernet 0/0
 ip address 172.16.201.2/24
 no shutdown
!
interface Port-channel 127
 description "mVLT - interconnect link"
 no ip address
 mtu 12000
 switchport
 vlt-peer-lag port-channel 127
 no shutdown
!
interface Port-channel 128
 description "VLTi - interconnect link"
 no ip address
 mtu 12000
 channel-member fortyGigE 0/56,60
 no shutdown
!
DCB-SWCORE-A – acts as tertiary Root Bridge in RSTP in case of loop
!
hostname DCB-SWCORE-A
!
protocol spanning-tree rstp
 no disable
 hello-time 1
 max-age 6
 forward-delay 4
 bridge-priority 12288
!
vlt domain 2
 peer-link port-channel 128
 back-up destination 172.16.202.2
 primary-priority 1
 system-mac mac-address 02:00:00:00:00:02
 unit-id 0
 peer-routing
!
 proxy-gateway lldp
  peer-domain-link port-channel 127
!
interface TenGigabitEthernet 0/46
 no ip address
 mtu 12000
 port-channel-protocol LACP
  port-channel 127 mode active
 dampening 10 100 1000 60
 no shutdown
!
interface TenGigabitEthernet 0/47
 no ip address
 mtu 12000
 port-channel-protocol LACP
  port-channel 127 mode active
 dampening 10 100 1000 60
 no shutdown
!
interface fortyGigE 0/56
 no ip address
 mtu 12000
 no shutdown
!
interface fortyGigE 0/60
 no ip address
 mtu 12000
 no shutdown
!
interface ManagementEthernet 0/0
 ip address 172.16.202.1/24
 no shutdown
!
interface Port-channel 127
 description "mVLT - interconnect link"
 no ip address
 mtu 12000
 switchport
 vlt-peer-lag port-channel 127
 no shutdown
!
interface Port-channel 128
 description "VLTi - interconnect link"
 no ip address
 mtu 12000
 channel-member fortyGigE 0/56,60
 no shutdown
!

DCB-SWCORE-B – acts as quaternary Root Bridge in RSTP in case of loop
!
hostname DCB-SWCORE-B
!
protocol spanning-tree rstp
 no disable
 hello-time 1
 max-age 6
 forward-delay 4
 bridge-priority 16384
!
vlt domain 2
 peer-link port-channel 128
 back-up destination 172.16.202.1
 primary-priority 8192
 system-mac mac-address 02:00:00:00:00:02
 unit-id 1
 peer-routing
!
 proxy-gateway lldp
  peer-domain-link port-channel 127
!
interface TenGigabitEthernet 0/46
 no ip address
 mtu 12000
 port-channel-protocol LACP
  port-channel 127 mode active
 dampening 10 100 1000 60
 no shutdown
!
interface TenGigabitEthernet 0/47
 no ip address
 mtu 12000
 port-channel-protocol LACP
  port-channel 127 mode active
 dampening 10 100 1000 60
 no shutdown
!
interface fortyGigE 0/56
 no ip address
 mtu 12000
 no shutdown
!
interface fortyGigE 0/60
 no ip address
 mtu 12000
 no shutdown
!
interface ManagementEthernet 0/0
 ip address 172.16.202.2/24
 no shutdown
!
interface Port-channel 127
 description "mVLT - interconnect link"
 no ip address
 mtu 12000
 switchport
 vlt-peer-lag port-channel 127
 no shutdown
!
interface Port-channel 128
 description "VLTi - interconnect link"
 no ip address
 mtu 12000
 channel-member fortyGigE 0/56,60
 no shutdown
!

Conclusion

Force10 mVLT is great technology for loop free L2 network topology. It can be leveraged for local loop free topologies inside single datacenter or as L2 extension between datacenters. However our networks are usually built to support IP traffic therefore L3 considerations has to be addressed as well. Just think about default IP gateway behavior and potential DCI potential trombone.  That’s where other VLT features peer-routing and proxy-gateway come in to play and mitigate DCI trombone issue. You can see these technologies configured in VLT configurations above. But that’s another topic for another blog post.

To be absolutely honest I personally don't recommend L2 interconnects between datacenters without any good justification. I strongly recommend L3 datacenter interconnects and when stretched L2 is needed then some network overlay technology can be leveraged. L3 will guarantee independent availability zones and splitting L2 failure domain. But on the other hand such network overlay needs some other bits and pieces which in some cases increase complexity and cost. Therefore mVLT can be seriously considered for cost effective datacenter L2 extensions.  That's a typical "it depends" scenario where these two design decision options has to be compared and final decision clearly justified.   

If you want to know more about these technologies or use cases just ask and we can go deeper or broader. And as always any feedback and/or comment is highly appreciated.

19 comments:

Anonymous said...

Thank you very much for this post David.

Anonymous said...

Hello David,

What mac-address should be placed in the vlt domain config ?
Is it just a formal mac-address or ?

Thanks

David Pasek said...

Command "system-mac mac-address" is optional and Dell Networking OS automatically creates a VLT-system MAC address used for internal system operations.

Explicit configuration minimize the time required for the VLT system to synchronize the default MAC address of the VLT domain on both peer switches when one peer switch reboots.

VLT-system MAC address is used just for internal system operations therefore any MAC address can be used.

Read my blog post "Locally Administered Address Ranges" at http://blog.igics.com/2014/05/locally-administered-address-ranges.html to deside what MAC adresses to choose.

Unknown said...

Hi David, why do we need mVLT if both VLT domains are already loop free? VLT domain 1 and VLT domain 2 are seen as a single switch. So what is stopping us from just creating a simple VLT LAG between the two?

David Pasek said...

VLT is nothing else then Dell terminology for LAG. Actually Multi Chassis LAG (aka MLAG).

Single VLT domain are still two switches in the network but in STP topology they appears like single logical switch (single node).

When you have two VLT domains they appear in STP topology as two logical switches (two nodes).

mVLT is nothing else then VLT (aka MLAG) between these two logical switches (four physical chassis, two in each VLT domain).

To make it more complicated mVLT also use term eVLT in Dell/Force10 documentation. mVLT and eVLT are two names for the same thing.

Now, Dell has some enhancement on mVLT for L3 traffic optimization which can be useful for some use cases. Dell call it mVLT Proxy Gateway. It is based on ProxyARP. That's the only difference between normal VLT (MLAG) and mVLT (MLAG).

When you interconnect Force10 VLT Domain with Cisco vPC Domain (pair of Nexus switches) interconnect is VLT (Dell MLAG) on Dell side and vPC (Cisco MLAG) on Cisco side.

Does it make sense now?

Unknown said...

So if I don't need mVLT Proxy Gateway, can I use VLT instead of mVLT betwern two two-switch VLT domains?

David Pasek said...

Short answer is yes.

Longer answer:

mVLT Proxy Gateway is L3 function. So if you don't need it don't use it.

If you are interested only in L2 you can use just multi-chassis LAG (port-channel) between two VLT domains. You call it VLT between two VLT domains. It is perfectly fine. Just FYI, it is exactly what mVLT means in Dell terminology. And to confuse it even more, in Dell documentation you can also see term eVLT which is just another name for mVLT.

VLT stands for "virtual link trunking" where trunk means LAG (port-channel) but because Virtual Port-Channel (vPC) is Cisco term I assume that Dell (actually Force10 company) used term trunk which is used by HP/3COM for port-channel. In my opinion, the most precise term would be MLAG (multi chassis link aggregation) but it is also used by some other vendors.

mVLT stands for "multi domains VLT"
eVLT stands for Enhanced VLT

Don't blame me for these different terms from different vendors. It seems to me that each vendor wants to be somehow unique which confuse all practitioners in the field.

Hope now it is absolutely clear.

Unknown said...

Hi David, yes I know the terminology is a bit confusing. I just wanted to confirm the benefits of using mVLT over VLT for connecting two VLT domains. I guess at the end of the day it doesn't really matter. I can set up a mVLT and if I don't need L3 I just won't be using any of the mVLT enchancements.

Thanks again for the comprehensive answers!

David Pasek said...

Yes Nick. That's absolutely fair statement.

Unknown said...
This comment has been removed by a blog administrator.
Unknown said...

Hi David,

Do you have any insight how lacp restores the connection on mvlt topology whenever incident occurs on one of the vlt pair like reboot/hw failure? I've a similar mvlt/evlt setup with pvst and during my tests I've seen 6-10 no ping reply which I was expecting 2-3 max especially in full mesh mvlt topology.

Thanks,

Anonymous said...

David, thanks a lot for this.
Question on VLAN, for example vlan 2 has untagged portchannel 127. There is no way to "add" pc 127 to another valn?

Thanks
Frank

David Pasek said...

@Anonymous: Of course, there is way how to manage VLANs on PortChannel 127. You can manage VLAN tagging as usual. You have to go to config mode of particular VLANs and configure it as tagged or untagged on particular interface. See. http://blog.igics.com/2015/07/dell-force10-interface-configuration.html for further information.

Anonymous said...

Hey Dave,
thanks a lot for responing so fast.
I´ll have a look :-)
Frank

Anonymous said...

Hi David, can we use this kind of architecture to create disaster recovery solution? I wonder if we can stretch layer 2 between two datacenters and using peer-routing and proxy-gateway achieve HA for VMs. I'm not sure which address should I use as a default gateway for VM.

Peter

David Pasek said...

Hi Peter. Yes, you can but there are some buts as always and you have to know what is the goal, what you addressing and how the used technology really works.

It also depends if you want it as DR or cross-site/geo HA.

DR = recoverability
HA = availability
these two design qualities are different as I'm trying to explain here
https://www.slideshare.net/davidpasek/metro-cluster-high-availability-or-srm-disaster-recovery-69964166


Btw, DR is not only about technical aspects but mainly about the recoverability process in particular organization.

Anyway, from a technical point of view here are few buts ...

BUT #1
Stretched L2 is not the best approach to two DR sites because you will join two fault domains together. Yes, one L2 is IMHO single fault domain because of spanning-tree, broadcast storm, unknown unicast flooding, hair-pinning (aka tromboning), etc.

BUT #2
I designed and tested mVLT as HA/DR solution two one customer a few years ago and during pre-production tests I have realized that dynamic mVLT configuration works nice but when both core switches where default gateway is configured fails, the routing does not work even you can ping default gateway IPs. I have been told that static mVLT configuration would solve it but never tested.

BUT #3
mVLT is proxy ARP solution so you have 4 IP addresses (one per each switch) which can be used as a default gateway for end-points. Let's say you use IP addresses 192.168.100.1 (SW-A1), 192.168.100.2(SW-A2), 192.168.100.3 (SW-B1), 192.168.100.4 (SW-B2) on your four switches and 192.168.100.1 is configured on end-point devices as DEFAULT GW. When SW-A1 is down, other three switches can do the work on behalf of SW-A1 for running devices but not for newly spin up devices. Why? Because existing devices already know MAC address of default gw (192.168.100.1) so they know to which MAC address send the traffic and other three switches can work on behalf of SW-A1 but newly booted device is trying to resolve MAC address of IP 192.168.100.1 but this is not what mVLT does. It does not reply to ARP requests.

CONCLUSION
You can use it for DR and HA but you must know what scenarios this technology address out-of-the-box and what scenarios has to addressed by some other orchestration tool or other technology. I personaly believe Dell Force10 for L2 + VMware NSX for L3 services is better way to go but even this combination must be tested carefully.

David Pasek said...

NSX would give you L2 over L3 between sites and BGP or OSPF for L3 fail-over.

If you cannot use NSX, I believe another possible solution is to use mVLT for L2 and physically separated upstream routers with VRRP on top of Force10 Switches. Unfortunately, VRRP within Force10 mVLT was not supported back in the days. To be honest, I do not know what is the latest status.

David.


Anonymous said...

Hi, how would you connect 4 or more VLTi domains togheter? Do they all share the same port channel, or is there one port channel for each pair?

David Pasek said...

Hi Anonymous,

I think you mean - "how to connect 4 or more VLT domains". VLTi is just an interconnect between two switch chassis.

Each VLT domain is a single logical switch from the logical topology point of view, therefore there is one port-channel (mVLT in Force10 terminology) between each two VLT domains.

Hope this helps.