Friday, September 11, 2020

Datacenter Network Topology - Dell OS10 MultiDomain VLT

Yesterday, I have got the following e-mail from one of my blog readers ...

Hello David,

Let me introduce myself, I work in medium size company and we began to sell Dell Networking stuff to go along with VxRail. We do small deployments, not the big stuff with spine/leaf L3 BGP, you name it. For a Customer, I had to implement this solution. Sadly, we are having a bad time with STP as you can see on the design.

 

Customer design with STP challenge

Is there a way to be loop-free ? I thought about Multi Domain VLT LAG but it looks like it is not supported in OS10. 

I wonder how you would do this. Is SmartFabric the answer ?
Thank you

Well, first of all, thanks for the question. If you ask me, it all boils down to specific design factors - use cases, requirements, constraints, assumptions.

So let's write down design factors

Requirements:

  • Multi-site deployment
  • A small deployment with a single VLT domain per site.
  • Robust L2 networking for VxRail clusters

Constraints:

  • Dell Networking hardware with OS10
  • Networking for VMware vSphere/vSAN (VxRail)

Assumptions:
  • No more than a single VLT domain per site is required
  • No vSphere/vSAN (VxRail) Clusters are Stretched across sites
Any unfulfilled assumption is a potential risk. In the case of unfulfilled assumption, the design should be reviewed and potentially redesigned to fulfill the design factors. 

Now, let's think about network topology options we have. 

The reader has asked if DellEMC SmartFabric can help him. Well, SmartFabric can be the option as it is Leaf-Spine Fabric fully managed by External SmartFabric Orchestrator. Something like Cisco ACI / APIC. SmartFabric uses EVPN, BGP, VXLAN, etc. for multi-rack deployment. I do not know the latest details, but AFAIK, it was not multi-site ready a few months ago. The latest SmartFabric features should be validated with DellEMC. Anyway, SmartFabric can do L2 over L3 if you need stretching L2 over L3 across racks. Eventually, it should be possible to stretch L2 even across sites.

However, because our design is targeted to a small deployment, I think the Leaf-Spine is the overkill for small deployment and I always prefer the KISS (Keep It Simple, Stupid) approach. 

So, here are two final options of network topology I would consider and compare.

OPTION 1: Stretched L2 Loop-Free across sites 
OPTION 2: L3 across sites with L2/L3 boundary in TOR access switches 


 Option 1 Stretched L2 Loop-Free across sites


Option 2 - L3 across sites with L2/L3 boundary in TOR access switches 

So let's compare these two options. 

Option 1 - Stretched L2 Loop-Free across sites 

Benefits

  • Simplicity
  • Stretched L2 across sites allows workload (device, VM, container, etc.) migrations across sites without L2 over L3 network overlay (NSX, SmartFabric, etc.) and re-IP.

Drawbacks

  • Topology is not scalable for more TOR access switches (VLT domains), but this ok with the design factors
  • Topology optimally requires 8 links across sites. Optionally, can be reduced to 4 links.
  • Only two routers. One per site.
  • Stretched L2 topology across sites also extends L2 network fault-domain across sites, therefore broadcast storms, unknown unicast flooding, and potential STP challenges are the potential risks.
  • This topology has L3 trombone by design - https://blog.ipspace.net/2011/02/traffic-trombone-what-it-is-and-how-you.html. This drawback can be accepted or mitigated by NSX distributed routing.

OPTION 2 - L3 across sites with L2/L3 boundary in TOR access switches 

Benefits

  • Better scalability, because other VLT domains (TOR access switches) can be connected to core routers. However, this benefit is not required by the design factors above. 
  • Topology optimally requires 4 links across sites. Optionally, can be reduced to 2 links. This is less than Option 1 requires.
  • Each site is local fault-domain from L2 networking point of view, as L2 fault-domain is not stretched across sites. L2 faults (STP, broadcast storms, unknown unicast flooding, etc.) are isolated within the site. 

Drawbacks

  • More complex routing configuration with ECMP and dynamic routing protocol like iBGP or OSPF
  • Four routers. Two per site.
  • L3 topology across sites restricts workload (device, VM, container, etc.) migrations across sites without L2 over L3 network overlay (NSX, SmartFabric, etc.) or changing the IP address of migrated workload.

Conclusion and Design Decision

Both considered design options are L2 loop-free topologies and I hope it should fit all design factors defined above. If you do not agree, please write a comment because anybody can make an error in any design or not foresee all situations, until the architecture design is implemented and validated. 

If I should make a final design decision, it would depend on two other factors
  • Do I have VMware NSX in my toolbox or not?
  • What is the skillset level of network operators (Dynamic Routing, ECMP, VRRP) responsible for the operation?
If I would not have NSX and network operators would prefer Routing High Availability (VRRP) over Dynamic Routing with ECMP (high availability + scalability + performance), I would decide to implement Option 1.

In the case of NSX and willingness to use dynamic routing with ECMP, I would decide to implement Option 2.

The reader in his question mentioned, that his company do not use spine/leaf L3 BGP, therefore Option 1 is probably a better fit for him. 

Disclaimer: I had no chance to test and validate any of the design option considered above, therefore, if you have any real experience, please speak out loudly in the comments.

No comments: