Tuesday, April 09, 2019

What NSX-T Manager appliance size is good for your environment?

NSX-T 2.4 has NSX Manager and NSX Controller still logically separated but physically integrated within a single virtual appliance which can be clustered as a 3-node management/controller cluster. So the first typical question during NSX-T design workshop or before NSX-T implementation is what NSX-T Manager appliance size is good for my environment.

In NSX-T 2.4 documentation (NSX Manager VM System Requirements) are documented following NSX Manager Appliance sizes.

Appliance Size
Memory
vCPU
Disk Space
VM Hardware Version
NSX Manager Extra Small
8 GB
2
200 GB
10 or later
NSX Manager Small VM
16 GB
4
200 GB
10 or later
NSX Manager Medium VM
24 GB
6
200 GB
10 or later
NSX Manager Large VM
48 GB
12
200 GB
10 or later
In the above documentation section is written that
  • The NSX Manager Extra Small VM resource requirements apply only to the Cloud Service Manager.
  • The NSX Manager Small VM appliance size is suitable for lab and proof-of-concept deployments.
So for NSX-T on-prem production usage, you can use Medium and Large size. But which one? The NSX-T documentation section (NSX Manager VM System Requirements) has no more info to support your design or implementation decision. However, in another part of the documentation (Overview of NSX-T Data Center) is written that
  • The NSX Manager Medium appliance is targeted for deployments up to 64 hosts
  • The NSX Manager Large appliance for larger-scale environments.

Conclusion

Long story short, only Medium and Large sizes are targeted to On-Prem NSX-T production usage. The Medium size should be used in an environment up to 64 ESXi hosts. For larger environments, the Large size is the way to go.

Hope this helps to your NSX-T Plan, Design, and Implement exercise.

Friday, April 05, 2019

vSAN : Number of required ESXi hosts

As you have found this article, I would assume that you know what vSAN is. For those who are new to vSAN, below is the definition from https://searchvmware.techtarget.com/definition/VMware-VSAN-VMware-Virtual-SAN
VMware vSAN (formerly Virtual SAN) is a hyper-converged, software-defined storage (SDS) product developed by VMware that pools together direct-attached storage devices across a VMware vSphere cluster to create a distributed, shared data store. The user defines the storage requirements, such as performance and availability, for virtual machines (VMs) on a VMware vSAN cluster and vSAN ensures that these policies are administered and maintained.
VMware vSAN aggregates local or direct-attached data storage devices to create a single storage pool shared across all ESXi hosts in the vSAN (aka vSphere) cluster. vSAN eliminates the need for external shared storage and simplifies storage configuration and virtual machine provisioning. Data are protected across ESXi hosts. To be more accurate across failure domains, but let's assume we stick with the vSAN default failure domain, which is ESXi host.

vSAN is policy-based storage and policy dictates how data will be redundant, distributed, reserved, etc. You can treat a policy as a set of requirements you can define and storage system will try to deploy and operate the storage object in compliance with these requirements. If it cannot satisfy requirements defined in a policy, the object cannot be deployed or, if already deployed, it becomes in the non-compliant state, therefore at risk.

vSAN is object storage, therefore each object is composed of multiple components.

Let's start with RAID-1. For RAID-1, components can be replicas or witnesses.
Replicas are components containing the data.
Witnesses are components containing just metadata used to avoid split-brain scenario.

Objects components are depicted on the screenshot below where you can see three objects
  1. VM Home 
  2. VM Swap
  3. VM Disk
where each object has two components (data replicas) and one witness (component containing just metadata). 
vSAN Components

The key concept of data redundancy is FTT.  FTT is the number of failures to tolerate. To tolerate failures, vSAN supports two methods of data distribution across vSAN nodes (actually ESXi hosts). It is often referenced as an FTM (Failure Tolerance Method). FTM can be
  • RAID-1 (aka Mirroring)
  • RAID-5/6 (aka Erasure Coding)
As data are distributed across nodes to achieve redundancy and not disks, I'd rather call it RAIN than RAID. Anyway, vSAN terminology uses RAID, so let stick with RAID.

In the table below, you can see how many hosts you need to achieve particular FTT for FTM RAID-1 (Mirroring):

FTTReplicasWitness componentsMinimum # of hosts
0101
1213
2325
3437

In the table below, you can see how many hosts you need to achieve particular FTT for FTM RAID-5/6 (Erasure Coding):
FTTErasure codingRedundancyMinimum # of hosts
0NoneNo redundancy1
1RAID-53D+1P4
2RAID-64D+2P6
3N/AN/AN/A

Design consideration: 
The above number of ESXi hosts are minimal. What does it mean? In case of longer ESXi host maintenance or long-time server failure, vSAN will not be able to rebuild components from affected ESXi node somewhere else. That's the reason why at least one additional ESXi host is highly recommended. Without one additional ESXi host, there can be situations, your data are not redundant, therefore unprotected. 

I have written this article mainly for myself to use it as a quick reference during conversations with customers. Hope you will find it useful as well.