VCDX #200 Blog of one VMware Infrastructure Designer: July 2014

Tuesday, July 29, 2014

Force10 VLT - Design Verification Test Plan

One my philosophical rule is "Trust, but Verify". Design Verification Test Plan is good approach to be sure how the system you have designed behaves. Typical design verification test plan contains Usability, Performance and Reliability tests.

Force10 VLT domain configuration is actually two node cluster (the system) providing L2/L3 network services. What network services your VLT domain should provides depends on customer requirements. However typical VLT customer requirement is to have high availability and eliminate network down times when some system component fails or is maintained by administrator. Planing and executing reliability tests is good approach to verify that customer's high availability requirements have been achieved.

Bellow are some reliability tests I'm thinking are worth to execute and when my gear will be back in my lab I'll try to find some time and execute tests described below and publish real test results.

If you know about some other tests which make sense to perform, please don't be shy, leave the comment and I'll do it for you.

Test #1
Description:
Simulate VLT Domain secondary node failure impact on Ethernet traffic. How long (in ms) is traffic disrupted?
Tasks:
Use system A and system B both connected via VLT link to VLT Domain
Ping from system A to system B at least 10x per second
Power Off secondary VLT node
Measure network disruption
Expected Results:
It should be sub second failure.

Test Result:
TBD

Test #2
Description:
Simulate VLT Domain primary node failure impact on Ethernet traffic. How long (in ms) is traffic disrupted?
Tasks:
Use system A and system B both connected via VLT link to VLT Domain
Ping from system A to system B at least 10x per second
Power Off primary VLT node
Measure network disruption
Expected Results:
It should be sub second failure.
Test Result:
TBD

Test #3
Description:
Simulate one link from VLTi (ISL) port-channel failure.
Tasks:
Use system A and system B both connected via VLT link to VLT Domain
Ping from system A to system B at least 10x per second
Pull out one cable participating in VLTi static port-channel
Measure network disruption
Expected Results:
VLT Domain should be still working without traffic impact.
Test Result:
TBD
Test #4
Description:
Simulate all links from VLTi (ISL) port-channel failure.
Tasks:
Use system A and system B both connected via VLT link to VLT Domain
Ping from system A to system B at least 10x per second
Pull out all cables participating in VLTi static port-channel
Measure network disruption
Expected Results:
Backup link should act as arbiter. VLT Domain should be still working but in split brain mode and only primary VLT node should handle the traffic.
Test Result:
TBD

Test #5
Description:
Simulate VLT Domain backup link failure. Backup link configured as IP heartbeat over out-of-band management.
Tasks:
Use system A and system B both connected via VLT link to VLT Domain
Ping from system A to system B at least 10x per second
Pull out cable participating in backup link
Measure network disruption
Expected Results:
All traffic should work correctly but VLT should report backup link failure.
Test Result:
TBD
Test #6
Description:
Simulate one link failure on some virtual link trunk (aka VLT or virtual port-channel).
Tasks:
Use system A and system B both connected via VLT link to VLT Domain
Ping from system A to system B at least 10x per second
Pull out cable participating in VLT
Measure network disruption
Expected Results:
Port channel should survive this failure.
Test Result:
TBD

These six tests should verify basic high availability and resiliency of Force10 VLT cluster.

All problems should be notified by SNMP and/or syslog to central monitoring system in case it is configured properly. That can move us to Usability Tests .... but that's another set of tests ...

And please remember that TOO MUCH TESTING WOULD NEVER BE ENOUGH :-)

Wednesday, July 23, 2014

CISCO UDLD alternative on Force10

I've been asked by one DELL System Engineer if we support CISCO's UDLD feature because it was required in some RFI. Well, DELL Force10 Operating System have similar feature solving the same problem and it is called FEFD.

Here is the explanation from FTOS 9.4 Configuration Guide ...

FEFD (Far-end failure detection) is supported on the Force10 S4810 platform. FEFD is a protocol that senses remote data link errors in a network. FEFD responds by sending a unidirectional report that triggers an echoed response after a specified time interval. You can enable FEFD globally or locally on an interface basis. Disabling the global FEFD configuration does not disable the interface configuration.

Figure caption: Configuring Far-End Failure Detection

The report consists of several packets in SNAP format that are sent to the nearest known MAC address. In the event of a far-end failure, the device stops receiving frames and, after the specified time interval, assumes that the far-end is not available. The connecting line protocol is brought down so that upper layer protocols can detect the neighbor unavailability faster.

Update 2015-05-20:
If I understand it correctly CISCO's UDLD main purpose is to detect potential uni-directional links and mitigate the risk of loop in the network because STP cannot help in this scenario. Force10 has another feature to prevent a loop in such situation - STP loop guard.

The STP loop guard feature provides protection against Layer 2 forwarding loops (STP loops) caused by a hardware failure, such as a cable failure or an interface fault. When a cable or interface fails, a participating STP link may become unidirectional (STP requires links to be bidirectional) and an STP port does not receive BPDUs. When an STP blocking port does not receive BPDUs, it transitions to a Forwarding state. This condition can create a loop in the network.

Sunday, July 13, 2014

Heads Up! VMware virtual disk IOPS limit bad behavior in VMware ESX 5.5

I've been informed about strange behavior of VM virtual disk IOPS limits by one my customer for whom I did vSphere design recently. If you don't know how VM vDisk IOPS limits can be useful in some scenarios read my another blog post - "Why use VMware VM virtual disk IOPS limit?". And because I designed this technology for some of my customers they are very impacted by bad vDisk IOPS limit behavior in ESX 5.5

I've tested VM IOPS limits in my lab to see it by myself. Fortunately I have two labs. Older vSphere 5.0 lab with Fibre Channel Compellent storage and newer vSphere 5.5 lab with iSCSI storage EqualLogic. First of all let's look how it works in ESX 5.0. Same behavior is in ESX 5.1 and this behavior make perfect sense.

By default VM vDisks doesn't have limits as seen on next screen shot.

When I run IOmeter with single worker (thread) on unlimited vDisk I can achieve 4,846 IOPS. That's what datastore (physical storage) is able to give to single thread.

When I run IOmeter with two workers (threads) on unlimited vDisk I can achieve 7,107 IOPS. That's ok because all shared storages have implemented algorithms to limit performance for threads. That's actually protection against single thread abuse of all storage performance.

Now let's try to setup SIOC to 200 IOPS limits on both vDisks on VM as depicted on picture below.

Due to settings above IOmeter single worker generated workload is limited to 400 IOPS (2 x 200) per whole VM because all limit values are consolidated per virtual machine per LUN. For more info look at http://kb.vmware.com/kb/1038241. So it behaves as expected because IOmeter IOPS was oscillating between 330 and 400 IOPSes as you can see in picture below.

We can observe similar behavior with two workloads.

So in ESX 5.0 lab everything works as expected. Now let's move to another lab where I have vSphere 5.5. There is iSCSI storage so first of all we will run IOmeter without vDisk IOPS limits to see maximal performance we can get. On picture below we can see that single thread is able to get 1741 IOPSes.

... and two workers can get 3329 IOPSes.

So let's setup vDisk IOPS limits to 200 IOPS limits on both vDisks on VM as in test on ESX 5.0. I have also 2 disks on this VM. Due to these settings IOmeter single worker generated workload should be also limited to 400 IOPS (2 x 200) per whole VM. But unfortunately it is not limited and it can get 2000 IOPSes. It is strange and in my opinion bad behavior.

But even worse behavior can be observe when there are more threads. In examples below you can see two and four workers (threads) behavior. VM is getting really slow performance.

ESX 5.5 VM vDisk behavior is really strange and because all typical OS storage workloads (even OS booting) are multi-threaded than VM vDisk IOPS limits technology is unusable. My customer has opened support request so I believe it is a bug and VMware Support will help to escalate it in to VMware engineering.

UPDATE 2014-07-14 (Workaround):
I've tweet about this issue to @DuncanYB @CormacJHogan @MostafaVMW and Duncan moved me immediately in to the right direction. He reveal me the secret ... ESXi has two disk schedulers old one and new one (aka mClock). ESXi 5.5 uses new one (mClock) by default. If you switch back to the old one, disk scheduler behaves as expected. Below is the setting how to switch to the old one.

Go to ESX Host Advanced Settings and set Disk.SchedulerWithReservation=0

This will switch back to the old disk scheduler.

Kudos to Duncan.

Switching back to old scheduler is good workaround which will probably appear in VMware KB but there is definitely some reason why VMware introduced new disk scheduler in 5.5. I hope we will get more information from VMware engineering so stay tuned for more details ...

UPDATE 2015-12-3:
Here are some references to more information about disk schedulers in ESXi 5.5 and above ...

ESXi 5.5
http://www.yellow-bricks.com/2014/07/14/new-disk-io-scheduler-used-vsphere-5-5/
http://cormachogan.com/2014/09/16/new-mclock-io-scheduler-in-vsphere-5-5-some-details/
http://anthonyspiteri.net/esxi-5-5-iops-limit-mclock-scheduler/

ESXi 6
http://www.cloudfix.nl/2015/02/02/vsphere-6-mclock-scheduler-reservations/

Saturday, July 12, 2014

Why use VMware VM virtual disk IOPS limit?

What is VM IOPS limit? Here is explanation from VMware documentation ....

When you allocate storage I/O resources, you can limit the IOPS that are allowed for a virtual machine. By default, these are unlimited. If a virtual machine has more than one virtual disk, you must set the limit on all of its virtual disks. Otherwise, the limit will not be enforced for the virtual machine. In this case, the limit on the virtual machine is the aggregation of the limits for all virtual disks.

I really like this feature because VM vDisk IOPS limit is excellent mechanism to protect physical storage back-end against overloading by some disk intensive VMs and allows to set up some fair user policy for storage performance. Somebody can argue with usage of VM disk share mechanism. Yes, that's of course possible as well and it can be complementary. However, with shares fair user policy your users will get high performance at the beginning when back-end storage has lot of available performance but their performance will decrease later during time when more VMs will use this particular datastore. It means that performance is not predictive and users can complain.

Let's do simple IOPS limit example. You have datastores provisioned on storage pool with automated storage tiering which can serve up to 25,000 IOPS and you have there 100 virtual disks (vDisks). Setting 250 IOPS limit to each virtual disk ensures that if all VMs will use all their IOPSes back-end datastores will not be overloaded. I agree it is very strict limitation and VMs cannot use more IOPS when performance is available in physical storage. But this is business problem and best vDisk limiting policy depends on your business model and company strategy. Below are listed two business models for virtual disk performance limits I've already used on some my vSphere projects:

Service catalog strategy
Capacity/performance ratio strategy

Service catalog strategy allows customers (internal or external) increase or decrease vDisk IOPSes as needed and of course pay for it appropriately.

Capacity/performance ratio strategy approach is to calculate ratio between physical storage capacity and performance and use same ratio for vDisks. So if you have storage having 50 TB with 25,000 front-end performance you have 1 GB with 0.5 IOPS. You should define and apply some overbooking ratio because you use shared storage. Let's use ratio 3:1 and we will have 150 IOPSes for 100GB vDisk.

To be honest I prefer service catalog strategy as it is what real world need because each workload is different and service catalog gives better way how to define vDisks to match workloads in your particular environment.

Summary
VM vDisk IOPS limit approach is useful in environments where you want to have guaranteed and long term predictable storage performance (response time) for VMs vDisks. Please, be aware that even this approach is not totally fair because IOPS reality is much more complex and total number of IOPSes on back-end storage is not static number as we use in our example. In real physical storage, the number of front-end IOPSes you can get from storage is function of several parameters like IO size, read/write ratio, RAID type, workload type (sequence or random), cache hit, automated storage algorithm, etc ...

I hope VMware VVOLs will move this approach to the next level in future. However vDisk IOPS limit is technology we can use today.

Monday, July 07, 2014

How social media and community sharing help entrprise customers

I'm always happy when someone finds my blog article or shared document useful. Here is one example of recent email communication from one DELL customer who Googled my DELL OME (Open Manage Essentials is basic system management for DELL hardware inforastructure) document explaining network communication flows among DELL Open Manage components.

All personal information are anonymized so customer real name is changed to Mr.Customer.

VCDX Defense Timer

If you are prepering for VCDX and you want to do VCDX mock defense you can use the exact timer which is used during real VCDX defense.

The timer is available online at https://vcdx.vmware.com/vcdx-timer

Good luck with your VCDX journey!!!

Pages