Sunday, September 15, 2013

High latency on vSphere datastore backed by NFS

Last week one of my customers experienced high latency on vSphere datastore backed by NFS mount. Generally, the usual root cause of high latency is because of few disk spindles used for particular datastore but that was not the case here.

NFS datastore for vSphere
Although NFS was always understood as lower storage tier VMware and NFS vendors were working very hardly on NFS improvements in recent years. Another plus for NFS nowadays is that 10Gb ethernet is already commodity which helps NFS significantly because it doesn't support multi-pathing (aka MPIO) as FC or iSCSI does. On the other hand it is obvious that NFS is another abstract storage layer for vSphere and some other details like NFS client implementation, ethernet/IP queue management, QoS, and so on can impact whole solution. Therefore when someone tell me NFS for vSphere I'm always cautious.  Don't get me wrong I really like abstractions, layering, unification and simplification but it must not have any influence on the stability and performance.

I don't want to discuss advantages and disadvantages of particular protocol as it depends on particular environment requirements and what someone wants to achieve. By the way I have recently prepared one particular design decision protocol comparison for another customer here so you can check it out and comment it there.

Here in this case the customer had really good reason to use NFS but the latency issue is potential show stopper.

I have to say that I had also bad NFS experience back in 2010 when I was designing and implementing Vblock0 for one customer. Vblock0 used EMC Celerra therefore NFS or iSCSI were the only options. NFS was better choice because of Celerra iSCSI implementation (that's another topic). We were not able to decrease disk response times bellow 30ms so at the end NFS (EMC Cellera) was used as Tier 3 storage and customer bought another block storage (EMC Clariion) for Tier 1. It is history because I was implementing new vSphere 4.1 and SIOC was just introduced without broad knowledge about SIOC benefits especially for NFS.

Since there lot of things changed with NFS so that's just one history and field experience of one engagement so lets go back to the high latency problem today on NFS and troubleshooting steps what we did with this customer.

TROUBLESHOOTING

Environment overview
Customer has vSphere 5.0 (Enterprise Plus) Update 2 patched to the latest versions (ESXi build 1254542).
NFS storage is NetApp FAS with the latest ONTAP version (NetApp Release 8.2P2 7-Mode).
Compute is based on CISCO UCS and networking on top of UCS is based on Nexus 5500.

Step 1/ Check SIOC or MaxQueueDepth
I told customer about known NFS latency issue documented in KB article 2016122 and broadly discussed on Cormag Hogan blog post here. Based on community and my own experience I have hypothesis that the problem is not related only to NetApp storage but it is most probably ESXi NFS client issue. This is just my opinion without any proof.

Active SIOC or  /NFS/MaxQueueDepth 64 is workaround documented on KB Article mentioned earlier. Therefore I asked them if SIOC is enabled as we discussed during Plan & Design phase. The answer was yes it is.

Hmm. Strange.

Step 2/ NetApp firmware
Yes. This customer has NetApp filer and in kb article is update comment that the latest NetApp firmware solve this issue. Customer has latest 8.2 firmware which should fix the issue. But it evidently doesn't help.

Hmm. Strange.

Step 3/ Open support case with NetApp and VMware
I suggested to open support case and in parallel continue with troubleshooting

I don't why but customers in Czech Republic are shame to use support line. I don't known why when they are paying significant amount of money for it. But it is like it is and even this customer didn't engaged VMware nor NetApp support and continued with troubleshooting. Ok, I understand we can solve everything by our self but why not ask for help? That's more social than technical question and I would like to known if this administrator behavior is global habit or some special habit here in central Europe. Don't be shame and speak out in comments even about this more social subject.

Step 4/ Go deeper in SIOC troubleshooting

Check if storageRM (Storage Resource Management) is running
/etc/init.d/storageRM status
Enable advanced logging in Software Advanced Settings -> Misc -> Misc.SIOControlLogLevel = 7
By default there is 0 and 7 is max value.

Customer found strange log message in "/var/log/storagerm.log"
Open /vmfs/volumes/ /.iorm.sf/slotsfile (0x10000042, 0x0) failed: permission denied 
There is not VMware KB for it but Frank Denneman bloged about it here.

So customer is experiencing the same issue like Frank in his lab.

Solution is to changed *nix file privileges how Frank was instructed by VMware Engineering (that's the beauty when you have direct access to engineering) ...

chmod 755 /vmfs/volumes/DATASTORE/.iorm.sf/slotsfile

Changes take effect immediately and you can check it in "/var/log/storagerm.log"
...
DATASTORE: read 2406 KB in 249 ops, wrote 865 KB in 244 ops avgReadLatency
1.85, avgWriteLatency 1.42, avgLatency 1.64 iops = 116.59, throughput =
773.65 KBps
...
Advanced logging can be disable in Software Advanced Settings -> Misc -> Misc.SIOControlLogLevel = 0

After this normalized latency is between 5-7 ms which is quite normal.

Incident solved ... waiting for other incidents :-)

Problem management continues ...

Lessons learned from this case
SIOC is excellent VMware technology helping with datastore wide performance fairness. In this example it help us significantly with dynamic queue management helping with NFS response times.

However even in any excellent technology can be bugs ...

SIOC can be leveraged just by customers having Enterprise Plus licenses.

Customers having lower licenses has to use static Queue value (/NFS/MaxQueueDepth) 64 or even less based on response times. BTW default Max NFS queue depth value is 4294967295.  I understand NFS.MaxQueueDepth as a Disk.SchedNumReqOutstanding for block devices. Default value of parameter Disk.SchedNumReqOutstanding is 32 helping with sharing LUN queues which usually have queue depth 256.  It is ok for usual situations but if you have more disk intensive VMs per LUN than this parameter can be tuned. This is where SIOC help us with dynamic queue management even across ESX hosts sharing same device (LUN, datastore).

For deep dive Disk.SchedNumReqOutstanding explanation i suggest to read Jason Boche blog post here.

Static queue management brings significant operational overhead and maybe other issues we don't know about right now. So go with SIOC if you can, if you have enterprise environment consider upgrade to Enterprise Plus. If you still have response times issue troubleshoot SIOC if it does what he has to do.

Anyway, it would be nice if VMware can improve NFS behavior. SIOC is just one of two workarounds we can use to mitigate risk of high latency NFS datastores.

Customer unfortunately didn't engaged VMware Global Support Organization therefore nobody in VMware knows about this issue and cannot write new or update existing KB article. I'll try to do some social network noise to help highlight the problem.

5 comments:

Bojan Jovanovic said...

Hi David & Happy New Year!

Awesome article. Thanks a lot for sharing your experience. I've one question for you, related to SIOC and NFS:

When enabling SIOC on a NFS Datastore, it automatically takes care of the variable NFS.MaxQueueDepth. AFAIK it sets the value "256" and as soon the latency threshold is reached it automatically drops to a lover value (e.g. 4).

Do you know, if there is a possibility to modify the "default" SIOC - NFS.MaxQueueDepth value (256) to 64?

I would like to reduce my NFS disconnects with the NFS.MaxQueueDepth value 64, while still having the SIOC in place, taking care of the latency.

Best regards,
Bojan

David Pasek said...

Hi Bojan,
thanks for positive comment.

Here is the answer to your question.

Default value for NFS.MaxQueueDepth is 4294967295.

To check the value you can look at ESX host advanced settings by vSphere Client (C# or Web).

Or if you like CLI you can use command
esxcfg-advcfg -g /NFS/MaxQueueDepth
or my preferred esxcli variant

~ # esxcli system settings advanced list -o /NFS/MaxQueueDepth
Path: /NFS/MaxQueueDepth
Type: integer
Int Value: 4294967295
Default Int Value: 4294967295
Min Value: 1
Max Value: 4294967295
String Value:
Default String Value:
Valid Characters:
Description: Maximum per-Volume queue depth

To set NFS.MaxQueueDepth to 64 you can leverage command
esxcfg-advcfg -s 64 /NFS/MaxQueueDepth
or
esxcli system settings advanced set -i 64 -o /NFS/MaxQueueDepth

To verify you can show current advanced setting

~ # esxcli system settings advanced list -o /NFS/MaxQueueDepth
Path: /NFS/MaxQueueDepth
Type: integer
Int Value: 64
Default Int Value: 4294967295
Min Value: 1
Max Value: 4294967295
String Value:
Default String Value:
Valid Characters:
Description: Maximum per-Volume queue depth

Hope this helps,
David.

David Pasek said...

... I guess you can adjust /NFS/MaxQueueDepth together with SIOC enabled but I didn't test it so I'm not 100% sure.

On the other hand I believe you don't need adjust /NFS/MaxQueueDepth if SIOC is enabled because it will do the job for you even from default value 4294967295.

/NFS/MaxQueueDepth adjustment is more important for VMware users without Enterprise Plus license beacuse they cannot use SIOC.

Bojan Jovanovic said...

Thanks a lot for the fast response.

I'm actually aware of the default value of 4294967295, but as soon I enable the SIOC it automatically sets the NFS.MaxQueueDepth value to 256 and coordinates the lower values if needed.

My goal is to have the best of both workarounds:
1. to have NFS.MaxQueueDepth on max. 64 with SIOC
2. to have SIOC in place to take care of the latency

The question is, is there a way to modify the "default" SIOC - NFS.MaxQueueDepth value 256?

I know, it's definitely a deep-dive question :)

Best,
Bojan

David Pasek said...

Now I see what you wanna achieve. To be honest I don't have time for labor in my lab with such config. Moreover, I don't see added value on both settings because VMware developers should know why they use NFS.MaxQueueDepth to 256 for SIOC mode and going down automatically when response time increase. And don't forget you can configure your own SIOC latency threshold for particular datastore.
If you really want go deeper and have real need for that try ask @CormacJHogan on twitter. Cormac moved from VMware Technical Marketing to VMware Engineering Integration Team or so. Maybe it is interesting topic for him. Thanks for be my reader and let me know if you find something interesting. David.