Thursday, December 21, 2017

SDRS Initial Placement - interim storage lease between recommendation and provisioning

Every day we learn something new. In the past, I blogged about SDRS behavior on these blog posts


Recently (a few months ago),  I have been informed about interesting SDRS behavior which is not exposed through standard GUI nor advanced settings but available through API. Such functionality was not very well known even within VMware so I have decided to blog about it.

Long story short ...

vSphere API Call for SDRS Initial Placement can lease recommended storage resource for some time. 

What does it mean? Just after recommendations, SDRS can lease the storage space on recommended datastores to have an interim reservation for somebody who is, most probably, going to do provisioning. By default, SDRS do not lease storage space on recommended datastores, therefore, you can observe provisioning failures in some situations. I have simulated such situation in Test #3 of test plan available here. Such situations are not very common when you do manual provisioning but there is higher probability when automated provisioning is in use so you can experience such issues on environments with VMware vRealize Automation (vRA) or vCloud Director (vCD).

And now the secret I did not know ... SDRS has the solution for such issues since vSphere 5.1. When somebody (vRA, vCD, anybody else who wants to deploy VM) asked for SDRS recommendation via API call, that API call can include a specific parameter (resourceLeaseDurationSec) which will instruct vSphere to block the recommended storage space on datastores only for provisioning of that specific SDRS recommendation. It's worth to mention, the resource leasing is released immediately after provisioning, therefore the time defined in resourceLeaseDurationSec is actually the maximum reservation time of the resource just in case somebody who wanted to do provisioning change his mind and decided to not deploy VM. This is to avoid unnecessary storage space reservations.
 
If you want to know details check API documentation. Here is what is written in vSphere API documentation about placeSpec.resourceLeaseDurationSecResource lease duration in seconds. If the duration is within bounds, Storage DRS will hold onto resources needed for applying recommendations generated as part of that call. Only initial placement recommendations generated by storage DRS can reserve resources this way.

Parameter resourceLeaseDurationSec is used in StoragePlacementSpec which encapsulates all of the inputs passed to the VcStorageResourceManager method recommendDatastoresVcStoragePlacementSpec is documented here.

So that sounds good, right? Well, there is one issue with this approach. SDRS can give provisioning application more recommendations (multiple datastores) which would lead to blocking more storage space than is really needed. VMware engineering is aware of this issue and at the moment works at least with vRA BU to solve it. As far as I know, the final solution will be a special SDRS setting to return a single recommendation. However, this is planned as a specific integration optimization between SDRS and vRA provisioning. 

UPDATE: details about special SDRS setting is described at https://www.vcdx200.com/2018/06/undocumented-sdrs-advanced-options.html

The challenge with VRA storage (vSphere vDisk) provisioning and SDRS is depicted in the figure bellow


To be honest, there is another design consideration and potential risk associated with this solution. If resourceLeaseDurationSec is used and an external application (VRA, vCloud Director, or other) is using it incorrectly, it can eventually block a storage space in the Datastore cluster and cause Deny of Service (DoS). Incorrect usage would be to leverage SDRS for recommendations that would block the recommended storage space but do not provision anything, therefore storage would be in block state for some defined time and not available for other provisionings until the lease expires.

Tuesday, December 19, 2017

What ESXi command will create kernel panic and result in a PSOD?

This is a very short post but I want to publish it at least for myself to find this trick much quickly next time.

Sometimes, especially during testing of vSphere HA, it can be useful to simulate PSOD (Purple Screen of Death). I did some googling and found the article "What ESXi command will create kernel panic and result in a PSOD?". Long story short, PSOD can be accomplished by following ESXi console command:
vsish -e set /reliability/crashMe/Panic 1
Of course, you have to SSH to particular ESXi host before you can run the command above.

All credits go to IT Pro Today article available here http://www.itprotoday.com/virtualization/q-what-esxi-command-will-create-kernel-panic-and-result-psod

Sunday, December 17, 2017

No Storage, No vSphere, No Datacenter

In the past, I have had a lot of discussions with different customers and partners about various storage issues with VMware vSphere. It was always identified as a physical storage or SAN issue and VMware support recommendation was to contact the particular storage vendor. It was always true and correct recommendation, however such storage issues always have the catastrophic or at least huge impact not only on virtualized workloads running on the impacted datastore but also on manageability of VMware vSphere because of intensive ESXi logging which affects hostd and vpxa services and it ends up with ESXi host disconnection from vCenter and very slow direct manageability of ESXi host. Such issues should be resolved by fixing the storage issue but in the meantime, vSphere admins do not have visibility into the part or even the whole vSphere environment, therefore, they usually restart impacted ESXi hosts which have a negative impact on the availability of VMs running even on not impacted datastores. Such situations are usually classified by users as the whole datacenter outage. You can imagine how hot are discussions between VMware and Storage teams in such situations and I often say the generic expression ...
"NO STORAGE,  NO DATACENTER"
Well, there is no doubt, the storage is the most important piece of the datacenter. VMware ESXi hypervisor is usually just an I/O storage passthrough component with some additional intelligence like
  • native storage multipathing (NMP), 
  • fair storage I/O scheduling (SIOC), 
  • I/O filtering (VAIO),
  • etc. 

And probably due to such additional intelligence, VMware customers usually expect that VMware vSphere will do some magic to mitigate physical storage or SAN related issues. First of all, it is logical and obvious that VMware vSphere cannot solve the issue of physical infrastructure. However, there can be some specific scenarios when the storage device is not available through one path but available via another path.

In this blog post, I would like to share my recent findings of storage issues and VMware native multipathing. Let's start with the visualization of storage multipathing over Fibre Channel SAN. Usually, there are two independent SANs (A and B). Each ESXi HBA is connected to different SAN. From the storage point of view, each storage controller (two storage controllers depicted in the figure below) is connected to different SAN through different storage front-end ports. HBA port is storage initiator and storage front-end ports are usually storage targets.

The I/O sent from ESXi hosts to their assigned logical unit numbers (LUNs) travels through a specific route that starts with an HBA and ends at a LUN. This route is referred to as a path. Each host, in a properly designed infrastructure, should have more than one path to each LUN. VMware generally recommends four storage paths but the optimal number of paths depending on particular storage architecture. In the figure above we have following four paths to LUN 1
  • vmhba0:C0:T0:L1
  • vmhba0:C0:T2:L1
  • vmhba1:C0:T1:L1
  • vmhba1:C0:T3:L1
Note: The storage system usually exports multiple LUNs with additional paths but we use single LUN (LUN 1) here for simplicity. 

ESXi host sees LUN 1 as four independent devices (volumes) but because ESXi has native multipathing driver these four LUNs are identified as the same LUN (LUN1 in our case) therefore ESXi automatically collapse these four devices into a single device having four independent paths. Storage I/Os to such device are distributed across multiple paths based on multipathing policy. ESXi has three native multipathing policies
  • fixed (FIXED), 
  • most recently used (aka MRU) and 
  • round robin (RR). 
Multipathing policy type dictates how multiple I/Os are distributed across available paths but if the one I/O is sent through particular path it will stick on it until the path is claimed as dead. Single I/O flow is depicted below.

Most commonly used SCSI commands are
  • Inquiry (Requests general information of the target device)
  • Test/Unit/Ready aka TUR (Checks whether the target device is ready for the transfer operation)
  • Read (Transfers data from the SCSI target device)
  • Write (Transfers data to the SCSI target device)
  • Request Sense (Requests the sense data of the last command)
  • Read Capacity (Requests the storage capacity information)
If the LUN accepts the SCSI command everything is great and shiny, however when a LUN at the end of storage path experiences some problems, then an ESXi host sends the Test Unit Ready (TUR) command to the storage target (particular storage front-end port) to confirm that the path to the LUN is down before initiating a path failover. However, when the ESXi receive some TUR response from the storage system the path is for ESXi host up a running and repeatedly returns a retry operation request without triggering the failover even the TUR returns error responses and effectively the LUN is not ready. Typical TUR SCSI command response should be "TEST_UNIT_READY" but in case of any problem it returns from the storage systems following responses:
  • SCSI_HOST_BUS_BUSY 0x02
  • SCSI_HOST_SOFT_ERROR 0x0b
  • SCSI_HOST_RETRY 0x0c

The particular I/O flow is happening over a single selected path and VMware native multipathing will not try another path even there is some probability that LUN could be ready via another path. Let me say it again. The default behavior is that ...
... storage path does not fail over when the path to the target is up and sending reponse back into the initiator even the LUN is not available for whatever reasons.
The reason for such conservative vSphere behavior is that Enterprise Storage System and SAN should work and storage vendors claiming storage availability higher than 99.999%. Multipathing is usually solving the issue with the path to the storage system (to storage target ports) but not the problem on the storage system itself (LUN unknown unavailability). I personally believe, the physical storage system has another possibilities how to respond to ESXi host that particular path is not available at the moment and instruct ESXi multipathing driver to not issue I/Os via particular path if it is necessary and the storage system does not have other possibilities for transfer I/O to the LUN in the storage. However, the reality is that some storage systems do not have LUN available (TUR return errors) through one path but it works via another path. This is a typical interoperability issue. However, I have just been informed that there is a solution how to resolve this interoperability issue. You can use the enable_action_OnRetryErrors option
What is the advanced option enable_action_OnRetryErrors?
This option allows the ESXi host to mark a problematic path as dead. After marking the path as dead, the host can trigger a failover and use an alternative working path. I assume that in case the LUN is not available via any path, all paths will be claimed as dead until LUN works again. See VMware KB 2106770 (Storage path does not fail over when TUR command repeatedly returns retry requests) for instructions how to enable/disable the option.

Now you can ask when the storage path claimed as dead will become active again in case the LUN is back and available. All paths claimed as dead are periodically evaluated. The Fibre Channel path state is evaluated at a fixed interval or when there is an I/O error and TUR is returning nothing, which is not our case here. The path evaluation interval is defined via the advanced configuration option Disk.PathEvalTime in seconds. The default value is 300 seconds. This means that the path state is evaluated every 5 minutes unless an error is reported sooner on that path, in which case the path state might change depending on the interpretation of the reported error. However, I have been told that this standard Disk Path Evaluation DOES NOT return path to an active state when it was claimed as down by OnRetryErrors action. My understanding of the reason for such behavior is that the storage path had some errors, therefore, it is not good to put the path back into production to avoid flip-flop situation.

Let me stress again, such intelligent and proactive failover behavior based on TUR responses is not the default one. At least not in vSphere 6.5 and below. There are some rumors that it can change in the next vSphere release but there is not any official messaging so far. I personally think that more intelligent behavior is better for VMware customers which are usually expecting such cleverness from the vSphere and they are negatively surprised how vSphere behaves in case of storage issues over some paths. Som the intelligent and proactive failover behavior based on TUR responses can be additional cleverness of VMware vSphere native multipathing, however, it is important to say that it would help with few specific behaviors/misbehaviors of some storage systems but the basic rule is still valid ... "NO STORAGE,  NO DATACENTER".

Disclaimer: This is my current understanding how vSphere ESXi handles storage I/O based on my long experience in the field, tests in the lab, design and implementation projects and knowledge I have read from the documentation, VMware KB's and books.  If you want to know more, please, check some relevant references below and do your own research. I do not if my understanding of this topic is complete and if I do not understand something wrong. Therefore, express any feedback in the comments and we can discuss it further because only deep constructive discussions lead to further knowledge. 

References:

Tuesday, December 12, 2017

Start order of software services in VMware vCenter Server Appliance 6.5 U1

In the past, I have documented start order of services in VMware vCenter Server Appliance 6.0 U2.

In the past, I simply stopped all services in VCSA, started them again and document the order.

Commands to do that are
service-control --stop --all
service-control --start --all

I did the same in vCenter Server Appliance 6.5 U1, and below are documented services started in the following order ...
  1. lwsmd (Likewise Service Manager)
  2. vmafdd (VMware Authentication Framework)
  3. vmdird (VMware Directory Service)
  4. vmcad (VMware Certificate Service)
  5. vmware-sts-idmd (VMware Identity Management Service)
  6. vmware-stsd (VMware Security Token Service)
  7. vmdnsd (VMware Domain Name Service)
  8. vmware-psc-client (VMware Platform Services Controller Client)
  9. vmon (VMware Service Lifecycle Manager)
I was very surprised that there are no other services like vmware-vpostgres, vpxd, etc. I have found out that the rest of VCSA services are started by vmon service. To understand the start order we have to stop these servcies and start it again 
/usr/lib/vmware-vmon/vmon-cli --batchstop ALL 
/usr/lib/vmware-vmon/vmon-cli --batchstart ALL
vmon-cli do not report anything to standard output but it is very verbose to log file located at /var/log/vmware/vmon/vmon-syslog.log so grep of the log can help to understand the start order of vmon controlled services.

 root@vc01 [ /var/log/vmware/vmon ]# grep "Executing op START on service" vmon-sys  
 17-12-12T09:44:23.639142+00:00 notice vmon Executing op START on service eam...  
 17-12-12T09:44:23.643113+00:00 notice vmon Executing op START on service cis-license...  
 17-12-12T09:44:23.643619+00:00 notice vmon Executing op START on service rhttpproxy...  
 17-12-12T09:44:23.644161+00:00 notice vmon Executing op START on service vmonapi...  
 17-12-12T09:44:23.644704+00:00 notice vmon Executing op START on service statsmonitor...  
 17-12-12T09:44:23.645413+00:00 notice vmon Executing op START on service applmgmt...  
 17-12-12T09:44:26.076456+00:00 notice vmon Executing op START on service sca...  
 17-12-12T09:44:26.139508+00:00 notice vmon Executing op START on service vsphere-client...  
 17-12-12T09:44:26.199049+00:00 notice vmon Executing op START on service cm...  
 17-12-12T09:44:26.199579+00:00 notice vmon Executing op START on service vsphere-ui...  
 17-12-12T09:44:26.200095+00:00 notice vmon Executing op START on service vmware-vpostgres...  
 17-12-12T09:45:33.427357+00:00 notice vmon Executing op START on service vpxd-svcs...  
 17-12-12T09:45:33.431203+00:00 notice vmon Executing op START on service vapi-endpoint...  
 17-12-12T09:46:54.874107+00:00 notice vmon Executing op START on service vpxd...  
 17-12-12T09:47:28.148275+00:00 notice vmon Executing op START on service sps...  
 17-12-12T09:47:28.169502+00:00 notice vmon Executing op START on service content-library...  
 17-12-12T09:47:28.176130+00:00 notice vmon Executing op START on service vsm...  
 17-12-12T09:47:28.195833+00:00 notice vmon Executing op START on service updatemgr...  
 17-12-12T09:47:28.206981+00:00 notice vmon Executing op START on service pschealth...  
 17-12-12T09:47:28.220975+00:00 notice vmon Executing op START on service vsan-health...  


  1. eam (VMware ESX Agent Manager)
  2. cis-license (VMware License Service)
  3. rhttpproxy (VMware HTTP Reverse Proxy)
  4. vmonapi
  5. statsmonitor
  6. applmgmt
  7. sca (VMware Service Control Agent)
  8. vsphere-client
  9. cm (Component Manager / Content Library Service)
  10. vsphere-ui
  11. vmware-vpostgres (VMware Postgres)
  12. vpxd-svcs
  13. vapi-endpoint
  14. vpxd (VMware vCenter Server)
  15. sps (VMware vSphere Profile-Driven Storage Service)
  16. content-library
  17. vsm
  18. updatemgr
  19. pschealth
  20. vsan-health
Hope it helps to other folks in VMware community.

References

Friday, December 01, 2017

vSphere Switch Independent Teaming or LACP?

I have answered this question lot of times during the last couple of years, thus I have finally decided to write a blog post on this topic. Unfortunately, the answer always depends on specific factors (requirements and constraints) for the particular environment so do not expect the short answer. Instead of the simple answer, I will do the comparison of LBT and LACP.

I assume you (my reader) is familiar with LACP but do you know what LBT is? If not, here is the short explanation.
VMware LBT (load based teaming) is advanced switch independent teaming available on VMware DVS which pin each VM vNIC to particular physical uplink in roud robin fasion but if the network traffic of particular physical NIC is higher then 75% of total bandwidth over 30 seconds it will initiate rebalancing across available physical uplinks (physical NICs of ESXi host) to avoid network congestion on particular uplink. 
If you are not familiar with basic VMware vSphere networking read my previous blog post "Back to the basics - VMware vSphere networking" before continuing.

What we are doing is the comparison of switch independent teaming and LACP. LACP is the capability of VMware Distributed Virtual Switch (VDS), therefore, I would assume you are on vSphere Enterprise Plus license and having VDS. When you have VDS then I would have another assumption, that you are already considering LBT as it is the best choice for switch independent teaming algorithms available on VDS.

LBT versus LACP comparison

Option 1: Switch Independent Teaming (LBT - Load Based Teaming)
Option 2: LACP

LBT advantages
  • Fully independent on upstream physical switches
  • Simple configuration
  • Beacon probing can be used.  Note: Beacon probing requires at least 3 physical NICs.
LBT disadvantages
  • Single VM cannot handle traffic higher than the bandwidth of single physical NIC.
  • Traffic is load-balanced across links in the channel from ESXi perspective (egress traffic) but only at VM NIC granularity and returning traffic (ingress traffic) is forwarded by the same link as egress traffic.
LACP advantages
  • One of the main LACP advantages is continuous heartbeat between two sides of the link (ESXi physical NIC port and switch port). VMware's LACP is sending LACPDUs every 30 seconds but it can be reconfigured to fast mode when LACPDUs are exchanged every 1 second. This improves failover in case of link failure and also helps when link status (up/down) do not work well.
  • Single VM can, in theory, handle higher traffic then single physical NIC because of load-balancing algorithm. 
  • Trafic can be load-balanced from both sides of the link (virtual link channel, port-channel, etc.). From ESXi perspective by ESXi and from the switch perspective by load-balancing set on the switch side. The proper configuration on both sides is required.
LACP disadvantages
  • ESXi Network Dump Collector does not work if the Management vmkernel port has been configured to use EtherChannel/LACP
  • VMware vSphere beacon probing cannot be used
  • The LACP is not supported with software iSCSI port binding.
  • The LACP support settings are not available in host profiles.

CONCLUSION AND ANSWER

So which option is better? Well, it depends.

When you do not have direct or indirect control of physical network infrastructure then switch independent teaming is generally much simpler and safer solution, therefore LBT is a better choice. 

In case, you trust your network vendor LACP implementation and you have some control or trust your physical switch configuration LACP is the better choice because of LACPDU heart beating and multiple load-balancing hash algorithms which can, in theory, improve network bandwidth for single VM network traffic and can be configured on both sides of the link channel. Another advantage is that LACP works better with multi-chassis LAG (MLAG) technologies like Cisco vPC, Dell Force10 VLT, Arista MLAG, etc. Generally, Multi-Chassis LAG "orphan ports" (ports without LACP) are not recommended by MLAG switch vendors because they do not have the control of the end-point.

So the final decision is, as always, up to you but this blog post should help you with the right decision on your specific environment.

Any other opinions, advantages, disadvantages, and ideas are welcome, so do not hesitate to write a comment.

****************************************************************

References to other resources:

[1] Check "Limitations of the LACP Support on a vSphere Distributed Switch" in the documentation here.


FAQ related to LBT and LACP comparison

Q: VMware's LACP is sending LACPDUs every 30 seconds. Is there any way how to configure LACPDU frequency to 1 second?

A: Yes.

You can use command "esxcli network vswitch dvs vmware lacp timeout set". It allows set advanced timeout settings for LACP

Description:
set ... Set long/short timeout for vmnics in one LACP LAG
Cmd options:
-l|--lag-id= The ID of LAG to be configured. (required)
-n|--nic-name= The nic name. If it is set, then only this vmnic in the lag will be configured.
-t|--timeout Set long or short timeout: 1 for short timeout and 0 for long timeout. (required)
-s|--vds= The name of VDS. (required)

Relevant blog post on this topic "VMware vSphere DVS LACP timers".

Q: Does ESXi has a possibility to display LACP settings of established LACP session in particular ESXi host? Something like "show lacp" on Cisco switch?

A: Yes. You can use command "esxcli network vswitch dvs vmware lacp status get". It should be equivalent to "show lacp" on Cisco physical switch

Q: How VMware vSwitch Beacon Probing works?
A: Read following blog posts
Q: What is Beacon Probing interval?
A: 1 second

Q: Is ESXi beacon probing send beacons to every VLAN?
A: Yes, but only to VLANs (portgroups) where at least one VM is connected. It does not make sense to test failure on VLANs where nothing is connected.


Friday, November 03, 2017

VMware vSphere DVS LACP timers

I have a customer who was planning a migration from Nexus 1000V (N1K) to VMware Distributed Virtual Switch (aka DVS). I assist their network team in testing DVS functionality and all was nice and shiny. However, they had few detailed LACP related questions because they would like to use LACP against Cisco vPC. I would like to highlight two questions for which I did not find any info in official documentation.

Q1: VMware's LACP is sending LACPDUs every 30 seconds. Is there any way how to configure LACPDU frequency to 1 second?

A1: Short answer is yes. It is possible to reconfigure LACPDU from 30 seconds (normal) to 1 second (fast).

Long answer ... Link Aggregation Control Protocol (LACP) allows the exchange of information with regard to the link aggregation between the two members of said aggregation. This information will be packetized in Link Aggregation Control Protocol Data Units (LACDUs). For further detail about LACP timers see blog post "LACP timer and what it means". In short, LACP timers can be set to "rate fast" - 1 second, or "rate normal" - 30 seconds.

The default value for VMware DVS LACP rate is 30 seconds (normal).  There is esxcli command to configure shorter LACP timer (1 second). See the command help bellow.

esxcli network vswitch dvs vmware lacp timeout set
It allows set advanced timeout settings for LACP
Description:
set Set long/short timeout for vmnics in one LACP LAG
Cmd options:
-l|--lag-id= The ID of LAG to be configured. (required)
-n|--nic-name= The nic name. If it is set, then only this vmnic in the lag will be configured.
-t|--timeout Set long or short timeout: 1 for short timeout and 0 for a long timeout. (required)
-s|--vds= The name of VDS. (required)

Q2: Do we have a possibility to display LACP settings of established LACP session in particular ESXi host? Something like "show lacp" on Cisco switch?

A2: Yes. There is esxcli command ... esxcli network vswitch dvs vmware lacp status get ... which is equivalent to "show lacp" on Cisco physical switch.


UPDATE 2019-03-26: I have been informed that LACP timers setting do not persist ESXi reboot. I'm trying to get more information internally within VMware about such unexpected behavior.  

Sunday, September 24, 2017

How to downsize vCenter Server Appliance 6.5 storage?

Last week I have been asked by one partner how to downsize vCenter Server Appliance (VCSA) 6.5 storage.

Well, let's start with upsizing. To add CPU and RAM resources is very easy. VCSA 6.5 supports CPU Hot Add and Memory Hot Plug, therefore you do not need to even shut down VCSA to increase CPU and RAM resources.

CPU Hot Add and RAM Hot Plug
Storage expansion though is a little bit more difficult. You still do not have to shut down VCSA because virtual disk can hot-extended, however after a disk is extended you have to grow disk partitions within the operating system, Photon OS in this particular case. William Lam wrote the blog post here about it. Generally you have to run script  /usr/lib/applmgmt/support/scripts/autogrow.sh within VCSA shell so it is not a rocket science you just need to know what script to execute.

So upsize is easily doable. But what about downsize? VCSA 6.5 supports CPU Hot Remove but RAM cannot be downsized, therefore for RAM downsizing you have to shut down VCSA decrease memory resources and power on VM. Not a big deal, just small downtime so it can be done during a maintenance window. But the storage downsize is not possible. Well, in theory, it is possible but it is definitely not supported to decrease the size of disk partitions and the virtual disk itself because it is hard and also very risky as you do not know where data are located within the disk.

Warning: I have been told by someone that downsizing method described below does not work and restore options will allow you to choose just bigger form factor than originally backed up  VCSA. Unfortunately, I have the smallest form factor in my home lab so I cannot verify it but I believe him. Sorry for the misleading idea but there might be some unsupported method how to tweak backup files to have the impression it is a backup from smaller VCSA form factor.  

We have another downsizing option for VCSA 6.5. You are most probably aware that VCSA 6.5 has introduced application-based backup where vCenter inventory, identity, and even the database is backed up to a remote location via following protocols HTTP, HTTPS, SCP, FTP, FTPS. The restore of VCSA 6.5 backup is done as a new VCSA deployment executed in a restore mode where a previously created backup is used as a restore point. The nice thing is, that during VCSA restore you can choose different VCSA form factor with different storage footprint. So this is a potential way how to downsize vCenter Server Appliance 6.5 storage in a supported way. The only assumption is that your backup data will fit into new storage size.

If you will plan such downsizing exercise, please do not forget to keep your original vCenter Server Appliance somewhere to have a simple way how to roll back.

Hope this is helpful and informative.

Tuesday, September 19, 2017

What is the difference between VMware vRealize Suite and vCloud Suite

Several times I have been asked by my customers what is the difference between VMware vRealize Suite and vCloud Suite. Both are actually licensing packaging suits. VMware vCloud Suite suite is the superset of VMware vRealize Suite. In other words, vCloud Suite includes everything as vRealize Suite plus vSphere Infrastructure (ESXi Enterprise Plus licenses).

VMware vRealize Suite is a purpose-built management solution for the heterogeneous data center and the hybrid cloud. It is designed to deliver and manage infrastructure and applications to increase business agility while maintaining IT control. It provides the most comprehensive management stack for private and public clouds, multiple hypervisors, and physical infrastructure.

vRealize Suite editions comparison is available here and visualized on figure below.

vRealize Suite editions comparison
More specifically vRealize Suite 2017 includes following components:
  • vSphere Replication 6.5.1
  • vSphere Data Protection 6.1.5
  • vSphere Big Data Extensions 2.3.2
  • vRealize Orchestrator Appliance 7.3.0
  • vRealize Suite Lifecycle Manager 1.0
  • vRealize Operations Manager 6.6.1
  • vRealize Business for Cloud 7.3.1
  • vRealize Log Insight 4.5.0
  • vRealize Automation 7.3.0
  • vRealize Code Stream Management Pack for IT DevOps 2.2.1
VMware vCloud Suite brings together VMware’s industry-leading vSphere hypervisor with vRealize Suite, the complete cloud management solution.

vCloud Suite 2017 currently includes vRealize Suite 2017 and vSphere Enterprise Plus and NSX-V:
  • Everything included in vRealize Suite 2017
  • plus vSphere Enterprise Plus Infrastructure for vCloud Suite (ESXi licenses only, vCenter is not included)
  • ESXi 6.5 U1
  • plus NSX-V
  • NSX-V 6.3.3
What's new in 2017 Suites?
  1. VMware has introduced new overall manager called vRealize Suite Lifecycle Manager to simplify deployment and on-going management of the vRealize products.
  2. NSX-V (NSX for vSphere) is included in vCloud Suite 2017 ???
Please note, that vCenter Server is not included in vCloud Suite, therefore additional license for vCloud Suite is required.

Hope this helps to understand VMware license packaging.

...........................................................................................

UPDATE 2017-09-20:  The note about NSX-V licensing in vCloud Suite 2017 Release announcement is a little bit misleading. On VMware WebSite here is written
NSX is available as an optional component that can be purchased with vCloud Suite. 
So, VMware vCloud Suite 2017 customers are not entitled to use full NSX-V. It is just an entitlement to use NSX Manager for AV (replacement for vCNS Endpoint). NSX Manager for AV has been available for free even before vCloud Suite 2017 so that's nothing new. And of course, any VMware customer can purchase full NSX-V to extend their vSphere infrastructure for a network virtualization and datacenter modernization.

Sunday, September 17, 2017

CLI for VMware Virtual Distributed Switch - implementation procedure

Some time ago I have blogged about perl scripts emulating well known physical network switch CLI commands (show mac-address-table and show interface status) for VMware Distributed Virtual Switch (aka VDS). See the blog post here "CLI for VMware Virtual Distributed Switch".

Now is the time to operationalize it. My scripts are written in Perl leveraging vSphere Perl SDK which is distributed by VMware as vCLI. vCLI is available for Linux and Windows OS. I personally prefer Linux over Windows, therefore, the implementation procedure below is for Centos 7.

Implementation procedure for Centos 7

1/ Install Centos 7 minimal OS installation

2/ Install Perl 

yum install perl

3/ Install vCLI Prerequisite Software for Linux Systems with Internet Access

yum install e2fsprogs-devel libuuid-devel openssl-devel perl-devel
yum install glibc.i686 zlib.i686
yum install perl-XML-LibXML libncurses.so.5 perl-Crypt-SSLeay

4/ Install the vCLI Package on a Linux System with Internet Access

Download vSphere Perl SDK (Use link here)
tar –zxvf VMware-vSphere-CLI-6.X.X-XXXXX.x86_64.tar.gz
sudo vmware-vsphere-cli-distrib/vmware-install.pl

5/ Install VDSCLI scripts

# Create directory for vdscli
mkdir vdscli

# Change directory to vdscli
cd vdscli

# Install wget to be able to download VDSCLI files
yum install wget

# Download vdscli.pl
wget https://raw.githubusercontent.com/davidpasek/vdscli/master/vdscli.pl

# Download supporting shell wraper for show mac-address-table
wget https://raw.githubusercontent.com/davidpasek/vdscli/master/show-mac-address-table.sh

# Download supporting shell wraper for show interface status
wget https://raw.githubusercontent.com/davidpasek/vdscli/master/show-interface-status.sh

# Change mod of all files in VDSCLI directory to allow execution
chmod 755 *

# Edit shell wrappers with your specific vCenter hostname and credentials

It is highly recommended to create a specific readonly (AD or vSphere SSO) account for VDSCLI as depicted on screenshot below

vSphere SSO account for VDSCLI

6/ VDSCLI validation

# You must be in vdscli directory
# Run command below
./show-mac-address-table.sh

You should get mac address table for VMware distributed virtual switch. Something like on screenshot below

Output from ./show-mac-address-table.sh 

So now we have Perl scripts to get information from VMware Distributed Virtual Switch. So far so good. However, we would like to have Interactive CLI to have the same user experience as we have on physical switches CLI, right? For Interactive CLI I have decided to use Python ishell (https://github.com/italorossi/ishell).

7/ iShell installation

# Install python
yum install python

# Install and upgrade pip which is Python package manager
yum install epel-release
yum install python-pip
pip install --upgrade pip

# We need gcc therefore we install all development tools
yum group install "Development Tools"
yum install python-devel
yum install ncurses-devel

# Now we can finally install ishell
pip install ishell

8/ Interactive VDSCLI validation

# You must be in vdscli directory
# Run command bellow
./vdscli-ishell.py

You should be able to use an interactive shell with just two show commands. See. the screenshot below to get the impression how it works.

VDSCLI Interactive Shell

9/ Remote SSH or Telnet

The last step is to expose VDSCLI interactive shell over SSH or Telnet. I will show you how to do it for SSH but if you enable Telnet on your linux server it will work as well.

Let's add specific OS user
adduser admin
passwd admin

You have to copy all vdscli scripts to home dir of the newly created user (/home/admin). It is user admin in our case but you can create any username you want.

We have to add our interactive shell /home/admin/vdscli-ishell.py into /etc/shells because only programs configred there can be used as shells.

chsh admin
and use /home/vdscli/vdscli-ishell.py as a shell

CONCLUSION

So at the end you can simply ssh to Linux system we have build and immediately use CLI to VMware Distributed Switch as depicted on screenshot below

VDSCLI Interactive Shell over SSH
And that was the goal. Hope somebody else in VMware community will find it useful.

Friday, September 01, 2017

ESXi Physical NIC Capabilities for NSX VTEP

NSX VTEP encapsulation significantly benefits from physical NIC offload capabilities. In this blog post, I will show  how to identify NIC capabilities.

Check NIC type and driver

esxcli network nic get -n vmnic4
[dpasek@esx01:~] esxcli network nic get -n vmnic4
   Advertised Auto Negotiation: false
   Advertised Link Modes: 10000BaseT/Full
   Auto Negotiation: false
   Cable Type: FIBRE
   Current Message Level: 0
   Driver Info: 
         Bus Info: 0000:05:00.0
         Driver: bnx2x
         Firmware Version: bc 7.13.75
         Version: 2.713.10.v60.4
   Link Detected: true
   Link Status: Up 
   Name: vmnic4
   PHYAddress: 1
   Pause Autonegotiate: false
   Pause RX: true
   Pause TX: true
   Supported Ports: FIBRE
   Supports Auto Negotiation: false
   Supports Pause: true
   Supports Wakeon: false
   Transceiver: internal
   Virtual Address: 00:50:56:59:d8:8c
   Wakeon: None
[dpasek@czchoesint203:~] 

esxcli software vib list | grep bnx2x
[dpasek@esx01:~] esxcli software vib list | grep bnx2x
net-bnx2x                      2.713.10.v60.4-1OEM.600.0.0.2494585   QLogic     VMwareCertified   2017-05-10  
[dpasek@czchoesint203:~]

Driver parameters can be listed by command …
esxcli system module parameters list -m bnx2x
[dpasek@esx01:~] esxcli system module parameters list -m bnx2x
Name                                  Type          Value  Description                                                                                                                                                                                                                                                                                    
------------------------------------  ------------  -----  -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
RSS                                   int                  Controls the number of queues in an RSS pool. Supported Values 2-4.                                                                                                                                                                                                                            
autogreeen                            uint                  Set autoGrEEEn (0:HW default; 1:force on; 2:force off)                                                                                                                                                                                                                                        
bnx2x_vf_passthru_wait_event_timeout  uint                 For debug purposes, set the value timeout value on VF OP to complete in ms                                                                                                                                                                                                                     
debug                                 uint                  Default debug msglevel                                                                                                                                                                                                                                                                        
debug_unhide_nics                     int                  Force the exposure of the vmnic interface for debugging purposes[Default is to hide the nics]1.  In SRIOV mode expose the PF                                                                                                                                                                   
disable_feat_preemptible              int                  For debug purposes, disable FEAT_PREEMPTIBLE when set to value of 1                                                                                                                                                                                                                            
disable_fw_dmp                        int                  For debug purposes, disable firmware dump  feature when set to value of 1                                                                                                                                                                                                                      
disable_iscsi_ooo                     uint                  Disable iSCSI OOO support                                                                                                                                                                                                                                                                     
disable_rss_dyn                       int                  For debug purposes, disable RSS_DYN feature when set to value of 1                                                                                                                                                                                                                             
disable_tpa                           uint                  Disable the TPA (LRO) feature                                                                                                                                                                                                                                                                 
disable_vxlan_filter                  int                  Enable/disable vxlan filtering feature. Default:1, Enable:0, Disable:1                                                                                                                                                                                                                         
dropless_fc                           uint                  Pause on exhausted host ring                                                                                                                                                                                                                                                                  
eee                                                        set EEE Tx LPI timer with this value; 0: HW default; -1: Force disable EEE.                                                                                                                                                                                                                    
enable_default_queue_filters          int                  Allow filters on the default queue. [Default is disabled for non-NPAR mode, enabled by default on NPAR mode]                                                                                                                                                                                   
enable_geneve_ofld                    int                  Enable/Disable GENEVE offloads. 1: [Default] Enable GENEVE Offloads. 0: Disable GENEVE Offloads.                                                                                                                                                                                               
enable_live_grcdump                   int                  Enable live GRC dump 0x0: Disable live GRC dump, 0x1: Enable Parity/Live GRC dump [Enabled by default], 0x2: Enable Tx timeout GRC dump, 0x4: Enable Stats timeout GRC dump                                                                                                                    
enable_vxlan_ofld                     int                  Allow vxlan TSO/CSO offload support.[Default is enabled, 1: enable vxlan offload, 0: disable vxlan offload]                                                                                                                                                                                    
heap_initial                          int                  Initial heap size allocated for the driver.                                                                                                                                                                                                                                                    
heap_max                              int                  Maximum attainable heap size for the driver.                                                                                                                                                                                                                                                   
int_mode                              uint                  Force interrupt mode other than MSI-X (1 INT#x; 2 MSI)                                                                                                                                                                                                                                        
max_agg_size_param                    uint                 max aggregation size                                                                                                                                                                                                                                                                           
max_vfs                               array of int         Number of Virtual Functions: 0 = disable (default), 1-64 = enable this many VFs                                                                                                                                                                                                                
mrrs                                  int                   Force Max Read Req Size (0..3) (for debug)                                                                                                                                                                                                                                                    
multi_rx_filters                      int                  Define the number of RX filters per NetQueue: (allowed values: -1 to Max # of RX filters per NetQueue, -1: use the default number of RX filters; 0: Disable use of multiple RX filters; 1..Max # the number of RX filters per NetQueue: will force the number of RX filters to use for NetQueue
native_eee                            int                                                                                                                                                                                                                                                                                                                 
num_queues                            int                   Set number of queues (default is as a number of CPUs)                                                                                                                                                                                                                                         
num_queues_on_default_queue           int                  Controls the number of RSS queues ( 1 or more) enabled on the default queue. Supported Values 1-7, Default=4                                                                                                                                                                                   
num_rss_pools                         int                  Control the existance of an RSS pool. When 0,RSS pool is disabled. When 1, there will be an RSS pool (given that RSS>0).                                                                                                                                                                       
poll                                  uint                  Use polling (for debug)                                                                                                                                                                                                                                                                       
pri_map                               uint                  Priority to HW queue mapping                                                                                                                                                                                                                                                                  
psod_on_panic                         int                   PSOD on panic                                                                                                                                                                                                                                                                                 
rss_on_default_queue                  int                  RSS feature on default queue on eachphysical function that is an L2 function. Enable=1, Disable=0. Default=0                                                                                                                                                                                   
skb_mpool_initial                     int                  Driver's minimum private socket buffer memory pool size.                                                                                                                                                                                                                                       
skb_mpool_max                         int                  Maximum attainable private socket buffer memory pool size for the driver.                                                                                                                                                                                                                      

To get driver parameter value
esxcfg-module --get-options bnx2x 
[dpasek@esx01:~] esxcfg-module -g bnx2x
bnx2x enabled = 1 options = ''

-->


HCL device identifiers

vmkchdev -l | grep vmnic
[dpasek@esx01:~] vmkchdev -l | grep vmnic
0000:02:00.0 14e4:1657 103c:22be vmkernel vmnic0
0000:02:00.1 14e4:1657 103c:22be vmkernel vmnic1
0000:02:00.2 14e4:1657 103c:22be vmkernel vmnic2
0000:02:00.3 14e4:1657 103c:22be vmkernel vmnic3
0000:05:00.0 14e4:168e 103c:339d vmkernel vmnic4
0000:05:00.1 14e4:168e 103c:339d vmkernel vmnic5
0000:88:00.0 14e4:168e 103c:339d vmkernel vmnic6
0000:88:00.1 14e4:168e 103c:339d vmkernel vmnic7
So, in case of vmknic4 there is
·       VID:DID SVID:SSID
·       14e4:168e 103c:339d 

Check TSO configuration

esxcli network nic tso get
[dpasek@esx01:~] esxcli network nic tso get
NIC     Value
------  -----
vmnic0  on   
vmnic1  on   
vmnic2  on   
vmnic3  on   
vmnic4  on   
vmnic5  on   
vmnic6  on   
vmnic7  on   

esxcli system settings advanced list -o /Net/UseHwTSO
[dpasek@esx01:~] esxcli system settings advanced list -o /Net/UseHwTSO
   Path: /Net/UseHwTSO
   Type: integer
   Int Value: 1
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value: 
   Default String Value: 
   Valid Characters: 
   Description: When non-zero, use pNIC HW TSO offload if available

If you want disable TSO, use following commands …
esxcli network nic software set --ipv4tso = 0 -n vmnicX
esxcli network nic software set --ipv6tso = 0 -n vmnicX

Guest OS TSO settings in Linux OS can be changed by command …
ethtool -K ethX tso on/ off

Check LRO configuration

esxcli system settings advanced list -o /Net/TcpipDefLROEnabled
[dpasek@esx01:~] esxcli system settings advanced list -o /Net/TcpipDefLROEnabled
   Path: /Net/TcpipDefLROEnabled
   Type: integer
   Int Value: 1
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value: 
   Default String Value: 
   Valid Characters: 
   Description: LRO enabled for TCP/IP

vmxnet settings can be validated by command …
esxcli system settings advanced list -o /Net/Vmxnet3HwLRO
[dpasek@esx01:~] esxcli system settings advanced list -o /Net/Vmxnet3HwLRO
   Path: /Net/Vmxnet3HwLRO
   Type: integer
   Int Value: 1
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value: 
   Default String Value: 
   Valid Characters: 
   Description: Whether to enable HW LRO on pkts going to a LPD capable vmxnet3

Set guest OS LRO settings in linux OS …
ethtool -k ethx lro on/ off.

Check Checksum offload configuration

ESX settings
esxcli network nic cso get
[dpasek@esx01:~] esxcli network nic cso get
NIC     RX Checksum Offload  TX Checksum Offload
------  -------------------  -------------------
vmnic0  on                   on                 
vmnic1  on                   on                 
vmnic2  on                   on                 
vmnic3  on                   on                 
vmnic4  on                   on                 
vmnic5  on                   on                 
vmnic6  on                   on                 
vmnic7  on                   on                 
[dpasek@czchoesint203:~] 

The following command can be used for disabling CSO for a specific pNIC:
esxcli network nic cso set -n vmnicX



Check VXLAN offloading

vsish -e get /net/pNics/vmnic4/properties
dpasek@esx01:~] vsish -e get /net/pNics/vmnic4/properties
properties {
   Driver Name:bnx2x
   Driver Version:2.713.10.v60.4
   Driver Firmware Version:bc 7.13.75
   System Device Name:vmnic4
   Module Interface Used By The Driver:vmklinux
   Device Hardware Cap Supported:: 0x493c032b -> VMNET_CAP_SG VMNET_CAP_IP4_CSUM VMNET_CAP_HIGH_DMA VMNET_CAP_TSO VMNET_CAP_HW_TX_VLAN VMNET_CAP_HW_RX_VLAN VMNET_CAP_SG_SPAN_PAGES VMNET_CAP_IP6_CSUM VMNET_CAP_TSO6 VMNET_CAP_TSO256k VMNET_CAP_ENCAP VMNET_CAP_GENEVE_OFFLOAD VMNET_CAP_SCHED
   Device Hardware Cap Activated:: 0x403c032b -> VMNET_CAP_SG VMNET_CAP_IP4_CSUM VMNET_CAP_HIGH_DMA VMNET_CAP_TSO VMNET_CAP_HW_TX_VLAN VMNET_CAP_HW_RX_VLAN VMNET_CAP_SG_SPAN_PAGES VMNET_CAP_IP6_CSUM VMNET_CAP_TSO6 VMNET_CAP_TSO256k VMNET_CAP_SCHED
   Device Software Cap Activated:: 0x30800000 -> VMNET_CAP_RDONLY_INETHDRS VMNET_CAP_IP6_CSUM_EXT_HDRS VMNET_CAP_TSO6_EXT_HDRS
   Device Software Assistance Activated:: 0 -> No matching defined enum value found.
   PCI Segment:0
   PCI Bus:5
   PCI Slot:0
   PCI Fn:0
   Device NUMA Node:0
   PCI Vendor:0x14e4
   PCI Device ID:0x168e
   Link Up:1
   Operational Status:1
   Administrative Status:1
   Full Duplex:1
   Auto Negotiation:0
   Speed (Mb/s):10000
   Uplink Port ID:0x0400000a
   Flags:: 0x41e0e -> DEVICE_PRESENT DEVICE_OPENED DEVICE_EVENT_NOTIFIED DEVICE_SCHED_CONNECTED DEVICE_USE_RESPOOLS_CFG DEVICE_RESPOOLS_SCHED_ALLOWED DEVICE_RESPOOLS_SCHED_SUPPORTED DEIVCE_ASSOCIATED
   Network Hint:
   MAC address:9c:dc:71:db:d0:38
   VLanHwTxAccel:1
   VLanHwRxAccel:1
   States:: 0xff -> DEVICE_PRESENT DEVICE_READY DEVICE_RUNNING DEVICE_QUEUE_OK DEVICE_LINK_OK DEVICE_PROMISC DEVICE_BROADCAST DEVICE_MULTICAST
   Pseudo Device:0
   Legacy vmklinux device:1
   Respools sched allowed:1
   Respools sched supported:1
}

VXLAN offload capability is called 'VMNET_CAP_ENCAP'. That's what you need to look for.
vsish -e get /net/pNics/vmnic4/properties | grep VMNET_CAP_ENCAP
[dpasek@esx01:~] vsish -e get /net/pNics/vmnic4/properties | grep VMNET_CAP_ENCAP
   Device Hardware Cap Supported:: 0x493c032b -> VMNET_CAP_SG VMNET_CAP_IP4_CSUM VMNET_CAP_HIGH_DMA VMNET_CAP_TSO VMNET_CAP_HW_TX_VLAN VMNET_CAP_HW_RX_VLAN VMNET_CAP_SG_SPAN_PAGES VMNET_CAP_IP6_CSUM VMNET_CAP_TSO6 VMNET_CAP_TSO256k VMNET_CAP_ENCAP VMNET_CAP_GENEVE_OFFLOAD VMNET_CAP_SCHED

1.1.7     Check VMDq (NetQueue)

esxcli network nic queue filterclass list
This esxcli command shows information about the filters supported per vmnic and used by NetQueue.
[dpasek@esx01:~] esxcli network nic queue filterclass list
NIC     MacOnly  VlanOnly  VlanMac  Vxlan  Geneve  GenericEncap
------  -------  --------  -------  -----  ------  ------------
vmnic0    false     false    false  false   false         false
vmnic1    false     false    false  false   false         false
vmnic2    false     false    false  false   false         false
vmnic3    false     false    false  false   false         false
vmnic4     true     false    false  false   false         false
vmnic5     true     false    false  false   false         false
vmnic6     true     false    false  false   false         false
vmnic7     true     false    false  false   false         false

1.1.8     Dynamic NetQ

The following command will output the queues for all vmnics in your ESXi host.
esxcli network nic queue count get
[dpasek@esx01:~] esxcli network nic queue count get 
NIC     Tx netqueue count  Rx netqueue count
------  -----------------  -----------------
vmnic0                  1                  1
vmnic1                  1                  1
vmnic2                  1                  1
vmnic3                  1                  1
vmnic4                  8                  5
vmnic5                  8                  5
vmnic6                  8                  5
vmnic7                  8                  5

It is possible to disable NetQueue on a ESXi host level using the following command:
esxcli system settings kernel set --setting =" netNetqueueEnabled" --value =" false"

VXLAN PERFORMANCE

RSS can help in case VXLAN is used because VXLAN traffic can be distributed among multiple hardware queues. NICs that offer RSS have a throughput around 9 Gbps but NICs that do not only have a throughput of around 6 Gbps. Therefore, the right choice of physical NIC is critical.