Sunday, November 29, 2020

Virtual Machine Advanced Configuration Options

First and foremost, it is worth mentioning, that it is definitely not recommended to change any advanced settings unless you know what you are doing and you are fully aware of all potential impacts. VMware default settings are the best for general use covering the majority of use cases, however, when you have some specific requirements you might need to do the VM tuning and change some advanced virtual machine configuration options. In this blog post, I'm trying to document advanced configuration options I've found useful in some specific design decisions.

Time synchronization

  • time.synchronize.tools.startup
    • Description:
    • Type: Boolean
    • Values:
      • true / 1 (default)
      • false / 0
  • time.synchronize.restore
    • Description:
    • Type: Boolean
    • Values:
      • true / 1 (default)
      • false / 0
  • time.synchronize.shrink
    • Description:
    • Type: Boolean
    • Values:
      • true / 1 (default)
      • false / 0
  • time.synchronize.continue
    • Description:
    • Type: Boolean
    • Values:
      • true / 1 (default)
      • false / 0
  • time.synchronize.resume.disk
    • Description:
    • Type: Boolean
    • Values:
      • true / 1 (default)
      • false / 0

Relevant resources:

Ethernet

Isolation

With the isolation option, you can restrict file operations between the virtual machine and the host system, and between the virtual machine and other virtual machines.

VMware virtual machines can work both in a vSphere environment and on hosted virtualization platforms such as VMware Workstation and VMware Fusion. Certain virtual machine parameters do not need to be enabled when you run a virtual machine in a vSphere environment. Disable these parameters to reduce the potential for vulnerabilities.

Following advanced settings are booleans (true/false) with default value false. You can disable it by changing the value to true.

  • isolation.tools.unity.push.update.disable
  • isolation.tools.ghi.launchmenu.change
  • isolation.tools.memSchedFakeSampleStats.disable
  • isolation.tools.getCreds.disable
  • isolation.tools.ghi.autologon.disable
  • isolation.bios.bbs.disable
  • isolation.tools.hgfsServerSet.disable
  • isolation.tools.vmxDnDVersionGet.disable
  • isolation.tools.diskShrink.disable
  • isolation.tools.memSchedFakeSampleStats.disable
  • isolation.tools.guestDnDVersionSet.disable
  • isolation.tools.unityActive.disable
  • isolation.tools.diskWiper.disable

Snapshots

Remote Display

Tuesday, November 24, 2020

vSAN 7 Update 1 - What's new in Cloud Native Storage

 vSAN 7 U1 comes with new features also in Cloud Native Storage area, so let's look at what's new.

PersistentVolumeClaim expansion

Kubernetes v1.11 offered volume expansion by editing the PersistentVolumeClaim object. Please note, that volume shrink is not supported and extension must be done offline. Online expansion is not supported in U1 but planned on the roadmap.  

Static Provisioning in Supervisor Cluster

This feature allows exposing an existing storage volume within a K8s cluster integrated within vSphere Hypervisor Cluster (aka Supervisor Cluster, vSphere with K8s, Project Pacific).

vVols Support for vSphere K8s and TKG Service

Supporting external storage deployments on vK8s and TKG using vVols.

Data Protection for Modern Applications

vSphere 7.0 U1 comes with support Dell PowerProtect and Velero backup for Pacific Supervisor and TKG clusters. Velero only option to initiate snapshots from supervisor Velero plugin and store on S3.


vSAN Direct

vSAN Direct is the feature introducing Directly Attach Storage (typically physical HDD) for object storage solutions running on top of vSphere. 


There will not be a shared vSAN Datastore like typical vSAN has but vSAN Direct Datastores are allowing connect physical disks directly to virtual appliances or containers on top of vSphere/vSAN Cluster providing Object Storage services and bypassing traditional vSAN datapath.

Hope you find it useful.

Monday, November 23, 2020

Why HTTPS is faster than HTTP?

Recently, I was planning, preparing, and executing a network performance test plan, including TCP, UDP, HTTP, and HTTPS throughput benchmarks. The intention of the test plan was the network throughput comparison between two particular NICs

  • Intel X710
  • QLogic FastLinQ QL41xxx

There was a reason for such exercise (reproduction of specific NIC driver behavior) and I will probably write another blog post about it, but today I would like to raise another topic. During the analysis of testing results, I've observed very interesting HTTPS throughput results in comparison to HTTP throughput. These results were observed on both types of NICs, therefore, it should not be a benefit of specific NIC hardware or driver.

Here is the Test Lab Environment:

  • 2x ESXi hosts
    • Server Platform: HPE ProLiant DL560 Gen10
    • CPU: Intel Cascade Lake based Xeon
    • BIOS: U34 | Date (ISO-8601): 2020-04-08
    • NIC1: Intel X710, driver i40en version: 1.9.5, firmware 10.51.5
    • NIC2: QLogic QL41xxx, driver qedentv version: 3.11.16.0, firmware mfw 8.52.9.0 storm 
    • OS/Hypervisor: VMware ESXi 6.7.0 build-16075168 (6.7 U3)
  • 1x Physical Switch
    • 10Gb switch ports  <<  network bottleneck by purpose, because customer is using 10Gb switch ports as well

Below are the observed interesting HTTP and HTTPS results.

HTTP


HTTPS


OBSERVATION, EXPLANATION, AND CONCLUSION

We have observed

  • HTTP throughput between 5 and 6 Gbps
  • HTTPS throughput between 8 and 9 Gbps

which means 50% higher throughput of HTTPS over HTTP. Normally, we would be expecting HTTP transfer faster than HTTPS as HTTPS requires encryption, which should end-up with some CPU overhead. Encryption overhead is questionable, but nobody would expect HTTPS significantly faster than HTTP, right? That's the reason I was asking myself, 

why HTTPS overachieved HTTP results on HPE Lab with the latest Intel CPUs?

Here is my process of the "issue" troubleshooting or better to say, root cause analysis. 

Conclusion

  • In my home lab, I have old Intel CPUs models (Intel Xeon CPU E5-2620 0 @ 2.00GHz), that's the reason HTTP and HTTPS throughputs are identical.
  • In the HPE test lab, there are the latest Intel CPU models, therefore, HTTPS can be offloaded and client/server communication can leverage asynchronous advantages for web servers using Intel® QuickAssist Technology introduced in the Intel Xeon E5-2600 v3 product family. 
  • It is worth to mention, that it is not only about CPU hardware acceleration, but also about software code which must be written in the form, hardware acceleration can leverage for a positive impact on performance. This is the case of OpenSSL 1.1.0, and NGINX 1.10 to boost HTTPS server efficiency. 

Lesson learned

When you are virtualizing network functions, it is worth considering the latest CPUs, as it can have a significant impact on overall system performance and throughput. Does not matter, if such network function virtualization is done by VMware NSX or other virtualization or containerization platforms.

Investigation continues

To be honest, I do not know if I really fully understand the root cause of such behavior. I still wonder why HTTPS is 50% faster than HTTP, and if CPU offloading is the only factor for such performance gain.

I'll try to run the test plan on other hardware platforms, compare results, and do some further research to understand much deeper. Unfortunately, I do not have direct access to the latest x86 servers of other vendors, so it can take a while. If you have access to some modern x86 hardware and want to run my test plan by yourself, you can download the test plan document from here. If you will invest some time into the testing, please share your results in the comments below this article or simply send me an e-mail

Hope this blog post is informative, and as always, any comment or idea is very welcome. 

Saturday, November 21, 2020

Understanding vSAN Architecture Components for better troubleshooting

VMware vSAN becomes more and more popular, thus more often used as primary storage in data centers and server rooms. Sometimes, as with any IT technology, is necessary to do the troubleshooting. Understanding of architecture and components interactions is essential for effective troubleshooting of vSAN. Over years, I have collected some vSAN architectural information into a slide deck I made available at https://www.slideshare.net/davidpasek/vsan-architecture-components

In the slide deck are the slides with the following sections ...

vSAN Terminology

  • CMMDS - Cluster Monitoring, Membership, and Directory Service
  • CLOMD - Cluster Level Object Manager Daemon
  • OSFSD - Object Storage File System Daemon
  • CLOM - Cluster Level Object Manager
  • OSFS - Object Storage File System
  • RDT - Reliable Datagram Transport
  • VSANVP - Virtual SAN Vendor Provider
  • SPBM - Storage Policy-Based Management
  • UUID - Universally unique identifier
  • SSD - Solid-State Drive
  • MD - Magnetic disk
  • VSA - Virtual Storage Appliance
  • RVC - Ruby vSphere Console

Architecture components
  • CMMDS
    • Cluster Monitoring, Membership, and Directory Service
  • CLOM
    • Cluster Level Object Manager Daemon
  • DOM
    • Distributed Object Manager
    • Each object in a vSAN cluster has a DOM owner and a DOM client
  • LSOM
    • Local Log Structured Object Manager
    • LSOM works with local disks
  • RDT
    • Reliable Datagram Transport
Components interaction



Architecture & I/O Flow




Troubleshooting tools
  • RVC
    • vsan.observer
    • vsan.disks_info
    • vsan.disks_stats
    • vsan.disk_object_info
    • vsan.cmmds_find
  • ESXCLI
    • esxcli vsan debug disk list
  • Objects tools
    • /usr/lib/vmware/osfs/bin/objtool
How to use vSAN Observer

  • SSH somewhere where you have RVC. It can be for example VCSA or HCIbench
    • ssh root@[IP-ADDRESS-OF-VCSA]
  • Run RVC command-line interface and connect to your vCenter where you have vSphere cluster with vSAN service enabled. RVC requires the password of the administrator in your vSphere domain. 
    • rvc administrator@[IP-ADDRESS-OF-VCSA]
  • Start vSAN Observer on your vSphere cluster with vSAN service enabled
    • vsan.observer -r /localhost/[vDatacenter]/computers/[vSphere & vSAN Cluster]
  • Go to vSAN Observer web interface
    • vSAN Observer is available at https://[IP-ADDRESS-OF-VCSA]:8010
Slide deck includes little more info so download it from https://www.slideshare.net/davidpasek/vsan-architecture-components

If you have to troubleshoot vSAN, I highly recommend to follow the process documented at "Troubleshooting vSAN Performance".

Hope it helps the broader VMware community.

If you know some other detail or troubleshooting tool, please leave a comment below this post.

Thursday, November 05, 2020

NSX-T Edge Node performance profiles

It is good to know that NSX-T Edge Node has multiple performance profiles. Those profiles will change the # of vCPU for DPDK and so leave more or less vCPU for other services such as LB:

  • default (best for L2/L3 traffic)
  • LB TCP (best for L4 traffic)
  • LB HTTP (best for HTTP traffic)
  • LB HTTPS (best for HTTPS traffic)

Now you can ask how to choose Load Balancer Performance profile. SSH to the edge node and use CLI.

 nsx-edgebm3> set load-balancer perf-profile  
  http   Performance profile type argument  
  https   Performance profile type argument  
  l4    Performance profile type argument  
 Note: You may be prompted to restart the dataplane or reboot the Edge Node if there are changes in the profile in # of cores used by LB.  
 To go back to default profile:  
 nsx-edgebm3> clear load-balancer perf-profile  

Changing from L4 to HTTP help me to achieve ~3x higher HTTP throughput through L7 NSX-T load balancer. Hope this helps someone else as well.

Tuesday, September 22, 2020

vSAN - vLCM Capable ReadyNode

VMware vSphere Lifecycle Manager (aka vLCM) is one of the very interesting features in vSphere 7.  vLCM is a powerful new approach to simplified consistent lifecycle management for the hypervisor and the full stack of drivers and firmware for the servers powering your data center.

There are only a few server vendors who have implemented firmware management with vLCM.

At the moment of writing this article, these vendors are:

  • Dell and HPE for vSphere 7.0
  • Dell, HPE, Lenovo for vSphere 7.0 Update 1

Recently I have got the following question from one of my customers.

"Where I can find official information about certified vLCM server vendors?"

It is a very good question. I would expect such information in VMware Compatibility Guides (VCG), however, there is no such information on "Systems / Servers" VCG but you can find it in "vSAN" VCG.



vSAN VCG contains "vSAN ReadyNodes Additional Features" where one feature is "vLCM Capable ReadyNode". So, there you can find Server Vendors successfully implemented firmware management integration with vLCM, but it is available only for vSAN Ready Nodes. I can imagine, that in the future, vLCM capability may or may not be available even for standard servers and not only for vSAN Ready Nodes.

Friday, September 11, 2020

Datacenter Network Topology - Dell OS10 MultiDomain VLT

Yesterday, I have got the following e-mail from one of my blog readers ...

Hello David,

Let me introduce myself, I work in medium size company and we began to sell Dell Networking stuff to go along with VxRail. We do small deployments, not the big stuff with spine/leaf L3 BGP, you name it. For a Customer, I had to implement this solution. Sadly, we are having a bad time with STP as you can see on the design.

 

Customer design with STP challenge

Is there a way to be loop-free ? I thought about Multi Domain VLT LAG but it looks like it is not supported in OS10. 

I wonder how you would do this. Is SmartFabric the answer ?
Thank you

Well, first of all, thanks for the question. If you ask me, it all boils down to specific design factors - use cases, requirements, constraints, assumptions.

So let's write down design factors

Requirements:

  • Multi-site deployment
  • A small deployment with a single VLT domain per site.
  • Robust L2 networking for VxRail clusters

Constraints:

  • Dell Networking hardware with OS10
  • Networking for VMware vSphere/vSAN (VxRail)

Assumptions:
  • No more than a single VLT domain per site is required
  • No vSphere/vSAN (VxRail) Clusters are Stretched across sites
Any unfulfilled assumption is a potential risk. In the case of unfulfilled assumption, the design should be reviewed and potentially redesigned to fulfill the design factors. 

Now, let's think about network topology options we have. 

The reader has asked if DellEMC SmartFabric can help him. Well, SmartFabric can be the option as it is Leaf-Spine Fabric fully managed by External SmartFabric Orchestrator. Something like Cisco ACI / APIC. SmartFabric uses EVPN, BGP, VXLAN, etc. for multi-rack deployment. I do not know the latest details, but AFAIK, it was not multi-site ready a few months ago. The latest SmartFabric features should be validated with DellEMC. Anyway, SmartFabric can do L2 over L3 if you need stretching L2 over L3 across racks. Eventually, it should be possible to stretch L2 even across sites.

However, because our design is targeted to a small deployment, I think the Leaf-Spine is the overkill for small deployment and I always prefer the KISS (Keep It Simple, Stupid) approach. 

So, here are two final options of network topology I would consider and compare.

OPTION 1: Stretched L2 Loop-Free across sites 
OPTION 2: L3 across sites with L2/L3 boundary in TOR access switches 


 Option 1 Stretched L2 Loop-Free across sites


Option 2 - L3 across sites with L2/L3 boundary in TOR access switches 

So let's compare these two options. 

Option 1 - Stretched L2 Loop-Free across sites 

Benefits

  • Simplicity
  • Stretched L2 across sites allows workload (device, VM, container, etc.) migrations across sites without L2 over L3 network overlay (NSX, SmartFabric, etc.) and re-IP.

Drawbacks

  • Topology is not scalable for more TOR access switches (VLT domains), but this ok with the design factors
  • Topology optimally requires 8 links across sites. Optionally, can be reduced to 4 links.
  • Only two routers. One per site.
  • Stretched L2 topology across sites also extends L2 network fault-domain across sites, therefore broadcast storms, unknown unicast flooding, and potential STP challenges are the potential risks.
  • This topology has L3 trombone by design - https://blog.ipspace.net/2011/02/traffic-trombone-what-it-is-and-how-you.html. This drawback can be accepted or mitigated by NSX distributed routing.

OPTION 2 - L3 across sites with L2/L3 boundary in TOR access switches 

Benefits

  • Better scalability, because other VLT domains (TOR access switches) can be connected to core routers. However, this benefit is not required by the design factors above. 
  • Topology optimally requires 4 links across sites. Optionally, can be reduced to 2 links. This is less than Option 1 requires.
  • Each site is local fault-domain from L2 networking point of view, as L2 fault-domain is not stretched across sites. L2 faults (STP, broadcast storms, unknown unicast flooding, etc.) are isolated within the site. 

Drawbacks

  • More complex routing configuration with ECMP and dynamic routing protocol like iBGP or OSPF
  • Four routers. Two per site.
  • L3 topology across sites restricts workload (device, VM, container, etc.) migrations across sites without L2 over L3 network overlay (NSX, SmartFabric, etc.) or changing the IP address of migrated workload.

Conclusion and Design Decision

Both considered design options are L2 loop-free topologies and I hope it should fit all design factors defined above. If you do not agree, please write a comment because anybody can make an error in any design or not foresee all situations, until the architecture design is implemented and validated. 

If I should make a final design decision, it would depend on two other factors
  • Do I have VMware NSX in my toolbox or not?
  • What is the skillset level of network operators (Dynamic Routing, ECMP, VRRP) responsible for the operation?
If I would not have NSX and network operators would prefer Routing High Availability (VRRP) over Dynamic Routing with ECMP (high availability + scalability + performance), I would decide to implement Option 1.

In the case of NSX and willingness to use dynamic routing with ECMP, I would decide to implement Option 2.

The reader in his question mentioned, that his company do not use spine/leaf L3 BGP, therefore Option 1 is probably a better fit for him. 

Disclaimer: I had no chance to test and validate any of the design option considered above, therefore, if you have any real experience, please speak out loudly in the comments.

Tuesday, September 01, 2020

Why NUMA matters?

This is a very short blog post because more and more VMware customers and partners are asking me the same question ... 

"Why NUMA matters?"

If you want to know more I would highly recommend reading Frank Denneman's detailed blog posts or books about NUMA, however, the table below is worth 1000 words.

Local memory access latency is ~ 75 ns.

Remote memory access latency is ~ 132 ns.

A ~ 40% positive impact on performance is worth to incorporate NUMA considerations in your data center infrastructure design.

If you prefer a comprehensive presentation, Frank Denneman should speak on VMworld 2020 about NUMA in session "60 Minutes of NUMA [HCP2453]".


Wednesday, August 26, 2020

iSCSI Best Practices - 2020 review

I have just listened to the Virtually Speaking podcast episode Back to Basics: iSCSI Back in 2014, I wrote a blog post about iSCSI Best Practices, but it was about general iSCSI best practices for any operating system or hypervisor. All these old best practices should be still considered in full-stack design but four design considerations have been highlighted in the above podcast. These four are

  1. Jumbo Frames - more details in my blog post about iSCSI Best Practices
  2. iSCSI Port Binding - more details here at VMware KB https://kb.vmware.com/s/article/2038869
  3. Delay ACK - more details in my blog post and at VMware KB https://kb.vmware.com/s/article/1002598
  4. NoOp Time out - in the article at https://www.jacobhopkinson.com/2019/05/10/iscsi-a-25-second-pause-in-i-o-during-a-single-link-loss-what-gives/
 
Update 2020-06-29:
Consider using custom named iSCSI IQN. See the justification at
 
Hope this info helps other folks in the VMware community.

Wednesday, August 12, 2020

Could not connect to one or more vCenter Server Systems: https://vCenterFQDN: 443/sdk"

When I have logged in vCenter 7 vSphere Client in my home lab, I have experienced the message
"Could not connect to one or more vCenter Server Systems: https://vCenterFQDN: 443/sdk"
Below is the screenshot from vSphere Client ...



The message is very clear but such an issue can be caused by various reasons, therefore vpxd.log in vCenter Server appliance should be checked to identify the specific reason causing the unavailability of vCenter Server service providing API endpoint for other services.

In my particular case, I have seen the following log messages ...

2020-08-05T00:07:38.663Z error vpxd[18559] [Originator@6876 sub=HTTP session map] Out of HTTP sessions: Limited to 2000
2020-08-05T00:07:38.746Z error vpxd[16483] [Originator@6876 sub=HTTP session map] Out of HTTP sessions: Limited to 2000
2020-08-05T00:07:38.821Z error vpxd[16578] [Originator@6876 sub=HTTP session map] Out of HTTP sessions: Limited to 2000
2020-08-05T00:07:38.999Z error vpxd[16549] [Originator@6876 sub=HTTP session map] Out of HTTP sessions: Limited to 2000
2020-08-05T00:07:39.045Z error vpxd[16506] [Originator@6876 sub=HTTP session map] Out of HTTP sessions: Limited to 2000
2020-08-05T00:07:39.122Z error vpxd[16503] [Originator@6876 sub=HTTP session map] Out of HTTP sessions: Limited to 2000
2020-08-05T00:07:39.311Z error vpxd[16553] [Originator@6876 sub=HTTP session map] Out of HTTP sessions: Limited to 2000

This means that something exhausted the maximum number (2000) of HTTP sessions vCenter daemon VPXD is accepting. Now the question is who is the troublemaker. The other error messages appearing in vpxd.log were about invalid login of hmsuser

2020-08-05T00:00:39.651Z info vpxd[16492] [Originator@6876 sub=Default opID=3ff7839f] [VpxLRO] -- ERROR lro-16 -- SessionManager -- vim.SessionManager.impersonateUser: vim.fault.InvalidLogin:
--> Result:
--> (vim.fault.InvalidLogin) {
-->    faultCause = (vmodl.MethodFault) null,
-->    faultMessage = <unset>
-->    msg = ""
--> }
--> Args:
-->
--> Arg userName:
--> "hmsuser"
--> Arg locale:
--> "en"

Who is this hmsuser? Well, HMS stands for HMS service used by vSphere Replication and it was the culprit. After Power Off vSphere Replication, I was able to login to vCenter again. I had no time for further problem management, but because of the lab environment, I will most probably install brand new vSphere Replication on the latest version.

Hope this can help other folks in VMware community.
 

Monday, July 06, 2020

vCenter Server Appliance - Update installation is in progress

A few days ago, I have updated my home-lab VCSA vCenter Server 7.0 GA (15952498) to vCenter Server 7.0.0a (16189094). Everything seemed ok from the vCenter (vSphere Client) perspective. I was seeing there vCenter build 16189207, which is obviously VCSA 16189094.

The only problem I had was the fact, that I was not able to log in to VCSA VAMI.


After user authentication into VAMI, I was getting a message ... "Update installation is in progress". During the problem isolation, I tested the REST API and was getting the same message through Appliance REST API.

To be honest, I had no idea, how to check VCSA update status and move it forward, therefore I did the log files troubleshooting.

In /var/log/vmware/applmgmt/vami.log I had regularly repeating messages... "Executing VMware vCenter Server SNMP post upgrade actions..."

The next step was parsing PatchRunner.log. There was nothing remarkable except info about the patch stageDir located at "/storage/core/software-update/updates/7.0.0.10300/patch_runner" where i had two empty directories and file "patch_phase_context.json"

Normally, the directory /storage/core/software-update/updates/ is empty, however I have there still subdirectory with patch … /storage/core/software-update/updates/7.0.0.10300

I tried to delete the whole directory /storage/core/software-update/updates/7.0.0.10300 but it did not help.

It seemed, that post-upgrade actions (specifically VMware vCenter Server SNMP) cannot finish.

I HAD NO IDEA ... I WAS STUCK ... the next step was the research on the internal VMware slack channel. I have not found anything, therefore asked for help. In a few days, somebody else had the same issue in his home lab and he got the tip from VMware Engineering guys.

The trick was in file /etc/applmgmt/appliance/software_update_state.conf

I had there the following content

{
    "state": "INSTALL_IN_PROGRESS",
    "version": "7.0.0.10300",
    "latest_query_time": "2020-06-12T17:30:46Z",
    "operation_id": "/storage/core/software-update/install_operation"
}

so the trick was to change the content into

{
    "state": "UP_TO_DATE"
}

After the change, I was able to log in to VAMI and use it as usual. Well, almost as usual.
I realized, that I was not able to start another update as buttons [STAGE ONLY] and [STAGE AND INSTALL] are greyed out.



Nevertheless, this is only GUI problem and VCSA CLI update procedure works like a charm, so the update can be done through SSH session and command software-packages install --url or software-packages install --iso

Another positive thing is, that I was able to perform vCenter native backup, so if needed, I have the option to do a VCSA backup and restore, which would probably solve this cosmetic GUI issue.

Disclaimer: This exercise was done in my home lab. If you will experience a similar issue in a production system, please, contact VMware support before any changes in VCSA.

Monday, June 22, 2020

What's new in VxRail 7.0

This is a very short blog post about VxRail 7.0 which has been launched today. First of all, VxRail naming has been aligned with vSphere versioning, hence VxRail 7.0. Here is the summary of the announcement:

  • VxRail 7.0 includes the vSphere 7.0 and vSAN 7.0
  • Customers can now run vSphere Kubernetes on the Dell Tech Cloud Platform, VMware Cloud Foundation 4.0 on VxRail 7.0.
  • With a more accessible Consolidated Architecture, Dell Technologies Cloud Platform can now be deployed starting with a 4-node configuration.
  • Brand-new Dell EMC VxRail D Series – the most extreme yet. The D560/D560F is a ruggedized, durable platform that delivers the full power of VxRail for workloads at the edge, in challenging environments, or for space-constrained areas.
  • Even more platform flexibility with the new VxRail E Series model based on, for the first time, AMD EPYC processors.
  • The single socket, 1U nodes offer dual-socket performance making them ideal platforms for desktop VDI, analytics, and computer-aided design.
  • Enhanced operational benefits with new automation and self-service features enabling customers to schedule and run upgrade health checks in advance of upgrades with VxRail HCI System Software.
  • The addition of Intel® Optane™ DC Persistent Memory to the E560 and P570 platforms offers high performance and significantly increased memory capacity with data persistence.
  • The latest NVIDIA® Quadro RTX™ 6000 and 8000 GPUs to the V570F bringing the most significant advancement in computer graphics in over a decade to professional workflows.
If you were unable to attend the event, you can always visit the VxRail event page, where you can watch it OnDemand!

Monday, May 11, 2020

Undocumented HA Advanced Option - das.restartVmsWithoutResourceChecks

Some time ago, a colleague of mine (@stan_jurena) was challenged by one VMware customer who experienced APD (All Path Down) storage situation in the whole HA Cluster and he expected that VMs will be killed by VMware Hypervisor (ESXi) because of HA Cluster APD response setting "Power off and restart VMs - Aggressive restart policy". To be honest, I had the same expectation. However, after the discussion with VMware engineering, we have been told, that the primary role of HA Cluster is to keep VMs up and running, so "Aggressive restart policy" will restart VM only in certain conditions which are much better described in vSphere Client 7 UI. See the screenshot below.


APD Aggressive restart policy
A VM will be powered off, If HA determines the VM can be restarted on a different host, or if HA cannot detect the resources on other hosts because of network connectivity loss (network partition).

So, what it means? Aggressive restart policy is the same as Conservative but extended for the situation when there is network partitioning. This can be helpful in situations when you have IP storage and experience IP network issues but it does not help in a situation when you have dedicated Fibre Channel SAN and the storage is not available for the whole vSphere Cluster.

We explained to VMware engineering, that there are situations when it is much better to kill all VMs than keep compute (VMs) running without available storage. Based on these discussions, there was created a Feature Request, which was internally named as "super aggressive option" APD. I'm happy to see, that it was implemented and released in vSphere 7 as vSphere advanced option
das.restartVmsWithoutResourceChecks = false (default) / true (super aggressive)
I think this advanced option will be very useful for infrastructure architects / technical designers who will have a good justification to use this advanced option. Here are my typical justifications
  • When the storage subsystem is unavailable for some time, Linux operating system switch file system to Read-Only mode which has a negative impact on running applications. Such a situation typically leads to server restart anyway.
  • When you have an OS/Application clustering solution (for example MSCS) on top of vSphere clustering, having one Application node on one vSphere cluster and another Application node on different vSphere cluster, you prefer to kill VM (App Node) on the problematic cluster (without available storage) to fail-over to App Node (VM) running on the healthy cluster.
Hope this makes sense.

Please leave the comment if you will find this advanced option useful. VMware Engineering might consider adding this option into GUI, based vSphere architects / technical designers' feedback.

References
  • Duncan Epping wrote the blog post about it here.
  • For other "Advanced configuration options for VMware High Availability in vSphere 5.x and 6.x" check VMware KB 2033250.


Friday, May 08, 2020

CPU capacity planning and sizing

During infrastructure capacity planning and sizing, the technical designer has to calculate CPU, RAM, Storage, and Network resource requirements. Recently, I had an interesting discussion with my colleagues on how to estimate CPU requirements for application workload.

Each computer application requires some CPU resources for computational tasks and additional resources for I/O tasks. It is obvious that the computational tasks require CPU cycles, however, it is not so obvious that there are CPU cycles associated also with I/O. In other words, each I/O requires some CPU resources. It does not matter if it is memory, storage, or network I/O.

For example, a generally accepted rule of thumb in the networking is that
1 Hertz of CPU processing is required to send or receive 1 bit/s of TCP/IP.
[Source: VMware vSphere 6.5 Host Resources Deep Dive]

This would mean 2.5Gb/s would require ~ 2.5Ghz CPU, thus ~ 100% of one CPU Core @ 2.5 GHz.

It would be nice to have a similar rule of thumb for storage I/O. I did quick research (googling) but was not able to find any information about CPU requirements for storage I/O.  I did a quick test in my home lab and start the synthetic random workload (4KB I/O) on 4vCPU VM on ESXi host having CPU at 2 GHz, where I was able to see 5,000 IOPS with CPU utilization 8.5%. This would mean one 4KB I/O requires 136 Hz.

4KB I/O on 4x vCPU VM with pCPU @ 2 GHz 
I did another test with 512 B I/O where 1 IOPS requires 114 Hz.
And for 64KB I/O size, 1 IOPS requires 161 Hz.

0.5 KB I/O => 512 Bytes I/O (4,096 bits) = 114 Hz
4 KB I/O => 4,096 Bytes I/O (32,768 bits) = 136 Hz
64 KB I/O => 65,536 Bytes I/O (52,4288 bits) = 161 Hz

Based on my observations, it is difficult to define the rule of thumb for 1 Bit/s or Byte/s but rather I would define CPU (Hz) requirements for 1 storage I/O.

Based on multiple assessments of real datacenter environments, I would say that typical average storage I/O size is around 40-50 KB, therefore here is my rule of thumb
1 Storage I/O requires ~ 150 Hz of CPU processing
This would mean 10,000 IOPS would require ~ 1.5 GHz, thus 60% of one CPU Core @ 2.5 GHz.

Please, be aware that this is a very simplified calculation but clearly shows that storage workload is always associated with CPU requirements and it can help with capacity planning and infrastructure sizing.

What do you think about this calculation?
Do you observe different numbers?
Would you calculate it differently?
You can leave the comment below the article.

Sunday, May 03, 2020

vSphere 7 - Storage Requirements for the vCenter Server Appliance

I have upgraded vSphere in my home lab and realized that VCSA 7.0 storage requirements increased significantly.

Here are the requirements of vCenter Server Appliance 6.7


Here are the requirements of vCenter Server Appliance 7.0


You can see the difference by yourself. VCSA 7.0 requires roughly 30%-60% more storage than VCSA 6.7. It is good to know it especially for home labs where hardware resources are limited or during logical designs of new environments where you do some math calculation to plan hardware requirements.

Monday, April 20, 2020

What's New in vSAN 7

vSAN 7.0 introduces the following new features and enhancements.

vSphere Lifecycle Manager (vLCM).

vLCM enables simplified, consistent lifecycle management for your ESXi hosts. It uses a desired-state model that provides lifecycle management for the hypervisor and the full stack of drivers and firmware. vLCM reduces the effort to monitor compliance for individual components and helps maintain a consistent state for the entire cluster. In vSAN 7.0, this solution supports Dell and HPE ReadyNodes.

Integrated File Services. 

vSAN native File Service delivers the ability to leverage vSAN clusters to create and present NFS (v4.1 and v3) file shares. vSAN File Service extends vSAN capabilities to files, including availability, security, storage efficiency, and operations management.

Native support for NVMe hotplug.

This feature delivers a consistent way of servicing NVMe devices and provides operational efficiency for select OEM drives.

I/O redirect based on capacity imbalance with stretched clusters.

vSAN redirects all VM I/O from a capacity-strained site to the other site, until the capacity is freed up. This feature improves the uptime of your VMs.

Skyline integration with vSphere health and vSAN health. 

Joining forces under the Skyline brand, Skyline Health for vSphere and vSAN are available in the vSphere Client, enabling a native, in-product experience with consistent proactive analytics.

Remove EZT for a shared disk. 

vSAN 7.0 eliminates the prerequisite that shared virtual disks using the multi-writer flag must also use the eager zero thick format.

Support vSAN memory as a metric in performance service. 

vSAN memory usage is now available within the vSphere Client and through the API.

Visibility of vSphere Replication objects in vSAN capacity view. 

vSphere replication objects are visible in vSAN capacity view. Objects are recognized as vSphere replica type, and space usage is accounted for under the Replication category.

Support for large capacity drives. 

Enhancements extend support for 32TB physical capacity drives and extend the logical capacity to 1PB when deduplication and compression are enabled.

Immediate repair after a new witness is deployed. 

When vSAN performs a replacement witness operation, it immediately invokes a repair object operation after the witness has been added.

vSphere with Kubernetes integration. 

CNS is the default storage platform for vSphere with Kubernetes. This integration enables various stateful containerized workloads to be deployed on vSphere with Kubernetes Supervisor and Guest clusters on vSAN, VMFS and NFS datastores.

File-based persistent volumes. 

Kubernetes developers can dynamically create shared (Read/Write/Many) persistent volumes for applications. Multiple pods can share data. vSAN native File Services is the foundation that enables this capability.

vVol support for modern applications. 

You can deploy modern Kubernetes applications to external storage arrays on vSphere using the CNS support added for vVols. vSphere now enables unified management for Persistent Volumes across vSAN, NFS, VMFS, and vVols.

vSAN VCG notification service.

You can subscribe to vSAN HCL components such as vSAN ReadyNode, I/O controller, drives (NVMe, SSD, HDD) and get notified through email about any changes. The changes include firmware, driver, driver type (async/inbox), and so on. You can track the changes over time with new vSAN releases.

Thursday, April 16, 2020

Logical design - storage performance sizing

Storage performance is always a kind of magic because multiple factors come in to play and not all disks are equal, however, in logical design, we have to do some math because capacity (and performance) planning is a very important part of logical design.

How I do it? I do math with some performance assumptions.

Here are assumptions about various disk type performance I use for my capacity planning exercises.

The below numbers are estimated for the random I/O of 64KB I/O size.

Mechanical hard drives
SAS 15k - 200 IOPS
SATA 7k - 80 IOPS

Read Intensive Solid-state disks (SSD)
SATA Read Intensive SSD - 5,000 IOPS (read) / 1,500 IOPS (write)
SAS Read Intensive SSD - 10,000 IOPS (read) / 2,000 IOPS (write)
NVMe Read Intensive SSD - 30,000 IOPS (read) / 2,500 IOPS (write)

Mixed Used Solid-state disks (SSD)
SATA Mixed Used SSD - 5,000 IOPS (read) / 1,800 IOPS (write)
SAS Mixed Used SSD - 12,500 IOPS (read) / 5,000 IOPS (write)
NVMe Mixed Used SSD - 45,000 IOPS (read) / 10,000 IOPS (write)

Write Intensive Solid-state disks (SSD)
SAS Write Intensive SSD - 12,500 IOPS / 7,500 IOPS (write)

SSD assumptions are based on hardware vendors' spec sheets. One of these spec sheets is available here https://www.slideshare.net/davidpasek/dell-power-edge-ssd-performance-specifications

So with these assumptions, the performance math is relatively simple.

Let's have for example 4x SAS Read Intensive SSD within a disk group.
Such a disk group should have the aggregated read performance 4 x 10,000 IOPS = 40,000 IOPS

As we see in the performance numbers above, there is a significant performance difference between SSD read and write.

For our SAS Read Intensive SSD disk we have 10,000 IOPS for 100% read but only 2,000 IOPS for 100% write so we have to normalize these numbers based on expected read/write ratio. If the planned storage workload is 70% read and 30% write, we can assume the single SSD disk will give as 7,000 + 600 IOPS, so in total 7,600 IOPS.

Storage is typically protected by some RAID protection, where the write penalty comes into play. Write penalty is the number of I/O operations required on the backend for a single frontend I/O operation.

Here are write penalties for various RAID protections
RAID 0 (no protection) - write penalty 0
RAID 1 (mirror) - write penalty 2
RAID 5 (erasure coding / single parity) - write penalty 4
RAID 6 (erasure coding / souble parity) - write penalty 6

So, let's calculate the write penalty and write overhead.

If the planned storage workload is 70% read and 30% write and we have total aggregated normalized performance 30,400 IOPS (4 x 7,600) and we have to split the available performance into READ bucket and WRITE bucket.

In our example scenario, we have
READ bucket (70%) - 21,280 IOPS
WRITE bucket (30%) - 9,120 IOPS

Now we have to apply write penalty on write bucket. So let's say we would like to have RAID 5 protection, therefore 9,120 IOPS available on the backend can handle only 2,280 IOPS coming from the frontend.

Based on these calculations, the aggregated performance of RAID 5 protected disk group of 4 Read Intensive SSD disks should be able to handle 23,560 IOPS (21,280 + 2,280) of front-end storage workload.  Please note, that the considered workload pattern is random, 64KB I/O size with Read/Write ratio 70/30.

Do not forget, that this is just logical planning and estimation, every physical system can introduce additional overhead. In real systems, you can have bottlenecks not considered in this simplified calculation. Example of such bottleneck can be
  • storage controller, driver, firmware
  • low queue depth int the storage path (controller, switch, expander, disk), not allowing I/O parallelism
  • network or other bus latency
Therefore, any design should be always tested after implementation and performance results validated with expected numbers.

Are you doing similar design exercises? Any comment or suggestion is always welcome and appreciated.

Saturday, March 21, 2020

What's new in VMware vSphere 7

vSphere 7 has been announced and will be GA and available to download into our labs very soon. Let's briefly summarize what's new in vSphere 7 and put some links to other resources.

vSphere with Kubernetes

Project Pacific evolved into Integrated Kubernetes and Tanzu. vSphere has been transformed in order to support both VMs and containers. Tanzu Kubernetes Grid Service is how customers can run fully compliant and conformant Kubernetes with vSphere. However, when complete conformance with the open-source project isn’t required, the vSphere Pod Service can provide optimized performance and improved security through VM-like isolation. Both of these options are available through VMware Cloud Foundation 4.

The important takeaway is that Kubernetes is now built into vSphere which allows developers to continue using the same industry-standard tools and interfaces they’ve been using to create modern applications. vSphere Admins also benefit because they can help manage the Kubernetes infrastructure using the same tools and skills they have developed around vSphere.

References:
Improved DRS

DRS used to focus on the cluster state and the algorithm would recommend a vMotion when it would benefit the balance of the cluster as a whole. This meant that DRS used to achieve cluster balance by using a cluster-wide standard deviation model. The new DRS logic computes a VM DRS score on the hosts and moves the VM to a host that provides the highest VM DRS score. This means DRS cares less about the ESXi host utilization and prioritizes the VM “happiness”. The VM DRS score is also calculated every minute and this results in a much more granular optimization of resources.


Another new feature is "DRS Scalable Shares". Scalable Shares solves a problem many have been facing over the last decade or so, which is that DRS does not take the number of VMs in the pool into account when it comes to allocating resources.

References:
Refactored vMotions

Improvements in live migrations of monster workloads. Monster VMs with a large memory & CPU footprint, like SAP HANA and Oracle database backends, had challenges being live-migrated using vMotion. The performance impact during the vMotion process and the potentially long stun-time during the switchover phase meant that customers were not comfortable using vMotion for these large workloads. With vSphere 7, we are bringing back that capability as we have greatly improved the vMotion logic.

How the improvement was achieved?
  • Multi-threading
  • A dedicated vCPU is used for page tracing which means that the VM and its applications can keep working while the vMotion processes are occurring. Prior to vSphere 7, page tracing occurred on all vCPUs within a VM, which could cause the VM and its workload to be resource-constrained by the migration itself. 

References:
Assignable Hardware

There is a new framework called Assignable Hardware that was developed to extend support for vSphere features when customers utilize hardware accelerators. It introduces vSphere DRS (for initial placement of a VM in a cluster) and vSphere High Availability (HA) support for VM’s equipped with a passthrough PCIe device or a NVIDIA vGPU. Related to Assignable Hardware is the new Dynamic DirectPath I/O which is a new way of configuring passthrough to expose PCIe devices directly to a VM. The hardware address of a PCIe device is no longer directly mapped to the configuration (vmx) file of a virtual machine. Instead, it is now exposed as a PCIe device capability to the VM.

Together, Dynamic DirectPath I/O, NVIDIA vGPU, and Assignable Hardware are a powerful new combination unlocking some great new functionality. For example, let’s look at a VM that requires an NVIDIA V100 GPU. Assignable Hardware will now interact with DRS when that VM is powered on (initial placement) to find an ESXi host that has such a device available, claim that device, and register the VM to that host. If there is a host failure and vSphere HA kicks in, Assignable Hardware also allows for that VM to be restarted on a suitable host with the required hardware available.


References:
Bitfusion

Bitfusion stays in vSphere 7 as a Tech Preview feature. It allows us to leverage hardware accelerators (GPUs) across an infrastructure (over network fabric) and integrate it with evolving technologies such as FPGAs and custom ASICs using the same infrastructure. This is actually the first implementation of the software-defined composable infrastructure within VMware SDDC stack, therefore it is a very promising and very needed technology for modern applications such as ML/AI workloads.


References:
Precision Time Protocol (PTP)

Precision Time Protocol is helpful for financial and scientific applications requiring sub-millisecond accuracy. PTP requires VM Hardware 17 and it must be enabled on both an in-quest device and an ESXi service. Thus, you have to choose between NTP or PTP.


VM Template Management (Content Library)

VM template check-in and check-out operations with versioning feature. Content Library should also support of controlled replication into remote locations. With these vSphere 7 Content Library improvements, the Content Library is now a mature and very useful tool for VM template management.


References:
vSphere Lifecycle Manager (vLCM)

Desired state of ESXi hosts image (divers & firmware) and host configuration assigned to vSphere Clusters. It requires integration with hardware vendor system management like Dell OMIVV (OpenManage Integration for VMware vCenter) or HPE OneView for VMware vCenter.


References:
vSphere Update Planner

Update Planner is part of vLCM and it monitors current interoperability based on VMware HCL.


References:
vCenter Server Profiles

Export / Import of VCSA (vCenter) configuration. This is good for effective management of a lot of vCenters but please, do NOT expect export/import of vCenter objects like Clusters, VM Folders, Resource Pools, Virtual Switches, etc... This is export / import of VCSA configurations.

References:
VCSA multihoming

VCSA now supports multiple (up to 4) vNICs. The first vNIC (vNIC0) is for management, the second (vNIC1) is dedicated for vCenter Server HA and other vNICs can be used for other purposes like a backup or so.

vCenter and SSO Architecture

vCenter Server Appliance (VCSA) with embedded Platform Service Controler (PSC). External PSC is not supported and it leads into simple SSO topology.

Simplified Certificate Management

Much simpler SSL certificate management. Fewer certificates to manage. For example, vCenter has only two SSL certificates, a Machine SSL certificate, and Certification Authority Certificate. vSphere 7 introduced some vSphere Client UI improvements and also the REST API for certificate management for environments with more vCenters to manage. This is, of course, beneficial for environments implemented based on VMware Validated Designs (VVD) or VMware Cloud Foundation (VCF) environments which is the automated implementation of VVD.


Identity Federation

vCenter is not the key Identity Management System anymore. vSphere Client is using external authentication providers to optimize IDM integration in customer's environments. The first implementation supports only Microsoft Active Directory Federation Services (ADFS), however, VMware SSO still exists, therefore the customer can choose if he will use the brand new Identity Federation or keep existing AD/LDAP authentication through VMware SSO.



vSphere Trust Authority (vTA)

In vSphere 7, vCenter is not trusted authority anymore. vSphere 7 introduces vTA, which creates a hardware root of trust using a separate ESXi host cluster.


vSGX - Support of Intel Software Guard Extensions (SGX)

vSphere 7 introduces support of Intel Software Guard Extensions. I was blogging about SGX a few years ago in blog post Intel Software Guard Extensions (SGX) in VMware VM. Intel SGX allows applications to work with hardware to create a secure enclave that cannot be viewed by the guest OS or hypervisor. With SGX, applications can move sensitive logic and storage into this enclave. SGX is the Intel-only feature. AMD has SEV, which is a different approach.


References:
vSphere 7 Configuration Maximums

Hosts per single vCenter: 2,500
Powered-on VMs on single vCenter: 30,000

Hosts per SSO domain (vCenters in linked mode): 15,000
Powered-on VMs per SSO domain (vCenters in linked mode): 150,000

vCenter Server Latency - vCenter <-> vCenter: 150 ms
vCenter Server Latency - vCenter <-> ESXi: 150 ms
vCenter Server Latency - vSphere Client (web browser) <-> vCenter: 100 ms

The improvements between vSphere 6.7 and 7 are clearly visible in figure below.


For further configuration maximums, look at https://configmax.vmware.com/

Skyline Health for vSphere 7

Skyline Health for vSphere 7 is the unified health check tool for vSphere which works exactly as Skyline Health for vSAN available since vSphere 6.7 U3. It brings into infrastructure operations similar approach developers are doing in agile development methods - automated testing. You can think about it as a set of tests (health check tests) continually testing everything works as expected.


NVMe over Fabric


In vSphere 7, VMware added support for shared NVMe storage using NVMeoF. For external connectivity, NVMe over Fibre Channel and NVMe over RDMA (RoCE v2) are supported.

References:

Conclusion

vSphere 7 is another major vSphere Release. For those who work with VMware virtual infrastructures for ages (see old ESX 3i below), it is amazing where the VMware virtualization platform (vSphere 7, ESXi 7) evolved and what is possible nowadays.

Old good ESXi from Virtual Infrastructure 3 from 2006-ish year :-)
Nowadays, there are totally different reasons to upgrade to the latest vSphere version in comparison to the old days of server consolidation, TCO reduction, and better manageability. Top reasons to upgrade to vSphere 7 are
  • Scalability: The fastest path to the Hybrid/Multi-Cloud and increase scalability through leveraging HCI (Hyper-Converged Infrastructure) 
  • Security: Infrastructure security, secure audits, and account management
  • Performance: maximize performance and efficiency
  • Manageability: Reduce complexity, simplify software patching and hardware upgrades, proactive support technology and services
VMware vSphere 7 new features and incorporation of containers (Kubernetes) into the single platform is another step into VMware's vision to run any app on any cloud. On vSphere 7, you can run
  • monster workloads such as SAP HANA
  • traditional applications in virtual machines
  • modern distributed applications (Cloud Native Applications, CNA) containerized and orchestrated by Kubernetes
This is a great message to all of us, who invested a lot of time (years) to learn, test, design, implement and operate VMware technologies. I can honestly say, ... I LOVE VMWARE ...