Sunday, April 06, 2025

Network throughput and CPU efficiency of FreeBSD 14.2 and Debian 10.2 in VMware

I'm long time FreeBSD user (since FreeBSD 2.2.8, 1998) and all these (27) years I lived with the impression that FreeBSD has the best TCP/IP network stack in the industry. 

Recently, I was blogging about testing network throughput of 10 Gb line where I have used default installation of FreeBSD 14.2 with iperf and realized that I need at least 4 but better 8 vCPUs in VMware virtual machine to achieve more than 10Gb network throughput. Colleague of mine told me that he does not see such huge CPU requirements in Debian and such information definitely caught my attention. That's the reason I have decided to test it.

TCP throughput tests were performed between two VMs in one VMware ESXi host, therefore, the network traffic does not need to go to the physical networking.

Physical server I use for these tests has CPU Intel Xeon CPU E5-2680 v4 @ 2.40GHz. This type of CPU has been introduced by Intel in 2016 so it is not the latest CPU technology but both operating system will have the same conditions.

VMs were provisioned on VMware ESX 8.0.3 hypervisor, which is the latest version at time of writing this article.

VM hardware used for iperf tests is

  • 1 vCPU (artificially limited by hypervisor to 2000 MHz)
  • 2 GB RAM
  • vNIC type is vmxnet3 
I run iperf -s on one VM01 and iperf -c [IP-OF-VM01] -t600 -i5 on VM02 and watching results.

Test results of FreeBSD 14.2

I can achieve 1.34 Gb/s without Jumbo Frames enabled. 
This is 1.5 Hz for 1 bit/s (2 GHz / 1.34 Gb)
During network test without Jumbo Frames enabled, iperf client consumes ~40% CPU usage and server also ~40% CPU usage. 

Test results of Debian 12.10

I can achieve 9.5 Gb/s
This is 0.21 Hz for 1 bit/s (2 GHz / 9.5 Gb). 
During network test, iperf client consumes ~50% CPU usage and server ~60% CPU usage. There is no difference when Jumbo Frames are enabled.

Comparison of default installations

Network throughput of default installation of Debian 12.10 is 7x better than default installation of FreeBSD 14.2. We can also say that Debian requires 7x less CPU cycles per bit/s.

FreeBSD Network tuning

In Debian, open-vm-tools 12.2.0 are automatically installed in default installation.

FreeBSD does not install open-vm-tools automatically but vmxnet driver is included in the kernel, therefore, open-vm-tools should not be necessary. Anyway, I installed open-vm-tools and explicitly enabled vmxnet in rc.conf, but there is no improvement in network throughput, which confirms that open-vm-tools are not necessary for optimal vmxnet networking.

So this is not the thing, so what else we can do to improve network throughput?

Network Buffers

We can try increase Network Buffers.

What is default setting of kern.ipc.maxsockbuf?

root@VM-CUST-0001-192-168-1-11:~ # sysctl -a | grep kern.ipc.maxsockbuf
kern.ipc.maxsockbuf: 2097152

What is default setting of net.inet.tcp.sendspace?

root@VM-CUST-0001-192-168-1-11:~ # sysctl -a | grep net.inet.tcp.sendspace
net.inet.tcp.sendspace: 32768

What is default setting of net.inet.tcp.recvspace?

root@VM-CUST-0001-192-168-1-11:~ # sysctl -a | grep net.inet.tcp.recvspace
net.inet.tcp.recvspace: 65536

Let's increase these values in /etc/sysctl.conf

# Increase maximum buffer size
kern.ipc.maxsockbuf=8388608

# Increase send/receive buffer sizes
net.inet.tcp.sendspace=4194304
net.inet.tcp.recvspace=4194304

and reboot the system.

When I test iperf with these deeper network buffers I can achieve 1.2 Gb/s which is even slightly worse than throughput with default settings (1.34 Gb/s) and far beyond the Debian throughput (9.5 Gb/s) , thus tuning of network buffers does not help and I revert settings to default.

Jumbo Frames

We can try enable Jumbo Frames.

I have Jumbo Frames enabled on the physical network, so I can try enable Jumbo Frames in FreeBSD and test the impact on network throughput.

Jumbo Frames are enabled in FreeBSD by following command

ifconfig vmx0 mtu 9000

We can test if Jumbo Frames are available between VM01 and VM02.

ping -s 8972 -D [IP-OF-VM02]

iperf test result: 
I can achieve 5 Gb/s with Jumbo Frames enabled. 
This is 0.4 Hz for 1 bit/s (2 GHz / 5 Gb)
iperf client consumes ~20% CPU usage and server also ~20% CPU usage

When I test iperf with Jumbo Frames enabled, I can achieve 5 Gb/s which is significantly (3.7x) higher throughput than throughput with default settings (1.34 Gb/s) but it is still less than Debian throughput (9.5 Gb/s) with default settings (MTU 1,500). It is worth to mention that Jumbo Frames helped not only with higher throughput but also with less CPU usage.

I have also tested iperf throughput on Debian with Jumbo Frames enabled and interestingly enough, I have get the same throughput (9.5 Gb/s) as I was able to achieve withou Jumbo Frames, so increasing MTU on Debian did not have any positive impact on network throughput nor CPU usage.

I have reverted MTU settings to default (MTU 1,500) and tried another performance tuning.

Enable TCP Offloading

We can enable TCP Offloading capabilities. TXCSUM, RXCSUM, TSO4, and TSO6 are enabled by default, but LRO (Large Receive Offload) is not enabled.

Let's enable LRO and test the impact on iperf throughput.

ifconfig vmx0 txcsum rxcsum tso4 tso6 lro

iperf test result:
I can achieve 7.29 Gb/s with LRO enabled and standard MTU 1,500  
This is 0.27 Hz for 1 bit/s (2 GHz / 7.29 Gb)
iperf client consumes ~20% CPU usage and server also ~25% CPU usage

When I test iperf with LRO enabled, I can achieve 7.29 Gb/s which is significantly better than throughput with default settings (1.34 Gb/s) and even better than Jumbo Frame impact (5 Gb/s). But it is still less the Debian throughput (9.5 Gb/s) with default settings.

Combination of TCP Offloading (LRO) and Jumbo Frames

What if the impact of LRO and Jumbo Frames are combined?

ifconfig vmx0 mtu 9000 txcsum rxcsum tso4 tso6 lro

iperf test result:
I can achieve 8.9 Gb/s with Jumbo Frames and LRO enabled. 
This is 0.22 Hz for 1 bit/s (2 GHz / 8.9 Gb)
During network test with Jumbo Frames and LRO enabled, iperf client consumes ~25% CPU usage and server also ~30% CPU usage. 

Conclusion

Network throughput

Network throughput within single VLAN between two VMs with default installations of Debian 12.10 is almost 10 Gb/s (9.5 Gb/s) with ~50% usage of single CPU @ 2 GHz.

Network throughput within single VLAN between two VMs with default installations of FreeBSD 14.2 is 1.34 Gb/s with ~40% usage of single CPU @ 2 GHz.

Debian 12.10 default installation has 7x higher throughput than default installation of FreeBSD 14.2.

Enabling LRO without Jumbo Frames increase FreeBSD network throughput to 7.29 Gb/s.

Enabling Jumbo Frames on FreeBSD increase throughput to 5 Gb/s. Enabling Jumbo Frames in Debian configuration does not help with higher Throughput.

Combination of Jumbo Frames and LRO increases FreeBSD network throughput to 8.9 Gb/s which is close to 9.5 Gb/s of default Debian system, but still lower result than network throughput on Debian.

CPU usage

In terms of CPU, Debian uses ~50% CPU on iperf client and  ~60% on iperf server.

FreeBSD with LRO and without Jumbo Frames uses ~20% CPU on iperf client and  ~25% on iperf server. When LRO is used in combination with Jumbo Frames, it uses ~25% CPU on iperf client and  ~30% on iperf server, but it can achieve 20% higher throughput.

What system has better networking stack?

Debian can achieve higher throughput even without Jumbo Frames (9.5 Gb/s vs 7.29 Gb/s)  but at the cost of higher CPU usage (50/60% vs 20/25%). When Jumbo Frames can be enabled the throughput is similar (9.5 Gb/s vs 8.9 Gb/s) but with significantly higher CPU usage in Debian (50/60% vs 25/30%). 

Key findings

Debian has all TCP Offloading Capabilities (LRO, TXCSUM,  RXCSUM, TSO) enabled on default installation. Disabled LRO in default FreeBSD installation is the main reason why FreeBSD has poor VMXNET3 network throughput on its default installation. When LRO is enabled, the FreeBSD network throughput is pretty decent but still lower than Debian. Jumbo Frames is another help for FreeBSD and does not help Debian at all, which is interesting. Combination of LRO and Jumbo Frames boost FreeBSD network performance to 8.9 Gb/s but Debian can achieve 9.5 Gb/s without Jumbo Frames. I will try to open discussion about this behavior in FreeBSD and Linux forums to understand some further details. I do not understand why enabling Jumbo Frames on Debian does not have positive impact on network throughput and lower CPU usage.

 

Sunday, March 30, 2025

Network benchmark (iperf) of 10Gb Data Center Interconnect

I wanted to test 10Gb ethernet link I have got as data center interconnect between two datacenters. I generally do not trust anything I have not tested.

If you want test something, it is important to have good testing methodology and toolset.

Toolset

OS: FreeBSD 14.2 is IMHO the best x86-64 operating system in terms of networking. Your mileage may vary.

Network benchmark testing tool: IPERF (iperf2) is weel known tool to benchmark network performance and bandwidth.

Hypervisor: VMware ESXi 8.0.3 is the best in class hypervisor to test varios virtual machines

Methodology

I have use two Virtual Machines. At the end I will test network throughput between two VMs, where one VM is in each end of network link (DC Interconnect). However before the final test (Test 4) of DC interconnect throughput, I will test network throughput (Test 1) within the same VM to test localhost throughput, (Test 2) between VMs within single hypervisor (ESXi) host to avoid using physical network, (Test 3) VMs across two hypervisors (ESXi) within single VLAN in one datacenter to test local L2 throughput.

Results

Test 1: Network throughput within the same VM to test localhost throughput

VMware Virtual Machines have following hardware specification:

  • 8 vCPU (INTEL XEON GOLD 6544Y @ 3.6 Ghz)
  • 8 GB RAM
  • 8 GB vDisk
  • 1 vNIC (vmxnet) 
1 iperf connection / 2 CPU Threads (-P not specified, default setting in use)
iperf server command: iperf -s
iperf client comand: iperf -c localhost -t 60
Network Throughput: 75.4Gb/s - 83Gb/s
CPU usage on server/client: 23%
MEM usage on server/client: ~500MB 
 
2 iperf connections / 4 CPU Threads (-P 2)
iperf server command: iperf -s
iperf client comand: iperf -c localhost -P 2 -t 60
Network Throughput: 90.8Gb/s - 92Gb/s
CPU usage on server/client: 28%
MEM usage on server/client: ~500MB

4 iperf connections / 8 CPU Threads (-P 4)
iperf server command: iperf -s
iperf client comand: iperf -c localhost -P 4 -t 60
Network Throughput: 88.5Gb/s - 89.1Gb/s
CPU usage on server/client: 29%
MEM usage on server/client: ~500MB 
 
8 iperf connections / 16 CPU Threads (-P 8)
iperf server command: iperf -s
iperf client comand: iperf -c localhost -P 8 -t 60
Network Throughput: 91.6Gb/s - 95.3Gb/s
CPU usage on server/client: 30%
MEM usage on server/client: ~500MB 
 
Tests with Higher TCP Windows Size (800kB)
 
1 iperf connection / 2 CPU Threads (-P not specified, default setting in use -w 800k)
iperf server command: iperf -s
iperf client comand: iperf -c localhost -w 800k -t 60
Network Throughput: 69.8Gb/s - 81.0Gb/s
CPU usage on server/client: 28%
MEM usage on server/client: ~500MB
 
2 iperf connections / 4 CPU Threads (-P 2 -w 800k)
iperf server command: iperf -s
iperf client comand: iperf -c localhost -P 2 -w 800k -t 60
Network Throughput: 69.8Gb/s - 69.9Gb/s
CPU usage on server/client: 28%
MEM usage on server/client: ~500MB
 
4 iperf connections / 8 CPU Threads (-P 4 -w 800k)
iperf server command: iperf -s
iperf client comand: iperf -c localhost -P 2 -w 800k -t 60
Network Throughput: 69.2Gb/s - 70.0Gb/s
CPU usage on server/client: 28%
MEM usage on server/client: ~500MB
 
8 iperf connections / 16 CPU Threads (-P 8 -w 800k)
iperf server command: iperf -s
iperf client comand: iperf -c localhost -P 8 -w 800k -t 60
Network Throughput: 72.6Gb/s - 74.0Gb/s
CPU usage on server/client: 28%
MEM usage on server/client: ~500MB

Test 2: Network throughput between VMs within hypervisor (no physical network)

VMware Virtual Machines with following hardware specification:

  • 8 vCPU (INTEL XEON GOLD 6544Y @ 3.6 Ghz)
  • 8 GB RAM
  • 8 GB vDisk
  • 1 vNIC (vmxnet) 
1 iperf connection / 2 CPU Threads (-P not specified, default setting in use)
iperf server command: iperf -s
iperf client comand: iperf -c 10.202.201.6 -t 60
Network Throughput: 6.5Gb/s - 6.71Gb/s
CPU usage on server: 70%
CPU usage on client: 30-50%
MEM usage on server/client: ~500MB 
 
2 iperf connections / 4 CPU Threads (-P 2)
iperf server command: iperf -s
iperf client comand: iperf -c 10.202.201.6 -P 2 -t 60
Network Throughput: 8.42Gb/s -8.62Gb/s
CPU usage on server: ~33%
CPU usage on client: ~30%
MEM usage on server/client: ~500MB
 
4 iperf connections / 8 CPU Threads (-P 4)
iperf server command: iperf -s
iperf client comand: iperf -c 10.202.201.6 -P 4 -t 60
Network Throughput: 19.5Gb/s - 20.2Gb/s
CPU usage on server: 85%
CPU usage on client: 48%
MEM usage on server/client: ~500MB
 
 

8 iperf connections / 16 CPU Threads (-P 8)
iperf server command: iperf -s
iperf client comand: iperf -c 10.202.201.6 -P 8 -t 60
Network Throughput: 17.1Gb/s - 18.4Gb/s
CPU usage on server: ~85%
CPU usage on client: ~30%
MEM usage on server/client: ~500MB
 

Tests with Higher TCP Windows Size (800kB)
 
1 iperf connection / 2 CPU Threads (-P not specified, default setting in use -w 800k)
iperf server command: iperf -s
iperf client comand: iperf -c 10.202.201.6 -w 800k -t 60
Network Throughput: 6.57Gb/s - 6.77Gb/s
CPU usage on server: 24%
CPU usage on client: 24%
MEM usage on server/client: ~500MB
 
2 iperf connections / 4 CPU Threads (-P 2 -w 800k)
iperf server command: iperf -s
iperf client comand: iperf -c 10.202.201.6 -P 2 -w 800k -t 60
Network Throughput: 7.96Gb/s -8.0Gb/s
CPU usage on server: ~30%
CPU usage on client: ~28%
MEM usage on server/client: ~500MB
 
4 iperf connections / 8 CPU Threads (-P 4 -w 800k)
iperf server command: iperf -s
iperf client comand: iperf -c 10.202.201.6 -P 4 -w 800k -t 60
Network Throughput: 15.8Gb/s -18.8Gb/s
CPU usage on server: ~85%
CPU usage on client: ~40%
MEM usage on server/client: ~500MB
 
8 iperf connections / 16 CPU Threads (-P 8 -w 800k)
iperf server command: iperf -s
iperf client comand: iperf -c 10.202.201.6 -P 8 -w 800k -t 60
Network Throughput: 19.1Gb/s - 22.8Gb/s
CPU usage on server: ~98%
CPU usage on client: ~30%
MEM usage on server/client: ~500MB
 

Test 3: Network throughput between VMs across two hypervisors within VLAN (25Gb switch ports) in one DC

VMware Virtual Machines have following hardware specification:

  • 8 vCPU (INTEL XEON GOLD 6544Y @ 3.6 Ghz)
  • 8 GB RAM
  • 8 GB vDisk
  • 1 vNIC (vmxnet) - connected to 25Gb physical switch ports
1 iperf connection / 2 CPU Threads (-P not specified, default setting in use)
iperf server command: iperf -s
iperf client comand: iperf -c 10.202.201.6 -t 60
Network Throughput: 6.1Gb/s - 6.34Gb/s
CPU usage on server: 23%
CPU usage on client: 17%
MEM usage on server/client: ~500MB 
 
2 iperf connections / 4 CPU Threads (-P 2)
iperf server command: iperf -s
iperf client comand: iperf -c 10.202.201.6 -P 2 -t 60
Network Throughput: 9.31Gb/s -10.8Gb/s
CPU usage on server: ~43%
CPU usage on client: ~30%
MEM usage on server/client: ~500MB
 
4 iperf connections / 8 CPU Threads (-P 4)
iperf server command: iperf -s
iperf client comand: iperf -c 10.202.201.6 -P 4 -t 60
Network Throughput: 19.5Gb/s - 20.2Gb/s
CPU usage on server: 85%
CPU usage on client: 48%
MEM usage on server/client: ~500MB

8 iperf connections / 16 CPU Threads (-P 8)
iperf server command: iperf -s
iperf client comand: iperf -c 10.202.201.6 -P 8 -t 60
Network Throughput: 17.1Gb/s - 18.4Gb/s
CPU usage on server: ~80%
CPU usage on client: ~50%
MEM usage on server/client: ~500MB

Tests with Higher TCP Windows Size (800kB)
 
1 iperf connection / 2 CPU Threads (-P not specified, default setting in use -w 800k)
iperf server command: iperf -s
iperf client comand: iperf -c 10.202.201.6 -w 800k -t 60
Network Throughput: 6.11Gb/s - 6.37Gb/s
CPU usage on server: 16%
CPU usage on client: 22%
MEM usage on server/client: ~500MB
 
2 iperf connections / 4 CPU Threads (-P 2 -w 800k)
iperf server command: iperf -s
iperf client comand: iperf -c 10.202.201.6 -P 2 -w 800k -t 60
Network Throughput: 9.81Gb/s -10.9Gb/s
CPU usage on server: ~39%
CPU usage on client: ~25%
MEM usage on server/client: ~500MB
 
4 iperf connections / 8 CPU Threads (-P 4 -w 800k)
iperf server command: iperf -s
iperf client comand: iperf -c 10.202.201.6 -P 4 -w 800k -t 60
Network Throughput: 16.5Gb/s -19.8Gb/s
CPU usage on server: ~85%
CPU usage on client: ~40%
MEM usage on server/client: ~500MB
 
8 iperf connections / 16 CPU Threads (-P 8 -w 800k)
iperf server command: iperf -s
iperf client comand: iperf -c 10.202.201.6 -P 8 -w 800k -t 60
Network Throughput: 17.7Gb/s - 18.2Gb/s
CPU usage on server: ~80%
CPU usage on client: ~50%
MEM usage on server/client: ~500MB

Test 4: Network throughput between VMs across two hypervisors across two interconnected VLANs across two DCs

VMware Virtual Machines have following hardware specification:

  • 8 vCPU (INTEL XEON GOLD 6544Y @ 3.6 Ghz)
  • 8 GB RAM
  • 8 GB vDisk
  • 1 vNIC (vmxnet) 
iperf server command: iperf -s
iperf client comand: iperf -c 10.202.201.6 -P 4 -t 60
Network Throughput: 9.74 Gb/s

Conclusion

Network throughput requires CPU cycles, therefore number of CPU cores matters.
 
iperf client by default uses one connection for generating network traffic where each connection uses 2 vCPUs (hyper-threading threads). In such default configuration I was able to achieve ~6.65 Gb/s in VM with at least 2 vCPU, which is not enough to test 10Gb/s datacenter interconnect. 
 
By using parameter -P 4, four parallel iperf client connections are initiated where each iperf connection uses 2 vCPUs (hyper-threading threads), therefore it can leverage all 8 vCPUs we have in testing VM.
 
By using parameter -P 8 in VM, eight parallel iperf client connections are initiated where each iperf client connection uses 2 vCPUs (hyper-threading threads), therefore it can leverage 16 vCPUs, but us we use only 8 vCPUs in our test machine, it only make bigger stress on existing CPUs and therfore it can have negative impact on overall network throughput.
 
The best practice is to use -P 4 for iperf client on machine with 8 CPUs as iperf client connections can be balanced across all 8 available CPUs. If you have more CPUs available, parameter -P should be the half of number of available CPUs.
  • 1 CPUs VM can achieve network traffic up to 5.83 Gb/s. During such network traffic, CPU is fully used (100% usage) and maximum single iperf connection throughput of 6.65 Gb/s cannot be acieved duw to CPU constraint.
  • 2 CPUs VM can achieve network traffic up to 6.65 Gb/s. During such network traffic, CPU is fully used (100% usage).
  • 4 CPUs VM with -P 2 is necessary to achieve network traffic up to 10 Gb/s.
  • 8 CPUs VM with -P 4 is necessary to achieve network traffic over 10 Gb/s. These 8 threads can generate 20 Gb/s which is good enough to test my 10Gb/s data center interconnect. 
Another iperf parametr which in theory could improve network throughput is the parameter -w which defines TCP Window Size. iperf by default uses TCP Window Size between 32kB and 64kB. By increasing TCP Window Size to 800kB (-w 800k) can slightly improve (~10%) performance during higher stress on CPU (-P 8 = 8 Processes / 16 Threads) across VMs. However, higher TCP Window Size (-w 800k) has negative impact (in some cases almost 30%) on localhost network throughput performance.

What real network throughput I have measured during this testing excercise? 

Localhost network throughput is significantly higher than network throughput across Virtual Machines or accross physical network and servers. We can achieve between 75 Gb/s and 95 Gb/s on Localhost. Network traffic does not need across virtual and physical hardware. It is logical that virtual and physical hardware introduces some bottlenecks.
 
Network throughput between VMs within single hypervisor can achieve 6.5 Gb/s with single process and two threads.  Up to 22.8 Gb/s (eight processes / sixteen threads and higher TCP Windows Size - 800kB) and 20.2 Gb/s with eight processes / sixteen threads and default TCP Windows Size. 

Network throughput between VMs within VLAN (25 Gb switch ports) in one data center can achieve up to 20.2 Gb/s (eight processes / sixteen threads and standard TCP Windows Size).
 
When you would need higher throughput than 20 Gb/s between VMware virtual machines, more CPU cores and special performance tuning of vNIC/vmxnet driver would need to be done. Such performance tunning would be about enabling Jumbo Frames (MTU 9,000, ifconfig_vmx0="inet <IP> netmask <NETMASK> mtu 9000") into guest OS, increasing Network Buffers in FreeBSD kernel (kern.ipc.maxsockbuf, net.inet.tcp.sendspace, net.inet.tcp.recvspace=4194304), Enable TCP Offloading (ifconfig_vmx0="inet <IP> netmask <NETMASK> mtu 9000 txcsum rxcsum tso4 tso6 lro"), Tune Interrupt Moderation, Use Multiple Queues aka RSS (sysctl net.inet.rss.enabled=1, sysctl net.inet.rss.bits=4). Fortunatelly enough, 20 Gb throughput is good enough to test my 10 Gb data center interconnect. 

Network throughput between VMs accros 10 Gb data center interconnect can achieve 9.74 Gb/s (four iperf connections / eight vCPUs in use). 9.74 Gb/s TCP throughput over 10 Gb/s data center ethernet interconnect is acceptable throughput.

Thursday, March 20, 2025

VMware PowerCLI (PowerShell) on Linux

VMware PowerCLI is very handy and flexible automation tool allowing automation of almost all VMware features. It is based on Microsoft PowerShell. I do not have any Microsoft Windows system in my home lab but I would like to use Microsoft PowerShell. Fortunately enough, Microsoft PowerShell Core is available for Linux. Here is my latest runbook how to leverage PowerCLI in Linux management workstation leveraging Docker Application packaging.

Install Docker in your Linux Workstation

This is out of scope of this runbook.

Pull official and verified Microsoft Powershell

sudo docker pull mcr.microsoft.com/powershell:latest

Now you can run powershell container interactively (-i) and in allocated pseudo-TTY (-t). Option -rm stands for "Automatically remove the container when it exits".

List container images

sudo docker image ls

Run powershell container

sudo docker run --rm -it mcr.microsoft.com/powershell

You can avoid image pull and run powershell container, it will pull image automatically during first attempt of run.

Install PowerCLI in PowerShell

Install-Module -Name VMware.PowerCLI -Scope CurrentUser -Force

Allow Untrusted Certificates

Set-PowerCLIConfiguration -InvalidCertificateAction Ignore -Confirm:$false

Now you can connect to vCenter and list VMs

Connect-VIServer -Server <vcenter-server> -User <username> -Password <password>

Get-VM


 

 

 

Saturday, March 15, 2025

How to update ESXi with unsupported CPU?

I have old unsupported servers in my lab used for ESXi 8.0.3. In such configuration, you cannot update ESXi by default procedure in GUI.

vSphere Cluster Update doesn't allow remediation

ESXi host shows unsupported CPU

Solution is to allow legacy CPU and update ESXi from shell with esxcli.

Allow legacy CPU

The option allowLegacyCPU is not available in the ESXi GUI (DCUI or vSphere Client). It must be enabled using the ESXi shell or SSH. Bellow are command to allow legacy CPU.

esxcli system settings kernel set -s allowLegacyCPU -v TRUE

You can verify it by command ...

esxcli system settings kernel list | grep allowLegacyCPU

If above procedure fails, the other option is to edit file /bootbank/boot.cfg and add allowLegacyCPU=true to the end of kernelopt line.

In my case, it look like ...

kernelopt=autoPartition=FALSE allowLegacyCPU=true

After modifying /bootbank/boot.cfg, ESXi configuration should be saved to make changes persistent across reboots.

 /sbin/auto-backup.sh

Reboot of ESXi is obviously required to make kernel option active.

reboot

After reboot, you can follow by standard system update procedure by ESXCLI method as documented below.

ESXi update procedure (ESXCLI method)

  1. Download appropriate ESXi offline depot. You can find URL of depot in Release Notes of particular ESXi version. You will need Broadcom credentials to download it from Broadcom support site.
  2. Upload (leveraging Datastore File Browser, scp, winscp, etc.) ESXi offline depot to some Datastore
    • in my case /vmfs/volumes/vsanDatastore/TMP
  3. List profiles in ESXi depot
    • esxcli software sources profile list -d /vmfs/volumes/vsanDatastore/TMP/VMware-ESXi-8.0U3d-24585383-depot.zip 
  4. Update ESXi to particular profile with no hardware warning
    • esxcli software profile update -d /vmfs/volumes/vsanDatastore/TMP/VMware-ESXi-8.0U3d-24585383-depot.zip -p ESXi-8.0U3d-24585383-no-tools --no-hardware-warning
  5. Reboot ESXi
    •   reboot

Hope this helps other folks in their home labs with unsupported CPUs.

Friday, February 07, 2025

Broadcom (VMware) Useful Links for Technical Designer and/or Architect

Lot of URLs have been changed after Broadcom acquisition of VMware. That's the reason I have started to document some of useful links for me.

VMware Product Configuration Maximums - https://configmax.broadcom.com (aka https://vmware.com/go/hcl)

Network (IP) ports Needed by VMware Products and Solutions - https://ports.broadcom.com/

VMware Compatibility Guide - https://compatibilityguide.broadcom.com/ (aka https://www.vmware.com/go/hcl)

VMware Product Lifecycle - https://support.broadcom.com/group/ecx/productlifecycle (aka https://lifecycle.vmware.com/)

Product Interoperability Matrix - https://interopmatrix.broadcom.com/Interoperability

VMware Hands-On Lab - https://labs.hol.vmware.com/HOL/catalog

Broadcom (VMware) Education / Learning - https://www.broadcom.com/education

VMware Validated Solutions - https://vmware.github.io/validated-solutions-for-cloud-foundation/

If you are independent consultant and have to open support ticket related to VMware Education or Certification you can use form at https://broadcomcms-software.wolkenservicedesk.com/web-form  

VMware Health Analyzer

 Do you know any other helpful link? Use comments below to let me know. Thanks.

Tuesday, February 04, 2025

How my Microsoft Windows OS syncing the time?

This is very short post with the procedure how to check time synchronization of Microsoft Windows OS in VMware virtual machine.

There are two options how time can be synchronized

  1. via NTP 
  2. via VMware Tools with ESXi host where VM is running 

The command w32tm /query /status shows the current configuration of time sync.

 Microsoft Windows [Version 10.0.20348.2582]  
 (c) Microsoft Corporation. All rights reserved.  
 C:\Users\david.pasek>w32tm /query /status  
 Leap Indicator: 0(no warning)  
 Stratum: 6 (secondary reference - syncd by (S)NTP)  
 Precision: -23 (119.209ns per tick)  
 Root Delay: 0.0204520s  
 Root Dispersion: 0.3495897s  
 ReferenceId: 0x644D010B (source IP: 10.77.1.11)  
 Last Successful Sync Time: 2/4/2025 10:14:10 AM  
 Source: DC02.example.com  
 Poll Interval: 7 (128s)  
 C:\Users\david.pasek>   

If Windows OS is connected to Active Directory (this is my case), it synchronize time with AD via NTP by default. This is visible in the output of command w32tm /query /status.

You are dependent on Active Directory Domain Controllers, therefore, the correct time in Active Directory Domain Controllers is crucial. I was blogging how to configure time in virtualized Active Directory Domain Controller back in 2011. Is is very old post but it still should work.

To check if VMware Tools are syncing time with ESXi host use following command

 C:\>"c:\Program Files\VMware\VMware Tools\VMwareToolboxCmd.exe" timesync status  
 Disabled  

VMware Tools time sync is disabled by default, which is the VMware best practice. It is highly recommended to not synchronize time with underlaying ESXi host and leverage NTP sync over network with trusted time provider. This will help you in case someone will make configuration mistake and time is not configured properly in particular ESXi.  

Hope you find this useful.

Friday, December 20, 2024

CPU cycles required for general storage workload

I recently published a blog post about CPU cycles required for network and VMware vSAN ESA storage workload. I realized it would be nice to test and quantify CPU cycles needed for general storage workload without vSAN ESA backend operations like RAID/RAIN and compression.

Performance testing is always tricky as it depends on guest OS, firmware, drivers, and application, but we are not looking for exact numbers and approximations are good enough for a general rule of thumb helping pure designer during capacity planning. 

My test environment was old Dell PowerEdge R620 (Intel Xeon CPU E5-2620 @ 2.00GHz), with ESXi 8.0.3 and Windows Server 2025 in a Virtual Machine (2 vCPU @ 2 GHz, 1x para-virtualized SCSI controller/PVSCSI, 1x vDisk). Storage subsystem was VMware VMFS datastore on local NVMe consumer-grade disk (Kingston SNVS1000GB flash).

Storage tests were done using an old good Iometer.

Test VM had total CPU capacity of 4 GHz (4,000,000,000 Hz aka CPU Clock Cycles)

Below are some test results to help me define another rule of thumb.

TEST - 512 B, 100% read, 100% random - 4,040 IOPS @ 2.07 MB/s @ avg response time 0.25 ms

  • 15.49% CPU = 619.6 MHz
  • 619.6 MHz  (619,600,000 CPU cycles) is required to deliver 2.07 MB/s (16,560,000 b/s)
    • 37.42 Hz to read 1 b/s
    • 153.4 KHz for reading 1 IOPS (512 B, random)

TEST - 512 B, 100% write, 100% random - 4,874 IOPS @ 2.50 MB/s @ avg response time 0.2 ms

  • 19.45% CPU = 778 MHz
  • 778 MHz  (778,000,000 CPU cycles) is required to deliver 2.50 MB/s (20,000,000 b/s)
    • 38.9 Hz to write 1 b/s
    • 159.6 KHz for writing 1 IOPS (512 B, random)

TEST - 4 KiB, 100% read, 100% random - 3,813 IOPS @ 15.62 MB/s @ avg response time 0.26 ms

  • 13.85% CPU = 554.0 MHz
  • 554.0 MHz  (554,000,000 CPU cycles) is required to deliver 15.62 MB/s (124,960,000 b/s)
    • 4.43 Hz to read 1 b/s
    • 145.3 KHz for 1 reading IOPS (4 KiB, random)

TEST - 4 KiB, 100% write, 100% random - 4,413 IOPS @ 18.08 MB/s @ avg response time 0.23 ms

  • 21.84% CPU = 873.6 MHz
  • 873.6 MHz  (873,600,000 CPU cycles) is required to deliver 18.08 MB/s (144,640,000 b/s)
    • 6.039 Hz to write 1 b/s
    • 197.9 KHz for writing 1 IOPS (4 KiB, random)

TEST - 32 KiB, 100% read, 100% random - 2,568 IOPS @ 84.16 MB/s @ avg response time 0.39 ms

  • 10.9% CPU = 436 MHz
  • 436 MHz  (436,000,000 CPU cycles) is required to deliver 84.16 MB/s (673,280,000 b/s)
    • 0.648 Hz to read 1 b/s
    • 169.8 KHz for reading 1 IOPS (32 KiB, random)

TEST - 32 KiB, 100% write, 100% random - 2,873 IOPS @ 94.16 MB/s @ avg response time 0.35 ms

  • 14.16% CPU = 566.4 MHz
  • 566.4 MHz  (566,400,000 CPU cycles) is required to deliver 94.16 MB/s (753,280,000 b/s)
    • 0.752 Hz to write 1 b/s
    • 197.1 KHz for writing 1 IOPS (32 KiB, random)

TEST - 64 KiB, 100% read, 100% random - 1,826 IOPS @ 119.68 MB/s @ avg response time 0.55 ms

  • 9.06% CPU = 362.4 MHz
  • 362.4 MHz  (362,400,000 CPU cycles) is required to deliver 119.68 MB/s (957,440,000 b/s)
    • 0.37 Hz to read 1 b/s
    • 198.5 KHz for reading 1 IOPS (64 KiB, random)

TEST - 64 KiB, 100% write, 100% random - 2,242 IOPS @ 146.93 MB/s @ avg response time 0.45 ms

  • 12.15% CPU = 486.0 MHz
  • 486.0 MHz  (486,000,000 CPU cycles) is required to deliver 149.93 MB/s (1,199,440,000 b/s)
    • 0.41 Hz to write 1 b/s
    • 216.7 KHz for writing 1 IOPS (64 KiB, random)

TEST - 256 KiB, 100% read, 100% random - 735 IOPS @ 192.78 MB/s @ avg response time 1.36 ms

  • 6.66% CPU = 266.4 MHz
  • 266.4 MHz  (266,400,000 CPU cycles) is required to deliver 192.78 MB/s (1,542,240,000 b/s)
    • 0.17 Hz to read 1 b/s
    • 362.4 KHz for reading 1 IOPS (256 KiB, random)

TEST - 256 KiB, 100% write, 100% random - 703 IOPS @ 184.49 MB/s @ avg response time 1.41 ms

  • 7.73% CPU = 309.2 MHz
  • 309.2 MHz  (309,200,000 CPU cycles) is required to deliver 184.49 MB/s (1,475,920,000 b/s)
    • 0.21 Hz to write 1 b/s
    • 439.9 KHz for writing 1 IOPS (256 KiB, random)

TEST - 256 KiB, 100% read, 100% seq - 2784 IOPS @ 730.03 MB/s @ avg response time 0.36 ms

  • 15.26% CPU = 610.4 MHz
  • 610.4 MHz  (610,400,000 CPU cycles) is required to deliver 730.03 MB/s (5,840,240,000 b/s)
    • 0.1 Hz to read 1 b/s
    • 219.25 KHz for reading 1 IOPS (256 KiB, sequential)

TEST - 256 KiB, 100% write, 100% seq - 1042 IOPS @ 273.16 MB/s @ avg response time 0.96 ms

  • 9.09% CPU = 363.6 MHz
  • 363.6 MHz  (363,600,000 CPU cycles) is required to deliver 273.16 MB/s (2,185,280,000 b/s)
    • 0.17 Hz to write 1 b/s
    • 348.4 KHz for writing 1 IOPS (256 KiB, sequential)

TEST - 1 MiB, 100% read, 100% seq - 966 IOPS @ 1013.3 MB/s @ avg response time 1 ms

  • 9.93% CPU = 397.2 MHz
  • 397.2 MHz  (397,200,000 CPU cycles) is required to deliver 1013.3 MB/s (8,106,400,000 b/s)
    • 0.05 Hz to read 1 b/s
    • 411.18 KHz for reading 1 IOPS (1 MiB, sequential)

TEST - 1 MiB, 100% write, 100% seq - 286 IOPS @ 300.73 MB/s @ avg response time 3.49 ms

  • 10.38% CPU = 415.2 MHz
  • 415.2 MHz  (415,200,000 CPU cycles) is required to deliver 300.73 MB/s (2,405,840,000 b/s)
    • 0.17 Hz to write 1 b/s
    • 1.452 MHz for writing 1 IOPS (1 MiB, sequential)

Observations

We can see that the CPU cycles required to read 1 b/s vary based on I/O size, Read/Write, and Random/Sequential pattern.

  • Small I/O (512 B, random) can consume almost 40 Hz to read or write 1 b/s. 
  • Normalized I/O (32 KiB, random) can consume around 0.7 Hz to read or write 1 b/s
  • Large I/O (1 MiB, sequential) can consume around 0.1 Hz to read or write 1 b/s
If we use the same approach as for vSAN and average 32 KiB I/O (random) and 1 MiB I/O (sequential), we can define the following rule of thumb 
"0.5 Hz of general purpose x86-64 CPU (Intel Sandy Bridge) is required to read or write 1 bit/s from local NVMe flash disk"

If we compare it with the 3.5 Hz rule of thumb for vSAN ESA RAID-5 with compression, we can see the vSAN ESA requires 7x more CPU cycles, but it makes perfect sense because vSAN ESA does a lot of additional processing on the backend. Such processing mainly involves data protection (RAID-5/RAIN-5) and compression.  

I was curious how much CPU cycles require a non-redundant storage workload and observed numbers IMHO make sense.

Hope this helps others during infrastructure design exercises.