Thursday, May 16, 2019

The SPECTRE story continues ... now it is MDS

Last year (2018) started with shocked Intel CPU vulnerabilities Spectre and Meltdown and two days ago was published another SPECTRE variant know as Microarchitectural Data Sampling or MDS. It was obvious from the beginning, that this is just a start and other vulnerabilities will be found over time by security experts and researchers. All these vulnerabilities are collectively known as Speculative Executions aka SPECTRE variants.

Here is the timeline of particular SPECTRE variant vulnerabilities along with VMware Security Advisories.

2018-01-03 - Spectre (speculative execution by performing a bounds-check bypass) / Meltdown (speculative execution by utilizing branch target injection) - VMSA-2018-0002.3 
2018-05-21 - Speculative Store Bypass (SSB) - VMSA-2018-0012.1
2018-08-14 - L1 Terminal Fault - VMSA-2018-0020
2019-05-14 - Microarchitectural Data Sampling (MDS) - VMSA-2019-0008

I published several blog posts about SPECTRE topics in the past

The last two vulnerabilities "L1 Terminal Fault (aka L1TF)" and "Microarchitectural Data Sampling (aka MDS)" are related to Intel CPU Hyper-threading. As per statement here AMD is not vulnerable.

When we are talking about L1TF and MDS, a typical question of my customers having Intel CPUs is if they are safe when Hyper-Threading is disabled in the BIOS. The answer is yes but you would have to power cycle the physical system to reconfigure BIOS settings which can be pretty annoying and time-consuming in larger environments. That's' why VMware recommends leveraging SDDC concept and set it by software change - ESXi hypervisor advanced setting. It is obviously much easier to change two ESXi advanced settings VMkernel.Boot.hyperthreadingMitigation and VMkernel.Boot.hyperthreadingMitigationIntraVM to the value true and disable hyperthreading in ESXi CPU scheduler without a need of physical server power cycle. You can do it by PowerCLI one-liner in a few minutes which is much more flexible than BIOS changes.

So that's it from the security point of view but what about performance?

It is simple and obvious. When hyper-threading is disabled you will obviously lose the CPU performance benefit of Hyper-Threading technology which can be somewhere between 5 - 20% and heavily depends on the type of particular workload. Let's be absolutely clear here. Until the issue is addressed inside the CPU hardware architecture it will be always the tradeoff between security and performance. If I understand Intel messaging correctly, the first hardware solution for their Hyper-Threading is implemented in Cascade Lake family. You can double check it by yourself here ...
Side Channel Mitigation by Product CPU Model
https://www.intel.com/content/www/us/en/architecture-and-technology/engineering-new-protections-into-hardware.html

You can get hyperthreading performance back but only in VMware vSphere 6.7 U2. VMware vSphere 6.7 U2 includes new scheduler options that secure it from the L1TF vulnerability, while also retaining as much performance as possible. This new scheduler has introduced ESXi advanced setting
VMkernel.Boot.hyperthreadingMitigationIntraVM which allows you to set it to FALSE (this is the default) and leverage HyperThreading benefits within Virtual Machine but still do isolation between VMs when VMkernel.Boot.hyperthreadingMitigation is set to TRUE. This possibility is not available in older ESXi hypervisors and there are no plans to backport it. For further info read paper "Performance of vSphere 6.7 Scheduling Options".

By the way, last year I have spent a significant time to test the performance impact of SPECTRE and MELTDOWN vulnerabilities remediations. If you want to check the results of the performance tests of Spectre/Meltdown 2018 variants along with the conclusion, you can read my document published on SlideShare. It would be cool to perform the same tests for L1TF and MDS but it would require additional time effort. I'm not going to do so until sponsored by some of my customers. But anybody can do it by himself as a test plan is described in the document below.



No comments: