Thursday, April 16, 2020

Logical design - storage performance sizing

Storage performance is always a kind of magic because multiple factors come in to play and not all disks are equal, however, in logical design, we have to do some math because capacity (and performance) planning is a very important part of logical design.

How I do it? I do math with some performance assumptions.

Here are assumptions about various disk type performance I use for my capacity planning exercises.

The below numbers are estimated for the random I/O of 64KB I/O size.

Mechanical hard drives
SAS 15k - 200 IOPS
SATA 7k - 80 IOPS

Read Intensive Solid-state disks (SSD)
SATA Read Intensive SSD - 5,000 IOPS (read) / 1,500 IOPS (write)
SAS Read Intensive SSD - 10,000 IOPS (read) / 2,000 IOPS (write)
NVMe Read Intensive SSD - 30,000 IOPS (read) / 2,500 IOPS (write)

Mixed Used Solid-state disks (SSD)
SATA Mixed Used SSD - 5,000 IOPS (read) / 1,800 IOPS (write)
SAS Mixed Used SSD - 12,500 IOPS (read) / 5,000 IOPS (write)
NVMe Mixed Used SSD - 45,000 IOPS (read) / 10,000 IOPS (write)

Write Intensive Solid-state disks (SSD)
SAS Write Intensive SSD - 12,500 IOPS / 7,500 IOPS (write)

SSD assumptions are based on hardware vendors' spec sheets. One of these spec sheets is available here https://www.slideshare.net/davidpasek/dell-power-edge-ssd-performance-specifications

So with these assumptions, the performance math is relatively simple.

Let's have for example 4x SAS Read Intensive SSD within a disk group.
Such a disk group should have the aggregated read performance 4 x 10,000 IOPS = 40,000 IOPS

As we see in the performance numbers above, there is a significant performance difference between SSD read and write.

For our SAS Read Intensive SSD disk we have 10,000 IOPS for 100% read but only 2,000 IOPS for 100% write so we have to normalize these numbers based on expected read/write ratio. If the planned storage workload is 70% read and 30% write, we can assume the single SSD disk will give as 7,000 + 600 IOPS, so in total 7,600 IOPS.

Storage is typically protected by some RAID protection, where the write penalty comes into play. Write penalty is the number of I/O operations required on the backend for a single frontend I/O operation.

Here are write penalties for various RAID protections
RAID 0 (no protection) - write penalty 0
RAID 1 (mirror) - write penalty 2
RAID 5 (erasure coding / single parity) - write penalty 4
RAID 6 (erasure coding / souble parity) - write penalty 6

So, let's calculate the write penalty and write overhead.

If the planned storage workload is 70% read and 30% write and we have total aggregated normalized performance 30,400 IOPS (4 x 7,600) and we have to split the available performance into READ bucket and WRITE bucket.

In our example scenario, we have
READ bucket (70%) - 21,280 IOPS
WRITE bucket (30%) - 9,120 IOPS

Now we have to apply write penalty on write bucket. So let's say we would like to have RAID 5 protection, therefore 9,120 IOPS available on the backend can handle only 2,280 IOPS coming from the frontend.

Based on these calculations, the aggregated performance of RAID 5 protected disk group of 4 Read Intensive SSD disks should be able to handle 23,560 IOPS (21,280 + 2,280) of front-end storage workload.  Please note, that the considered workload pattern is random, 64KB I/O size with Read/Write ratio 70/30.

Do not forget, that this is just logical planning and estimation, every physical system can introduce additional overhead. In real systems, you can have bottlenecks not considered in this simplified calculation. Example of such bottleneck can be
  • storage controller, driver, firmware
  • low queue depth int the storage path (controller, switch, expander, disk), not allowing I/O parallelism
  • network or other bus latency
Therefore, any design should be always tested after implementation and performance results validated with expected numbers.

Are you doing similar design exercises? Any comment or suggestion is always welcome and appreciated.

No comments: