A sought after feature in our cloud has been something to support Hadoop/Spark functionality. So far our I/O performance hasn't been anything to write home (or to the internet) about, but let's hope that will change soon.
What we basically want to do is provide high IOPS capable temporary data processing VMs. On the hardware side we have 2-socket servers with 256GB of memory and 6 * 400GB SSD disks in a raid-0 configuration.
We were wondering how to best make use of these servers and their disks in OpenStack. One option would be to go with the default QCOW2 file-backed VM images. That sounds like driving a Ferrari with snow-chains on, so no.
We wanted to see if raw image backed VMs are fast enough, or does LVM have significant advantages over that? If we use LVM, we don't want to use straight LVM volumes. Sanitizing the environment between customers by having to zero all disks after each VM is terminated takes time, and there are better options. The better options are sparse and thin volumes. Sparse volumes are supported in OpenStack, but we've had issues with them previously. Thin volumes seem like the way to go, and we've had good experiences with them in general. To make it work in OpenStack, I made a few line patch into the nova code so it creates all LVM volumes as thin volumes (Yeah, no you don't want it, it falls more into the "hack" category).
Non-scientific tests but tests nonetheless
So we ran a test on two compute hosts. One of them had raw file-backed VMs on an XFS filesystem, the other used thin volumes for its VMs. Each compute host ran 4 VMs, and each VM ran fio randrw tests. The total job size was 64GB, and we ran 1-64 threads. We ran the tests with the fio flags --bsrange=4k-1M. These tests were run 4 times per VM, so we got a total of 16 results for each of the compute hosts.
We took the average of the results for each thread amount. And well, see for yourself.
I think we'll go with LVM thin volumes.
Geek. Product Owner @CSCfi