Intel Optane SSD/IMDT Microbenchmark

Symas Corp., September 2018

As noted above, the Optane SSDs also support a feature known as IMDT, Intel Memory Drive Technology which makes the Optane SSD appear as RAM to the OS. Now we're going to examine the performance of the previous Optane SSD workloads with increased amounts of RAM, and compare that to the performance of IMDT.

First we'll continue using the 300GB databases as we operated on them before, with only 64GB of system RAM available. Then we'll increase system RAM to 128GB and retest, and then again with system RAM increased to the full 192GB installed in the machine. All of these tests are using ext4 filesystem with journaling disabled. As before, the test uses a single writer with an increasing number of readers, up to the maximum number of hardware threads in the system. The number of reader threads is set to 1, 2, 4, 8, 16, 32, and 63 threads for each successive run. All of the threads operate on randomly selected records in the database. Additional details of the test setup are available on the previous page. Thanks again to Packet and Intel for providing the test system and all their assistance.

LMDB at 64GB, 128GB, and 192GB RAM

The graphs show a pretty consistent story, more RAM gives faster throughput, as expected. The performance gain isn't linear; there's a much larger gain in read performance going from 128GB to 192GB than from 64GB to 128GB. That's also normal; once the amount of RAM encompasses the size of the B+tree's interior/branch page working set, reads go much faster.

The Optane SSD is generally faster than the Flash SSD. Above 8 reader threads, the readers' bandwidth utilization starts to choke out writers on the Optane SSD, and its write performance drops below the Flash SSD's. The implication here is that the Optane SSDs favor reads at heavier load levels.

It's worth pointing out that up to the 8 readers/1 writer load level, the Optane SSD is ~3.5x faster than the Flash SSD for reads, and ~2x faster for writes. For mixed workloads up to 90/10 read/write, the Optane SSD is definitely worth looking into. For systems that experience high random write rates, the Optane SSD is clearly better than Flash. A throughput advantage of this magnitude means LMDB users can literally hit their SLA with 1/2 as much RAM as otherwise.

Or in summary

RocksDB at 64GB, 128GB, and 192GB RAM

RocksDB read throughput shows some benefit from increased RAM size, but write speeds are basically unchanged. Aside from a noticeable drop in write throughput on the 128GB/2 readers test (apparently due to an extremely slow compaction phase) there's not much to see on the write side.
Note that the RocksDB cache size is set to 32GB, 64GB, and 96GB, respectively, for these tests. (Obviously not a concern for LMDB since it requires no cache tuning.)

Or in summary

IMDT Comparison

Next the machine is reconfigured to use the Optane SSDs with IMDT, and rebooted. On reboot the system believes it has 1.4TB of RAM. At this point we can only use the Flash SSD for database storage now, as the Optane SSDs are all acting as RAM. First, a comparison of the Flash throughput with 192GB RAM, vs throughput with 1.4TB IMDT:

LMDB

With a peak of over 2.1M read ops per second, it's over 3x faster than the 192GB RAM test. IMDT indeed boosts the read performance to the level of a fully in-memory DB. Write throughput is still limited by the speed of the underlying Flash SSD.

RocksDB

The peak read throughput here only increases 61%. Still a significant improvement for RocksDB, but nothing special as DBs go. Write throughput is slower at all load levels using IMDT.

Larger DBs

With 1.4TB RAM available, we proceeded to test new DBs of size 400GB, 800GB, and 1.2TB to see if they still behaved as fully in-memory DBs.

Also, we had to abandon testing of RocksDB for these tests because it was so slow and taking too long to load each new DB. Only results for LMDB are presented below.

Loading

DB	Load Time			CPU	DB Size	Context Switches		FS Ops
	Wall	User	Sys	%	KB	Vol	Invol	In	Out	Write Amp
400GB	24:18.73	03:45.88	19:18.94	94	403354944	13799	1538	16	806726520	8.0672652
800GB	40:53.68	06:02.74	34:37.74	99	806709744	255	2137	968	1613419544	8.06709772
1200GB	01:06:24.00	09:33.75	56:38.41	98	1210064592	2579	8744	0	2420133840	8.0671128

The load speed is essentially linear. It's also interesting to note that the FS Inputs column is effectively zero for each of these loads, which means no metadata was ever forced out of cache and needed to be reread. I.e., the loads are behaving as expected for a fully in-memory DB. Also, the write amplification factor is essentially constant, which is typical for LMDB. (It's really logN with a very large base, so it grows very slowly as DB sizes increase.) The 25% overhead we saw previously, from filesystem metadata overhead, is absent here. Again, this is in line with a fully in-memory workload; everything fit into memory so there were no extraneous metadata flushes from the ext4 fs driver. However, compared to the 13:21.04 it took to load a 300GB DB without IMDT, there's clearly some heavy overhead for IMDT to support writes here.

Running

All the DBs are still using 16 byte keys and 4000 byte values. For the 400GB DB we're using 100000000 records. For the 800GB DB we're using 200000000 records. For the 1200GB DB we're using 300000000 records.

Performance for the 400GB DB looks good. Read throughput peaks at almost 1.7M reads/sec.

Performance with 800GB is slightly slower, but still good. Read throughput peaks at almost 1.2M reads/sec.

Performance with 1200GB is significantly slower. The numbers are what we'd expect for a typically larger-than-RAM execution, peaking at only 86K reads/sec. The results are so far out of line from the previous two tests that we're not very confident in the reliability of this result. We encourage you to rerun and verify these results for yourself.

Conclusion

Using the IMDT with Optane SSDs can significantly boost performance with larger DBs, making a system with limited RAM perform as if it had much more physical RAM. There are limits though, and it appears that exceeding about a 4:1 ratio of Optane:RAM will nullify the performance benefits. (Note that this 4:1 ratio is based on uncertain results from the 1200GB test, so take it with a grain of salt. Your mileage will certainly vary as far as the ratio of Optane:RAM. The Setup Guide shows that configs up to 8:1 are supported.)

Files

The files used to perform these tests are all available for download.

255306652 Aug 27 04:01 imdt1.tgz Command scripts, output, atop records
411606600 Sep 23 20:01 imdt2.tgz Command scripts, output, atop records
LibreOffice spreadsheet with tabulated results here. The source code for the benchmark drivers is all on GitHub. We invite you to run these tests yourself and report your results back to us.