Symas Corp., February 2015
Some of the DB engines also have particular compression algorithms statically built into their code bases. E.g., TokuDB has its own QuickLZ and LZMA implementations. These are tested as well, for comparison purposes.
Due to the large volume of test data here, each DB engine's results are
presented on their own page.
Compression makes the most sense for larger-than-memory workloads, because it both allows more records to be held in memory at once, and the reduced size potentially speeds up I/O operations. (In aggregate, transfer time per original uncompressed byte is decreased.)
Back in the 1980s when Howard was developing file compression algorithms, compression was all about saving storage space - cramming as much as possible into archives stored on floppy disks. These days with hard drives going for around $30/TB space isn't much of a concern any more; it's all about transfer speed - from memory to disk, or across a network link.
The small in-memory tests conducted here give an idea of the absolute maximum speed of the respective compressors, but they don't really reflect how disk-bound workloads will be affected. That will come in a future benchmarking effort.
The source code for the benchmark drivers is all on GitHub. We invite you to run these tests yourself and report your results back to us.
The software versions we used:
Software revisions used: violino:/home/software/leveldb> g++ --version g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2 Copyright (C) 2013 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. violino:/home/software/leveldb> git log -1 --pretty=format:"%H %ci" master e353fbc7ea81f12a5694991b708f8f45343594b1 2014-05-01 13:44:03 -0700 violino:/home/software/basho_leveldb> git log -1 --pretty=format:"%H %ci" develop d1a95db0418d4e17223504849b9823bba160dfaa 2014-08-21 15:41:50 -0400 violino:/home/software/db-5.3.21> ls -l README -rw-r--r-- 1 hyc hyc 234 May 11 2012 README violino:/home/software/HyperLevelDB> git log -1 --pretty=format:"%H %ci" master 02ad33ccecc762fc611cc47b26a51bf8e023b92e 2014-08-20 16:44:03 -0400 violino:~/OD/mdb> git log -1 --pretty=format:"%H %ci" a054a194e8a0aadfac138fa441c8f67f5d7caa35 2014-08-24 21:18:03 +0100 violino:/home/software/rocksdb> git log -1 --pretty=format:"%H %ci" 7e9f28cb232248b58f22545733169137a907a97f 2014-08-29 21:21:49 -0700 violino:/home/software/ft-index> git log -1 --pretty=format:"%H %ci" master f17aaee73d14948962cc5dea7713d95800399e65 2014-08-30 06:35:59 -0400 violino:/home/software/wiredtiger> git log -1 --pretty=format:"%H %ci" 1831ce607baf61939ddede382ee27e193fa1bbef 2014-08-14 12:31:38 +1000