Sequence Compression Benchmark


Benchmark machine

What compressors/dataset combinations were tested?

Each setting of each compressor is tested on every test dataset, except when it's difficult or impossible due to compressor limitations:

Benchmark process

The entire benchmark is orchestrated by a perl script. This script loads the lists of compressor settings and test data, and proceeds to test each combination that still has its measurements missing in the output directory. For each such combination (of compressor setting and test dataset), the following steps are performed:

  1. Compression is performed by piping the test data into the compressor. Compressed size and compression time is recorded. For compressed formats consisting of multiple files, sizes of all files are summed together.
  2. If compression time did not exceed 10 seconds, 9 more compression runs are performed, recording compression times. Compressed data from previous run is deleted before each next compression run.
  3. The next set of compression runs is performed to measure peak memory consumption. This set consists of the same number of runs as in steps 1-2 (either 1 or 10 runs). That is, for fast compressors and for small data the measurement is repeated 10 times.
  4. Decompression test run is performed. In this run decompressed data is piped to the md5sum -b - command. The resulting md5 signature is compared with that from the original file. In case of any mismatch this combination of compressor setting and dataset is disqualified and its measurements are discarded.
  5. Decompression time is measured. This time decompressed data is piped to /dev/null.
  6. If decompression completed within 10 seconds, 9 more decompression runs are performed and timed.
  7. Peak decompression memory is measured. The number of runs is same as in steps 5-6.
  8. The measurements are stored to a file. All compressed and temporary files are removed.

How time measurement was done?

Wall clock time was measured using Perl's Time::HiRes module (gettimeofday and tv_interval subroutines). The resulting time was recorded with millisecond precision.

How was the peak memory measured?

First, the compression command is stored in a temporary shell script file. Then it is executed via GNU Time, as /usr/bin/time -v >output.txt. "Maximum resident set size" value is extracted from the output. 1638 is then subtracted from this value and the result is stored as peak memory measurement. 1638 is the average "Maximum resident set size" measured by GNU Time in the same way for an empty script.

Why not measure memory consumption and time simultaneously?

Because measuring memory makes the task noticeably slower, especially for very fast tasks. Of course the downside of separate measurement is that it takes twice as long, but we decided that accurate timing results are worth it.

What measurements are collected for each test?

In cases where 10 values are collected, the average value is used by the benchmark web-site.

How are the other numbers computed?

Why not always perform the same number of runs in all cases?

Variable number of runs is the only way to have both accurate measurements and large test data (under the constraints of using one test machine, and running benchmark within reasonable time).

On one hand, benchmark takes lot of time. So much that some compressors can't be even tested at all on dataset larger than 10 MB in reasonable time. Therefore repeating every measurement 10 times is impractical. Or, it would imply restricting the test data to only small datasets.

On the other hand, measurements are slightly noisy. The shorter measured time, the more noisy its measurement. Thus for very quick runs, multiple runs allow for substantial noise suppression. For longer runs it does not make much difference, because the relative error is already small with longer times.

Using a threshold of 10 seconds seems a reasonable compromise between suppressing noise and including larger test data (and slow compressors).

Are there other ways to reduce measurement noise?

Other ways that we are using:

Additional improvement could be achieved by utilizing multiple machines to collect larger sample. We may explore this in the future.

Is the benchmark script available?

Yes, here:

It's provided for reference only, use at your own risk.