Sequence Compression Benchmark


Unfortunately some potentially useful compressors are incomplete, in terms of supported input or interface. In order to make them usable for typical biological sequence compression tasks, we had to make special wrappers for many compressors. These wrappers are provided below, along with other required supporting tools.

Using wrappers is not an ideal solution:

  1. Wrappers are not distributed with the original compressors, which means that those compressors are still incomplete on their own. (Someone planning to use those compressors would need to download our wrapper, or make their own).
  2. Wrappers are written in Perl, introducing a dependency.
  3. Wrappers are most probably slower compared to if the required functionality was supported by compressors natively.
  4. We spent minimal effort developing these wrappers, only doing necessary steps in simplest ways.
  5. We did not benchmark wrapped compressors without the wrappers. Therefore our results are not representative for compressor on its own.

On the other hand, without wrappers, those compressors are often useless in practical tasks. Only through wrappers they are able to compress and decompress actual relevant sequence datasets, in the required streaming mode.

Functionality added by the wrappers, to those compressors missing it, includes:

You are welcome to use the wrappers and utilities from this page, or to adapt them to your needs. However please be aware that they are provided with no guarantees. We made these tools only for making the benchmark possible, they don't have production level quality, features, performance, or safety. Use them entirely on your own risk.

Let us know if you have any questions or suggestions regarding the wrappers.

Wrapper scripts

All wrappers are in public domain.

Utilities written in Perl

This tool is in public domain.

Utilities written in C

All these utilities are in public domain.

Compressor fixes

LFastqC. Minimal fix which makes it possible to use it (many other problems are worked around in the wrapper). Removes one of the tar steps. Waits for compression and decompression sub-tasks to complete before processing their output.

Shared under the Apache License 2.0, same license used in LFastqC GitHub repository.