Our benchmark explores a specific topic: Comparison of reference-free compressors of FASTA-formatted sequence data, running in Linux command line, streaming uncompressed data on input and output. Within this scope, we tried our best to evaluate a comprehensive array of compressors on a broad range of data.
However, you may need a compressor for other kind of data, or otherwise have a different question, not answered by our benchmark. On this page we assembled links to some other projects you may find useful.
These links are selected and commented from the viewpoint of a potential benchmark user, that is, someone using the benchmark data in order to select a compressor.
Suggestions for more or better links are welcome!
Biological data compression
Order: new to old
Hernaez et al. (2019) Genomic Data CompressionAnnual Review of Biomedical Data Science, 2, 19-37.
- Paper (not open access): https://www.annualreviews.org/doi/abs/10.1146/annurev-biodatasci-072018-021229
Review of compression methods and tools for genomics. Includes no benchmark data.
Numanagic et al. (2016) Comparison of high-throughput sequencing data compression toolsNature Methods, 13, 1005-1008
- Paper (not open access) https://www.nature.com/articles/nmeth.4037
- Online data: https://sfu-compbio.github.io/compression-benchmark-data/
- Tools and data: https://github.com/sfu-compbio/compression-benchmark
Benchmark of compressors for FASTQ and SAM/BAM data. Made by the authors of SCALCE (FASTQ compressor). Online data consists of filterable tables. The only graphical charts are in the paper's Supplementary Information.
Deorowicz and Grabowski (2013) Data compression for sequencing dataAlgorithms for Molecular Biology, 8, 25.
Review focused on FASTQ compression. Includes no benchmark data.
General data compression
Squash Compression Benchmark
Benchmark of compression libraries, by Evan Nemerson. Includes variety of data types (though not DNA, RNA or protein), many compressors, and uses multiple test machines. Advanced presentation. Has not been updated for a while, machines and software are a bit dated.
Large Text Compression Benchmark
Benchmark of compressors on English texts, by Matt Mahoney (author of zpaq, zpipe and fastqz).
Benchmark of compression libraries, by Hamid Buzidi (author of LzTurbo).
Provides downloadable files with massive tables. By anonymous author.