Sequence Compression Benchmark

Criteria

We include compressors suitable to practical tasks of sequence comparison. Specifically, we benchmark lossless reference-free compression of well-formed FASTA files. We use all specialized sequence compressors that we could find and make to work. For general-purpose compressors we use only the major ones, in terms of performance, historical importance, or popularity. Suggestions for adding more compressors are welcome!

Any included compressors must:

Be available and free to use (at least for academic purpose)
Work on a typical modern Linux machine
Have command line interface

Below is the list of all tested compressors with brief comments. However please check the benchmark data for more complete picture. Better yet, install and evaluate any promising compressors on your own machine and with your own data.

Missing Compressors page lists compressors that are not included.

Jump to:

Specialized compressors: 2bit ac alapy beetl blast dcom dlim dnax dsrc fastqz fqs fqzcomp geco geco2 geco3 gtz harc jarvis kic leon lfastqc lfqc mfc minicom naf nuht pfish quip spring uht xm

General-purpose compressors: bcm brieflz brotli bsc bzip2 cmix copy gzip lizard lz4 lzop lzturbo nakamichi pbzip2 pigz snzip xz zpaq zpipe zstd

Specialized compressors

2bit

Paper: W. James Kent (2002) "BLAT - The BLAST-Like Alignment Tool", Genome Research, 12(4), 656-664, https://doi.org/10.1101/gr.229202
About: https://genome.ucsc.edu/goldenpath/help/twoBit.html
Sources: http://hgdownload.soe.ucsc.edu/admin/
Binaries: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/
Format: http://genome.ucsc.edu/FAQ/FAQformat.html#format7

2bit is a database format used by BLAT: https://genome.ucsc.edu/FAQ/FAQblat. It used to be limited to 4 GB input, but recently support for long input has been finally added with "-long" switch.

Version tested: "faToTwoBit" and "twoBitToFa" binaries dated 2018-11-07, from UCSC: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/.

Comments: No support for RNA or protein sequences. Requires wrapper to preserve sequence names, line lengths and IUPAC characters. Non-free to use: "Blat source and executables are freely available for academic, nonprofit and personal use. Commercial licensing information is available on the Kent Informatics website." - from FAQ: https://genome.ucsc.edu/FAQ/FAQblat

ac

Paper: Diogo Pratas, Morteza Hosseini, Armando J. Pinho (2018) "Compression of Amino Acid Sequences", In International Conference on Practical Applications of Computational Biology & Bioinformatics, May 2018, pp. 105-113, Springer, Cham.
Paper: Morteza Hosseini, Diogo Pratas, Armando J. Pinho (2019) "AC: A Compression Tool for Amino Acid Sequences", Interdisciplinary Sciences: Computational Life Sciences, 11, 68-76, https://doi.org/10.1007/s12539-019-00322-1
GitHub: https://github.com/cobilab/ac

Version tested: 1.1, 2020-01-29, commit fc136fc, built from source.

alapy

Homepage: http://alapy.com/services/alapy-compressor/
GitHub: https://github.com/ALAPY/alapy_arc

"ALAPY Compressor - is a cross-platform software tool used for efficient compression of NGS data. Latest version utilizes lossless compression algorithm developed by our data scientists for fastq files and optimized for the latest sequencing machines from Illumina."

Version tested: 1.3.0, 2017-07-25, binary from GitHub.

Comments: Closed source and non-free. Limited to one instance at a time. alapy-b (alapy_arc -l b) performs nearly identically to fastqz-slow (fastqz c).

beetl

Paper: Markus J. Bauer, Anthony J. Cox, Giovanna Rosone (2011) "Lightweight BWT construction for very large string collections" In Combinatorial Pattern Matching (Lecture Notes in Computer Science 6661), Springer Berlin, 219-231, https://doi.org/10.1007/978-3-642-21458-5_20
Paper: Anthony J. Cox, Markus J. Bauer, Tobias Jakobi, Giovanna Rosone (2012) "Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform" Bioinformatics, 28(11), 1415-1419, https://doi.org/10.1093/bioinformatics/bts173
GitHub: https://github.com/BEETL/BEETL

BEETL: Burrows-Wheeler Extended Tool Library, from Ilumina.

Version tested: commit 327cc65, 2019-11-14, built from source.

Comments: Requires sequences of identical length. Works only on short sequences. Not a complete compressor - it only computes BWT which then has to be compressed with another compressor (zstd in this benchmark).

blast

Paper: Stephen F. Altschul, Warren Gish, Webb Miller, Eugene W. Myers, David J. Lipman (1990) "Basic local alignment search tool" Journal of Molecular Biology, 215(3), 403-410, https://doi.org/10.1016/S0022-2836(05)80360-2
BLAST Homepage: https://blast.ncbi.nlm.nih.gov/
BLAST FTP: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/

Database format of BLAST, the most popular homology search tool.

Version tested: "convert2blastmask", "makeblastdb" and "blastdbcmd" binaries from BLAST 2.8.1+, 2018-11-26, 64-bit Linux binaries from FTP: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.8.1/ncbi-blast-2.8.1+-x64-linux.tar.gz

Comments: Does not preserve line length, does not support RNA. It's a multi-file format, even tiny data is represented in several files when converted to blast format.

dcom

Paper: Pinghao Li, Shuang Wang, Jihoon Kim, Hongkai Xiong, Lucila Ohno-Machado, Xiaoqian Jiang (2013) "DNA-COMPACT: DNA COMpression Based on a Pattern-Aware Contextual Modeling Technique" PLOS ONE, 8(11), e80377, https://doi.org/10.1371/journal.pone.0080377
Sourceforge: https://sourceforge.net/projects/dnacompact/

DNA-COMPACT is a DNA-only compressor. It can compress with and without reference. (Only compression without reference was tested here).

Version tested: Built from latest public source on Sourceforge (https://sourceforge.net/projects/dnacompact/files/, dated 2013-08-29

Comments: Does not support FASTA format, expects single nameless DNA sequence as input. Basic functionality has to be added via wrapper. This includes support of FASTA input and output, support for 'N' and IUPAC codes, support for masked sequence. Creates a temporary file 'tmphuff.txt' in current directory. This results in problems when running multiple DNA-COMPACT compression tasks in parallel: some of them crash and produce corrupted compressed files. Needs 2 decompression commands. Fails with "Segmentation fault", needs "ulimit -s unlimited" before running to avoid this crash. Freezes during decompresson on a 333 bp poly-A repeat sequence.

dlim

Paper: Monzoorul Haque Mohammed, Anirban Dutta, Tungadri Bose, Sudha Chadaram, Sharmila S. Mande (2012) "DELIMINATE - a fast and efficient method for loss-less compression of genomic sequences" Bioinformatics, 28(19), 2527-2529, https://doi.org/10.1093/bioinformatics/bts467

Version tested: Version 1.3c, binary received from authors by email.

Comments: No website - the only way to obtain this compressor is by contacting the authors. Closed source. Not free to use: "Kindly note that this tool is free for academic use. In case you plan to use it commercially, kindly get in touch with [authors]" - email from one of the authors. Creates temporary files in the current directory, which causes crashes when running parallel compression tasks. Has no streaming mode. Relies on "7za" binary.

dnax

Paper: Giovanni Manzini, Marcella Rastero (2004) "A simple and fast DNA compressor" Software - Practice and Experience, 34, 1397-1411, https://doi.org/10.1002/spe.619

Version tested: dnaX 0.1.0, source received from authors by email and built using bundled makefile.

Comments: No website - the only way to obtain this compressor is by contacting the authors. Creates temporary files in the current directory. Has no streaming mode. Does not support FASTA format, N, IUPAC codes, mask, RNA and protein sequences. Memory consumption growth proportionally to data size. Does not support data larger than 2 GB. Crashes on some data (reported to author). dnax-0 (dna0) freezes while decompressing a poly-A repeat of length 100. dnax-1 (dna1) freezes while decompressing a poly-A repeat of length 333. dnax-1 and dnax-2 corrupt data when compressing poly-A repeat of length 333.

dsrc

Paper: Sebastian Deorowicz and Szymon Grabowski (2011) "Compression of DNA sequence reads in FASTQ format" Bioinformatics, 27(6), 860-862, https://doi.org/10.1093/bioinformatics/btr014
Paper: Lukasz Roguski, Sebastian Deorowicz (2014) "DSRC 2—Industry-oriented compression of FASTQ files" Bioinformatics, 30(15), 2213-2215, https://doi.org/10.1093/bioinformatics/btu208
GitHub: https://github.com/refresh-bio/DSRC
Homepage: http://sun.aei.polsl.pl/dsrc/

Version tested: "2.02 @ 30.09.2014", commit 5eda82c, 2015-06-04, built with make -f Makefile.c++11 bin.

Comments: Corrupts data if input contains non-ACGT characters (such as "H") (Issue #24). Crashes on input containing single read (Issue #26). May corrupt data if it contains repetitive sequence (Issue #27).

fastqz

Paper: James K. Bonfield, Matthew V. Mahoney (2013) "Compression of FASTQ and SAM Format Sequencing Data" PLoS ONE, 8(3), e59190, https://doi.org/10.1371/journal.pone.0059190
GitHub: https://github.com/fwip/fastqz
Homepage: http://mattmahoney.net/dc/fastqz/

Version tested: 1.5, 2012-03-15, obtained from GitHub mirror, commit 39b2bbc, built after changing -lpthread to -pthread in Makefile.

Comments: Compressed format is not single file. It's tuned to specific distribution of qualities.

fqs

Paper: Sebastian Deorowicz (2020) "FQSqueezer: k-mer-based compression of sequencing data" Scientific Reports, 10, 578, https://doi.org/10.1038/s41598-020-57452-6
GitHub: https://github.com/refresh-bio/fqsqueezer

"We present FQSqueezer, a novel compression algorithm for sequencing data able to process single- and paired-end reads of variable lengths. It is based on the ideas from the famous prediction by partial matching and dynamic Markov coder algorithms known from the general-purpose-compressors world."

Version tested: FQSqueezer 0.1, commit 5741fc5, 2019-05-17.

fqzcomp

Paper: James K. Bonfield, Matthew V. Mahoney (2013) "Compression of FASTQ and SAM Format Sequencing Data" PLoS ONE, 8(3), e59190, https://doi.org/10.1371/journal.pone.0059190
GitHub: https://github.com/jkbonfield/fqzcomp

Version tested: 4.6, commit 96f2f61, 2019-12-02.

Comments: fqzcomp -s9 crashes during compression (Issue #2).

geco

Paper: Diogo Pratas, Armando J. Pinho, Paulo J. S. G. Ferreira (2016) "Efficient compression of genomic sequences" Data Compression Conference 2016 (DCC-2016), Snowbird, UT, 231-240.
GitHub: https://github.com/cobilab/geco
Homepage: http://bioinformatics.ua.pt/software/geco/

Version tested: GeCo v.2.1, 2016-12-24, built from source, commit 5569304.

Comments: Compresses only DNA sequence. Does not support FASTA format, N, IUPAC codes, mask.

geco2

Paper: Diogo Pratas, Morteza Hosseini, Armando J. Pinho (2019) "GeCo2: An Optimized Tool for Lossless Compression and Analysis of DNA Sequences" International Conference on Practical Applications of Computational Biology & Bioinformatics. Springer, Cham, 2019
GitHub: https://github.com/cobilab/geco2

Version tested: GeCo2 v.1.1, 2019-02-02, built from source, commit 062a8c0.

Comments: Compresses only DNA sequence. Does not support FASTA format, N, IUPAC codes, mask.

geco3

Paper: Milton Silva, Diogo Pratas, Armando J. Pinho (2020) "Efficient DNA sequence compression with neural networks" GigaScience, 9(11), giaa119, https://doi.org/10.1093/gigascience/giaa119
GitHub: https://github.com/cobilab/geco3

Version tested: GeCo3 v.1.0, 2020-06-12, built from source, commit a5cc883.

Comments: Compresses only DNA sequence. Does not support FASTA format, N, IUPAC codes, mask.

gtz

Paper: Yuting Xing, Gen Li, Zhenguo Wang, Bolun Feng, Zhuo Song, Chengkun Wu (2017) "GTZ: a fast compression and cloud transmission tool optimized for FASTQ files" BMC Bioinformatics, 18(Suppl 16), 549, https://doi.org/10.1186/s12859-017-1973-5
GitHub: https://github.com/Genetalks/gtz

Version tested: GTX.Zip PROFESSIONAL-2.1.3-V-2020-03-18 07:11:20, binary from https://gtz.io/gtz_latest.run

Comments: Does not accept data from standard input during compression. Refuses to compress if output file name does not end with ".gtz". According to EULA it phones home. GTZ.Zip Profesional expires and stops working 6 months after installation, rendering it useless for reproducible experiments (Issue #20). Installation script modifies user's .bashrc file without asking or notifying the user (Issue #19). Closed source and non-free.

harc

Paper: Shubham Chandak, Kedar Tatwawadi, Tsachy Weissman (2018) "Compression of genomic sequencing reads via hash-based reordering: algorithm and analysis" Bioinformatics, 34(4), 558-567, https://doi.org/10.1093/bioinformatics/btx639
GitHub: https://github.com/shubhamchandak94/HARC

Version tested: HARC commit cf35caf, 2019-10-04, built from source.

Comments: Has to be run from its own source directory. Recompiles itself on every run, which would be problematic in case of trying to run multiple harc compression tasks in parallel. Uses 7z and bsc binaries. GitHub repository has no license.

jarvis

Paper: Diogo Pratas, Morteza Hosseini, Jorge M. Silva, Armando J. Pinho (2019) "A Reference-Free Lossless Compression Algorithm for DNA Sequences Using a Competitive Prediction of Two Classes of Weighted Models" Entropy 2019, 21, 1074, https://doi.org/10.3390/e21111074
GitHub: https://github.com/cobilab/jarvis

JARVIS appears to be a further development of GeCo and GeCo2.

Version tested: JARVIS v.1.1, 2019-04-30, built from source, commit d7daef5

Comments: Compresses only DNA sequence. Does not support FASTA format, N, IUPAC codes, mask.

kic

Paper: Yeting Zhang, Khyati Patel, Tony Endrawis, Autumn Bowers, Yazhou Sun (2016) "A FASTQ compressor based on integer-mapped k-mer indexing for biologist" Gene, 579(1), 75-81, https://doi.org/10.1016/j.gene.2015.12.053
Homepage: http://www.ysunlab.org/kic.jsp

Version tested: KIC 0.2, 2015-11-25, binary from homepage: http://www.ysunlab.org/dist/kic.V0.2.zip.

Comments: Closed source. Uses 4 cores by default, trying to change to 1 core with "-n 1" always crashes.

leon

Paper: Gaetan Benoit, Claire Lemaitre, Dominique Lavenier, Erwan Drezen, Thibault Dayris, Raluca Uricaru, Guillaume Rizk (2015) "Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph" BMC Bioinformatics, 16, 288, https://doi.org/10.1186/s12859-015-0709-7
GitHub: https://github.com/GATB/leon
Homepage: http://gatb.inria.fr/software/leon/

Version tested: Leon 1.0.0, 2016-02-27, Linux binary from GitHub: https://github.com/GATB/leon/releases.

Comments: Massively slows down with the increased read length. Does not support IUPAC codes. Has no streaming mode. Crashes when compressing sequence with 1-character name (Issue #6). Uses current directory for temporary files (Issue #7). Generates broken read numbers when decompressing archive with no headers (Issue #8). Crashes on some large data (Issue #9). Does not allow specifying output file name.

lfastqc

Paper: Sultan Al Yami, Chun-Hsi Huang (2019) "LFastqC: A lossless non-reference-based FASTQ compressor" PLoS One, 14(11), e0224806, https://doi.org/10.1371/journal.pone.0224806
GitHub: https://github.uconn.edu/sya12005/LFastqC

Version tested: LFastqC commit 60e5fda, 2019-02-28, with necessary fixes.

Comments: Works only when executed from its directory. Expects input in its directory. Uses current directory for temporary files, instead of TMPDIR. Fails during tar step. Expects sequence names to have identical length. During decompression it attempts to read incomplete sequence data while it's still being written by the MFCompressD. Uses " | grep Hello" to silence colsole output of compressors that it uses. Not free since it depends on non-free MFCompress. Compression fails on 2.76 GB Helicobacter dataset and on larger data.

lfqc

Paper: Marius Nicolae, Sudipta Pathak, Sanguthevar Rajasekaran (2015) "LFQC: a lossless compression algorithm for FASTQ files" Bioinformatics, 31(20), 3276-3281, https://doi.org/10.1093/bioinformatics/btv384
GitHub: https://github.com/mariusmni/lfqc

Version tested: LFQC commit 59f56e0, 2016-01-06, with added fix from Issue #4. Also, parallel processing of names, sequences and qualities in lfqc.rb is changed to sequential to fix compression failures.

Comments: Corrupts data with irregularly formatted read names (Issue #5). Critical Issue #4 is closed but not fixed. Compression fails due to race condition (fixed by disabling parallel compression of names, sequences and qualities). Has to be run from its source directory. Uses zpaq with 4 threads, with no option to disable multithreading.

mfc

Paper: Armando J. Pinho, Diogo Pratas (2014) "MFCompress: a compression tool for FASTA and multi-FASTA data" Bioinformatics, 30(1), 117-118, https://doi.org/10.1093/bioinformatics/btt594
Homepage: http://bioinformatics.ua.pt/software/mfcompress/

Version tested: MFCompress 1.01, 2013-09-03, 64-bit Linux binary from homepage: http://bioinformatics.ua.pt/software/mfcompress/.

Comments: Supports only DNA data. Has no streaming mode. Not free to use: "available for non-commercial use. For other uses, please send an email to [author's email]" - homepage.

minicom

Paper: Yuansheng Liu, Zuguo Yu, Marcel E. Dinger, Jinyan Li (2019) "Index suffix-prefix overlaps by (w, k)-minimizer to generate long contigs for reads compression" Bioinformatics, 35(12), 2066-2074, https://doi.org/10.1093/bioinformatics/bty936
GitHub: https://github.com/yuansliu/minicom

"Minicom is a tool for compressing short reads in FASTQ. The minicom program is written in C++11 and works on Linux. It is availble under an open-source license."

The main minicom program is a shell script. It calls other tools, including bsc, 7z, md5sum, head, cp, mv, mkdir, tar, rm, make, as well as their own C++ code, which is recompiled on every run (this is where "make" comes in).

Version tested: commit 2360dd9, 2019-09-09.

Comments: Does not reproduce the original FASTQ file during decompression, but only sequence. Corrupts data with 5.6 GB input (Issue #3). Recompiles its C++ code for every run. Has to be run from within its directory. Automatically names output files. Has no streaming mode.

naf

Paper: Kirill Kryukov, Mahoko Takahashi Ueda, So Nakagawa, Tadashi Imanishi (2019) "Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences" Bioinformatics, 35(19), 3826-3828, https://doi.org/10.1093/bioinformatics/btz144
GitHub: https://github.com/KirillKryukov/naf
Homepage: http://kirill-kryukov.com/study/naf/

Version tested: 1.3.0, 2021-05-17, built from source, obtained from GitHub: https://github.com/KirillKryukov/naf.

nuht

Paper: Sultan Alyami, Chun-Hsi Huang (2019) "Nongreedy Unbalanced Huffman Tree Compressor for Single and Multifasta Files" Journal of Computational Biology, 26(0), 1-9, https://doi.org/10.1089/cmb.2019.0249
Homepage: https://github.uconn.edu/sya12005/Non-Greedy-Unbalanced-Huffman-Tree-Compressor-for-single-and-multi-fasta-files

Version tested: commit 08a42a8, 2018-09-26, Linux binary.

Comments: Paper is not open access. Closed source. Uses 30x times memory compared to input size. Auto-names output files.

pfish

GitHub: https://github.com/alexholehouse/pufferfish

Version tested: Pufferfish v.1.0 alpha, 2012-04-11, built from source, commit f1ddc4a.

Comments: Does not support FASTA format. Leaks memory during decompression (Issue #2). Fails on large data, such as 33 GB salamander genome (Issue #1).

quip

Paper: Daniel C. Jones, Walter L. Ruzzo, Xinxia Peng, Michael G. Katze (2012) "Compression of next-generation sequencing reads aided by highly efficient de novo assembly" Nucleic Acids Research, 40(22), e171, https://doi.org/10.1093/nar/gks754
GitHub: https://github.com/dcjones/quip
Homepage: https://homes.cs.washington.edu/~dcjones/quip/

Version tested: Quip 1.1.8-8-g9165bb5, 2017-12-17, built from source, commit 9165bb5. Only compression without assembly is tested here.

Comments: Does not support non-standard sequence characters. Crashes during decompression if compressed file name does not end with ".quip", also if the compressed file is not in current directory.

spring

Paper: Shubham Chandak, Kedar Tatwawadi, Idoia Ochoa, Mikel Hernaez, Tsachy Weissman (2019) "SPRING: a next-generation compressor for FASTQ data" Bioinformatics, 35(15), 2674-2676, https://doi.org/10.1093/bioinformatics/bty1015
GitHub: https://github.com/shubhamchandak94/Spring

Version tested: SPRING commit 6536b1b, 2019-11-28, built from source.

Comments: Paper is not open access. GitHub repository has no license.

uht

Paper: Anas Al-Okaily, Badar Almarri, Sultan Al Yami, Chun-Hsi Huang (2017) "Toward a Better Compression for DNA Sequences Using Huffman Encoding" Journal of Computational Biology, 24(4), 280-288, https://doi.org/10.1089/cmb.2016.0151
GitHub: https://github.com/aalokaily/Unbalanced-Huffman-Tree

Version tested: UHT binaries from 2016-12-27, downloaded from GitHub: https://github.com/aalokaily/Unbalanced-Huffman-Tree.

Comments: Closed source. Does not support masked sequence. Fails on 245 MB dataset and larger datasets.

xm

Paper: Minh Duc Cao, Trevor I. Dix, Lloyd Allison, Chris Mears (2007) "A simple statistical algorithm for biological sequence compression" Data Compression Conference, 2007 (DCC'07), Snowbird, UT, pp.43-52
XM GitHub: https://github.com/mdcao/xm
JAPSA GitHub: https://github.com/mdcao/japsa

Version tested: 3.0, commit 9b9ea57, 2019-01-07.

Comments: May corrupt data (Issue #29).

General-purpose compressors

bcm

GitHub: https://github.com/encode84/bcm
Homepage: http://compressme.net/

Version tested: BCM 1.30, 2018-01-21, commit 24b6017, built with: g++ -o bcm -O3 -march=native -ffast-math -s bcm.cpp divsufsort.c.

Comments: No streaming mode.

brieflz

GitHub: https://github.com/jibsen/brieflz

Version tested: BriefLZ 1.3.0, 2020-02-15, commit 0ab07a5, built from source using: mkdir build; cd build; cmake -DCMAKE_BUILD_TYPE=Release ..; cmake --build . --config Release.

Comments: No streaming mode. Unpredictable compression speed when using "--optimal" setting (Issue #11).

brotli

GitHub: https://github.com/google/brotli

Version tested: 1.0.7, 2018-10-23.

bsc

GitHub: https://github.com/IlyaGrebnov/libbsc
Homepage: http://libbsc.com/

Version tested: 3.1.0, 2016-01-01, commit 3dea347, built from source using bundled makefile.

Comments: No streaming mode.

bzip2

SourceForge: https://sourceforge.net/projects/bzip2/files/
Archive: https://web.archive.org/web/20180801004107/http://www.bzip.org/

Version tested: 1.0.6, 2010-09-06

cmix

GitHub: https://github.com/byronknoll/cmix
Homepage: https://www.byronknoll.com/cmix.html

Version tested: 17, 2019-03-24.

Comments: No streaming mode. Because of compression speed of less than 1 kB/s, it is currently benchmarked only on data smaller than 10 MB.

copy

"Copy" compressors don't compress the data, but make its exact uncompressed duplicate. Such processes tested here include the "cat" command, and the "-0" mode of pigz. They are included for control.

gzip

Homepage: https://www.gzip.org/
GNU: https://www.gnu.org/software/gzip/

Version tested: 1.6, 2013-06-09, default install that came with the OS (Ubuntu).

lizard

GitHub: https://github.com/inikep/lizard

Version tested: 1.0.0, commit dda3b33, 2019-03-08.

lz4

GitHub: https://github.com/lz4/lz4
Homepage: https://lz4.github.io/lz4/

Version tested: LZ4 1.9.1, 2019-04-24.

lzop

Homepage: https://www.lzop.org/

Version tested: 1.04, 2017-08-10.

lzturbo

Homepage: https://sites.google.com/site/powturbo/

Version tested: 1.2, 2014-08-11.

Comments: Closed source.

nakamichi

Homepage: http://www.sanmayce.com/Nakamichi/
Homepage: http://www.satanichi.net/

Version tested: Nakamichi 2020-May-09 (archived), built from source, using command: gcc -O3 -static -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o nakamichi -D_N_XMM -D_N_prefetch_4096 -D_N_alone -DHashInBITS=24 -DHashChunkSizeInBITS=24 -DRAMpoolInKB=5120 -DBtreeHEURISTIC -D_POSIX_ENVIRONMENT_ -DLongestLineInclusive=128 -DSpeedUpBuilding=32 -DLITE.

Comments: Requires massive amount of memory. Does not support streaming for compression. Fills console with ASCII art and irrelevant texts. Creates multiple log files. Refuses to decompress any files with names not ending with ".Nakamichi". Version 2020-May-09 is already unavailable on both homepages, so I mirror it here in minimal form. Due to slowness it is currently only tested on datasest smaller than 200 MB.

pbzip2

Homepage: https://launchpad.net/pbzip2/

Version tested: 1.1.13, 2015-12-18.

pigz

Homepage: https://zlib.net/pigz/

Version tested: 2.4, 2017-12-26.

snzip

GitHub: https://github.com/kubo/snzip

Based on the Snappy compression library.

Version tested: 1.0.4, 2016-10-02.

xz

Homepage: https://tukaani.org/xz/

Based on the LZMA algorithm.

Version tested: 5.2.2, 2015-09-29.

zpaq

Homepage: http://www.mattmahoney.net/dc/zpaq.html

Version tested: 7.15, 2016-08-17.

Comments: No streaming mode.

zpipe

ZPAQ homepage: http://www.mattmahoney.net/dc/zpaq.html
zpipe documentation: http://mattmahoney.net/dc/zpipedoc.html

Version tested: 2.01, 2010-12-23, built from source (http://mattmahoney.net/dc/zpipe.201) with libzpaq 4.00, 2011-11-13 (http://mattmahoney.net/dc/libzpaq400.zip), built with: g++ -o zpipe -O3 -march=native -ffast-math -s zpipe.cpp libzpaq.cpp.

zstd

GitHub: https://github.com/facebook/zstd
Homepage: https://facebook.github.io/zstd/

Version tested: 1.5.0, 2021-05-15, built from source using bundled makefile.