Sequence Compression Benchmark

Comparing 2 settings of 1 compressors
CompressorSize
(B)
C-Time
(s)
D-Time
(s)
DataSize
(MB)
DatasetName
lz4-134,5950.0030.0030.051Gordonia phage GAL1 GCF_001884535.1
lz4-1291,4660.0050.0040.522WS1 bacterium JGI 0000059-K21 GCA_000398605.1
lz4-1879,6990.0130.0071.712Astrammina rara GCA_000211355.2
lz4-13,307,6910.0330.0175.809Nosema ceranae GCA_000988165.1
lz4-15,270,8360.0440.0199.217Cryptosporidium parvum Iowa II GCA_000165345.1
lz4-17,472,1810.0580.02213.14Spironucleus salmonicida GCA_000497125.1
lz4-113,423,4560.0890.03323.67Tieghemostelium lacteum GCA_001606155.1
lz4-121,188,6580.1330.04536.92Fusarium graminearum PH-1 GCF_000240135.3
lz4-128,505,2260.1900.06256.15Salpingoeca rosetta GCA_000188695.1
lz4-131,576,0100.1270.04867.61PDB
lz4-121,301,0370.1110.04373.24Homo sapiens GRCh38 peptides all
lz4-160,365,9020.3570.106106.4Chondrus crispus GCA_000350225.2
lz4-198,275,7130.2410.084122.4NCBI Virus RefSeq Protein
lz4-1135,876,4240.7150.226245.3Mitochondrion
lz4-1139,160,5350.5890.170276.6UniProtKB Reviewed (Swiss-Prot)
lz4-1124,509,2830.8230.240340.4UCSC hg38 7way knownCanonical-exonNuc
lz4-1196,078,6701.1380.292341.0Kappaphycus alvarezii GCA_002205965.2
lz4-1205,271,1191.1310.347481.8NCBI Virus Complete Nucleotide Human
lz4-1240,852,2961.4110.414610.3SILVA 132 LSURef
lz4-1243,200,4561.7880.538968.8UCSC hg38 20way knownCanonical-exonNuc
lz4-1513,758,7083.1340.7851,008Strongylocentrotus purpuratus GCF_000002235.4
lz4-1507,192,8082.9590.8091,109SILVA 132 SSURef Nr99
lz4-1290,916,8681.9200.6371,215Influenza
lz4-11,531,097,7377.7302.2872,756Helicobacter
lz4-11,278,910,4827.6812.1543,282SILVA 132 SSURef
lz4-11,780,025,70810.612.5733,313Homo sapiens GCA_000001405.28
lz4-16,909,717,30535.5610.3313,409Picea abies GCA_900067695.1
lz4-234,5950.0030.0030.051Gordonia phage GAL1 GCF_001884535.1
lz4-2291,4660.0070.0040.522WS1 bacterium JGI 0000059-K21 GCA_000398605.1
lz4-2879,6990.0150.0071.712Astrammina rara GCA_000211355.2
lz4-23,307,6910.0310.0145.809Nosema ceranae GCA_000988165.1
lz4-25,270,8360.0440.0199.217Cryptosporidium parvum Iowa II GCA_000165345.1
lz4-27,472,1810.0580.02213.14Spironucleus salmonicida GCA_000497125.1
lz4-213,423,4560.0950.03323.67Tieghemostelium lacteum GCA_001606155.1
lz4-221,188,6580.1400.04636.92Fusarium graminearum PH-1 GCF_000240135.3
lz4-228,505,2260.1980.06156.15Salpingoeca rosetta GCA_000188695.1
lz4-231,576,0100.1340.04667.61PDB
lz4-221,301,0370.1220.04373.24Homo sapiens GRCh38 peptides all
lz4-260,365,9020.3610.107106.4Chondrus crispus GCA_000350225.2
lz4-298,275,7130.2470.084122.4NCBI Virus RefSeq Protein
lz4-2135,876,4240.7170.223245.3Mitochondrion
lz4-2139,160,5350.5930.172276.6UniProtKB Reviewed (Swiss-Prot)
lz4-2124,509,2830.8230.235340.4UCSC hg38 7way knownCanonical-exonNuc
lz4-2196,078,6701.1380.302341.0Kappaphycus alvarezii GCA_002205965.2
lz4-2205,271,1191.1320.346481.8NCBI Virus Complete Nucleotide Human
lz4-2240,852,2961.4120.416610.3SILVA 132 LSURef
lz4-2243,200,4561.8000.532968.8UCSC hg38 20way knownCanonical-exonNuc
lz4-2513,758,7083.1430.7991,008Strongylocentrotus purpuratus GCF_000002235.4
lz4-2507,192,8082.9760.8081,109SILVA 132 SSURef Nr99
lz4-2290,916,8681.9190.6331,215Influenza
lz4-21,531,097,7377.7972.2752,756Helicobacter
lz4-21,278,910,4827.6412.1273,282SILVA 132 SSURef
lz4-21,780,025,70810.412.5843,313Homo sapiens GCA_000001405.28
lz4-26,909,717,30535.3910.8413,409Picea abies GCA_900067695.1

Step 1. Select test data

Genomes (less repetitive) Other datasets (more repetitive)
Aggregate results from multiple datasets using:
sum average

Step 2. Select compressors to compare

Compare:
Sequence compressors
General-purpose compressors
Copy (no compression)
Wrappers
Include compressors
Include compressors
Use results from tests
Only best setting(s) in terms of
Sort by
Reverse sort order
Show only top entries
Link speed: Mbit/s (for estimating transfer time)
Show all values relative to

Select
individual
compressors:
Select
individual
compressor
settings:

Step 3. Configure output

Table

Column chart

Scatterplot

Columns to show:








Value to plot:
Scale:linearlogarithmic
Chart size: x pixels
Highlight specialized vs general-purpose compressors
X axis:
Fixed range: ..
linearlogarithmic
Y axis:
Fixed range: ..
linearlogarithmic