Sequence Compression Benchmark

wrap-uht has no data for 18 of the selected datasets, therefore these datasets are removed from selection

PDB,
Homo sapiens GRCh38 peptides all,
NCBI Virus RefSeq Protein,
Mitochondrion,
UniProtKB Reviewed (Swiss-Prot),
UCSC hg38 7way knownCanonical-exonNuc,
Kappaphycus alvarezii GCA_002205965.2,
NCBI Virus Complete Nucleotide Human,
SILVA 132 LSURef,
UCSC hg38 20way knownCanonical-exonNuc,
Strongylocentrotus purpuratus GCF_000002235.4,
SILVA 132 SSURef Nr99,
Influenza,
Helicobacter,
NCBI SARS-CoV-2 random-100k,
SILVA 132 SSURef,
Homo sapiens GCA_000001405.28,
Picea abies GCA_900067695.1

Comparing 2 settings of 2 compressors

Step 1. Select test data

Genomes (less repetitive) Other datasets (more repetitive)
Aggregate results from multiple datasets using:
sum average

Step 2. Select compressors to compare

Compare:
Sequence compressors
General-purpose compressors
Copy (no compression)
Wrappers
Include compressors
Include compressors
Use results from tests
Only best setting(s) in terms of
Sort by
Reverse sort order
Show only top entries
Link speed: Mbit/s (for estimating transfer time)
Show all values relative to

Select
individual
compressors:
Select
individual
compressor
settings:

Step 3. Configure output

Table

Column chart

Scatterplot

Columns to show:








Value to plot:
Scale:linearlogarithmic
Chart size: x pixels
Highlight specialized vs general-purpose compressors
X axis:
Fixed range: ..
linearlogarithmic
Y axis:
Fixed range: ..
linearlogarithmic