The bsc-m03 is experimental block sorting compressor based on M03 context aware compression algorithm invented by Michael Maniscalco:
- Michael Maniscalco M03: A solution for context based blocksort (BWT) compression, 2004
- Jurgen Abel Post BWT stages of the Burrows-Wheeler compression algorithm, 2010
Moreover, the bsc-m03 compressor is a practical implementation of Compression via Substring Enumeration for byte-oriented sources:
- Danny Dube, Vincent Beaudoin Lossless Data Compression via Substring Enumeration, 2010
- Takahiro Ota, Hiroyoshi Morita, Akiko Manada Compression by Substring Enumeration with a Finite Alphabet Using Sorting, 2018
Copyright (c) 2021-2024 Ilya Grebnov [email protected]
The bsc-m03 is released under the GNU General Public License
- 2023-05-08 : Version 0.5.5
- Fixed segmentation fault on Unix based systems.
- 2022-11-27 : Version 0.5.0
- Compression ratio improvements.
- 2022-11-20 : Version 0.4.0
- Compression ratio improvements.
- 2022-11-10 : Version 0.3.0
- Compression ratio improvements.
- 2022-01-08 : Version 0.2.1
- Performance improvements.
- 2022-01-05 : Version 0.2
- Memory usage improvements.
- Compression ratio improvements.
- 2021-12-07 : Version 0.1.1 - 0.1.2
- Minor compression ratio improvements.
- 2021-12-03 : Version 0.1.0
- Initial public release of the bsc-m03.
| File name | Input size (bytes) | Output size (bytes) | Bits per symbol |
|---|---|---|---|
| bib | 111261 | 24479 | 1.760 |
| book1 | 768771 | 203745 | 2.120 |
| book2 | 610856 | 138870 | 1.819 |
| geo | 102400 | 52465 | 4.099 |
| news | 377109 | 105621 | 2.241 |
| obj1 | 21504 | 9775 | 3.637 |
| obj2 | 246814 | 68003 | 2.204 |
| paper1 | 53161 | 14957 | 2.251 |
| paper2 | 82199 | 22594 | 2.199 |
| pic | 513216 | 44424 | 0.692 |
| progc | 39611 | 11257 | 2.274 |
| progl | 71646 | 13512 | 1.509 |
| progp | 49379 | 9248 | 1.498 |
| trans | 93695 | 15310 | 1.307 |
| File name | Input size (bytes) | Output size (bytes) | Bits per symbol |
|---|---|---|---|
| alice29.txt | 152089 | 38562 | 2.028 |
| asyoulik.txt | 125179 | 35889 | 2.294 |
| cp.html | 24603 | 6872 | 2.235 |
| fields.c | 11150 | 2685 | 1.926 |
| grammar.lsp | 3721 | 1120 | 2.408 |
| kennedy.xls | 1029744 | 57440 | 0.446 |
| lcet10.txt | 426754 | 94823 | 1.778 |
| plrabn12.txt | 481861 | 129770 | 2.154 |
| ptt5 | 513216 | 44424 | 0.692 |
| sum | 38240 | 11426 | 2.390 |
| xargs.1 | 4227 | 1585 | 3.000 |
| File name | Input size (bytes) | Output size (bytes) | Bits per symbol |
|---|---|---|---|
| bible.txt | 4047392 | 698395 | 1.380 |
| E.coli | 4638690 | 1126125 | 1.942 |
| world192.txt | 2473400 | 376173 | 1.217 |
| File name | Input size (bytes) | Output size (bytes) | Bits per symbol |
|---|---|---|---|
| dickens | 10192446 | 2199344 | 1.726 |
| mozilla | 51220480 | 15589159 | 2.435 |
| mr | 9970564 | 2156826 | 1.731 |
| nci | 33553445 | 1126386 | 0.269 |
| ooffice | 6152192 | 2503991 | 3.256 |
| osdb | 10085684 | 2223002 | 1.763 |
| reymont | 6627202 | 958772 | 1.157 |
| samba | 21606400 | 3794300 | 1.405 |
| sao | 7251944 | 4649723 | 5.129 |
| webster | 41458703 | 6253627 | 1.207 |
| xml | 5345280 | 357958 | 0.536 |
| x-ray | 8474240 | 3681388 | 3.475 |
| File name | Input size (bytes) | Output size (bytes) | Bits per symbol |
|---|---|---|---|
| chr22.dna | 34553758 | 7206269 | 1.668 |
| etext99 | 105277340 | 21422251 | 1.628 |
| gcc-3.0.tar | 86630400 | 10046880 | 0.928 |
| howto | 39422105 | 7504315 | 1.523 |
| jdk13c | 69728899 | 2612434 | 0.300 |
| linux-2.4.5.tar | 116254720 | 16351863 | 1.125 |
| rctail96 | 114711151 | 9707347 | 0.677 |
| rfc | 116421901 | 14871775 | 1.022 |
| sprot34.dat | 109617186 | 17157222 | 1.252 |
| w3c2 | 104201579 | 5598687 | 0.430 |
| File name | Input size (bytes) | Output size (bytes) | Bits per symbol |
|---|---|---|---|
| A10.jpg | 842468 | 823533 | 7.820 |
| AcroRd32.exe | 3870784 | 1555832 | 3.216 |
| english.dic | 465211 | 145096 | 2.495 |
| FlashMX.pdf | 4526946 | 3712716 | 6.561 |
| FP.LOG | 20617071 | 502648 | 0.195 |
| MSO97.DLL | 3782416 | 1878076 | 3.972 |
| ohs.doc | 4168192 | 803171 | 1.542 |
| rafale.bmp | 4149414 | 745470 | 1.437 |
| vcfiu.hlp | 4121418 | 604165 | 1.173 |
| world95.txt | 2988578 | 442271 | 1.184 |
| File name | Input size (bytes) | Output size (bytes) | Bits per symbol |
|---|---|---|---|
| enwik8 | 100000000 | 20263925 | 1.621 |
| enwik9 | 1000000000 | 160018905 | 1.280 |
| File name | Input size (bytes) | Output size (bytes) | Bits per symbol |
|---|---|---|---|
| dblp.xml | 296135874 | 21926695 | 0.592 |
| dna | 403927746 | 86414423 | 1.711 |
| english.1024MB | 1073741824 | 193810792 | 1.444 |
| pitches | 55832855 | 16984071 | 2.434 |
| proteins | 1184051855 | 304486803 | 2.057 |
| sources | 210866607 | 29749020 | 1.129 |
| File name | Input size (bytes) | Output size (bytes) | Bits per symbol |
|---|---|---|---|
| cere | 461286644 | 8576879 | 0.149 |
| coreutils | 205281778 | 4293243 | 0.167 |
| einstein.de.txt | 92758441 | 132286 | 0.011 |
| einstein.en.txt | 467626544 | 336029 | 0.006 |
| Escherichia_Coli | 112689515 | 7928044 | 0.563 |
| influenza | 154808555 | 1760692 | 0.091 |
| kernel | 257961616 | 2955825 | 0.092 |
| para | 429265758 | 10730998 | 0.200 |
| world_leaders | 46968181 | 518220 | 0.088 |
| fib41 | 267914296 | 83 | 0.000 |
| rs.13 | 216747218 | 86 | 0.000 |
| tm29 | 268435456 | 158 | 0.000 |