Libdeflate port #25

springmeyer · 2018-04-03T22:47:18Z

This ports gzip-hpp to use libdeflate. This is for ~~experimental purposes only~~ testing in production and to investigate the potential speed improvements in using libdeflate.

If this proves viable (fast and robust), this is exciting because:

currently node.js statically links zlib
this forces all node c++ addons (that get dynamically loaded into node) to use ABI compatible zlib (otherwise weird things can happen like symbol lookup error node-zipfile#63)
because libdeflate is a totally different API, if we use it, we could drop using zlib and then the problem of having node.js force a specific zlib version on us goes away!

Anyway, here are the first, promising, numbers locally:

Locally I see (with this PR) (smaller numbers is faster):

Run on (4 X 3500 MHz CPU s)
2018-04-03 15:43:13
----------------------------------------------------------------------------
Benchmark                                     Time           CPU Iterations
----------------------------------------------------------------------------
BM_compress                             1128545 ns    1118127 ns        661
BM_decompress                            202429 ns     201630 ns       3209
BM_compress_class                        778036 ns     777355 ns        979
BM_compress_class_no_reallocations       755331 ns     754569 ns        932
BM_decompress_class                      173623 ns     173489 ns       4284
BM_decompress_class_no_reallocations     162725 ns     162567 ns       4446

And with master:

Run on (4 X 3500 MHz CPU s)
2018-04-03 15:43:52
----------------------------------------------------------------------------
Benchmark                                     Time           CPU Iterations
----------------------------------------------------------------------------
BM_compress                             2124502 ns    2109067 ns        326
BM_decompress                            321913 ns     317480 ns       2122
BM_compress_class                       2181514 ns    2148765 ns        340
BM_compress_class_no_reallocations      2115537 ns    2102435 ns        329
BM_decompress_class                      317198 ns     315435 ns       2180
BM_decompress_class_no_reallocations     327788 ns     326020 ns       2243

TODO:

Find a better way than std::size_t uncompressed_size_guess = size * 3; to figure out the uncompressed size to allocate.
Is the same level of compression being used with libdeflate?
Is Z_DEFAULT_COMPRESSION producing roughly the same sizes?
Do the speed differences seen locally between zlib and libdeflate persist in production systems?

include/gzip/decompress.hpp

springmeyer · 2018-04-27T17:56:03Z

include/gzip/compress.hpp

+        if (compressor_)
+        {
+            libdeflate_free_compressor(compressor_);
+        }


Noting that this approach (initialize C struct pointer in constructor + free the memory in the deconstructor) is applying RAII (https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization) to avoid a memory leak.

include/gzip/compress.hpp

springmeyer · 2018-04-27T18:01:10Z

include/gzip/compress.hpp

-#pragma GCC diagnostic ignored "-Wold-style-cast"
-        if (deflateInit2(&deflate_s, level_, Z_DEFLATED, window_bits, mem_level, Z_DEFAULT_STRATEGY) != Z_OK)
+        std::size_t max_compressed_size = libdeflate_gzip_compress_bound(compressor_, size);
+        // TODO: sanity check this before allocating


The reason for this comment is fear/lack of knowledge on my part. What happens if libdeflate_gzip_compress_bound is buggy and returns a really massive value? Is that possible? Would we end up trying to allocate so much memory the machine would crumble? Probably not possible, but I've also not looked inside libdeflate yet to figure out how much to worry about this.

include/gzip/compress.hpp

include/gzip/decompress.hpp

…x/vtcomposite#83

springmeyer · 2018-09-26T04:56:45Z

include/gzip/decompress.hpp

@@ -56,7 +56,7 @@ class Decompressor
        // https://github.com/kaorimatz/libdeflate-ruby/blob/0e33da96cdaad3162f03ec924b25b2f4f2847538/ext/libdeflate/libdeflate_ext.c#L340
        // https://github.com/ebiggers/libdeflate/commit/5a9d25a8922e2d74618fba96e56db4fe145510f4
        std::size_t actual_size;
-        std::size_t uncompressed_size_guess = size * 3;
+        std::size_t uncompressed_size_guess = size * 4;
        output.resize(uncompressed_size_guess);


This probably needs to be re-written to avoid needing to guess the size up front. @artemp could you tackle that?

std::size_t uncompressed_size_guess = size * 4; - good starting point but it definitely needs a way to dynamically increase capacity if size * 4 is not large enough. Approach from the ruby link looks reasonable, here is my patch :

diff --git a/include/gzip/decompress.hpp b/include/gzip/decompress.hpp index badd315..627e670 100644 --- a/include/gzip/decompress.hpp +++ b/include/gzip/decompress.hpp @@ -57,17 +57,23 @@ class Decompressor // https://github.com/ebiggers/libdeflate/commit/5a9d25a8922e2d74618fba96e56db4fe145510f4 std::size_t actual_size; std::size_t uncompressed_size_guess = size * 4; - output.resize(uncompressed_size_guess); - enum libdeflate_result result = libdeflate_gzip_decompress(decompressor_, - data, - size, - static_cast<void*>(&output[0]), - uncompressed_size_guess, &actual_size); - if (result == LIBDEFLATE_INSUFFICIENT_SPACE) + output.reserve(uncompressed_size_guess); + enum libdeflate_result result; + for (;;) { - throw std::runtime_error("no space: did not succeed"); + result = libdeflate_gzip_decompress(decompressor_, + data, + size, + const_cast<char*>(output.data()), + output.capacity(), &actual_size); + if (result != LIBDEFLATE_INSUFFICIENT_SPACE) + { + break; + } + output.reserve((output.capacity() << 1) - output.size()); } - else if (result == LIBDEFLATE_SHORT_OUTPUT) + + if (result == LIBDEFLATE_SHORT_OUTPUT) { throw std::runtime_error("short output: did not succeed"); }

Couple notes:

Use std::vector::reserve() to increase capacity

Use std::vector::data() to access a pointer to the first element (c++11) which works for empty containers, while &vec[0] is not safe and needs an extra check.

/cc @springmeyer

@artemp great fix - can you please commit directly to this branch? But two concerns to keep in mind:

Can you think of a situation where result will never become == to LIBDEFLATE_INSUFFICIENT_SPACE and therefore the loop will run infinitely? Can we protect against that by checking for other error states inside the loop?

I wonder if we should avoid calling output.reserve because sometimes we may overeserve and the extra cost of over-reserving might overrule the benefit of reserving when it is relatively correct. This concern of mine comes from seeing how reserve is expensive and when wrong hurts perf over at Optimize coalesce carmen-cache#126

@springmeyer -

Can you think of a situation where result will never become == to LIBDEFLATE_INSUFFICIENT_SPACE and therefore the loop will run infinitely? Can we protect against that by checking for other error states inside the loop?

Good point, I'll add conditional exception in the loop if allocation size is getting to big.

I wonder if we should avoid calling output.reserve because sometimes we may overeserve and the extra cost of over-reserving might overrule the benefit of reserving when it is relatively correct. This concern of mine comes from seeing how reserve is expensive and when wrong hurts perf over at mapbox/carmen-cache#126

Let me investigate and fix this ^

I'm going to push updates and continue work on this branch

I'm going to push updates and continue work on this branch

👍 I think you'll likely need an easy way to test larger, varied files and a CLI would allow that. So could you finish #29 to aid your testing of this branch?

artemp

Looking good overall and faster according to make bench but see my comments with proposed changes /cc @springmeyer

…++11)

…icient + use `data()`

…size logic to output buffer + throw an exception if request to grow output buffer exceeds max_size (default 2GB).

artemp · 2018-09-27T13:59:19Z

test/test_io.cpp

@@ -130,7 +130,7 @@ TEST_CASE("test decompression size limit")
                               std::istreambuf_iterator<char>());
    stream.close();

-    std::size_t limit = 20 * 1024 * 1024; // 20 Mb
+    std::size_t limit = 500 * 1024 * 1024; // 500 Mb


@springmeyer - I've changed logic to validate output buffer size rather then input, which makes more sense in my opinion.

…allowed.

springmeyer · 2018-09-27T15:30:40Z

include/gzip/decompress.hpp

    struct libdeflate_decompressor* decompressor_ = nullptr;

  public:
-    Decompressor(std::size_t max_bytes = 1000000000) // by default refuse operation if compressed data is > 1GB
+    Decompressor(std::size_t max_bytes = 2147483648u) // by default refuse operation if required uutput buffer is > 2GB


@artemp typo: uutput

springmeyer · 2018-09-27T15:31:49Z

include/gzip/decompress.hpp

        // https://github.com/kaorimatz/libdeflate-ruby/blob/0e33da96cdaad3162f03ec924b25b2f4f2847538/ext/libdeflate/libdeflate_ext.c#L340
        // https://github.com/ebiggers/libdeflate/commit/5a9d25a8922e2d74618fba96e56db4fe145510f4
        std::size_t actual_size;
-        std::size_t uncompressed_size_guess = size * 4;
-        output.reserve(uncompressed_size_guess);
+        std::size_t uncompressed_size_guess = std::min(size * 4, max_);


What happens when size * 4 will not fit within std::numeric_limits<std::size_t>::max()?

springmeyer · 2018-09-27T17:38:07Z

@artemp all the changes look good. I made a few minor comments inline.

One other high level thought: check out #27 which may be of interest as you find bugs or find ways to improve the code in general (non-specific to this libdeflate branch). If you do have generally applicable changes I think it would be ideal to keep track of these and attempt to land them in a separate PR.

artemp · 2018-09-27T17:40:54Z

@springmeyer - thanks for the feedback, I'll reply asap. Going through #27 looks like a good plan.

springmeyer · 2018-10-08T20:52:30Z

@artemp I noticed that the -Weffc++ warning that is used downstream is triggering due to code in this branch. See:

In file included from ../src/vtcomposite.cpp:7:0:
/home/travis/build/mapbox/vtcomposite/../gzip-hpp/include/gzip/compress.hpp:13:7: error: ‘class gzip::Compressor’ has pointer data members [-Werror=effc++]
 class Compressor
       ^~~~~~~~~~
/home/travis/build/mapbox/vtcomposite/../gzip-hpp/include/gzip/compress.hpp:13:7: error:   but does not override ‘gzip::Compressor(const gzip::Compressor&)’ [-Werror=effc++]
/home/travis/build/mapbox/vtcomposite/../gzip-hpp/include/gzip/compress.hpp:13:7: error:   or ‘operator=(const gzip::Compressor&)’ [-Werror=effc++]
In file included from ../src/vtcomposite.cpp:8:0:
/home/travis/build/mapbox/vtcomposite/../gzip-hpp/include/gzip/decompress.hpp:13:7: error: ‘class gzip::Decompressor’ has pointer data members [-Werror=effc++]
 class Decompressor
       ^~~~~~~~~~~~
/home/travis/build/mapbox/vtcomposite/../gzip-hpp/include/gzip/decompress.hpp:13:7: error:   but does not override ‘gzip::Decompressor(const gzip::Decompressor&)’ [-Werror=effc++]
/home/travis/build/mapbox/vtcomposite/../gzip-hpp/include/gzip/decompress.hpp:13:7: error:   or ‘operator=(const gzip::Decompressor&)’ [-Werror=effc++]

at https://travis-ci.org/mapbox/vtcomposite/jobs/438830669#L986.

@artemp could you fix this warning/error? Perhaps it could be fixed by making the class noncopyable? You should be able to replicate locally with g++ and adding -Weffc++ (note: clang++ does not do anything with this option).

springmeyer · 2018-10-09T04:59:05Z

@artemp one more need that I noticed: we have a downstream consumer mapbox-maps that needs to support zlib-coded data as well as gzip-coded data. So, could you:

add a test for zlib-coded data (we must be missing one?)
ensure that this branch works to decompress it. This will likely require dynamically switching between libdeflate_gzip_decompress or libdeflate_zlib_decompress (or maybe libdeflate_deflate_decompress_ex?)

artemp · 2018-10-09T10:45:22Z

@artemp I noticed that the -Weffc++ warning that is used downstream is triggering due to code in this branch. See:

In file included from ../src/vtcomposite.cpp:7:0:
/home/travis/build/mapbox/vtcomposite/../gzip-hpp/include/gzip/compress.hpp:13:7: error: ‘class gzip::Compressor’ has pointer data members [-Werror=effc++]
 class Compressor
       ^~~~~~~~~~
/home/travis/build/mapbox/vtcomposite/../gzip-hpp/include/gzip/compress.hpp:13:7: error:   but does not override ‘gzip::Compressor(const gzip::Compressor&)’ [-Werror=effc++]
/home/travis/build/mapbox/vtcomposite/../gzip-hpp/include/gzip/compress.hpp:13:7: error:   or ‘operator=(const gzip::Compressor&)’ [-Werror=effc++]
In file included from ../src/vtcomposite.cpp:8:0:
/home/travis/build/mapbox/vtcomposite/../gzip-hpp/include/gzip/decompress.hpp:13:7: error: ‘class gzip::Decompressor’ has pointer data members [-Werror=effc++]
 class Decompressor
       ^~~~~~~~~~~~
/home/travis/build/mapbox/vtcomposite/../gzip-hpp/include/gzip/decompress.hpp:13:7: error:   but does not override ‘gzip::Decompressor(const gzip::Decompressor&)’ [-Werror=effc++]
/home/travis/build/mapbox/vtcomposite/../gzip-hpp/include/gzip/decompress.hpp:13:7: error:   or ‘operator=(const gzip::Decompressor&)’ [-Werror=effc++]

at https://travis-ci.org/mapbox/vtcomposite/jobs/438830669#L986.

@artemp could you fix this warning/error? Perhaps it could be fixed by making the class noncopyable? You should be able to replicate locally with g++ and adding -Weffc++ (note: clang++ does not do anything with this option).

Yep, just deleting copy constructor and copy assignment operator would do. Done in b4684ca

artemp · 2018-10-09T11:26:58Z

@artemp one more need that I noticed: we have a downstream consumer mapbox-maps that needs to support zlib-coded data as well as gzip-coded data. So, could you:
* [ ]  add a test for zlib-coded data (we must be missing one?)

* [ ]  ensure that this branch works to decompress it. This will likely require dynamically switching between `libdeflate_gzip_decompress` or `libdeflate_zlib_decompress` (or maybe `libdeflate_deflate_decompress_ex`?)

@springmeyer - I've no idea why this module called gzip-hpp.

I'm not buying 'for historical reasons', let's make it right, now, and avoid perpetual confusions!
deflate-hpp? libdeflate-hpp? libcoagulate? libvilipend ? Anything better ? Just lets not call apple oranges.

Once we're on the same page I'll add support for handling zlib (de)compression. In terms of APIs, I see two high level approaches. The first one is the correct one and the second is a 'lets-make-life-easier-by-hiding-implementation-switch-inside-a-class`. :)

Lets user decide (user is smart)

// This is not a working code just an illustration 
if (gzip) 
{
     deflate::gzip_decompressor obj{};
     // or
     deflate::decompressor<GZIP> obj{};
     // do the work
}
else if (zlib)
{
    deflate::zlib_decompressor obj{};
    // or
    deflate::decompressor<ZLIB> obj{};
    // do the work
}

This is how libdeflate is designed ^ offering maximum flexibility.

The second one

deflate::compressor obj{};
// do the work

Wondering if name choices here can be better:

Compressor::compress(...);
Decompressor::decompress(...);

?

Dane Springmeyer added 2 commits April 3, 2018 15:41

replace zlib with libdeflate

e08daa7

make format

17e9ace

springmeyer mentioned this pull request Apr 5, 2018

Blue sky/future: investigate alternative, high perf deflate implementions #9

Closed

GretaCB reviewed Apr 27, 2018

View reviewed changes

include/gzip/decompress.hpp Show resolved Hide resolved

springmeyer commented Apr 27, 2018

View reviewed changes

include/gzip/compress.hpp Show resolved Hide resolved

springmeyer commented Apr 27, 2018

View reviewed changes

include/gzip/compress.hpp Show resolved Hide resolved

springmeyer commented Apr 27, 2018

View reviewed changes

include/gzip/decompress.hpp Show resolved Hide resolved

springmeyer mentioned this pull request Jul 2, 2018

remove explicit linking to libz mapbox/vtquery#101

Merged

springmeyer mentioned this pull request Jul 14, 2018

Code review #27

Closed

springmeyer pushed a commit to mapbox/vtcomposite that referenced this pull request Jul 28, 2018

start porting to libdeflate - mapbox/gzip-hpp#25

c563018

need more space for vector tiles

a2fcbdb

springmeyer mentioned this pull request Jul 28, 2018

[WIP] Libdeflate mapbox/vtcomposite#83

Closed

need more space to avoid 'no space: did not succeed' error with mapbo…

5c0b86f

…x/vtcomposite#83

springmeyer requested a review from artemp September 26, 2018 04:53

springmeyer commented Sep 26, 2018

View reviewed changes

artemp suggested changes Sep 26, 2018

View reviewed changes

artemp approved these changes Sep 27, 2018

View reviewed changes

artemp added 4 commits September 27, 2018 11:10

use data() to access pointer to the first element of a container (c…

3cb55a8

…++11)

Iteratively grow output buffer capacity if "guessed" size is not suff…

32f4c39

…icient + use `data()`

set file size limit to ~500Mb

da1d85c

use resize to grow output buffer + remove bogus code and apply max …

72ca105

…size logic to output buffer + throw an exception if request to grow output buffer exceeds max_size (default 2GB).

artemp reviewed Sep 27, 2018

View reviewed changes

improve output buffer growth logic by ensuring size never exceed max …

16c669c

…allowed.

springmeyer commented Sep 27, 2018

View reviewed changes

springmeyer mentioned this pull request Sep 27, 2018

Travis temporarily broken #33

Closed

Dane Springmeyer and others added 8 commits September 28, 2018 10:08

bump to see if travis now works again

864905e

Update libdeflate to version 1.0

4b7ec6f

remove zlib specific test case

d9e64d1

Use meaningful names for template parameters (#27)

abfdc49

remove remaining gzip specific validation and test.

bd480a6

remove useless SECTION (#27)

487927c

make is_compressed noexcept (#27)

4e284d0

format

15205f2

Make Compressor/Decompressor noncopyable

b4684ca

Plantain mentioned this pull request Nov 18, 2018

Change DEFLATE from zlib to libdeflate mapbox/tippecanoe#674

Open

Libdeflate port #25

Are you sure you want to change the base?

Libdeflate port #25

Uh oh!

Conversation

springmeyer commented Apr 3, 2018 • edited by artemp Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

artemp Sep 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

artemp Sep 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

artemp left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

springmeyer commented Sep 27, 2018

Uh oh!

artemp commented Sep 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

springmeyer commented Oct 8, 2018

Uh oh!

springmeyer commented Oct 9, 2018

Uh oh!

artemp commented Oct 9, 2018

Uh oh!

artemp commented Oct 9, 2018

Uh oh!

Uh oh!

springmeyer commented Apr 3, 2018 •

edited by artemp

Loading

artemp Sep 26, 2018 •

edited

Loading

artemp Sep 27, 2018 •

edited

Loading

artemp left a comment •

edited

Loading

artemp commented Sep 27, 2018 •

edited

Loading