Skip to content

Document on-disk representation of bitshuffled data #148

@graeme-winter

Description

@graeme-winter

I got some way reverse-engineering the format so that I can do the bitshuffle independently of lz4 in my application but kept stubbing my toes - some clear documentation on how it is used would be very useful for non-canonical implementations.

For example: it would appear that the on disk representation takes the form of

BE uint32_t compressed_block_size <compressed block> BE uint32_t compressed_block_size <compressed block> BE uint32_t compressed_block_size <compressed block> ...

where <compressed_block> is the result of previously compressing 8192 bytes, then there is a partial block which is smaller, finally a (looks like) verbatim uncompressed teeny bit at the end which is some residual. I could try compressing and then unpacking arbitrary bit patterns to resolve this but it feels like some canonical definition of the on-disk format (beyond, of course, reading the source code) would be a useful addition to this library.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions