Skip to content

Inaccurate claims in readme about competing formats #15

@kentonv

Description

@kentonv

In the readme, in comparisons against each of Avro, Protobuf, JSON, and Flatbuffers, each comparison lists "Safely handle untrusted data" as a benefit of Noproto. I can't speak for Avro or FlatBuffers, but Protobuf is regularly used to handle untrusted data, and JSON is obviously used by basically everyone to handle untrusted data. So, these bullet points seem incorrect.

Later on, the readme says this:

When to use Flatbuffers / Bincode / CapN Proto

If you can safely compile all your data types into your application, all the buffers/data is trusted, and you don't intend to mutate buffers after they're created, Bincode/Flatbuffers/CapNProto is a better choice for you.

I can't speak for Faltbuffers or Bincode, but Cap'n Proto is explicitly designed for and heavily used in sandboxing scenarios involving unstruted data. In particular, Cap'n Proto was originally built to serve as the protocol that Sandstorm.io uses to communicate with untrusted apps running in secure sandboxes. It is believed to be secure for such use cases.

The statement also seems to imply that Cap'n Proto can only operate on schemas known at compile time. This is incorrect: Cap'n Proto features a mechanism for loading schemas dynamically and using them to operate on types with a reflection-like API. Incidentally, Protobuf features a very similar API. Practical experience has shown that these features are almost never used: people almost always know their schemas at compile time, and the benefits of compile-time type checking and optimizations are large. But dynamic schema loading can be useful in some niche scenarios.

Finally, regarding "mutation": Cap'n Proto actually supports mutation, with caveats. Fixed-width fields can be mutated in-place, but variable-width data will require new allocations which are always added to the end of the message. Removing an object will leave a zero'd "hole" in the data, which can only be reclaimed by making a full-tree copy. In principle, a Cap'n Proto implementation could use a complex memory allocation algorithm that is able to reuse holes in-place for later allocations of same or smaller size, but at present no implementation bothers to attempt this. This problem is generally fundamental to zero-copy formats, due to the need to allocate contiguous memory for each message. If NoProto has solved it somehow, it would be great to see some explanation of how; I couldn't immediately find it in the docs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions