-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Use Cases
Thank you for this software!
When source or sinks make use of protobuf encoding/decoding, the ability to decode protowire
is missing.
When serializing protobuf, the go official library is suggesting to prefix them with a varint
, treating the message like another nested message (without tag though).
Some tools like ClickHouse are making use of length prefixed messages (eg: when consuming from Kafka):
ClickHouse inputs and outputs protobuf messages in the length-delimited format. It means before every message should be written its length as a varint. See also how to read/write length-delimited protobuf messages in popular languages.
I would like to suggest adding such framing option.
Attempted Solutions
Currently, Vector offers two ways of decoding protobuf with framing: byte
or length_delimited
.
In certain cases, the source making use of a byte
framing (eg: the buffer in a socket, file sources), there are risks a protobuf message may be "cut" or skipped (2 batched messages, only first one is decoded, rest is discarded).
Furthermore, a default/zero-length protobuf would be missed.
The length_delimited
setting is not necessarily standard for protobuf and is not retro-compatible with varint
.
sources:
example:
type: socket
mode: unix_stream
path: "mysock.socket"
decoding:
codec: protobuf
protobuf:
desc_file: "abc.desc"
message_type: "abc.ABC"
framing:
method: length_delimited # needs a uint32 prefix
Unfortunately, it's not possible to create a "wrapper" protobuf message since the tag
(1 in the example below) must be encoded as well as varint:
message DEF {
repeated ABC abc = 1;
}
Proposal
My suggestion would be the following for sources and sinks.
Either having the protobuf decoder assume it will read a varint
and consider it a length. This said, not sure if this could be one-to-many way of decoding messages (+ waiting for the rest of the bytes).
sources:
example:
...
decoding:
codec: protobuf
protobuf:
desc_file: "abc.desc"
message_type: "abc.ABC"
protowire: true
framing:
method: byte
or having a proper varint
in framing:
sources:
example:
...
decoding:
codec: protobuf
protobuf:
desc_file: "abc.desc"
message_type: "abc.ABC"
framing:
method: varint
Thank you!
References
No response
Version
vector 0.36.1 (2857180 2024-03-11 14:32:52.417737479)