Introduce Dedicated Wire Protocol Subsystem for Message Parsing and Serialization #298
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation: The Current State
Currently, the responsibility for parsing and serializing PostgreSQL wire protocol messages is distributed across various parts of the application. This leads to ad-hoc handling in places like the backend and frontend logic, where we often resort to low-level byte manipulations directly on streams or buffers.
For example, in
admin/backend.rs
, we checkmessage.code() != 'Q'
to validate message types, ie. byte inspections, mixing concerns between protocol decoding and business logic.Similarly, scattered checks like peeking at individual bytes (e.g.,
if buf[0] == b'Z'
) forReadyForQuery
messages appear in connection handling, making the code harder to maintain, debug, and extend.While this approach has worked for our needs so far, it scatters protocol knowledge throughout the codebase, increasing the risk of inconsistencies, bugs in edge cases (like malformed packets), and making it tougher for new contributors to grasp the flow.
We should rein in complexity where we can.
The Proposed Change: A Dedicated
wire_protocol
ModuleI believe it's a natural evolution for PgDog, to introduce a dedicated subsystem focused solely on translating raw TCP byte streams into a stream of structured wire messages (and vice versa). This PR adds a new
wire_protocol
module that centralizes all protocol parsing and serialization logic.Note that this PR focuses solely on creating the module and defining the message structures—no existing code has been migrated to use these messages yet; that's planned for follow-up PRs.
Key components:
FrontendProtocolMessage
andBackendProtocolMessage
enums to represent all supported messages in a type-safe way. This builds toward a splitProtocolMessage
abstraction for bidirectional handling.to_bytes()
andfrom_bytes()
, ensuring consistent encoding/decoding without leaking byte-level details elsewhere.frontend
,backend
,shared_property_types
, and helpers, covering messages like Startup, Query, Bind, Authentication, etc.Importantly, the PostgreSQL wire protocol is a finite, well-established standard. Once this subsystem is fully implemented, it should be largely "done"—requiring zero future changes beyond occasional updates for new protocol versions or extensions.
Benefits
if message[7] == -1
(e.g., for null indicators in parameter values), we can work with expressive types likeBindFrame { parameters: Vec<Parameter> }
whereParameter::Binary
orParameter::Text
clearly convey intent. This reduces errors and makes intent obvious at a glance.wire_protocol
module, we deal only in high-level messages; inside, we handle bytes. No more half-parsed buffers floating around—this minimizes partial states and simplifies testing (e.g., unit tests for individual message roundtrips are now straightforward).bytes::Bytes
and borrowing where possible, we avoid unnecessary allocations in hot paths, though I've erred on the side of borrowing for now (more on this below).;
separated queries could be done at this level.Q "SELECT 1;SELECT 2;"
and not leak further into the code that we don't support double queries. (I don't know).Overall, this feels like a confident step toward making PgDog more robust and developer-friendly, without overcomplicating things.
Notes and Caveats
&'a [u8]
, etc.) to minimize allocations. If this causes lifetime headaches or perf issues down the line, switching to owned data (e.g.,Vec<u8>
for payloads) should be a straightforward refactor—happy to iterate based on feedback!ProtocolMessage
is using in application logic.Next Steps: Building the Full Pipeline
This subsystem lays the foundation for a structured protocol processing pipeline, enabling more sophisticated query routing, sharding, and interception in PgDog. Future work will integrate it into a multi-stage flow:
I'd love feedback on this approach—does it align with where we want to take PgDog? If there are better patterns or oversights, I'm all ears! 🚀
I did this overnight with a very tired brain, I might not have been operating at full mental capacity.
A lot of this might be cope for my smaller working memory, but I can't be the only one who would benefit from not interrupting my code scans to go find out what
message.byte()[7] == -1
means conceptually.