Generate payload for multiple datagrams at once#609
Conversation
0148c52 to
423532e
Compare
|
Performance analysis for using memmove: A tiny benchmark on Zen 3 (Ryzen 7 5700G) tells us that, for each type of copy size and method, following clocks are needed.
Assume we are building 4 datagrams at once. If we interpret these numbers naively, it means that the copying overhead of using However, if we convert these numbers to per-byte overhead, the difference is 0.026 clock / byte, which is pretty small, if not negligible. Also, To paraphrase, the difference is small and there are unknowns that causes hesitation to change the API. note a: To emulate the use case, we measured the throughput of memmove doing backward copies with tiny distances between the destination and the source addresses. |
…frame space can be allocated speculatively
FWIW we did try this, however it turned out to be slower, most likely due to the overhead of |
| /* If only a STREAM frame was to be built but `on_send_emit` returned BLOCKED, we might have built zero frames. Assuming | ||
| * that it is rare to see BLOCKED, send a PADDING-only packet (TODO skip sending the packet at all) */ | ||
| if (s->dst == s->dst_payload_from) | ||
| *s->dst++ = QUICLY_FRAME_TYPE_PADDING; |
There was a problem hiding this comment.
This is ugly, though, when h2o is the application, it might not matter in practice due to low probability of on_send_emit returning BLOCKED.
Do we want to fix the TODO before merging this PR?
Up until now,
on_send_emitcallback has been invoked for each STREAM frame being built. This has become a bottleneck, due to two reasons:preadfor each call toon_send_emit.To mitigate the issuse, this PR refactors the
quicly_send_streamfunction to generate STREAM frames for as much as 10 packets at once.This PR calls the
on_send_emitcallback that already exists, and scatters the data being read by callingmemmove.There are two alternatives that we might consider:
readv) that match to the payload section of multiple STREAM frames being generated.It might turn out that we'd want to try these alternatives, but they require changes to the API. Therefore, as the first cut, we are trying the approach using
memmove.