-
Notifications
You must be signed in to change notification settings - Fork 15
Introduce TraceData to unify text and binary data #1247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Also move Span structures to v04, to make space for v1 spans (which will eventually become the new default). TraceData is also going to be used in the V1 implementation, to carry around byte arrays and strings alike, separate from the indexed offsets into the big vector. Signed-off-by: Bob Weinand <[email protected]>
BenchmarksComparisonBenchmark execution time: 2025-10-01 12:00:29 Comparing candidate commit 2f92752 in PR branch Found 1 performance improvements and 0 performance regressions! Performance is the same for 52 metrics, 2 unstable metrics. scenario:benching deserializing traces from msgpack to their internal representation
CandidateCandidate benchmark detailsGroup 1
Group 2
Group 3
Group 4
Group 5
Group 6
Group 7
Group 8
Group 9
Group 10
Group 11
Group 12
Group 13
Group 14
Group 15
BaselineOmitted due to size. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1247 +/- ##
==========================================
- Coverage 71.66% 71.58% -0.09%
==========================================
Files 355 357 +2
Lines 56312 56252 -60
==========================================
- Hits 40358 40266 -92
- Misses 15954 15986 +32
🚀 New features to boost your workflow:
|
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
datadog-trace-utils/src/span/mod.rs
Outdated
} | ||
/// TraceData implementation using `Bytes` and `BytesString`. | ||
#[derive(Default, Debug, Clone, PartialEq, Serialize)] | ||
pub struct TinyData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about the name. If you have better suggestions, feel free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think BytesData
is better.
datadog-trace-utils/src/span/mod.rs
Outdated
/// Trait representing a tuple of (Text, Bytes) types used for different underlying data structures. | ||
/// Note: The functions are internal to the msgpack decoder and should not be used directly: they're | ||
/// only exposed here due to the inavailability of min_specialization in stable Rust. | ||
pub trait TraceData: Default + Debug + Clone + PartialEq + Serialize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the Clone
and PartialEq
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PartialEq is necessary because v04::Span is PartialEq. And yeah, generic types need to have all these derives of the context they are used in, otherwise #[derive] doesn't work.
I have no idea though why I added Clone. Removing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, Clone is needed for some cases where we have to clone spans (e.g. benchmarks and some tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, would be nice to mark PartialEq
and Clone
as test convenience things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How? Like by a comment? or a cfg_attr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if there's a best practice for conditional trait requirements. You can of course make some other trait which is defined twice behind a #[cfg(test)]
gates, one with and one without the extra requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's important enough to warrant conditional trait requirements. Was just an observation that it's only used from tests as far as I can see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, cfg_attr doesn't work anyway, that'd need at least a feature flag. That's ugly.
06bacc7
to
9bbe018
Compare
Signed-off-by: Bob Weinand <[email protected]>
9bbe018
to
d207172
Compare
Signed-off-by: Bob Weinand <[email protected]>
Signed-off-by: Bob Weinand <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the TraceData trait, it's a nice addition and makes even more sense with V1.
The span::Span
was supposed to be a common interface for spans regardless of the encoding. It made sense with v04 and v05 as they have basically the same fields but since v1 has different fields (e.g. attributes vs meta/metrics) we probably won't be able to share a common representation. This also means we will have to reimplement all trace utils for v1 span.
Regarding the changes to the decoder I'm not a fan of having unsafe functions implemented on the SpanBytes. Also I think these changes should be addressed in a separate PR as they're not tied to the addition of TraceData.
state.serialize_field("array_value", &wrapped_value)?; | ||
} | ||
} | ||
fn get_mut_slice(buf: &mut Self::Bytes) -> &mut &'static [u8]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like these functions should be implemented on the type of the buffer rather than on the TraceData type. Maybe in Buffer(B)
B shouldn't be T::Bytes but a different type that implements a trait to be deserialized to Span<(T::Text,T::Bytes)>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's mostly for convenience for now, so that Buffer -> T::Text works trivially. I wanted to not overcomplicate it, unless there's a real need for that. But it should not be overly problematic to change that in future if needed (e.g. you could then add a BufferData trait which holds (Out: TraceData, In: BufferInput) or such). Thanks to this already being T, it would be then a simple find T: TraceData
& replace by a new T: BufferData
operation at most places and returning T::Out::Text/Bytes then.
/// `T` is the type used to represent strings in the span, it can be either owned (e.g. BytesString) | ||
/// or borrowed (e.g. &str). To define a generic function taking any `Span<T>` you can use the | ||
/// [`SpanValue`] trait: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc needs to be updated to match the change to TraceData
|
||
/// Internal Buffer used to wrap msgpack data for decoding. | ||
/// Provides a couple accessors to extract data from the buffer. | ||
pub struct Buffer<T: TraceData>(T::Bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we use &[u8]
for the buffer ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to at least store the Bytes reference to construct the Bytes/BytesString object.
use zstd::stream::write::Encoder; | ||
|
||
#[derive(Debug, Clone)] | ||
#[derive(Debug)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to send_data are not linked to TraceData. Could you put them in a separate PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are, because it relied on Clone on the Span, and TraceData is not meant to be Clone outside of testing. (but these changes are also relatively minor)
}) | ||
#[inline] | ||
fn get_mut_slice<'b>(buf: &'b mut &'a [u8]) -> &'b mut &'static [u8] { | ||
unsafe { std::mem::transmute::<&'b mut &[u8], &'b mut &'static [u8]>(buf) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is unsafe and should at least be marked as such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the generic variant on the trait probably should be.
My plan is to have a projection with some functions which is compatible with all versions of the spans. Meaning that adding a float attribute will reroute to metrics, everything else to string. That's why moving the span to v04 (and add a new span to v1), then we can add a common projection which routes accordingly, agnostic to the actual backend span used. I definitely want to avoid re-implementing everything. |
They sort of are, because the point of this PR is to also make use of TraceData, especially the meta_struct part making use of TraceData::Bytes. |
Also move Span structures to v04, to make space for v1 spans (which will eventually become the new default). TraceData is also going to be used in the V1 implementation, to carry around byte arrays and strings alike, separate from the indexed offsets into the big vector.
This trivially allows directly working on tinybytes data or u8 slices, depending on the input, rather than requiring a copy-conversion at the end for tinybytes input. This is a major performance penalty (benchmarks say about 20% gain - I'm really confused why the regression introduced by #1004 was deemed acceptable).
This also addresses "APMSP-1941 - Replace
Bytes
with a wrapper that borrows the underlying".Note that this PR partially reverts #1139 by @shreyamalpani - but instead provides the ability to pass an endpoint directly when sending.
This was done by the deliberate choice to remove Clone from Span, which is quite an expensive operation and should simply never be done. Especially the goal is for v1 spans to possibly store the backing storage for strings alongside the actual TracerPayload (which v1 has), at which point such an operation would be prohibitively expensive.