Skip to content

CP-308811: Add an option to limit the span depth in tracing #6607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

snwoods
Copy link
Contributor

@snwoods snwoods commented Jul 28, 2025

Adds a new span.depth key to the trace context baggage, and a configurable max_span_depth. This defaults to 100 and so will not limit traces (the traces I've seen with the most depth are ~40 depth e.g. https://jaeger.kfd.eng.citrite.net/trace/ea5ddca5509b3ae1102bc7279092652d), but is useful when wanting to analyse large traces which can often become slow if all the spans are recorded in a trace.

This isn't perfect, the span.depth seems to get lost sometimes between xapi and xenops, resulting in a greater depth than that listed, but I have created ticket CP-308999 for this and this works well enough to greatly reduce the number of spans in a trace when needed, which is the intention. As an example, a host evacuate trace with max_span_depth 10 goes down to ~1000 spans rather than the 34k+ withou a depth limit.

let new_baggage =
baggage_of ctx
|> Option.value ~default:[]
|> List.remove_assoc k
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we manipulate this a lot we might want to replace this with a StringMap in the future, for now it is probably ok if we only have 1 or 2 keys

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be called update? Always try to remove the key if it exists first and add the new k/v pair.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that update makes more sense here :)

if not t.enabled then
ok_none (* Do not start span if the max depth has been reached *)
else if parent_depth >= Atomic.get Spans.max_depth then (
debug "Max_span_depth limit reached, not creating span" ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be useful to add the parent id and name here, so we can identify what is missing.

@@ -599,6 +599,7 @@ let initialise_observer ~__context component =
initialise_observer_component ~__context component

let initialise ~__context =
Tracing.Spans.set_max_depth !Xapi_globs.max_span_depth ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a similar initialization in xenopsd?
If the xapi/xenopsd propagation is fixed then the depth limit might be reached inside xenopsd instead, and currentlythat would always default to 100?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See how set_max_spans is implemented, introduce a new Observer.set_... function and call it from initialize_observer_meta.

let new_baggage =
baggage_of ctx
|> Option.value ~default:[]
|> List.remove_assoc k
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be called update? Always try to remove the key if it exists first and add the new k/v pair.

@@ -230,6 +232,20 @@ module TraceContext = struct

let baggage_of ctx = ctx.baggage

let baggage_depth_of ctx =
Option.bind (baggage_of ctx) (List.assoc_opt depth_key)
|> Option.value ~default:"1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is 1 a good default? Is the default not unlimited depth? Or has 1 special meaning? It seems there is quite a bit of int/string conversion and I assume that is because the this is a string/string map. If we need more structured data, the whole baggage could be a JSON value.

Do we have to do this conversion at every function boundary because we need to update the depth that is passed on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is 1 a good default?

Everything ought to have a depth baggage, which is derived from the parent's depth + 1. If this doesn't exist, it must be a new root span (no parent) and should have depth 1. If there were another situation where the depth was empty, we wouldn't know any other value to pick anyway and choosing 1 errs on the side of keeping spans rather than throwing them away.

Do we have to do this conversion at every function boundary because we need to update the depth that is passed on?

Yes, it's really a limitation of the Span.attributes which are string * string. It would be nice for depth to keep it as an int to avoid these conversions but I think other baggage values can use string.

Using JSON for the bagage (and attributes) does sound like a good future improvement, as that's what it is trying to represent really, and is how it ends up when it's exported to jaeger/bugtool

@snwoods snwoods marked this pull request as draft August 4, 2025 10:25
@snwoods
Copy link
Contributor Author

snwoods commented Aug 4, 2025

Moved to draft as the spans our still not always uploading properly. This appears to be caused by too many spans being submitted at once, which I'm fixing by submitting the spans in batches of 500.

@lindig
Copy link
Contributor

lindig commented Aug 6, 2025

I would be good to structure large numbers like in 1_000_000 to make them easier to read.

@snwoods snwoods force-pushed the private/stevenwo/CP-308811 branch from 3d39324 to fe85fe3 Compare August 6, 2025 16:26
snwoods added 2 commits August 6, 2025 17:39
Adds a new span.depth key to the trace context baggage, and a
configurable max_span_depth. This defaults to 100 and so will not limit
traces, but is useful when wanting to analyse large traces which can
often become slow if all the traces are recorded.

Signed-off-by: Steven Woods <[email protected]>
Http exporting appears to get overwhelmed when too many spans are
exported at the same time. This adds the option to export a smaller
chunk of spans at a time. This also reduces the size of the file exports
as we only check for max file size after exporting all of the finished
spans.

Signed-off-by: Steven Woods <[email protected]>
@snwoods snwoods force-pushed the private/stevenwo/CP-308811 branch from fe85fe3 to 3943fb9 Compare August 6, 2025 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants