-
Notifications
You must be signed in to change notification settings - Fork 292
CP-308811: Add an option to limit the span depth in tracing #6607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
let new_baggage = | ||
baggage_of ctx | ||
|> Option.value ~default:[] | ||
|> List.remove_assoc k |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we manipulate this a lot we might want to replace this with a StringMap in the future, for now it is probably ok if we only have 1 or 2 keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be called update? Always try to remove the key if it exists first and add the new k/v pair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that update makes more sense here :)
ocaml/libs/tracing/tracing.ml
Outdated
if not t.enabled then | ||
ok_none (* Do not start span if the max depth has been reached *) | ||
else if parent_depth >= Atomic.get Spans.max_depth then ( | ||
debug "Max_span_depth limit reached, not creating span" ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be useful to add the parent id and name here, so we can identify what is missing.
@@ -599,6 +599,7 @@ let initialise_observer ~__context component = | |||
initialise_observer_component ~__context component | |||
|
|||
let initialise ~__context = | |||
Tracing.Spans.set_max_depth !Xapi_globs.max_span_depth ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a similar initialization in xenopsd?
If the xapi/xenopsd propagation is fixed then the depth limit might be reached inside xenopsd instead, and currentlythat would always default to 100?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See how set_max_spans is implemented, introduce a new Observer.set_... function and call it from initialize_observer_meta.
let new_baggage = | ||
baggage_of ctx | ||
|> Option.value ~default:[] | ||
|> List.remove_assoc k |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be called update? Always try to remove the key if it exists first and add the new k/v pair.
@@ -230,6 +232,20 @@ module TraceContext = struct | |||
|
|||
let baggage_of ctx = ctx.baggage | |||
|
|||
let baggage_depth_of ctx = | |||
Option.bind (baggage_of ctx) (List.assoc_opt depth_key) | |||
|> Option.value ~default:"1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is 1 a good default? Is the default not unlimited depth? Or has 1 special meaning? It seems there is quite a bit of int/string conversion and I assume that is because the this is a string/string map. If we need more structured data, the whole baggage could be a JSON value.
Do we have to do this conversion at every function boundary because we need to update the depth that is passed on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is 1 a good default?
Everything ought to have a depth baggage, which is derived from the parent's depth + 1. If this doesn't exist, it must be a new root span (no parent) and should have depth 1. If there were another situation where the depth was empty, we wouldn't know any other value to pick anyway and choosing 1 errs on the side of keeping spans rather than throwing them away.
Do we have to do this conversion at every function boundary because we need to update the depth that is passed on?
Yes, it's really a limitation of the Span.attributes which are string * string. It would be nice for depth to keep it as an int to avoid these conversions but I think other baggage values can use string.
Using JSON for the bagage (and attributes) does sound like a good future improvement, as that's what it is trying to represent really, and is how it ends up when it's exported to jaeger/bugtool
Moved to draft as the spans our still not always uploading properly. This appears to be caused by too many spans being submitted at once, which I'm fixing by submitting the spans in batches of 500. |
I would be good to structure large numbers like in |
3d39324
to
fe85fe3
Compare
Adds a new span.depth key to the trace context baggage, and a configurable max_span_depth. This defaults to 100 and so will not limit traces, but is useful when wanting to analyse large traces which can often become slow if all the traces are recorded. Signed-off-by: Steven Woods <[email protected]>
Http exporting appears to get overwhelmed when too many spans are exported at the same time. This adds the option to export a smaller chunk of spans at a time. This also reduces the size of the file exports as we only check for max file size after exporting all of the finished spans. Signed-off-by: Steven Woods <[email protected]>
fe85fe3
to
3943fb9
Compare
Adds a new span.depth key to the trace context baggage, and a configurable max_span_depth. This defaults to 100 and so will not limit traces (the traces I've seen with the most depth are ~40 depth e.g. https://jaeger.kfd.eng.citrite.net/trace/ea5ddca5509b3ae1102bc7279092652d), but is useful when wanting to analyse large traces which can often become slow if all the spans are recorded in a trace.
This isn't perfect, the span.depth seems to get lost sometimes between xapi and xenops, resulting in a greater depth than that listed, but I have created ticket CP-308999 for this and this works well enough to greatly reduce the number of spans in a trace when needed, which is the intention. As an example, a host evacuate trace with max_span_depth 10 goes down to ~1000 spans rather than the 34k+ withou a depth limit.