Skip to content

💡 [REQUEST] - Better logging and tracing #475

@lvaylet

Description

@lvaylet

Summary

Following up on #441, it appears some of the telemetry required to troubleshoot random issues might be missing. Take this opportunity to rethink the metrics/logs/traces collected by the SLO Generator?

Basic Example

I am a huge fan of Chapter 4 in the excellent Zero to Production in Rust. The whole chapter is about Telemetry. The author starts with basic logging, then attaches Request IDs to every log (so he can correlate entries that show up in a random order in the logging service), then ultimately decides to use traces to track individual requests (to get the context automatically, without adding it explicitly). I feel like the same principle can be applied to each request to the SLO Generator API, or to each request to a backend/exporter. Traces could replace or extend the existing logs, and make troubleshooting much easier without having to enable the (very verbose) Debug mode with DEBUG=1.

A great opportunity to migrate to an agnostic stack like OpenTelemetry for metrics, logs and traces, with all these data exported to stdout/stderr and/or the OpenTelemetry Collector over the OpenTelemetry Protocol (OLTP). On GCP, Cloud Run supports sidecars for such a model, and the OpenTelemtry Collector can easily export to Cloud Operations.

Screenshots

No response

Drawbacks

Might require a significant rework, as well as the approval of existing users who rely on the logs themselves or on log-based metrics extracted from the log entries with regular expressions.

Unresolved questions

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions