-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Component(s)
No response
Is your feature request related to a problem? Please describe.
There have been multiple instances of regressions related to unexpected metric names being emitted by SDK’s and/or received by the collector that were not caught during testing. Some, like 12918, have already been addressed by implementing further tests. But there are still cases where this is causing issues. For example, a recent collector version bump caused the collector to reject incoming metrics from the .NET SDK (description of issue here). At Grafana, this led to an internal incident. Although test suites exist, scenarios like this are quite hard to capture without full end to end testing.
Despite an incident occurring related to the backwards incompatible change linked above, we were able to understand the root cause with the help of an internally developed (but OSS) integration testing framework called OATs (OpenTelemetry Acceptance Tests). OATs has worked quite well for us to be able to check for any compatibility issues related to OTel libraries, within the context of the LGTM stack and OTel collector (e.g. for .NET).
Describe the solution you'd like
It would be useful to have a blackbox testing framework for end-to-end (application -> collector -> backend) testing. Although the OTel ecosystem does not include the database level in its scope, the framework should ensure that the collector is sending out valid data to the eventual backend storage. This framework could validate the contract for each of the steps that signals must travel through to land in the backend storage. This test setup would also make requests over a network so that we are testing the validity of information sent over the wire between each step.
If it makes sense, we would be happy to offer OATs to do the job - or to shape OATs into a tool that would be beneficial for the community - under the OTel umbrella.
Describe alternatives you've considered
So far we have not been able to find a framework or test suite within the OTel ecosystem that could help proactively detect version incompatibilities such as those described above. However, if something like this already exists, please do let us know.
Additional context
- Recent sig call where this was discussed
- Recent slack thread
- This issue, though more specifically related to the .NET SDK, was also brought up here
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1
or me too
, to help us triage it. Learn more here.