-
Notifications
You must be signed in to change notification settings - Fork 177
Add filebeat registry to beat receiver diagnostics #9029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
6359e78
to
2d56e8e
Compare
This pull request is now in conflicts. Could you fix it? 🙏
|
# Conflicts: # internal/pkg/agent/application/coordinator/coordinator_test.go # internal/pkg/otel/manager/manager.go
2d56e8e
to
2e48fdf
Compare
|
💚 Build Succeeded
History
cc @swiatekm |
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have already reached out to you @swiatekm but for the shake I completion I am writing also here my findings so far; I did compile and run this on both mac and linux (tested only EDOT embedded mode) and both diagnostics do not contain a registry.tar.gz under filestream components when running with otel runtime, my cfg:
$ sudo /Library/Elastic/Agent-Development/elastic-agent inspect
agent:
logging:
to_stderr: true
monitoring:
_runtime_experimental: otel
enabled: true
inputs:
- _runtime_experimental: otel
file_identity:
native: null
id: filestream-filebeat
paths:
- /var/log/system.log
prospector:
scanner:
fingerprint:
enabled: false
type: filestream
use_output: default
- data_stream:
namespace: default
id: unique-system-metrics-input
streams:
- data_stream:
dataset: system.cpu
metricsets:
- cpu
- data_stream:
dataset: system.memory
metricsets:
- memory
- data_stream:
dataset: system.network
metricsets:
- network
- data_stream:
dataset: system.filesystem
metricsets:
- filesystem
type: system/metrics
use_output: default
outputs:
default:
hosts:
- [https://my-deployment-c67416.es.us-west2.gcp.elastic-cloud.com:443](https://my-deployment-c67416.es.us-west2.gcp.elastic-cloud.com/)
password: <REDACTED>
preset: balanced
type: elasticsearch
username: test
Also the following warn messages appear in my logs when I issue the diagnostics cmd
{"log.level":"warn","@timestamp":"2025-07-18T11:36:00.694Z","log.logger":"otel_manager","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/otel/manager.(*OTelManager).PerformComponentDiagnostics","file.name":"manager/diagnostics.go","file.line":126},"message":"error creating registry tar.gz: cannot stat '/Library/Elastic/Agent-Development/data/elastic-agent-9.2.0-SNAPSHOT-2e48fd/run/filestream-default/registry': 'stat /Library/Elastic/Agent-Development/data/elastic-agent-9.2.0-SNAPSHOT-2e48fd/run/filestream-default/registry: no such file or directory'","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2025-07-18T11:36:00.695Z","log.logger":"otel_manager","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/otel/manager.(*OTelManager).PerformComponentDiagnostics","file.name":"manager/diagnostics.go","file.line":126},"message":"error creating registry tar.gz: cannot stat '/Library/Elastic/Agent-Development/data/elastic-agent-9.2.0-SNAPSHOT-2e48fd/run/filestream-monitoring/registry': 'stat /Library/Elastic/Agent-Development/data/elastic-agent-9.2.0-SNAPSHOT-2e48fd/run/filestream-monitoring/registry: no such file or directory'","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
But I can clearly see that you added an integration test to validate that. So I did the following I commented the t.Run("filebeat process"...
and I run your integration test with a debugger, having added a breakpoint inside the func returned by testDiagnosticsFactory
. When my breakpoint hit, I run the following and I don't see the expected config
/tmp/TestBeatDiagnostics798991888/elastic-agent-9.2.0-SNAPSHOT-linux-arm64/elastic-agent inspect
agent:
logging:
to_stderr: true
inputs:
- data_stream:
namespace: default
id: unique-system-metrics-input
streams:
- data_stream:
dataset: system.cpu
metricsets:
- cpu
- data_stream:
dataset: system.memory
metricsets:
- memory
- data_stream:
dataset: system.network
metricsets:
- network
- data_stream:
dataset: system.filesystem
metricsets:
- filesystem
type: system/metrics
use_output: default
outputs:
default:
api_key: <REDACTED>
hosts:
- 127.0.0.1:9200
preset: balanced
type: elasticsearch
I will continue to see if somehow I am missing something but @swiatekm have a look as well
@pkoutsovasilis I suspect some funny business with filebeat receiver not writing the registry when it's supposed to in some circumstances. @leehinman any clue what could be causing this? I'm fairly confident my PR is correct in that it looks for the registry in the right place and includes it in diagnostics if it exists. |
It is very likely that the filebeat receiver registry was written to one of the metricbeat receiver's "data" directory. This is because of elastic/beats#44903. To verify just look in the "run" directory if you see something like:
that is exactly what is going on. only the If that is happening, this PR is blameless. |
I think the integration test is lying, and that I possibly tricked myself every time I tested this locally. The reason is that if you first run filebeat in process mode and don't clear the data afterwards, you'll be left with a state directory the diagnostics code will happily pick up, even though it wasn't actually written by the beats receiver. This does mean the PR is correct and would work if the path handling in libbeat were correct, but the feature as a whole doesn't work. For now, let's hold off on merging it. |
just for tracking. blocked by #8207 |
What does this PR do?
Adds the filebeat registry to filebeat receiver diagnostics. This involves copying the logic from filebeat itself, with some improvements.
Why is it important?
Diagnostics for beats receivers should contain the same information as they do for beats processes.
Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration files[ ] I have added an entry in./changelog/fragments
using the changelog toolHow to test this PR locally
Run an agent with self-monitoring in otel mode and collect diagnostics from it. You should see a
registry.tar.gz
file in thefilestream-monitoring
directory, containing the registry files.Related issues