Skip to content

Add filebeat registry to beat receiver diagnostics #9029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

swiatekm
Copy link
Contributor

@swiatekm swiatekm commented Jul 16, 2025

What does this PR do?

Adds the filebeat registry to filebeat receiver diagnostics. This involves copying the logic from filebeat itself, with some improvements.

Why is it important?

Diagnostics for beats receivers should contain the same information as they do for beats processes.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

How to test this PR locally

Run an agent with self-monitoring in otel mode and collect diagnostics from it. You should see a registry.tar.gz file in the filestream-monitoring directory, containing the registry files.

agent:
  logging:
    to_stderr: true
  monitoring:
    _runtime_experimental: otel
    enabled: true
inputs:
- data_stream:
    namespace: default
  id: unique-system-metrics-input
  streams:
  - data_stream:
      dataset: system.cpu
    metricsets:
    - cpu
  type: system/metrics
  use_output: default
outputs:
  default:
    api_key: <REDACTED>
    hosts:
    - 127.0.0.1:9200
    type: elasticsearch

Related issues

@swiatekm swiatekm force-pushed the feat/otel-diagnostics-filebeat branch from 6359e78 to 2d56e8e Compare July 16, 2025 16:11
@swiatekm swiatekm added skip-changelog backport-8.19 Automated backport to the 8.19 branch enhancement New feature or request labels Jul 16, 2025
Base automatically changed from feat/otel-diagnostics to main July 17, 2025 08:27
Copy link
Contributor

mergify bot commented Jul 17, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b feat/otel-diagnostics-filebeat upstream/feat/otel-diagnostics-filebeat
git merge upstream/main
git push upstream feat/otel-diagnostics-filebeat

@swiatekm swiatekm force-pushed the feat/otel-diagnostics-filebeat branch from 2d56e8e to 2e48fdf Compare July 17, 2025 08:47
@swiatekm swiatekm marked this pull request as ready for review July 17, 2025 09:16
@swiatekm swiatekm requested a review from a team as a code owner July 17, 2025 09:16
Copy link

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @swiatekm

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Jul 18, 2025
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Copy link
Contributor

@pkoutsovasilis pkoutsovasilis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already reached out to you @swiatekm but for the shake I completion I am writing also here my findings so far; I did compile and run this on both mac and linux (tested only EDOT embedded mode) and both diagnostics do not contain a registry.tar.gz under filestream components when running with otel runtime, my cfg:

$ sudo /Library/Elastic/Agent-Development/elastic-agent inspect
agent:
  logging:
    to_stderr: true
  monitoring:
    _runtime_experimental: otel
    enabled: true
inputs:
- _runtime_experimental: otel
  file_identity:
    native: null
  id: filestream-filebeat
  paths:
  - /var/log/system.log
  prospector:
    scanner:
      fingerprint:
        enabled: false
  type: filestream
  use_output: default
- data_stream:
    namespace: default
  id: unique-system-metrics-input
  streams:
  - data_stream:
      dataset: system.cpu
    metricsets:
    - cpu
  - data_stream:
      dataset: system.memory
    metricsets:
    - memory
  - data_stream:
      dataset: system.network
    metricsets:
    - network
  - data_stream:
      dataset: system.filesystem
    metricsets:
    - filesystem
  type: system/metrics
  use_output: default
outputs:
  default:
    hosts:
    - [https://my-deployment-c67416.es.us-west2.gcp.elastic-cloud.com:443](https://my-deployment-c67416.es.us-west2.gcp.elastic-cloud.com/)
    password: <REDACTED>
    preset: balanced
    type: elasticsearch
    username: test

Also the following warn messages appear in my logs when I issue the diagnostics cmd

{"log.level":"warn","@timestamp":"2025-07-18T11:36:00.694Z","log.logger":"otel_manager","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/otel/manager.(*OTelManager).PerformComponentDiagnostics","file.name":"manager/diagnostics.go","file.line":126},"message":"error creating registry tar.gz: cannot stat '/Library/Elastic/Agent-Development/data/elastic-agent-9.2.0-SNAPSHOT-2e48fd/run/filestream-default/registry': 'stat /Library/Elastic/Agent-Development/data/elastic-agent-9.2.0-SNAPSHOT-2e48fd/run/filestream-default/registry: no such file or directory'","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2025-07-18T11:36:00.695Z","log.logger":"otel_manager","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/otel/manager.(*OTelManager).PerformComponentDiagnostics","file.name":"manager/diagnostics.go","file.line":126},"message":"error creating registry tar.gz: cannot stat '/Library/Elastic/Agent-Development/data/elastic-agent-9.2.0-SNAPSHOT-2e48fd/run/filestream-monitoring/registry': 'stat /Library/Elastic/Agent-Development/data/elastic-agent-9.2.0-SNAPSHOT-2e48fd/run/filestream-monitoring/registry: no such file or directory'","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"} 

But I can clearly see that you added an integration test to validate that. So I did the following I commented the t.Run("filebeat process"... and I run your integration test with a debugger, having added a breakpoint inside the func returned by testDiagnosticsFactory. When my breakpoint hit, I run the following and I don't see the expected config

/tmp/TestBeatDiagnostics798991888/elastic-agent-9.2.0-SNAPSHOT-linux-arm64/elastic-agent inspect
agent:
  logging:
    to_stderr: true
inputs:
- data_stream:
    namespace: default
  id: unique-system-metrics-input
  streams:
  - data_stream:
      dataset: system.cpu
    metricsets:
    - cpu
  - data_stream:
      dataset: system.memory
    metricsets:
    - memory
  - data_stream:
      dataset: system.network
    metricsets:
    - network
  - data_stream:
      dataset: system.filesystem
    metricsets:
    - filesystem
  type: system/metrics
  use_output: default
outputs:
  default:
    api_key: <REDACTED>
    hosts:
    - 127.0.0.1:9200
    preset: balanced
    type: elasticsearch

I will continue to see if somehow I am missing something but @swiatekm have a look as well

@swiatekm
Copy link
Contributor Author

@pkoutsovasilis I suspect some funny business with filebeat receiver not writing the registry when it's supposed to in some circumstances. @leehinman any clue what could be causing this? I'm fairly confident my PR is correct in that it looks for the registry in the right place and includes it in diagnostics if it exists.

@leehinman
Copy link
Contributor

@pkoutsovasilis I suspect some funny business with filebeat receiver not writing the registry when it's supposed to in some circumstances. @leehinman any clue what could be causing this? I'm fairly confident my PR is correct in that it looks for the registry in the right place and includes it in diagnostics if it exists.

It is very likely that the filebeat receiver registry was written to one of the metricbeat receiver's "data" directory. This is because of elastic/beats#44903. To verify just look in the "run" directory if you see something like:

# find .
./beat/metrics-monitoring/registry
./beat/metrics-monitoring/registry/filebeat
./beat/metrics-monitoring/registry/filebeat/meta.json
./beat/metrics-monitoring/registry/filebeat/active.dat
./beat/metrics-monitoring/registry/filebeat/2.json
./beat/metrics-monitoring/registry/filebeat/log.json
...

that is exactly what is going on. only the filestream-monitoring directory should have a registry/filebeat directory.

If that is happening, this PR is blameless.

@swiatekm
Copy link
Contributor Author

I think the integration test is lying, and that I possibly tricked myself every time I tested this locally. The reason is that if you first run filebeat in process mode and don't clear the data afterwards, you'll be left with a state directory the diagnostics code will happily pick up, even though it wasn't actually written by the beats receiver. This does mean the PR is correct and would work if the path handling in libbeat were correct, but the feature as a whole doesn't work.

For now, let's hold off on merging it.

@leehinman
Copy link
Contributor

For now, let's hold off on merging it.

just for tracking. blocked by #8207

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.19 Automated backport to the 8.19 branch enhancement New feature or request skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants