[kernel-signals] Add syscall latency, lock contention, and PSI checks#48705
Draft
scottopell wants to merge 6 commits intoq-branch-observerfrom
Draft
[kernel-signals] Add syscall latency, lock contention, and PSI checks#48705scottopell wants to merge 6 commits intoq-branch-observerfrom
scottopell wants to merge 6 commits intoq-branch-observerfrom
Conversation
Add a host-level Pressure Stall Information (PSI) core check that reads
/proc/pressure/{cpu,memory,io} and emits system.pressure.* metrics.
- Parses avg10, avg60, avg300 and total stall microseconds
- Emits both "some" and "full" variants for memory and io
- Gracefully skips on kernels without PSI support (< 4.20)
- Includes unit tests with fixture-based /proc/pressure parsing
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add an eBPF-based kernel lock contention check that attaches to lock_contention_begin/end tracepoints and measures per-lock hold times. - eBPF program tracks lock acquire/release timestamps per TID - System-probe module exposes aggregated lock contention stats - Agent check queries system-probe and emits ebpf.lock_contention_ns - Graceful degradation via IgnoreStartupError for missing tracepoints - Includes per-CPU array optimization and FD mapping diagnostics Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve build issues from cherry-picking lock contention onto the observer branch: - Fix WriteAsJSON signature (no request param on this branch) - Remove noisyneighbor/injector references not present on this branch - Keep lock_contention_check module registration and config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…upport Adds a new system-probe module and agent check that tracks per-syscall latency using raw tracepoints (sys_enter/sys_exit, kernel >= 4.17, no BTF). Tracks 17 syscalls: read, write, pread64, pwrite64, poll, select, mmap, munmap, connect, accept, accept4, futex, epoll_wait, epoll_pwait, clone, execve, io_uring. Per-container tagging: the eBPF stats map is keyed by (cgroup_name, slot) so each container gets its own per-syscall counters. get_cgroup_name() is called at sys_enter and stored in the tid_entry alongside the timestamp. The Go probe applies cgroups.ContainerFilter to extract container IDs. arm64 compat: classify_syscall() now has two arch-guarded switch tables (bpf_target_x86 / bpf_target_arm64) with correct syscall numbers for each arch. Unsupported arches get a #error at compile time. Metrics emitted per (syscall, container_id) tuple: system.syscall.latency.total - monotonic total ns system.syscall.latency.count - monotonic call count system.syscall.latency.max - per-interval max ns (reset each flush) system.syscall.latency.slow_count - calls > 1ms Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Remove ConfigNamespaces field (dropped from module.Factory) - Fix utils.WriteAsJSON call signature (req moved to first argument) Applies to both lock_contention_check and syscall_latency_check modules. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Contributor
Go Package Import DifferencesBaseline: e5b320d
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three new kernel-signal checks feeding the observer's anomaly detection pipeline:
read,write,pread64,pwrite64,poll,select,mmap,munmap,connect,accept,accept4,futex,epoll_wait,epoll_pwait,clone,execve,io_uring. Metrics:system.syscall.latency.{total,count,max,slow_count}tagged withsyscall:andcontainer_id:.mutex_lock/mutex_unlockprobes. Requires kernel ≥ 5.14 + BTF./proc/pressure/{cpu,memory,io}and emitssystem.pressure.*gauges. Pure Go, no eBPF, available on any kernel with PSI enabled.Syscall latency design notes
Per-container tagging: the eBPF stats map is keyed by
(cgroup_name[128], slot)rather than a flat per-CPU array indexed by slot.get_cgroup_name()is called atsys_enterand stored intid_entryalongside the timestamp. On the Go side,cgroups.ContainerFilterextracts a container ID from the cgroup leaf name; host-level entries emit without acontainer_idtag.arm64 compat:
classify_syscall()has two arch-guarded switch tables (bpf_target_x86/bpf_target_arm64) with correct syscall numbers for each arch (e.g.readis nr 0 on x86_64, nr 63 on arm64). Unsupported arches get a#errorat compile time.Kernel requirement: raw tracepoints (
sys_enter/sys_exit) require kernel ≥ 4.17. No BTF required.Test plan
go build -tags linux_bpf ./pkg/collector/corechecks/ebpf/...— cleango build -tags linux_bpf ./cmd/system-probe/...— cleanTestCgoAlignment_ebpfSyscallStatspasses (struct layout verified against C)pressure_linux_test.go)read,write,futex,epoll_pwait,mmap,munmap,clone,execve,select,poll,connect,accept4in a 3-second window🤖 Generated with Claude Code