Skip to content

fix futexctn tool for symbol resolution and infinite loop#5496

Open
dubeyabhishek wants to merge 2 commits intoiovisor:masterfrom
dubeyabhishek:master
Open

fix futexctn tool for symbol resolution and infinite loop#5496
dubeyabhishek wants to merge 2 commits intoiovisor:masterfrom
dubeyabhishek:master

Conversation

@dubeyabhishek
Copy link
Copy Markdown

Please find detailed description of "stack symbol resolution" and "infinite loop behavior in map cleanup" in corresponding commit message.

Abhishek Dubey added 2 commits March 22, 2026 21:03
Preload symbol cache entries in the wait loop of futexctn to handle
processes that exit before print_stack is invoked.

Previously, syms_cache__get_syms was called lazily at print time.
For short-lived processes that exit before Ctrl-C is delivered,
/proc/<tgid>/maps no longer exists at print time, causing
syms__load_pid to return NULL and stack traces to show
"failed to get syms" or "[unknown]" for all frames.

Fix this by preloading the symbol cache in wait loop throughout
the collection interval. This ensures symbols are loaded while
the traced process is still alive, and remain available in the
cache for symbol resolution even after the process has exited.

Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
Snapshot hists map keys before cleanup to bound the number of
entries to delete.

The cleanup loop iterated the hists map while the BPF program
continued inserting new entries in parallel. Passing a deleted
key back to bpf_map_get_next_key caused the kernel to return
the first available key, which could be a newly inserted entry,
making the loop non-terminating.

Fix this by snapshotting all existing keys into a heap-allocated
array before deletion. The cleanup loop then deletes only the
finite set of snapshotted keys, leaving any new entries inserted
by BPF during cleanup intact for the next interval.

Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix futexctn issues around user-space stack symbol resolution (especially for short-lived processes) and prevent an infinite loop during histogram map cleanup.

Changes:

  • Updated syms_cache__get_syms() behavior to try to refresh cached symbols on cache hits.
  • Changed futexctn histogram cleanup to snapshot keys before deleting them to avoid looping.
  • Added a symbol “preload” pass intended to populate the symbol cache ahead of stack printing.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
libbpf-tools/trace_helpers.c Modifies symbol cache lookup/refresh behavior in syms_cache__get_syms().
libbpf-tools/futexctn.c Adds symbol-cache preloading and changes map cleanup to delete snapshotted keys.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +808 to +813
* for caching symbols appearing after last
* pre-loading/caching cycle.
*/
syms = syms__load_pid(tgid);
if (syms)
syms_cache->data[i].syms = syms;
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syms_cache__get_syms() now calls syms__load_pid() even on a cache hit and overwrites data[i].syms without freeing the previously cached syms. This both defeats caching (re-reads /proc//maps on every call) and introduces an unbounded memory leak when the same tgid is looked up repeatedly.

Consider only loading when data[i].syms is NULL (true cache miss), or if you need a refresh, free the old syms before replacing it and add a guard to avoid reloading on every lookup.

Suggested change
* for caching symbols appearing after last
* pre-loading/caching cycle.
*/
syms = syms__load_pid(tgid);
if (syms)
syms_cache->data[i].syms = syms;
* Only (re)load symbols on a true cache miss for this
* tgid. This avoids re-reading /proc/<pid>/maps on
* every lookup and prevents leaking previously cached
* syms by overwriting the pointer without freeing it.
*/
if (!syms_cache->data[i].syms) {
syms = syms__load_pid(tgid);
if (syms)
syms_cache->data[i].syms = syms;
}

Copilot uses AI. Check for mistakes.
Comment on lines 401 to +412
/* main: poll */
while (1) {
/*
* preload sym cache with symbols from short lived processes
* needed later for stack trace printing
*/
struct hist_key lookup_key = { .pid_tgid = -1 }, next_key;
int fd = bpf_map__fd(obj->maps.hists);
while (!bpf_map_get_next_key(fd, &lookup_key, &next_key)) {
syms_cache__get_syms(syms_cache, next_key.pid_tgid >> 32);
lookup_key = next_key;
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new symbol-preload loop calls syms_cache__get_syms(syms_cache, ...) unconditionally, but syms_cache is only defined/initialized in the !USE_BLAZESYM build. With USE_BLAZESYM enabled this will not compile (undefined identifier), and even in that configuration preloading isn't needed.

Wrap this preload block in the same #ifndef USE_BLAZESYM (and optionally !env.summary) guard as the syms_cache usage elsewhere.

Copilot uses AI. Check for mistakes.
Comment on lines 401 to 416
/* main: poll */
while (1) {
/*
* preload sym cache with symbols from short lived processes
* needed later for stack trace printing
*/
struct hist_key lookup_key = { .pid_tgid = -1 }, next_key;
int fd = bpf_map__fd(obj->maps.hists);
while (!bpf_map_get_next_key(fd, &lookup_key, &next_key)) {
syms_cache__get_syms(syms_cache, next_key.pid_tgid >> 32);
lookup_key = next_key;
}

sleep(env.interval);
printf("\n");

Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preload block runs at the top of the polling loop before sleep(env.interval), but print_map() deletes all histogram keys at the end of each iteration. As a result, at the start of the next iteration the map is typically empty, and this preload scan won't see the keys collected during the just-finished sleep interval (which are the ones about to be printed).

If the intent is to cache symbols for the entries that will be printed, move the preload scan to after sleep() and immediately before print_map(), or fold the caching into print_map()'s key-iteration pass (before processes exit).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants