Skip to content

fix(guest): bound thread spawns and reduce stack size in vm-agent#153

Open
AprilNEA wants to merge 1 commit intomasterfrom
fix/agent-bounded-threads
Open

fix(guest): bound thread spawns and reduce stack size in vm-agent#153
AprilNEA wants to merge 1 commit intomasterfrom
fix/agent-bounded-threads

Conversation

@AprilNEA
Copy link
Copy Markdown
Member

Summary

  • Add MAX_ACTIVE_CONNECTIONS (64) semaphore to cap concurrent connection threads
  • Reduce thread stack from 8MB to 1MB via thread::Builder::stack_size
  • Prevents guest OOM under exec bursts (e.g., docker health checks)

Test plan

  • docker exec burst doesn't OOM the guest

Closes ABX-238

The accept loops in vm-agent spawned unbounded threads with the default
8 MB stack for every incoming vsock connection.  Under exec bursts
(e.g. Docker health-check storms) this could exhaust guest VM memory.

- Reduce per-thread stack to 1 MB via thread::Builder::stack_size
- Cap active connection-handling threads at 64 per listener using an
  Arc<AtomicUsize> semaphore; excess connections are logged and dropped
- Separate counters for exec (port 52) and file-I/O (port 53) listeners
Copilot AI review requested due to automatic review settings March 31, 2026 12:52
@linear
Copy link
Copy Markdown

linear bot commented Mar 31, 2026

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ae59559406

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +785 to +786
handler(conn_fd);
active_clone.fetch_sub(1, Ordering::Relaxed);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Release active slot even if connection handler panics

The active-connection counter is decremented only after handler(conn_fd) returns normally, so any panic inside a handler leaks one slot permanently. This is realistic because the handler path still contains expect/unwrap panics (for example empty start.cmd in handle_piped), and after enough panics the counter reaches 64 and the agent starts dropping all new connections even though no threads are running. Please ensure slot release happens on unwind as well (e.g., guard with Drop or wrap handler execution).

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants