Skip to content

fix(docker): use raw container ID as fallback for networking teardown#155

Open
AprilNEA wants to merge 1 commit intomasterfrom
fix/container-teardown-fallback
Open

fix(docker): use raw container ID as fallback for networking teardown#155
AprilNEA wants to merge 1 commit intomasterfrom
fix/container-teardown-fallback

Conversation

@AprilNEA
Copy link
Copy Markdown
Member

Summary

  • resolve_canonical_from_uri() now returns the raw URI-extracted container ID as fallback instead of None when canonical resolution fails
  • Networking teardown (port forwarding + DNS) is always attempted, even when guest dockerd is unreachable
  • Worst case is a no-op cleanup with a non-matching key, which is far better than leaking resources

Test plan

  • All 54 unit tests and 5 integration tests pass
  • Stop container while guest dockerd is restarting — verify port forwarding cleaned up

Closes ABX-181

When resolve_canonical_from_uri() failed (e.g. guest dockerd unreachable
or container already removed), it returned None, causing stop_container,
kill_container, and remove_container to skip networking teardown entirely.
This leaked port forwarding listeners and DNS entries.

The start path uses canonical_id_or_fallback() which always returns a key,
so the registration key is deterministic. Teardown must use the same
fallback strategy: return the raw URI-extracted ID when canonical
resolution fails. Worst case is a no-op cleanup attempt, which is far
better than leaking resources.
Copilot AI review requested due to automatic review settings March 31, 2026 13:25
@linear
Copy link
Copy Markdown

linear bot commented Mar 31, 2026

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ec61a2adc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

"Failed to resolve canonical container ID; using raw ID for teardown"
);
None
Some(id)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid raw-ID fallback for rename canonical resolution

Returning Some(id) on canonical lookup failure changes behavior for rename_container, which also calls resolve_canonical_from_uri. In the failure case, a rename requested via short ID can now proceed with a non-canonical key: deregister_dns_by_id misses the original canonical entry, while the post-rename inspect can still succeed with the short ID and register a new DNS entry under that short key. This can leave stale DNS state and makes later cleanup by canonical ID incomplete; fallback-to-raw should be limited to teardown handlers (stop/kill/remove), not rename.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates Docker container lifecycle handling so host-side networking cleanup can proceed even when canonical container ID resolution fails, reducing the risk of leaked port-forwarding and DNS entries (ABX-181).

Changes:

  • resolve_canonical_from_uri() now falls back to the raw URI-extracted container ID when canonical resolution fails.
  • Updated warning message and function documentation to reflect the new fallback behavior.
Comments suppressed due to low confidence (1)

app/arcbox-docker/src/handlers/container.rs:392

  • The doc comment says VM readiness is ensured first, but the code ignores the result of ensure_vm_ready() (let _ = ...). Either handle/log the error (if readiness is required) or update the comment to reflect that readiness is only attempted/best-effort here.
/// Ensures VM readiness first, since [`resolve_canonical_id`] uses
/// `proxy_to_guest` directly (which does not call `ensure_vm_ready`).
async fn resolve_canonical_from_uri(state: &AppState, uri: &Uri) -> Option<String> {
    let id = extract_container_id(uri)?;
    // Ensure the VM is reachable before attempting the inspect call.
    let _ = state.runtime.ensure_vm_ready().await;
    match resolve_canonical_id(state, &id).await {

Comment on lines 388 to 401
async fn resolve_canonical_from_uri(state: &AppState, uri: &Uri) -> Option<String> {
let id = extract_container_id(uri)?;
// Ensure the VM is reachable before attempting the inspect call.
let _ = state.runtime.ensure_vm_ready().await;
match resolve_canonical_id(state, &id).await {
Some(canonical) => Some(canonical),
None => {
tracing::warn!(
container_id = %id,
"Failed to resolve canonical container ID; skipping networking teardown"
"Failed to resolve canonical container ID; using raw ID for teardown"
);
None
Some(id)
}
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says networking teardown is always attempted even when guest dockerd is unreachable, but callers like stop_container/kill_container/remove_container still ?-return on proxy(...) errors and only run teardown on successful responses. If dockerd/connection is down, these handlers will exit before reaching stop_port_forwarding_by_id/deregister_dns_by_id. Consider restructuring those handlers to run teardown in the error path too (using the fallback key) before returning the proxy error, so the stated behavior is actually achieved.

Copilot uses AI. Check for mistakes.
Comment on lines 388 to 400
async fn resolve_canonical_from_uri(state: &AppState, uri: &Uri) -> Option<String> {
let id = extract_container_id(uri)?;
// Ensure the VM is reachable before attempting the inspect call.
let _ = state.runtime.ensure_vm_ready().await;
match resolve_canonical_id(state, &id).await {
Some(canonical) => Some(canonical),
None => {
tracing::warn!(
container_id = %id,
"Failed to resolve canonical container ID; skipping networking teardown"
"Failed to resolve canonical container ID; using raw ID for teardown"
);
None
Some(id)
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces an important behavioral contract (fallback to raw URI ID when canonical resolution fails), but there’s no test covering that resolve_canonical_from_uri returns Some(raw_id) when resolve_canonical_id returns None. Consider adding a unit test by factoring the fallback decision into a pure helper, or by allowing AppState/connector to be mocked in tests.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants