fix(docker): use raw container ID as fallback for networking teardown#155
fix(docker): use raw container ID as fallback for networking teardown#155
Conversation
When resolve_canonical_from_uri() failed (e.g. guest dockerd unreachable or container already removed), it returned None, causing stop_container, kill_container, and remove_container to skip networking teardown entirely. This leaked port forwarding listeners and DNS entries. The start path uses canonical_id_or_fallback() which always returns a key, so the registration key is deterministic. Teardown must use the same fallback strategy: return the raw URI-extracted ID when canonical resolution fails. Worst case is a no-op cleanup attempt, which is far better than leaking resources.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3ec61a2adc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "Failed to resolve canonical container ID; using raw ID for teardown" | ||
| ); | ||
| None | ||
| Some(id) |
There was a problem hiding this comment.
Avoid raw-ID fallback for rename canonical resolution
Returning Some(id) on canonical lookup failure changes behavior for rename_container, which also calls resolve_canonical_from_uri. In the failure case, a rename requested via short ID can now proceed with a non-canonical key: deregister_dns_by_id misses the original canonical entry, while the post-rename inspect can still succeed with the short ID and register a new DNS entry under that short key. This can leave stale DNS state and makes later cleanup by canonical ID incomplete; fallback-to-raw should be limited to teardown handlers (stop/kill/remove), not rename.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Updates Docker container lifecycle handling so host-side networking cleanup can proceed even when canonical container ID resolution fails, reducing the risk of leaked port-forwarding and DNS entries (ABX-181).
Changes:
resolve_canonical_from_uri()now falls back to the raw URI-extracted container ID when canonical resolution fails.- Updated warning message and function documentation to reflect the new fallback behavior.
Comments suppressed due to low confidence (1)
app/arcbox-docker/src/handlers/container.rs:392
- The doc comment says VM readiness is ensured first, but the code ignores the result of
ensure_vm_ready()(let _ = ...). Either handle/log the error (if readiness is required) or update the comment to reflect that readiness is only attempted/best-effort here.
/// Ensures VM readiness first, since [`resolve_canonical_id`] uses
/// `proxy_to_guest` directly (which does not call `ensure_vm_ready`).
async fn resolve_canonical_from_uri(state: &AppState, uri: &Uri) -> Option<String> {
let id = extract_container_id(uri)?;
// Ensure the VM is reachable before attempting the inspect call.
let _ = state.runtime.ensure_vm_ready().await;
match resolve_canonical_id(state, &id).await {
| async fn resolve_canonical_from_uri(state: &AppState, uri: &Uri) -> Option<String> { | ||
| let id = extract_container_id(uri)?; | ||
| // Ensure the VM is reachable before attempting the inspect call. | ||
| let _ = state.runtime.ensure_vm_ready().await; | ||
| match resolve_canonical_id(state, &id).await { | ||
| Some(canonical) => Some(canonical), | ||
| None => { | ||
| tracing::warn!( | ||
| container_id = %id, | ||
| "Failed to resolve canonical container ID; skipping networking teardown" | ||
| "Failed to resolve canonical container ID; using raw ID for teardown" | ||
| ); | ||
| None | ||
| Some(id) | ||
| } | ||
| } |
There was a problem hiding this comment.
The PR description says networking teardown is always attempted even when guest dockerd is unreachable, but callers like stop_container/kill_container/remove_container still ?-return on proxy(...) errors and only run teardown on successful responses. If dockerd/connection is down, these handlers will exit before reaching stop_port_forwarding_by_id/deregister_dns_by_id. Consider restructuring those handlers to run teardown in the error path too (using the fallback key) before returning the proxy error, so the stated behavior is actually achieved.
| async fn resolve_canonical_from_uri(state: &AppState, uri: &Uri) -> Option<String> { | ||
| let id = extract_container_id(uri)?; | ||
| // Ensure the VM is reachable before attempting the inspect call. | ||
| let _ = state.runtime.ensure_vm_ready().await; | ||
| match resolve_canonical_id(state, &id).await { | ||
| Some(canonical) => Some(canonical), | ||
| None => { | ||
| tracing::warn!( | ||
| container_id = %id, | ||
| "Failed to resolve canonical container ID; skipping networking teardown" | ||
| "Failed to resolve canonical container ID; using raw ID for teardown" | ||
| ); | ||
| None | ||
| Some(id) | ||
| } |
There was a problem hiding this comment.
This change introduces an important behavioral contract (fallback to raw URI ID when canonical resolution fails), but there’s no test covering that resolve_canonical_from_uri returns Some(raw_id) when resolve_canonical_id returns None. Consider adding a unit test by factoring the fallback decision into a pure helper, or by allowing AppState/connector to be mocked in tests.
Summary
resolve_canonical_from_uri()now returns the raw URI-extracted container ID as fallback instead ofNonewhen canonical resolution failsTest plan
Closes ABX-181