Skip to content

[Infra] Agent Loop Termination Does Not Trigger Process Exit (Zombie State) #39

@lhb-goku

Description

@lhb-goku

Description:
When the agent loop encounters a critical error (e.g., panic, infinite loop detection) and breaks out of its Run() loop, the main Go process continues to run. Because the process does not exit (exit code 0), Docker's restart: unless-stopped policy is never triggered.

This leaves the agent in a "Zombie State":
• The container is Up.
• The HTTP server/Gateway is active.
• The Agent is removed from the Router.
• Incoming messages result in agent not found.

Logs:
Agent loop breaks/crashes ... Later requests fail time=2026-03-02T12:25:04.670+07:00 level=WARN msg="inbound: agent not found" agent=default channel=telegram

Root Cause:
• In internal/agent/loop.go (or similar), critical errors use break to exit the loop but do not signal the main application to shut down.

Proposed Solution:
• Implement a Health Check mechanism. If the main agent loop dies, the application should call os.Exit(1).
• Or, propagate the context cancellation up to main.go to trigger a graceful shutdown, allowing the orchestrator (Docker/K8s) to restart the pod/container.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions