Skip to content

fix(cron): route cron jobs to correct agent and publish response to channel#1839

Open
securityguy wants to merge 38 commits intosipeed:mainfrom
securityguy:fix/cron-peer-routing
Open

fix(cron): route cron jobs to correct agent and publish response to channel#1839
securityguy wants to merge 38 commits intosipeed:mainfrom
securityguy:fix/cron-peer-routing

Conversation

@securityguy
Copy link
Contributor

Problem

Two separate bugs prevented cron jobs from delivering agent responses to the correct channel:

1. Peer binding never matched for cron jobs

ProcessDirectWithChannel constructed an InboundMessage but never set the Peer field. As a result, bindings that match on peer (the standard pattern for routing messages to a specific agent by channel) never fired — cron jobs always fell through to the default agent regardless of the configured binding.

2. Agent response was silently discarded

Even after routing worked, the agent's response was thrown away. In CronTool.ExecuteJob, the deliver=false branch called ProcessDirectWithChannel and then:

// Response is automatically sent via MessageBus by AgentLoop
_ = response // Will be sent by AgentLoop

This comment is incorrect. ProcessDirectWithChannel bypasses the Run loop entirely — nothing ever published the response to the bus. The deliver=true and command branches already called msgBus.PublishOutbound directly; the deliver=false branch simply forgot to.

Fix

pkg/agent/loop.go — set Peer on the inbound message in ProcessDirectWithChannel so peer-based bindings resolve correctly:

if chatID != "" && chatID != "direct" {
    msg.Peer = bus.Peer{Kind: "channel", ID: chatID}
}

pkg/tools/cron.go — publish the agent response after ProcessDirectWithChannel returns:

if response != "" {
    pubCtx, pubCancel := context.WithTimeout(context.Background(), 5*time.Second)
    defer pubCancel()
    t.msgBus.PublishOutbound(pubCtx, bus.OutboundMessage{
        Channel: channel,
        ChatID:  chatID,
        Content: response,
    })
}

securityguy and others added 30 commits March 15, 2026 20:53
Some providers (via OpenRouter) reject assistant messages with
"content": "" alongside tool_calls. The OpenAI spec permits content to
be absent when tool_calls is set. Switch openaiMessage.Content from
string to *string with omitempty and introduce msgContent() to return
nil when content is empty and tool calls are present.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ields

Some OpenAI-compatible providers (e.g. OpenRouter routing to strict
backends) reject non-standard fields in the request body such as
reasoning_content in messages and extra_content / thought_signature
in tool calls. Add a per-model strict_compat: true config option that
strips these fields before serialization.

Implementation:
- Add StrictCompat bool to config.ModelConfig
- Add WithStrictCompat option to openai_compat.Provider
- Refactor HTTPProvider constructors into a single NewHTTPProviderWithOptions
  using variadic openai_compat.Option, eliminating the growing list of
  named constructors
- Thread StrictCompat through CreateProviderFromConfig via composed options

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the claude CLI exits with a non-zero status, the previous error
handler only checked stderr. However, the CLI writes its output
(including error details) to stdout, especially when invoked with
--output-format json. This left the caller with only "exit status 1"
and no actionable information.

Now includes both stderr and stdout in the error message so the actual
failure reason is visible in logs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add claude-cli and codex-cli to the supported vendors table and
include vendor-specific configuration examples explaining:
- No API key is required (uses existing CLI subscription)
- The claude-code sentinel model ID skips --model flag so the CLI
  uses its own configured default model

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add channels.telegram_bots config allowing multiple Telegram bot tokens
to be configured, each mapped to a separate channel (e.g. telegram-amber,
telegram-karen). Each channel can be independently bound to an agent via
the bindings config, enabling distinct AI personas behind separate bots.

Backward compatibility is preserved: the existing channels.telegram
single-entry config continues to work unchanged. On load it is normalized
into telegram_bots as an entry with id "default", which produces the
channel name "telegram" so all existing bindings remain valid.

Key changes:
- config: add TelegramBotConfig struct with ChannelName/AsTelegramConfig
  helpers; add TelegramBots field to ChannelsConfig; normalize legacy
  single entry into list on load
- telegram: add NewTelegramChannelFromConfig constructor accepting
  TelegramConfig + explicit channel name (avoids import cycle)
- channels: add TelegramBotFactory registry; add injectChannelDependencies
  helper to eliminate injection code duplication; add duplicate channel
  name guard in initTelegramBot; update initChannels to iterate over
  TelegramBots; add prefix-based rate limit fallback for named bots

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…le.json

Add two disabled example bots (alice, bob) under channels.telegram_bots
and corresponding top-level bindings to illustrate how multiple Telegram
bots map to separate named channels and agents.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds GeminiCliProvider that wraps the Gemini CLI as a subprocess,
following the same pattern as the existing claude-cli and codex-cli
providers.

The provider invokes:
  gemini --yolo --output-format json --prompt ""

with the prompt sent via stdin. The --prompt "" flag enables
non-interactive (headless) mode, reading the full prompt from stdin.

Key details:
- Model sentinel: "gemini-cli" skips --model flag (uses CLI default)
- Explicit model: "gemini-cli/gemini-2.5-pro" passes --model gemini-2.5-pro
- System messages prepended to stdin (no --system-prompt flag in gemini)
- Parses JSON response format: {"response": "...", "stats": {"models": {...}}}
- Token usage summed across all models in stats.models (gemini uses
  multiple internal models per request)
- Tool calls extracted from response text using shared extractToolCallsFromText
- New protocol: "gemini-cli" / alias "geminicli"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add PR sipeed#1633 (gemini-cli provider) to contributions table
- Add configuration guide section covering:
  - claude-cli, codex-cli, and gemini-cli providers with model_list examples
  - Multiple Telegram bots with bindings and per-agent config
  - Agent workspace and personality file notes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds GeminiCliProvider (PR sipeed#1633 against sipeed/picoclaw).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace Amber/Karen with Alice/Bob in all README examples for consistency.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously all agents shared a single LLMProvider instance created from
agents.defaults.model_name. Per-agent model config (agents.list[].model)
only changed the model string passed to Chat() — it never changed which
provider binary was invoked. This caused cross-provider fallback chains
(e.g. gemini-cli falling back to claude-cli) to fail, and made it
impossible to assign different CLI providers to different agents.

Introduces ProviderDispatcher which lazily creates and caches provider
instances keyed by "protocol/modelID". The fallback chain's run closure
now resolves the correct provider via the dispatcher before falling back
to agent.Provider for backward compatibility.

References sipeed#1634

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Brings in ProviderDispatcher fix (PR sipeed#1637 against sipeed/picoclaw).
References sipeed#1634.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ndling

Tool call detection previously relied on a literal strings.Index for
'{"tool_calls"' which failed whenever the LLM returned pretty-printed
JSON (newline after '{') or wrapped the output in markdown code fences.
Arguments typed as a JSON object instead of an encoded string also
caused a silent parse failure and leaked the raw JSON block to the user.

Changes:
- Strip markdown code fences (` + "```json" + ` / ` + "```" + `) before parsing
- Locate JSON candidate via first '{' / last '}' instead of literal match
- Unmarshal directly and check for top-level "tool_calls" key
- Accept arguments as either a JSON-encoded string or a plain JSON object
- Remove dead findMatchingBrace function and its tests
- Publish response.Content to the user immediately when a response
  contains both text and tool calls (previously the text was silently
  discarded into session history)
- Fix pre-existing test bug: TestCreateProvider_GeminiCliDefaultWorkspace
  now clears Agents.Defaults.Workspace before testing the '.' fallback

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
requiresRuntimeProbe and probeLocalModelAvailability handled claude-cli
and codex-cli but not gemini-cli, causing the launcher to report
"default model has no credentials configured" and skip auto-start when
gemini-cli was set as the default model.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TryAutoStartGateway only checked gateway.cmd, which tracks processes the
launcher itself spawned. A gateway managed externally (e.g. via systemd)
was invisible to this check, causing two problems:
  1. The launcher started a duplicate gateway instance on every launch.
  2. The WebUI showed "Gateway Not Running" even when it was healthy.

Fix: probe the gateway health endpoint in two places:
  - TryAutoStartGateway: skip auto-start if the health endpoint responds.
  - gatewayStatusData: report "running" when the launcher has no owned
    process but the health endpoint is responding. Launcher-owned
    transition states (restarting/error) take precedence over the probe.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Brings in launcher external gateway detection fix (PR sipeed#1811 against sipeed/picoclaw).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The --system-prompt flag exposed agent instructions and tool definitions
in the process argument list (visible via ps to all users on the host)
and risked hitting OS ARG_MAX limits when many tools are registered.

System prompt content is now prepended to the stdin payload, separated
from the user message by a --- delimiter. This is consistent with how
the gemini-cli and codex-cli providers already handle all input.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… allowlist on self-spawn

Previously SubagentManager was initialised with the global provider and
the calling agent's model name string (e.g. "openrouter-gpt-5.4").
When the global provider happened to be claude-cli this caused it to be
invoked with --model openrouter-gpt-5.4, which claude does not recognise,
blowing up every spawn attempt.

Two problems fixed together:

1. Provider dispatch: SubagentManager now holds the ProviderDispatcher and
   the calling agent's model candidates. When a subagent is spawned it
   resolves the correct provider through the same per-candidate dispatch
   used by the main agent loop. When agent_id names a different agent,
   that agent's candidates are resolved via a registry callback so the
   subagent runs with the target agent's configured model (e.g. spawning
   "karen" uses karen's claude-cli, not amber's openrouter).

2. Self-spawn allowlist: the allowlist check previously only ran when
   agent_id was explicitly set. Empty agent_id (self-spawn) now resolves
   to the caller's own ID before the check, so allow_agents: ["karen"]
   on amber correctly rejects an unqualified spawn attempt.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
securityguy and others added 8 commits March 19, 2026 23:10
When a named subagent (e.g. agent_id="karen") completes a task, its
response is now prefixed with the agent's name in bold so the user
can tell which agent produced the result. Self-spawns (no agent_id)
are unaffected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The list command previously showed only name, ID, schedule, status, and
next run time. Channel, recipient, deliver mode, and message/command were
hidden, making it impossible to tell from the CLI what a job would do or
which agent would handle it.

All payload fields are now displayed. Messages longer than 80 characters
are truncated with an ellipsis. Jobs with a command show the command
instead of a message.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ectly

Cron jobs fired via ProcessDirectWithChannel had no Peer set on the
InboundMessage, so the route resolver never matched peer-based bindings
and always fell through to the default agent.

Setting Peer{Kind: "channel", ID: chatID} when a real chatID is present
means a cron job targeted at a specific Slack channel ID will now be
routed to whichever agent has a matching peer binding for that channel,
consistent with how live inbound messages are routed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a cron job with deliver=false runs, the agent response was silently
discarded with a misleading comment "Will be sent by AgentLoop". The Run
loop is never involved in cron execution — ProcessDirectWithChannel bypasses
it entirely. As a result, Karen's response to scheduled tasks was computed
but never delivered to Slack (or any other channel).

Fix: explicitly publish the response via msgBus.PublishOutbound after
ProcessDirectWithChannel returns, consistent with how command and
deliver=true jobs already handle their output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: agent domain: corn go Pull requests that update go code type: bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant