Skip to content

fix: track agent in boulder state to fix session continuation (fixes #927)#1477

Open
kaizen403 wants to merge 3 commits intocode-yeongyu:devfrom
kaizen403:fix/boulder-agent-tracking
Open

fix: track agent in boulder state to fix session continuation (fixes #927)#1477
kaizen403 wants to merge 3 commits intocode-yeongyu:devfrom
kaizen403:fix/boulder-agent-tracking

Conversation

@kaizen403
Copy link
Contributor

@kaizen403 kaizen403 commented Feb 4, 2026

Summary

Fixes #927 - After session interruption during /start-work, when user types "continue", Prometheus (planner) was resuming instead of Sisyphus (executor), causing all subsequent delegate_task calls to get READ-ONLY directive injected.

Root Cause

The boulder.json file was missing an agent field to specify which agent should resume on continuation. The atlas hook hardcoded agent: "atlas" but this wasn't being persisted when the boulder state was created.

Changes

  • Add optional agent field to BoulderState interface in types.ts
  • Update createBoulderState() in storage.ts to accept optional agent parameter
  • Update start-work hook to set agent: 'atlas' when creating boulder state
  • Update atlas hook to use stored boulderState.agent (defaults to 'atlas') on continuation
  • Add tests for new agent field functionality

Testing

  • Added 2 new tests for createBoulderState with agent field
  • All 59 related tests pass
  • Full test suite: 2099 pass (10 pre-existing failures in MCP OAuth tests)
  • Typecheck passes

Summary by cubic

Persist the active agent in boulder.json so “continue” resumes the correct executor (atlas/sisyphus) instead of the planner (Prometheus). Fixes #927 and prevents READ-ONLY mode from being injected into delegate_task calls.

  • Bug Fixes
    • Added optional agent to BoulderState and createBoulderState(plan, session, agent?)
    • Set agent="atlas" when /start-work initializes boulder state
    • Idle handler now checks the last session agent against boulderState.agent (defaults to "atlas") and uses it for continuation; supports non-Atlas agents
    • Prometheus MD-only: prioritize boulder state agent over message files so sessions resume with the executor even after restarts
    • Added tests for agent field, backward compatibility, non-Atlas continuation, and boulder-over-message priority

Written for commit 38b40bc. Summary will update on new commits.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 5 files

Confidence score: 3/5

  • There is a concrete behavior mismatch in src/hooks/atlas/index.ts: continuation accepts boulderState.agent, but idle handler still early-returns unless last message agent is Atlas, which can block non-Atlas agents from continuing.
  • Given the medium severity and user-facing impact (continuations failing for agents like Sisyphus), there’s some merge risk despite being localized.
  • Pay close attention to src/hooks/atlas/index.ts - idle handler gating prevents non-Atlas continuations.
Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/hooks/atlas/index.ts">

<violation number="1" location="src/hooks/atlas/index.ts:571">
P2: Continuation injection now accepts `boulderState.agent`, but the idle handler still returns early unless the last message agent is Atlas. This blocks continuation for non-Atlas agents (e.g., Sisyphus), making the new agent parameter ineffective and preventing session resumption.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@kaizen403
Copy link
Contributor Author

@code-yeongyu hello again! please review this when you can :3 continuation now checks against boulderState.agent instead of hardcoding atlas

@code-yeongyu
Copy link
Owner

Hey @kaizen403, thanks for the PR! I have a question about the fix:

/start-work always creates boulder state with agent: "atlas". The existing isCallerOrchestrator() already checks lastAgent === "atlas", and the new logic checks lastAgent === boulderState.agent which is also "atlas". So the behavior should be identical for the current /start-work flow.

Could you clarify the exact reproduction scenario for Issue #927? Specifically:

  1. After session interruption and typing "continue", what does findNearestMessageWithFields() return as the last agent?
  2. Is the issue that the last agent becomes something other than "atlas" after interruption (e.g., due to compaction or message storage state)?

Want to make sure we understand the root cause before merging. Thanks!

…ode-yeongyu#927)

Add 'agent' field to BoulderState to track which agent (atlas) should
resume on session continuation. Previously, when user typed 'continue'
after interruption, Prometheus (planner) resumed instead of Sisyphus
(executor), causing all delegate_task calls to get READ-ONLY mode.

Changes:
- Add optional 'agent' field to BoulderState interface
- Update createBoulderState() to accept agent parameter
- Set agent='atlas' when /start-work creates boulder.json
- Use stored agent on boulder continuation (defaults to 'atlas')
- Add tests for new agent field functionality
Address code review: continuation was blocked unless last agent was Atlas,
making the new agent parameter ineffective. Now the idle handler checks if
the last session agent matches boulderState.agent (defaults to 'atlas'),
allowing non-Atlas agents to resume when properly configured.

- Add getLastAgentFromSession helper for agent lookup
- Replace isCallerOrchestrator gate with boulder-agent-aware check
- Add test for non-Atlas agent continuation scenario
…files

Root cause fix for issue code-yeongyu#927:
- After /plan → /start-work → interruption, in-memory sessionAgentMap is cleared
- getAgentFromMessageFiles() returns 'prometheus' (oldest message from /plan)
- But boulder.json has agent: 'atlas' (set by /start-work)

Fix: Check boulder state agent BEFORE falling back to message files
Priority: in-memory → boulder state → message files

Test: 3 new tests covering the priority logic
@kaizen403 kaizen403 force-pushed the fix/boulder-agent-tracking branch from 7d97a64 to 38b40bc Compare February 4, 2026 15:57
@kaizen403
Copy link
Contributor Author

@code-yeongyu you're right. and after spending quite a while. i found the actual root cause isn't in the atlas hook.. it's in prometheus-md-only.

to answer your questions:

  1. after interruption, findNearestMessageWithFields() isn't the problem. it's findFirstMessageWithAgent() which returns the oldest message (from /plan), not the newest.

  2. yes, after interruption the in-memory map is gone, so it falls back to message files and picks "prometheus" from the first message

i've updated the pr with the actual fix

@kaizen403
Copy link
Contributor Author

so.. what's happening is when you run /plan then /start-work, the session has messages from both agents. the switch to atlas is saved in memory. when the session crashes, memory is wiped. on "continue", the system checks message files and finds prometheus was first - so it thinks prometheus is running. wrong.

what i fixed:
boulder.json already stores which agent started the work (atlas). i just made prometheus-md-only check boulder.json before falling back to message files.

check order: memory -> boulder.json -> message files

added 3 tests to cover this. all passing

@kaizen403
Copy link
Contributor Author

appreciate you asking before merging btw. saved us from shipping an incomplete fix @code-yeongyu :3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Session interruption causes wrong agent (Prometheus) to resume instead of executor (Sisyphus)

2 participants