feat(workflows): add loop.command for loading loop prompts from files (#1759)#1789
feat(workflows): add loop.command for loading loop prompts from files (#1759)#1789marc0der wants to merge 12 commits into
Conversation
📝 WalkthroughWalkthroughAdds support for loading loop iteration prompts from command files via ChangesLoop Node Command File Feature
Sequence DiagramsequenceDiagram
participant Loader
participant Validator
participant Executor
participant loadCommandPrompt
participant AI
Loader->>Loader: Parse workflow YAML
Loader->>Validator: Validate schema (exactly one of prompt/command)
Validator->>Validator: Resolve/validate loop.command resource
Executor->>loadCommandPrompt: Load command file once at node start
loadCommandPrompt-->>Executor: Return command text template
loop For each iteration
Executor->>Executor: Substitute variables into template
Executor->>AI: Send substituted prompt
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/workflows/src/dag-executor.ts`:
- Around line 1791-1836: The executeLoopNode function currently can emit a
node_failed event on command-load failure without first emitting node_started
and omits the command name in the workflow event payload; fix by emitting a
node_started workflow event (matching other nodes' shape: type: 'node_started',
runId: workflowRun.id, nodeId: node.id, nodeName: node.id) immediately at the
start of executeLoopNode (before resolving loop.prompt/loop.command), and when
handling promptResult.failure (the branch that logs
'loop_node.command_load_failed'), include the failing command string
(loop.command) in the createWorkflowEvent data alongside the error (e.g., {
error: errMsg, command: loop.command }) and also include it in the emitted
getWorkflowEventEmitter() payload so structured logs and events carry the same
context.
In `@packages/workflows/src/schemas/loop.ts`:
- Around line 49-53: The schema currently trims only for validation but leaves
the stored value untrimmed, causing later resolution to fail; update the
loop.command schema to normalize the input (trim and possibly collapse
whitespace) before validation by using a z.preprocess or z.string().transform
that returns (val as string).trim(), then run isValidCommandName against that
normalized value (referencing the hasCommand check and isValidCommandName usage)
and ensure ctx.addIssue message and path reflect the trimmed value; this will
store the normalized command in the parsed output so downstream
resolution/execution sees the trimmed name.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2f039a14-0167-4790-9413-e40fd652b6e4
📒 Files selected for processing (11)
packages/docs-web/src/content/docs/guides/loop-nodes.mdpackages/web/src/lib/api.generated.d.tspackages/web/src/lib/dag-layout.test.tspackages/web/src/lib/dag-layout.tspackages/workflows/src/dag-executor.test.tspackages/workflows/src/dag-executor.tspackages/workflows/src/loader.test.tspackages/workflows/src/loader.tspackages/workflows/src/schemas/loop.tspackages/workflows/src/validator.test.tspackages/workflows/src/validator.ts
Whitespace-padded values like " my-cmd " previously passed parse-time validation (the superRefine trimmed for isValidCommandName) but were stored untrimmed, so downstream loadCommandPrompt looked up the literal padded filename and failed at runtime with a confusing "not found" diagnostic. Normalize at the Zod schema (z.string().trim()) so the parsed value matches what resolution sees, and the existing parse-time errors remain the actionable surface. Caught by CodeRabbit on PR coleam00#1789. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The structured log on loop.command load failure already carries the failing command, but the workflow event written to the store did not — event-stream consumers (web UI, run inspectors, downstream automation) saw only the error string. Mirror the log context onto the event so both observability paths surface the same diagnostic. Caught by CodeRabbit on PR coleam00#1789. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…executors executeLoopNode never emitted a node_started workflow event — neither on the success path (only the per-iteration loop_iteration_started fires, then node_completed at the end) nor on the new loop.command load-failure path. That breaks the project-wide event-pairing rule (CLAUDE.md: "Always pair _started with _completed or _failed") and was visible to event-stream consumers as a loop node that just appeared in node_failed without warning. Mirror executeBashNode and executeScriptNode: log dag_node_started, write the node_started workflow event (carrying the optional loop.command name in data so the start event captures the same context the failure event does), and emit the in-process WorkflowEmitterEvent. The outer DAG dispatcher already delegates the start event to each per-node executor, so no double emission. Caught by CodeRabbit on PR coleam00#1789. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Actionable comments posted: 0 |
e72e63e to
f673a61
Compare
Whitespace-padded values like " my-cmd " previously passed parse-time validation (the superRefine trimmed for isValidCommandName) but were stored untrimmed, so downstream loadCommandPrompt looked up the literal padded filename and failed at runtime with a confusing "not found" diagnostic. Normalize at the Zod schema (z.string().trim()) so the parsed value matches what resolution sees, and the existing parse-time errors remain the actionable surface. Caught by CodeRabbit on PR coleam00#1789. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The structured log on loop.command load failure already carries the failing command, but the workflow event written to the store did not — event-stream consumers (web UI, run inspectors, downstream automation) saw only the error string. Mirror the log context onto the event so both observability paths surface the same diagnostic. Caught by CodeRabbit on PR coleam00#1789. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…executors executeLoopNode never emitted a node_started workflow event — neither on the success path (only the per-iteration loop_iteration_started fires, then node_completed at the end) nor on the new loop.command load-failure path. That breaks the project-wide event-pairing rule (CLAUDE.md: "Always pair _started with _completed or _failed") and was visible to event-stream consumers as a loop node that just appeared in node_failed without warning. Mirror executeBashNode and executeScriptNode: log dag_node_started, write the node_started workflow event (carrying the optional loop.command name in data so the start event captures the same context the failure event does), and emit the in-process WorkflowEmitterEvent. The outer DAG dispatcher already delegates the start event to each per-node executor, so no double emission. Caught by CodeRabbit on PR coleam00#1789. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…prompt A loop node can now load its iteration prompt from a named command file (`loop.command: <name>`) instead of inlining it as `loop.prompt`. The schema enforces exactly one of the two. The loaded file is read once at node start and reused for every iteration; substitution semantics (`$LOOP_PREV_OUTPUT`, `$LOOP_USER_INPUT`, `$nodeId.output`, etc.) are unchanged. A bad reference fails the node with an actionable error before iteration 1. This mirrors the existing `prompt:` ⇄ `command:` relationship at the node level, so the longest/most-reusable loop prompts (Ralph-style implement loops) can live as Markdown files instead of being inlined in YAML. Refs specs/loop-command.md (coleam00#1759).
Adds Level 3 (resource resolution) checks for loop.command parallel to existing command-node checks: invalid name, unresolved file, and "did you mean…" suggestions, all surfaced before a workflow runs and labelled with field 'loop.command'. Reuses availableCommands already computed at the top of the validator loop. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Run after the workflow Zod schema gained an optional loop.command sibling to loop.prompt (items 1/2/4). Regenerated via `bun --filter @archon/web generate:types` against `bun run dev:server` on port 3090, then `bun x prettier --write` to keep the committed file in repo style and reduce the diff to the actual schema delta. Net effect: components['schemas']['DagNode']['loop'] now has prompt and command both optional, unblocking web-side consumers (next: canvas label for command-backed loops).
`resolveNodeDisplay` now branches inside the `'loop' in dn` block on
`dn.loop?.command`: when a loop node carries `loop.command`, it returns
`{ label: dn.loop.command, nodeType: 'loop' }` (no `promptText`), so the
read-only builder canvas shows the command name as the node's label —
mirroring how `command:` nodes display today. Inline-prompt loops keep
their existing `{ label: 'Loop', promptText }` shape unchanged.
`DagNodeComponent` needs no change: its `'loop'` case in
`getContentPreview` reads `promptText?.split('\n')[0] ?? ''`, which is
empty for command-backed loops (same effective preview as command
nodes, whose label already lives in the header). The `LOOP` badge and
loop stripe stay — only the label text changes — because the node is
still semantically a loop.
Closes the web-side acceptance criterion in specs/loop-command.md:
"the builder canvas labels a command-backed loop by its command name."
…er, unsafe name, ref-scan Lock the loader's loop.command behaviour with five cases inside 'describe(loop node parsing)': cleanly parses loop.command on its own, rejects both-present with an 'exactly one' message that names both fields, rejects neither-present with both alternatives named (so authors discover loop.command exists, not just the legacy loop.prompt), rejects '../escape' with 'invalid command name', and regression-guards the \$nodeId.output ref scanner so a command-backed loop neither crashes nor hides a sibling's reference to its output. Adds 5 tests (126/126 in loader.test.ts; was 121). Pairs with the schema + loader + executor + validator changes from earlier items in specs/loop-command.md.
Mirror the Level-3 command-node coverage for the new loop.command branch: repo-local hit, missing-target with suggestions, unsafe name guard, bundled-default fallback, and home-scope (ARCHON_HOME) resolution. Pure test addition — pins the behaviour landed in the validator change so a refactor cannot silently drop the defense-in-depth isValidCommandName check or the bundled/home resolution paths.
Pin the runtime behaviour of command-backed loop nodes with five tests in the existing `loop node execution` block: - read-once invariant: writes a command file, deletes it synchronously inside iter 1's mock generator, asserts iter 2 still runs from the in-memory template (no node_failed / loop_iteration_failed events). - fail-fast paths: missing target, empty target, and unsafe-name (../escape) each return before any sendQuery call and emit node_failed with the actionable diagnostic. The unsafe-name case bypasses the loop schema's superRefine via an "as unknown as DagNode" cast so the executor's defense-in-depth branch is exercised directly. - substitution: command-file body contains LOOP_PREV_OUTPUT and LOOP_USER_INPUT placeholders; iter 1 substitutes both to empty, iter 2 picks up iter 1's cleaned output for PREV while USER stays empty (non-interactive). Proves the loaded text flows through substituteWorkflowVariables identically to inline loop.prompt. Adds unlinkSync from 'fs' for the mid-generator deletion. bun test packages/workflows/src/dag-executor.test.ts now reports 250/250 (was 245, +5). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a top-line note pointing readers at both authoring shapes, surfaces `command:` in the Configuration Fields YAML block with the "exactly one" rule inline, and inserts a `### command` subsection that covers repo/home/bundled resolution, command-name safety, load-once-on-node-start semantics, fail-fast on missing/empty/unreadable targets, and parity with inline `prompt` for variable substitution. Worked example mirrors the spec's `archon-ralph-implement` running scenario. Closes the last spec acceptance criterion for `loop.command`.
Whitespace-padded values like " my-cmd " previously passed parse-time validation (the superRefine trimmed for isValidCommandName) but were stored untrimmed, so downstream loadCommandPrompt looked up the literal padded filename and failed at runtime with a confusing "not found" diagnostic. Normalize at the Zod schema (z.string().trim()) so the parsed value matches what resolution sees, and the existing parse-time errors remain the actionable surface. Caught by CodeRabbit on PR coleam00#1789. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The structured log on loop.command load failure already carries the failing command, but the workflow event written to the store did not — event-stream consumers (web UI, run inspectors, downstream automation) saw only the error string. Mirror the log context onto the event so both observability paths surface the same diagnostic. Caught by CodeRabbit on PR coleam00#1789. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…executors executeLoopNode never emitted a node_started workflow event — neither on the success path (only the per-iteration loop_iteration_started fires, then node_completed at the end) nor on the new loop.command load-failure path. That breaks the project-wide event-pairing rule (CLAUDE.md: "Always pair _started with _completed or _failed") and was visible to event-stream consumers as a loop node that just appeared in node_failed without warning. Mirror executeBashNode and executeScriptNode: log dag_node_started, write the node_started workflow event (carrying the optional loop.command name in data so the start event captures the same context the failure event does), and emit the in-process WorkflowEmitterEvent. The outer DAG dispatcher already delegates the start event to each per-node executor, so no double emission. Caught by CodeRabbit on PR coleam00#1789. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The loop.command rebase makes loop.prompt optional on the wire, but the console builder only models prompt-based loops. Coalesce to the empty default so the round-trip stays type-correct. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
dfe5147 to
ac6f103
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/web/src/experiments/console/builder/variants/loop.ts`:
- Around line 22-26: The loop conversion functions loopFromDag and loopToDag
create a lossy round-trip for command-backed loops. In loopFromDag, preserve the
command field from the input instead of collapsing all command-backed loops to
an empty prompt string. In loopToDag, serialize the command field alongside or
instead of prompt based on which one is present. Add an exactly-one-of
constraint to the builder's loop model to enforce that either prompt or command
is specified, but not both and not neither, matching the schema contract used in
workflows.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 85f92223-3a34-4562-baa1-6736b2137f5f
📒 Files selected for processing (12)
packages/docs-web/src/content/docs/guides/loop-nodes.mdpackages/web/src/experiments/console/builder/variants/loop.tspackages/web/src/lib/api.generated.d.tspackages/web/src/lib/dag-layout.test.tspackages/web/src/lib/dag-layout.tspackages/workflows/src/dag-executor.test.tspackages/workflows/src/dag-executor.tspackages/workflows/src/loader.test.tspackages/workflows/src/loader.tspackages/workflows/src/schemas/loop.tspackages/workflows/src/validator.test.tspackages/workflows/src/validator.ts
💤 Files with no reviewable changes (10)
- packages/web/src/lib/api.generated.d.ts
- packages/web/src/lib/dag-layout.test.ts
- packages/web/src/lib/dag-layout.ts
- packages/workflows/src/loader.test.ts
- packages/workflows/src/validator.ts
- packages/workflows/src/loader.ts
- packages/workflows/src/validator.test.ts
- packages/workflows/src/dag-executor.ts
- packages/workflows/src/dag-executor.test.ts
- packages/workflows/src/schemas/loop.ts
✅ Files skipped from review due to trivial changes (1)
- packages/docs-web/src/content/docs/guides/loop-nodes.md
| return { | ||
| prompt: loop.prompt, | ||
| // `prompt` is optional on the wire (a loop may use `command` instead); the | ||
| // builder only models prompt-based loops, so fall back to the empty default. | ||
| prompt: loop.prompt ?? '', | ||
| until: loop.until, |
There was a problem hiding this comment.
Command-backed loops are lossy in builder round-trips
loopFromDag (Line 25) collapses command-backed loops to prompt: '', and loopToDag (Line 41) always serializes prompt while never re-emitting loop.command. This can silently rewrite a valid loop.command node into a prompt-based node on save.
Suggested direction
- export interface LoopNodeData {
- prompt: string;
+ export interface LoopNodeData {
+ prompt?: string;
+ command?: string;
until: string;
max_iterations: number;
fresh_context: boolean;
until_bash?: string;
interactive?: boolean;
gate_message?: string;
} return {
- prompt: loop.prompt ?? '',
+ prompt: loop.prompt,
+ command: loop.command,
until: loop.until,
max_iterations: loop.max_iterations,
fresh_context: loop.fresh_context,
...
}; loop: {
- prompt: data.prompt,
+ ...(data.command ? { command: data.command } : { prompt: data.prompt ?? '' }),
until: data.until,
max_iterations: data.max_iterations,
fresh_context: data.fresh_context,
...
}Also enforce exactly-one-of in the builder model (same schema contract as workflows).
Also applies to: 38-42
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/web/src/experiments/console/builder/variants/loop.ts` around lines
22 - 26, The loop conversion functions loopFromDag and loopToDag create a lossy
round-trip for command-backed loops. In loopFromDag, preserve the command field
from the input instead of collapsing all command-backed loops to an empty prompt
string. In loopToDag, serialize the command field alongside or instead of prompt
based on which one is present. Add an exactly-one-of constraint to the builder's
loop model to enforce that either prompt or command is specified, but not both
and not neither, matching the schema contract used in workflows.
Summary
loop.prompt), with no way to externalise it to a command file — even though every non-loop AI prompt has this escape hatch viacommand:nodes (loadCommandPrompt).archon-ralph-dag.yamlinlines ~460 lines in itsimplementloop for this reason.loop.commandfield that loads the iteration prompt from a command file via the existingloadCommandPromptresolver. Mutually exclusive withloop.prompt— exactly one is required. Schema, executor, validator, web types, canvas label, docs, and tests at every layer.prompt_file:for other node types. No visual loop-node editor (none exists today — builder involvement is limited to API types + canvas label). No DB changes, no new external calls.UX Journey
Before
After
Architecture Diagram
Before
After
Connection inventory:
executeLoopNode(dag-executor.ts)loadCommandPrompt(executor-shared.ts)command:nodes use; read once before iteration looploopNodeConfigSchema(schemas/loop.ts)isValidCommandName(command-validation.ts)validateWorkflowResources(validator.ts)resolveCommand+findSimilarvalidateDagStructure(loader.ts)loop.promptonly$nodeId.outputref scan skips command-loaded text (parse-time can't read it)resolveNodeDisplay(web/dag-layout.ts)dn.loop.commandapi.generated.d.tsloop.{prompt?, command?}Label Snapshot
risk: lowsize: Mworkflows|web|docs|testsworkflows:loopChange Metadata
featureworkflowsLinked Issue
Validation Evidence (required)
bun run validateexit 0 locally; live end-to-end test documented under Human Verification.Security Impact (required)
loop.commandresolves through the sameloadCommandPromptresolvercommand:nodes already use (repo → home → bundled), with the sameisValidCommandNamepath-traversal validation. Nothing new is reachable.Compatibility / Migration
loop.promptkeeps working unchanged.Human Verification (required)
What was personally validated beyond CI:
ralph-wiggumArchon workflow (an autonomous plan/build loop) has twoloop:nodes whose inline prompts run ~55 and ~75 lines. I extracted those bodies into.archon/commands/ralph-plan.mdandralph-build.md, rewired the workflow to useloop.command: ralph-plan/loop.command: ralph-build, hard-reset a throwaway branch to immediately after the spec commit / before the implementation of a real feature (semverish-version-validation), then ran the converted workflow against that state from a local source build of this branch. Both loops loaded their command files, drove real iterations with full variable substitution ($LOOP_PREV_OUTPUT,$LOOP_USER_INPUT,$ARGUMENTS,$nodeId.output), and produced the expected per-iteration behaviour. Noloop.commandfailures, parse errors, or substitution issues observed.prompt-and-commandand neither-defined reject at parse time with field-targeted errors (loader tests). Unsafe command names (e.g.../escape) reject at both parse time and validate time. Missing / empty / unreadable command targets fail the node fast with actionable messages mirroring the command-node failure shape. Read-once invariant proven behaviourally by deleting the source file mid-iteration and confirming subsequent iterations still complete.check:bundledand the embedded-defaults regeneration pipeline are covered by CI.Side Effects / Blast Radius (required)
@archon/workflows(schema, loader, validator, executor) and@archon/web(regenerated API types + canvas label). No backend route, DB, or other-package surface.command:node path so closely that any regression would surface in the existing command-node tests too.node_failedon any runtime resolution failure (same observability shape as a missingcommand:node file).Rollback Plan (required)
loop.commandis opt-in per node. Removing it from a workflow falls back toloop.promptsemantics.loop.commandwould fail at validation time ('<name>' command not found) or with a clearnode_failedevent at runtime.Risks and Mitigations
loadCommandPromptcould change resolution precedence and inadvertently affect command-backed loops differently from command nodes.loop.commandmirror the command-node tests at every layer.command:node behaviour exactly. Documentation inloop-nodes.mdcalls out the fail-fast semantics. Validator surfaces the missing file with "did you mean…" suggestions before the workflow ever runs.Summary by CodeRabbit
loop.command) instead of an inline prompt (loop.prompt); the UI identifies command-backed loops by the command name and uses the same variable substitution as inline prompts.loop.promptorloop.command, reject invalid/missing/unsafe command targets at load time, and load command text once per node (mid-run file edits are ignored).loop.commandrules, precedence, and examples.