Skip to content

feat: add task lifecycle diagnostics#120

Merged
glittercowboy merged 4 commits intomainfrom
codex/turn-lifecycle-daemon
Apr 30, 2026
Merged

feat: add task lifecycle diagnostics#120
glittercowboy merged 4 commits intomainfrom
codex/turn-lifecycle-daemon

Conversation

@glittercowboy
Copy link
Copy Markdown
Contributor

@glittercowboy glittercowboy commented Apr 30, 2026

Summary

  • Adds local task lifecycle diagnostics and pretty log filtering for daemon logs.
  • Supervises Pi task execution with attempt-aware lifecycle events, deadline hooks, cleanup events, and terminal failure metadata.
  • Exposes attempt-scoped plan capability metadata to the Pi extension.
  • Consumes protocol-go v0.32.0.

Dependency

Verification

  • go test ./internal/pi ./internal/session ./internal/logging
  • go test ./...
  • go build -o gsd-cloud .

Post-merge

  • Push the next daemon release tag and verify the release workflow.

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced logs command with filtering by session, task, and time range; added JSON and formatted timeline output options.
    • Improved lifecycle logging with automatic secret redaction in prompt previews.
    • Added attempt tracking to task execution with better timeout detection and reporting.
  • Chores

    • Updated protocol dependency to latest version.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

📝 Walkthrough

Walkthrough

The PR adds lifecycle telemetry infrastructure spanning CLI logging tools, event logging/serialization, and turn-level execution supervision. New code enables structured log filtering and rendering via command-line, lifecycle event construction with prompt preview redaction, executor lifecycle hook callbacks, and turn supervisor orchestration tracking task phases with deadline-based failure detection.

Changes

Cohort / File(s) Summary
Logging CLI
cmd/logs.go, cmd/logs_filter.go, cmd/logs_pretty.go, cmd/logs_test.go
Adds structured log viewing with filtering by session/task/time/level, JSON parsing, and pretty timeline rendering with ANSI coloring. Validates mutual exclusivity of targeting flags and formats timestamps as HH:MM:SS with phase labels and optional metadata (task ID, failure code, retryable marker).
Lifecycle Event Infrastructure
internal/logging/lifecycle.go, internal/logging/lifecycle_test.go
Defines TaskLifecycleLogEvent and TaskLifecycleLogInput for structured lifecycle logging. Implements PromptPreview to normalize whitespace, redact secrets matching a compiled regex, and truncate by word/character count without exceeding either constraint.
Executor Lifecycle Hooks
internal/pi/executor.go, internal/pi/executor_test.go, internal/pi/worker.go
Adds LifecycleHooks callback mechanism to Executor and wires invocations for process start, prompt write, first event visibility, and cleanup phases. Extends plan capability environment injection with GSD_PLAN_CAPABILITY_ID and GSD_PLAN_CAPABILITY_ATTEMPT_ID. Updates streamPiEvents signature to accept lifecycle hooks parameter.
Plan Capability Detection
internal/pi/extension/plan-tools.js
Extends hasPlanCapability gate to require GSD_PLAN_CAPABILITY_ATTEMPT_ID environment variable in addition to existing URL/token/expiry checks for plan tool registration.
Session Actor Lifecycle Telemetry
internal/session/actor.go
Extends taskContext with attempt-aware fields (AttemptID, AttemptNumber, TurnKind, Deadlines) and propagates them into protocol messages. Adds lifecycle sink that logs events via NewTaskLifecycleLog and optionally sends protocol.TaskLifecycle messages to relay. Populates FailureCode and Retryable from task errors and wires executor through TurnSupervisor with lifecycle/tool hooks.
Turn Supervision
internal/session/turn_supervisor.go, internal/session/turn_supervisor_test.go
Introduces TurnSupervisor to orchestrate per-turn lifecycle phases (task started, prompt written, first/visible events, tool start/finish, timeouts). Tracks deadlines (process start, prompt write, first event, tool idle, etc.), emits phase events enriched with attempt/turn metadata, and synthesizes turnFailureError with TurnResult on timeout or failure. Provides TurnHooks callbacks to wire into executor behavior.
Dependency Update
go.mod
Updates github.com/gsd-build/protocol-go from v0.29.1 to v0.32.0.

Sequence Diagram

sequenceDiagram
    participant Client as Caller
    participant TS as TurnSupervisor
    participant Executor as Pi Executor
    participant Sink as Lifecycle Sink
    
    Client->>TS: Run(context, runFunc)
    activate TS
    
    TS->>TS: recordStartTime()
    TS->>Sink: Phase("task_started", fields)
    
    TS->>Client: runFunc(context, TurnHooks)
    activate Client
    
    Client->>Executor: Run(...)
    activate Executor
    
    Executor->>Executor: hooks.PromptWritten()
    Executor->>TS: PromptWritten via TurnHooks
    deactivate Executor
    TS->>TS: scheduleDeadline(FirstEvent)
    TS->>Sink: Phase("prompt_written", fields)
    
    Executor->>Executor: hooks.FirstEventSeen()
    Executor->>TS: FirstEventSeen via TurnHooks
    TS->>Sink: Phase("first_event_seen", fields)
    TS->>TS: scheduleDeadline(ToolIdle)
    
    Executor->>Executor: hooks.ToolStarted()
    Executor->>TS: ToolStarted(id, name) via TurnHooks
    TS->>Sink: Phase("tool_started", fields)
    
    Executor->>Executor: hooks.ToolFinished()
    Executor->>TS: ToolFinished(id, name) via TurnHooks
    TS->>Sink: Phase("tool_finished", fields)
    
    Client-->>TS: return
    deactivate Client
    
    TS->>Sink: Phase("task_timed_out", fields)
    TS->>TS: Result() = {FailureCode, Retryable}
    TS-->>Client: return error
    deactivate TS
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 Hops through lifecycles, phase by phase,
Events gathered, redacted in place,
Supervisor watches each turn with care,
Deadlines ticking in the twilight air,
Pretty timelines bloom in color bright!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.13% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add task lifecycle diagnostics' clearly and concisely describes the main feature being introduced across the changeset—adding task lifecycle tracking, supervision, and diagnostic capabilities including pretty log filtering, lifecycle event hooks, and deadline management.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/turn-lifecycle-daemon

Review rate limit: 6/10 reviews remaining, refill in 22 minutes and 8 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/logs_pretty.go`:
- Around line 47-49: The code currently treats any mode != colorNever the same
and calls colorizePrettyLine(), causing ANSI escapes even when piping; change
the condition around the call in the pretty-printing path so that
colorizePrettyLine(line, event.Phase) is only called when mode==colorAlways OR
when mode==colorAuto AND the output is a TTY (use an isTerminal/isTTY check on
the writer e.g., os.Stdout or the logger output). Update the branch using the
variables mode, colorNever and the function colorizePrettyLine to perform that
TTY check (or introduce a small helper like isTerminal()) so color=auto disables
ANSI when output is not a terminal.

In `@cmd/logs.go`:
- Around line 156-158: When logsOptions.lastTask is true and you call
latestTaskID(lines), handle the empty-string case instead of leaving
filter.TaskID blank: check the return value of latestTaskID(lines) and if it's
an empty string, return an error (or exit with a clear message) indicating "no
task-scoped entries found for --last-task" rather than assigning "" to
filter.TaskID (which causes filterLogLines() to match everything); update the
branch that currently sets filter.TaskID to call latestTaskID(lines) and bail
out with a clear error or non-zero exit when the result is empty.
- Around line 107-116: hasStructuredMode() currently treats color and noColor as
selectors for structured rendering; remove opts.color != "auto" and opts.noColor
from the hasStructuredMode() boolean expression so color flags do not change the
mode, and add validation in the command's option parsing/validation (where
logsOptionsState is validated) to return an error if --color/--no-color is
provided without --pretty (i.e., require opts.pretty when opts.color != "auto"
or opts.noColor is true) so color flags only affect pretty output once
structured/pretty mode is explicitly selected.

In `@internal/session/turn_supervisor.go`:
- Around line 75-125: The One-shot timers started in TurnHooks (the FirstEvent
timer in PromptWritten and the ToolIdle timer in ToolStarted) must be tracked on
the TurnSupervisor (add fields like firstEventTimer and toolIdleTimer) and
cancelled when the turn advances or finishes: start and store the timers instead
of calling time.AfterFunc inline in PromptWritten/ToolStarted, call Stop() (and
drain the channel if needed) from ToolFinished and from the cleanup path just
before run() returns or when the supervisor is marked complete, and also clear
the timer fields when stopped; ensure PromptWritten/ToolStarted/ToolFinished and
the post-run return path consistently stop any outstanding timers so stale
timers cannot call s.timeout after the turn has already completed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ef1b9ae3-cc18-4907-bb40-da4259db2122

📥 Commits

Reviewing files that changed from the base of the PR and between 156be4a and 5913ff0.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (14)
  • cmd/logs.go
  • cmd/logs_filter.go
  • cmd/logs_pretty.go
  • cmd/logs_test.go
  • go.mod
  • internal/logging/lifecycle.go
  • internal/logging/lifecycle_test.go
  • internal/pi/executor.go
  • internal/pi/executor_test.go
  • internal/pi/extension/plan-tools.js
  • internal/pi/worker.go
  • internal/session/actor.go
  • internal/session/turn_supervisor.go
  • internal/session/turn_supervisor_test.go

Comment thread cmd/logs_pretty.go
Comment on lines +47 to +49
if mode != colorNever {
line = colorizePrettyLine(line, event.Phase)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

color=auto currently behaves like always.

Every mode except never runs colorizePrettyLine(), so piping pretty logs to a file or another command still injects ANSI escapes. auto needs a TTY check before deciding whether to colorize.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/logs_pretty.go` around lines 47 - 49, The code currently treats any mode
!= colorNever the same and calls colorizePrettyLine(), causing ANSI escapes even
when piping; change the condition around the call in the pretty-printing path so
that colorizePrettyLine(line, event.Phase) is only called when mode==colorAlways
OR when mode==colorAuto AND the output is a TTY (use an isTerminal/isTTY check
on the writer e.g., os.Stdout or the logger output). Update the branch using the
variables mode, colorNever and the function colorizePrettyLine to perform that
TTY check (or introduce a small helper like isTerminal()) so color=auto disables
ANSI when output is not a terminal.

Comment thread cmd/logs.go
Comment on lines +107 to +116
func (opts logsOptionsState) hasStructuredMode() bool {
return opts.sessionID != "" ||
opts.taskID != "" ||
opts.lastTask ||
opts.since > 0 ||
opts.level != "" ||
opts.pretty ||
opts.json ||
opts.color != "auto" ||
opts.noColor
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Keep color flags from changing the command mode.

hasStructuredMode() treats --color and --no-color as selectors for structured rendering, so gsd-cloud logs --no-color stops tailing the raw file and switches to a one-shot pretty snapshot. These flags should only modify pretty output once structured mode is already selected, or validation should require --pretty.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/logs.go` around lines 107 - 116, hasStructuredMode() currently treats
color and noColor as selectors for structured rendering; remove opts.color !=
"auto" and opts.noColor from the hasStructuredMode() boolean expression so color
flags do not change the mode, and add validation in the command's option
parsing/validation (where logsOptionsState is validated) to return an error if
--color/--no-color is provided without --pretty (i.e., require opts.pretty when
opts.color != "auto" or opts.noColor is true) so color flags only affect pretty
output once structured/pretty mode is explicitly selected.

Comment thread cmd/logs.go
Comment on lines +156 to +158
if logsOptions.lastTask {
filter.TaskID = latestTaskID(lines)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don't fall back to the entire log when --last-task finds nothing.

If latestTaskID(lines) returns "", filter.TaskID stays empty and filterLogLines() passes every event through. On a log file with no task-scoped entries yet, --last-task would silently dump the full log instead of returning no matches or a clear error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/logs.go` around lines 156 - 158, When logsOptions.lastTask is true and
you call latestTaskID(lines), handle the empty-string case instead of leaving
filter.TaskID blank: check the return value of latestTaskID(lines) and if it's
an empty string, return an error (or exit with a clear message) indicating "no
task-scoped entries found for --last-task" rather than assigning "" to
filter.TaskID (which causes filterLogLines() to match everything); update the
branch that currently sets filter.TaskID to call latestTaskID(lines) and bail
out with a clear error or non-zero exit when the result is empty.

Comment on lines +75 to +125
hooks := TurnHooks{
PromptWritten: func() {
s.emit("prompt_written", nil)
if s.opts.Deadlines.FirstEvent > 0 {
time.AfterFunc(s.opts.Deadlines.FirstEvent, func() {
s.mu.Lock()
shouldTimeout := !s.firstEvent && s.result.FailureCode == ""
s.mu.Unlock()
if shouldTimeout {
s.timeout("no_first_event_timeout", true)
}
})
}
},
FirstEventSeen: func() {
s.mu.Lock()
s.firstEvent = true
s.mu.Unlock()
s.emit("first_event_seen", nil)
},
FirstVisibleEventSeen: func() {
s.mu.Lock()
s.firstVisible = true
s.retrySafe = false
s.mu.Unlock()
s.emit("first_visible_event_seen", nil)
},
ToolStarted: func(toolCallID string, toolName string) {
s.mu.Lock()
s.retrySafe = false
s.mu.Unlock()
s.emit("tool_started", map[string]any{"toolCallId": toolCallID, "toolName": toolName})
if s.opts.Deadlines.ToolIdle > 0 {
time.AfterFunc(s.opts.Deadlines.ToolIdle, func() {
s.timeout("tool_idle_timeout", false)
})
}
},
ToolFinished: func(toolCallID string, toolName string) {
s.emit("tool_finished", map[string]any{"toolCallId": toolCallID, "toolName": toolName})
},
}

err := run(ctx, hooks)
s.mu.Lock()
hasFailure := s.result.FailureCode != ""
s.mu.Unlock()
if hasFailure {
return turnFailureError{result: s.Result()}
}
return err
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cancel outstanding deadline timers when the turn advances or exits.

ToolStarted() arms a one-shot ToolIdle timer, but ToolFinished() never stops it and Run() never marks the supervisor as complete before returning. That means a turn can start a tool, finish it quickly, keep streaming for longer than ToolIdle, and still get canceled by the stale timer. The same late-fire problem exists for the FirstEvent timer after run() has already returned. Track these timers on TurnSupervisor and stop them on state transitions / final completion before returning.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/session/turn_supervisor.go` around lines 75 - 125, The One-shot
timers started in TurnHooks (the FirstEvent timer in PromptWritten and the
ToolIdle timer in ToolStarted) must be tracked on the TurnSupervisor (add fields
like firstEventTimer and toolIdleTimer) and cancelled when the turn advances or
finishes: start and store the timers instead of calling time.AfterFunc inline in
PromptWritten/ToolStarted, call Stop() (and drain the channel if needed) from
ToolFinished and from the cleanup path just before run() returns or when the
supervisor is marked complete, and also clear the timer fields when stopped;
ensure PromptWritten/ToolStarted/ToolFinished and the post-run return path
consistently stop any outstanding timers so stale timers cannot call s.timeout
after the turn has already completed.

@glittercowboy glittercowboy merged commit 42f9c78 into main Apr 30, 2026
2 checks passed
@glittercowboy glittercowboy deleted the codex/turn-lifecycle-daemon branch April 30, 2026 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant