Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions PROTOCOL.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,11 @@ Dispatch a user message to a session.
| taskId | uuid | |
| sessionId | uuid | |
| channelId | string | Routes stream events back to the correct browser tab |
| attemptId | uuid? | Active task attempt created by the relay. |
| attemptNumber | int? | Monotonic attempt number for the task. |
| leaseExpiresAt | string? | ISO timestamp for the active relay lease. |
| deadlineProfile | TaskDeadlines? | Daemon supervision deadline profile in milliseconds. |
| turnKind | string? | `user`, `session_title`, `context_stats`, `compact`, or `control`. |
| prompt | string | |
| engine | "pi"? | Optional task execution engine. Empty means `"pi"`. |
| provider | string? | Optional Pi provider id. Empty means `"claude-cli"`. |
Expand Down Expand Up @@ -69,8 +74,24 @@ Dispatch a user message to a session.
| Field | Type | Notes |
|---|---|---|
| token | string | Bearer token scoped to one task. |
| id | string? | Capability record identifier. |
| attemptId | uuid? | Attempt that owns this capability. |
| apiBaseUrl | string | Cloud app base URL for `/api/agent-plan/*`. |
| expiresAt | string | ISO timestamp. |
| snapshot | object? | Capability metadata snapshot used by cloud-side authorization. |

`TaskDeadlines`:

| Field | Type | Notes |
|---|---|---|
| processStartMs | int? | Process launch deadline. |
| promptWriteMs | int? | Prompt write deadline. |
| firstEventMs | int? | Deadline for the first parsed runtime event. |
| firstVisibleEventMs | int? | Deadline for the first user-visible runtime event. |
| streamIdleMs | int? | Stream inactivity deadline. |
| toolIdleMs | int? | Tool execution inactivity deadline. |
| userInputMs | int? | User input wait deadline. |
| cleanupTermMs | int? | Grace period for process cleanup. |

`Task.contextRefs` carries project-relative file and folder references selected in the cloud composer. The relay forwards this field only to daemons that advertise `Hello.capabilities.contextRefs`.

Expand Down Expand Up @@ -221,6 +242,63 @@ reconnect recovery happens by reloading persisted messages by session and
sequence from the database. There is no daemon ↔ relay ack/replay/WAL handshake
in protocol version 1.

### Task attempt lifecycle

The relay owns task attempts. A dispatched `task` includes the active
`attemptId`, `attemptNumber`, `leaseExpiresAt`, `deadlineProfile`, and
`turnKind`. Daemons echo attempt metadata on task-adjacent frames so the relay
can associate runtime events with the active attempt.
Comment on lines +247 to +250
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Document the new attempt fields on the existing task-adjacent frames too.

This section says attempt metadata is echoed beyond taskLifecycle, but the message tables for stream, taskStarted, taskComplete, taskError, taskCancelled, permissionRequest, and question still describe the pre-attempt shapes. That leaves the authoritative spec behind the Go types.

Based on learnings: "Treat message shape changes as cross-repo work: update PROTOCOL.md, Go types, and tests here first before bumping consumers".

Also applies to: 297-300

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@PROTOCOL.md` around lines 247 - 250, The protocol docs currently state that
attempt metadata is echoed beyond taskLifecycle but the message tables for
stream, taskStarted, taskComplete, taskError, taskCancelled, permissionRequest,
and question still show the old shapes; update PROTOCOL.md to include the new
attempt fields (attemptId, attemptNumber, leaseExpiresAt, deadlineProfile,
turnKind) on each of those task-adjacent frame descriptions and message tables
(also apply the same fix at the other occurrence around lines 297-300), and
ensure the prose clarifies that these fields are present on every echoed frame
so it matches the Go types and tests before bumping consumers.


`taskLifecycle` is the structured lifecycle frame for attempt diagnostics:

| Field | Type | Notes |
|---|---|---|
| type | "taskLifecycle" | |
| taskId | uuid | |
| attemptId | uuid | |
| attemptNumber | int | |
| sessionId | uuid | |
| channelId | string | |
| phase | string | Lifecycle phase. |
| status | string | Durable attempt status. |
| retryable | boolean? | Terminal retry-safety hint. |
| failureCode | string? | Stable terminal failure code. |
| message | string? | Operator-facing detail. |
| userMessage | string? | User-facing failure detail. |
| observedAt | string | RFC3339 timestamp. |
| deadlineAt | string? | Deadline associated with the phase. |
| pid | int? | Local process id when known. |
| provider | string? | Pi provider id. |
| model | string? | Provider model id. |
| requestId | uuid? | Root correlation id. |
| traceparent | string? | W3C trace context. |

Lifecycle phases are `accepted`, `queued`, `started`, `pi_started`,
`prompt_written`, `first_event_seen`, `first_visible_event_seen`, `streaming`,
`tool_started`, `tool_finished`, `waiting_input`, `input_received`,
`cleanup_started`, `cleanup_finished`, `heartbeat`, `retry_scheduled`,
`completed`, `failed`, `canceled`, `timed_out`, and `lost`.

Attempt statuses are `created`, `queued`, `started`, `pi_started`,
`prompt_written`, `first_event_seen`, `first_visible_event_seen`, `streaming`,
`waiting_input`, `tool_running`, `cleanup_started`, `cleanup_finished`,
`completed`, `failed`, `canceled`, `timed_out`, and `lost`.

The relay filters stale lifecycle frames by the active tuple
`(taskId, attemptId, sessionId, machineId)`. Frames that do not match the active
attempt are ignored for aggregate task state and may still be logged for
diagnostics. Terminal lifecycle phases map to task aggregate states:
`completed`, `failed`, `canceled`, `timed_out`, and `lost`.

Retry safety is attempt-local. A timeout before visible output or side effects
can be marked `retryable`; phases after visible output or tool execution make
automatic retry unsafe unless a higher-level policy explicitly allows it.

Control turns use `turnKind` to distinguish user-visible work from local
maintenance such as session title generation, context stats, compaction, and
daemon control flows. Consumers that do not understand attempt fields ignore
them as additive JSON fields.

### `planningEvent`

Append-only planning journal event sent from a source runtime to the relay.
Expand Down
22 changes: 22 additions & 0 deletions envelope.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,26 @@ type Envelope struct {
Payload any
}

func (e Envelope) DecodePayload() (any, error) {
payload, err := payloadForType(e.Type)
if err != nil {
return nil, err
}
switch raw := e.Payload.(type) {
case json.RawMessage:
if err := json.Unmarshal(raw, payload); err != nil {
return nil, fmt.Errorf("decode %s: %w", e.Type, err)
}
case []byte:
if err := json.Unmarshal(raw, payload); err != nil {
return nil, fmt.Errorf("decode %s: %w", e.Type, err)
}
default:
return e.Payload, nil
}
return payload, nil
}

// ParseEnvelope reads raw JSON, looks at the type field, and unmarshals
// into the correct concrete struct.
func ParseEnvelope(data []byte) (*Envelope, error) {
Expand Down Expand Up @@ -63,6 +83,8 @@ func payloadForType(msgType string) (any, error) {
switch msgType {
case MsgTypeTask:
return &Task{}, nil
case MsgTypeTaskLifecycle:
return &TaskLifecycle{}, nil
case MsgTypeStop:
return &Stop{}, nil
case MsgTypePermissionResponse:
Expand Down
Loading