You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This RFC documents the comprehensive architecture for operation cancellation and interruption handling in LobeHub's agent runtime system. It describes the hierarchical operation structure, cancellation propagation mechanism, and interrupt handling behavior across different execution phases.
Motivation
As LobeHub's agent runtime system has evolved with complex multi-step workflows involving LLM calls, tool executions, and human interventions, we needed a robust and consistent cancellation mechanism that:
Properly cleans up pending operations when users cancel mid-execution
Maintains state consistency across the operation tree
Provides clear feedback to users about what was cancelled
Prevents resource leaks from incomplete operations
Follows clear design principles for maintainability
This RFC serves as the comprehensive documentation of our cancellation architecture after implementing and refining the system.
typeOperationStatus=|'pending'// Waiting to start (not currently used)|'running'// Executing|'paused'// Paused (for user intervention scenarios)|'completed'// Successfully completed|'cancelled'// User cancelled|'failed';// Execution failed
Responsibility: Detect cancellation and trigger agent interrupt handling
while(state.status!=='done'&&state.status!=='error'){// Check if operation has been cancelledconstcurrentOperation=get().operations[operationId];if(currentOperation?.status==='cancelled'){log('[internal_execAgentRuntime] Operation cancelled, marking state as interrupted');// Set state.status to 'interrupted' to trigger agent abort handlingstate={ ...state,status: 'interrupted'};// Let agent handle the abort (will clean up pending tools if needed)constresult=awaitruntime.step(state,nextContext);state=result.newState;log('[internal_execAgentRuntime] Operation cancelled, stopping loop');break;}// Execute stepconstresult=awaitruntime.step(state,nextContext);// ...}
Key Design:
Does NOT directly handle abort logic - Avoids duplicating cleanup code
Responsibility: Unified abort checking and cleanup decision
asyncrunner(context: AgentRuntimeContext,state: AgentState){// Unified abort check: before all phase handlingif(state.status==='interrupted'){returnthis.handleAbort(context,state);}// ... phase handling}privatehandleAbort(context: AgentRuntimeContext,state: AgentState): AgentInstruction{const{ hasToolsCalling, parentMessageId, toolsCalling }=this.extractAbortInfo(context,state);// If there are pending tool calls, clean them upif(hasToolsCalling&&toolsCalling.length>0){return{type: 'resolve_aborted_tools',payload: { parentMessageId, toolsCalling }};}// No tools to clean up, finish directlyreturn{type: 'finish',reason: 'user_requested',reasonDetail: 'Operation cancelled by user'};}
Abort Information Extraction:
Extracts different information based on current phase:
llm_result phase:
Extract toolsCalling from payload
Tools haven't created messages yet
tool_result / tools_batch_result phase:
Find messages with pluginIntervention.status === 'pending' in state.messages
Extract plugin info as toolsCalling
Layer 3: Executor
Responsibility: Register cancel handlers and cleanup resources
createToolMessage - "Ensure Complete" Strategy
onOperationCancel(createToolMsgOpId,async({ metadata })=>{// Wait for message creation to completeconstcreateResult=awaitmetadata?.createMessagePromise;if(createResult){// Update message to aborted stateawaitPromise.all([optimisticUpdateMessageContent(msgId,'Tool execution was cancelled by user.'),optimisticUpdateMessagePlugin(msgId,{intervention: {status: 'aborted'}})]);}});
Rationale: Message creation is async; when cancelled, it might be in progress. Wait for completion then mark as aborted.
executeToolCall - "Immediate Cleanup" Strategy
onOperationCancel(executeToolOpId,async()=>{// Update message to aborted state immediatelyawaitPromise.all([optimisticUpdateMessageContent(toolMessageId,'Tool execution was cancelled by user.'),optimisticUpdateMessagePlugin(toolMessageId,{intervention: {status: 'aborted'}})]);});
Rationale: Message already exists; update state immediately for fast response.
Parent Operation Check
// Check if parent operation was cancelled while creating messageconsttoolOperation=toolOperationId ? get().operations[toolOperationId] : undefined;if(toolOperation?.abortController.signal.aborted){log('[call_tool] Parent operation cancelled, skipping tool execution');return{ events,newState: state};}
Layer 4: Tool
Responsibility: Check AbortSignal and stop execution
Sets content: 'Tool execution was aborted by user.'
Sets pluginIntervention: { status: 'aborted' }
Sets state.status = 'done'
UI:
Assistant message shows complete content
Tool messages show "Tool execution was aborted by user."
Tool cards show aborted status (gray/disabled style)
Operation completes
Phase 4: tool_result (During Tool Execution)
Interrupt Timing: Tool is executing (e.g., search, code execution)
Behavior (call_tool executor):
Case A: Cancelled During createToolMessage
// createToolMessage cancel handler executesonOperationCancel(createToolMsgOpId,async({ metadata })=>{// Wait for message creation to completeconstcreateResult=awaitmetadata?.createMessagePromise;// Update message to aborted stateawaitoptimisticUpdateMessageContent(msgId,'Tool execution was cancelled by user.');awaitoptimisticUpdateMessagePlugin(msgId,{intervention: {status: 'aborted'}});});
Result:
Tool message creation completes
Message shows "Tool execution was cancelled by user."
onOperationCancel(executeToolOpId,async()=>{// Update message to aborted stateawaitoptimisticUpdateMessageContent(toolMessageId,'Tool execution was cancelled by user.');awaitoptimisticUpdateMessagePlugin(toolMessageId,{intervention: {status: 'aborted'}});});
Result:
Tool stops execution
Tool message updates to aborted state
Returns error event or empty events
Agent runtime loop continues:
Loop detects operation cancelled
Sets state.status = 'interrupted'
Agent handleAbort checks for other pending tools
If yes: execute resolve_aborted_tools
If no: finish directly
UI:
Tool message shows "Tool execution was cancelled by user."
Tool card shows aborted status
If other pending tools exist, they're also marked aborted
Operation completes
Phase 5: tool_result (Tool Complete, Ready to Call LLM)
Interrupt Timing: Tool execution complete, ready to call LLM with tool results
Behavior:
Similar to Phase 3, but tool result messages already exist
Agent handleAbort checks pending tools (if multiple tools, some completed)
// Periodically check in long-running operationsasyncfunctionlongRunningTask(abortSignal: AbortSignal){for(constitemofitems){// Check if cancelledif(abortSignal.aborted){log('Task cancelled');return;}// Process itemawaitprocessItem(item);}}
Completing Operations
// On successget().completeOperation(operationId);// On failureget().failOperation(operationId,{type: 'NetworkError',message: 'Failed to fetch data',});
Implementation Summary
Problems Solved
✅ Problem 1: Loop level didn't check cancellation status
Solution: Check operation.status === 'cancelled' at while loop start
✅ Problem 2: Cancellation didn't clean up pending tools
Solution: Set state.status = 'interrupted', trigger agent's handleAbort()
✅ Problem 3: Cancellation logic scattered across multiple places
Solution: Agent's unified abort check and handling
This operation cancellation architecture provides:
Clear layered responsibilities - Each layer has well-defined duties
Consistent cancellation behavior - All cancellations follow the same pattern
Proper resource cleanup - No dangling operations or messages
User-friendly feedback - Clear UI indication of what was cancelled
Maintainable design - Easy to understand and extend
The architecture has been battle-tested and proven effective in production. This RFC serves as the definitive documentation for understanding and maintaining the system.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
This RFC documents the comprehensive architecture for operation cancellation and interruption handling in LobeHub's agent runtime system. It describes the hierarchical operation structure, cancellation propagation mechanism, and interrupt handling behavior across different execution phases.
Motivation
As LobeHub's agent runtime system has evolved with complex multi-step workflows involving LLM calls, tool executions, and human interventions, we needed a robust and consistent cancellation mechanism that:
This RFC serves as the comprehensive documentation of our cancellation architecture after implementing and refining the system.
Operation Type Hierarchy
Core Operation Types
Operation Status
Operation Tree Structure
Typical Execution Flow
Example: Simple LLM Chat (No Tool Calls)
Example: Single Tool Call
Cancellation Propagation Mechanism
Cancellation Workflow
When a user cancels an operation:
Propagation Example
Layered Cancellation Handling
Layer 1: Streaming Executor
Responsibility: Detect cancellation and trigger agent interrupt handling
Key Design:
state.status = 'interrupted'- Triggers agent's unified abort handlingruntime.step()- Lets agent execute cleanup logichandleAbort()- Cleans up pending tools (if any)Layer 2: Agent (GeneralChatAgent)
Responsibility: Unified abort checking and cleanup decision
Abort Information Extraction:
Extracts different information based on current phase:
llm_resultphase:toolsCallingfrom payloadtool_result/tools_batch_resultphase:pluginIntervention.status === 'pending'in state.messagestoolsCallingLayer 3: Executor
Responsibility: Register cancel handlers and cleanup resources
createToolMessage - "Ensure Complete" Strategy
Rationale: Message creation is async; when cancelled, it might be in progress. Wait for completion then mark as aborted.
executeToolCall - "Immediate Cleanup" Strategy
Rationale: Message already exists; update state immediately for fast response.
Parent Operation Check
Layer 4: Tool
Responsibility: Check AbortSignal and stop execution
Rationale: Relies on parent cancel handler to update message status.
Interrupt Behavior by Phase
Phase 1: init / user_input
Interrupt Timing: User message just submitted, LLM not called yet
Behavior:
status === 'interrupted'handleAbort()finishinstructionUI:
Phase 2: llm_result (During LLM Streaming)
Interrupt Timing: LLM is streaming output
Behavior (call_llm executor):
Agent handles human_abort phase:
finishinstructionUI:
Phase 3: llm_result (LLM Complete, Ready to Execute Tools)
Interrupt Timing: LLM returned tool calls, but tool messages not created yet
Behavior (streaming executor):
Agent handling (unified abort check in runner):
resolve_aborted_tools executor executes:
content: 'Tool execution was aborted by user.'pluginIntervention: { status: 'aborted' }UI:
Phase 4: tool_result (During Tool Execution)
Interrupt Timing: Tool is executing (e.g., search, code execution)
Behavior (call_tool executor):
Case A: Cancelled During createToolMessage
Result:
Case B: Cancelled During executeToolCall
Builtin tool detects abort:
executeToolCall cancel handler executes:
Result:
Agent runtime loop continues:
UI:
Phase 5: tool_result (Tool Complete, Ready to Call LLM)
Interrupt Timing: Tool execution complete, ready to call LLM with tool results
Behavior:
UI:
Design Principles
1. Layered Responsibilities
Streaming Executor Layer
state.status = 'interrupted'Agent Layer
state.status === 'interrupted')extractAbortInfo)handleAbort)Executor Layer
Tool Layer
abortController.signal.aborted2. Cancellation Strategies
Ensure Complete Strategy (createToolMessage)
Immediate Cleanup Strategy (executeToolCall)
Recursive Cancel Strategy (all parent operations)
3. State Consistency
Operation State
statusfieldAgent State
status: 'interrupted'triggers abort handlingstatus: 'done'indicates completionstatus: 'waiting_for_human'indicates waiting for approvalMessage State
pluginIntervention.status: 'aborted'indicates tool cancelledcontentshows cancellation reasonBest Practices
Creating Operations
Registering Cancel Handlers
Checking Abort
Completing Operations
Implementation Summary
Problems Solved
✅ Problem 1: Loop level didn't check cancellation status
operation.status === 'cancelled'at while loop start✅ Problem 2: Cancellation didn't clean up pending tools
state.status = 'interrupted', trigger agent'shandleAbort()✅ Problem 3: Cancellation logic scattered across multiple places
Current Limitations & Future Improvements
Reference Files
Core Files
src/store/chat/slices/operation/types.ts- Operation type definitionssrc/store/chat/slices/operation/actions.ts- Operation management logicsrc/store/chat/agents/GeneralChatAgent.ts- Agent abort handlingsrc/store/chat/agents/createAgentExecutors.ts- Executor implementationsrc/store/chat/slices/aiChat/actions/streamingExecutor.ts- Streaming executorTool Files
src/store/chat/slices/builtinTool/actions/search.ts- Search toolsrc/store/chat/slices/builtinTool/actions/interpreter.ts- Interpreter toolsrc/store/chat/slices/builtinTool/actions/localSystem.ts- LocalSystem toolsrc/store/chat/slices/plugin/actions/pluginTypes.ts- Plugin toolsTest Files
src/store/chat/agents/__tests__/GeneralChatAgent.test.ts- Agent testssrc/store/chat/agents/__tests__/createAgentExecutors/- Executor testssrc/store/chat/slices/aiChat/actions/__tests__/streamingExecutor.test.ts- Streaming testsConclusion
This operation cancellation architecture provides:
The architecture has been battle-tested and proven effective in production. This RFC serves as the definitive documentation for understanding and maintaining the system.
Beta Was this translation helpful? Give feedback.
All reactions