feat(api): add WebSocket mode for Responses API by chaliy · Pull Request #26 · chaliy/llmsim

chaliy · 2026-02-24T14:05:19Z

What

Adds WebSocket transport support for the OpenAI Responses API endpoint (/openai/v1/responses), enabling persistent connections for multi-turn agentic workflows.

Why

WebSocket mode reduces connection overhead for agentic tool-call loops. Instead of opening a new HTTP connection per turn, clients maintain a single WebSocket and send response.create events sequentially. This matches the OpenAI SDK's client.responses.connect() interface.

How

WebSocket handler (src/cli/ws_handler.rs): Manages upgrade, message dispatch, connection-local response caching, and 60-minute timeout
Wire format types (src/openai/websocket.rs): Deserializes both flat (SDK default) and nested response.create formats
Stream reuse: The existing ResponsesTokenStream is reused — events are stripped of SSE envelopes and sent as JSON text frames
Protocol compliance: All streaming events include sequence_number, item_id, logprobs, and annotations per the OpenAI spec
generate: false warmup: Returns a completed response with no output for preflight state preparation
previous_response_id caching: Most recent response cached per connection; mismatch returns previous_response_not_found error
Stats tracking: WebSocket requests counted alongside HTTP requests

Testing

7 integration tests (tests/websocket_test.rs): basic response, event sequence validation, JSON-not-SSE verification, multi-turn sequential, previous_response_id caching/errors, invalid message handling
Python example (examples/openai_websocket_client.py) using the official OpenAI SDK

Risk

Low
WebSocket is additive; existing HTTP endpoints are unchanged

Checklist

Tests added or updated
Backward compatibility considered
Specs updated (specs/responses-api.md, specs/api-endpoints.md)
Docs updated (docs/api.md, examples/README.md)

https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ

Add foundational types and infrastructure for WebSocket transport: - Enable axum "ws" feature in Cargo.toml - Add WebSocket client/server message types (openai::websocket) - Add WebSocket stats tracking (connections, requests) - Extract shared generate_responses_result() for HTTP and WS reuse https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ

Add WebSocket transport for the Responses API endpoint, mirroring OpenAI's WebSocket mode for persistent connections and multi-turn agentic workflows. Features: - WebSocket upgrade on GET /openai/v1/responses - Client sends response.create events, server streams JSON frames - Connection-local cache for previous_response_id continuations - Sequential execution (one in-flight response per connection) - 60-minute connection timeout - generate=false warmup support - Error events: previous_response_not_found, connection_limit_reached Implementation: - New ws_handler module with WebSocket connection lifecycle - Refactored response generation into shared function (HTTP + WS) - WebSocket stats tracking (connections, requests) - Exposed build_router() for integration testing - 7 WebSocket integration tests - Python smoke test example using websocket-client https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ

https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ

…t format - Rewrite examples/websocket_client.py to use the official OpenAI Python SDK (openai[realtime]>=2.22.0) with client.responses.connect() instead of the raw websocket-client library - Support both flat and nested response.create client event formats: flat: {"type": "response.create", "model": "gpt-5", "input": "Hello"} nested: {"type": "response.create", "response": {"model": "gpt-5", ...}} - Update server error events to use the flat format expected by the SDK: {"type": "error", "code": "...", "message": "...", "param": null} - Update integration tests for new error event format https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ

…content events The OpenAI SDK requires sequence_number on every server event, not just delta events. Also adds item_id to content/text events for correlating events to their parent output item. Changes: - ResponsesStreamEvent: all helpers now take a seq parameter - Stream builder: global monotonically increasing sequence_number across all events (0, 1, 2, ... N) - content_part.added, output_text.delta, output_text.done, content_part.done, and all reasoning_summary events now include item_id field - Error SSE events now use the flat format matching the WebSocket error format (code, message, param, sequence_number) Verified with OpenAI Python SDK: all events parse correctly with continuous sequence_number and item_id fields. https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ

- Add `annotations: []` to OutputContentPart::OutputText for spec compliance - Add `logprobs: []` to output_text.delta and output_text.done events - Rename websocket_client.py to openai_websocket_client.py for clarity - Update all docs/example references to new filename https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ

- Update R5.3 to clarify sequence_number is on all events - Add R5.4 (logprobs field) and R5.5 (annotations field) - Document flat and nested response.create formats in R10.2 - Fix non-requirements: previous_response_id is cached per WS connection - Update api.md to show flat format (used by OpenAI SDK) https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ

## What Prepare patch release v0.2.3 with version bump and changelog update. ## Why Release new features and fixes since v0.2.2. ## How - Bumped version in Cargo.toml from 0.2.2 to 0.2.3 - Updated CHANGELOG.md with v0.2.3 section including all changes since v0.2.2 ## Changelog excerpt ### Highlights - WebSocket mode for Responses API streaming - OpenAI thinking/reasoning emulation support - Fixed repository URLs for crates.io listing ### What's Changed * chore: routine maintenance - update deps and align specs (#30) * fix: correct repository URLs for crates.io listing (#29) * chore: add attribution settings and agent guidance for commits/PRs (#28) * feat: add /ship command for full shipping workflow (#27) * feat(api): add WebSocket mode for Responses API (#26) * feat(api): add OpenAI thinking/reasoning emulation (#25) ## Risk - Low - Standard version bump release ## Checklist - [x] Tests added or updated - [x] Backward compatibility considered Co-authored-by: Claude <noreply@anthropic.com>

claude added 7 commits February 24, 2026 02:58

chore: remove implementation plan file

e0231a3

https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ

chaliy merged commit 387f40e into main Feb 24, 2026
11 checks passed

chaliy deleted the claude/websocket-mode-integration-BoD6T branch February 24, 2026 14:12

chaliy mentioned this pull request Mar 20, 2026

chore(release): prepare v0.2.3 #31

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): add WebSocket mode for Responses API#26

feat(api): add WebSocket mode for Responses API#26
chaliy merged 7 commits intomainfrom
claude/websocket-mode-integration-BoD6T

chaliy commented Feb 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chaliy commented Feb 24, 2026

What

Why

How

Testing

Risk

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants