feat(api): add WebSocket mode for Responses API#26
Merged
Conversation
Add foundational types and infrastructure for WebSocket transport: - Enable axum "ws" feature in Cargo.toml - Add WebSocket client/server message types (openai::websocket) - Add WebSocket stats tracking (connections, requests) - Extract shared generate_responses_result() for HTTP and WS reuse https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ
Add WebSocket transport for the Responses API endpoint, mirroring OpenAI's WebSocket mode for persistent connections and multi-turn agentic workflows. Features: - WebSocket upgrade on GET /openai/v1/responses - Client sends response.create events, server streams JSON frames - Connection-local cache for previous_response_id continuations - Sequential execution (one in-flight response per connection) - 60-minute connection timeout - generate=false warmup support - Error events: previous_response_not_found, connection_limit_reached Implementation: - New ws_handler module with WebSocket connection lifecycle - Refactored response generation into shared function (HTTP + WS) - WebSocket stats tracking (connections, requests) - Exposed build_router() for integration testing - 7 WebSocket integration tests - Python smoke test example using websocket-client https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ
…t format
- Rewrite examples/websocket_client.py to use the official OpenAI Python
SDK (openai[realtime]>=2.22.0) with client.responses.connect() instead
of the raw websocket-client library
- Support both flat and nested response.create client event formats:
flat: {"type": "response.create", "model": "gpt-5", "input": "Hello"}
nested: {"type": "response.create", "response": {"model": "gpt-5", ...}}
- Update server error events to use the flat format expected by the SDK:
{"type": "error", "code": "...", "message": "...", "param": null}
- Update integration tests for new error event format
https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ
…content events The OpenAI SDK requires sequence_number on every server event, not just delta events. Also adds item_id to content/text events for correlating events to their parent output item. Changes: - ResponsesStreamEvent: all helpers now take a seq parameter - Stream builder: global monotonically increasing sequence_number across all events (0, 1, 2, ... N) - content_part.added, output_text.delta, output_text.done, content_part.done, and all reasoning_summary events now include item_id field - Error SSE events now use the flat format matching the WebSocket error format (code, message, param, sequence_number) Verified with OpenAI Python SDK: all events parse correctly with continuous sequence_number and item_id fields. https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ
- Add `annotations: []` to OutputContentPart::OutputText for spec compliance - Add `logprobs: []` to output_text.delta and output_text.done events - Rename websocket_client.py to openai_websocket_client.py for clarity - Update all docs/example references to new filename https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ
- Update R5.3 to clarify sequence_number is on all events - Add R5.4 (logprobs field) and R5.5 (annotations field) - Document flat and nested response.create formats in R10.2 - Fix non-requirements: previous_response_id is cached per WS connection - Update api.md to show flat format (used by OpenAI SDK) https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ
chaliy
added a commit
that referenced
this pull request
Mar 20, 2026
## What Prepare patch release v0.2.3 with version bump and changelog update. ## Why Release new features and fixes since v0.2.2. ## How - Bumped version in Cargo.toml from 0.2.2 to 0.2.3 - Updated CHANGELOG.md with v0.2.3 section including all changes since v0.2.2 ## Changelog excerpt ### Highlights - WebSocket mode for Responses API streaming - OpenAI thinking/reasoning emulation support - Fixed repository URLs for crates.io listing ### What's Changed * chore: routine maintenance - update deps and align specs (#30) * fix: correct repository URLs for crates.io listing (#29) * chore: add attribution settings and agent guidance for commits/PRs (#28) * feat: add /ship command for full shipping workflow (#27) * feat(api): add WebSocket mode for Responses API (#26) * feat(api): add OpenAI thinking/reasoning emulation (#25) ## Risk - Low - Standard version bump release ## Checklist - [x] Tests added or updated - [x] Backward compatibility considered Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds WebSocket transport support for the OpenAI Responses API endpoint (
/openai/v1/responses), enabling persistent connections for multi-turn agentic workflows.Why
WebSocket mode reduces connection overhead for agentic tool-call loops. Instead of opening a new HTTP connection per turn, clients maintain a single WebSocket and send
response.createevents sequentially. This matches the OpenAI SDK'sclient.responses.connect()interface.How
src/cli/ws_handler.rs): Manages upgrade, message dispatch, connection-local response caching, and 60-minute timeoutsrc/openai/websocket.rs): Deserializes both flat (SDK default) and nestedresponse.createformatsResponsesTokenStreamis reused — events are stripped of SSE envelopes and sent as JSON text framessequence_number,item_id,logprobs, andannotationsper the OpenAI specgenerate: falsewarmup: Returns a completed response with no output for preflight state preparationprevious_response_idcaching: Most recent response cached per connection; mismatch returnsprevious_response_not_founderrorTesting
tests/websocket_test.rs): basic response, event sequence validation, JSON-not-SSE verification, multi-turn sequential, previous_response_id caching/errors, invalid message handlingexamples/openai_websocket_client.py) using the official OpenAI SDKRisk
Checklist
specs/responses-api.md,specs/api-endpoints.md)docs/api.md,examples/README.md)https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ