Skip to content

feat(api): add WebSocket mode for Responses API#26

Merged
chaliy merged 7 commits intomainfrom
claude/websocket-mode-integration-BoD6T
Feb 24, 2026
Merged

feat(api): add WebSocket mode for Responses API#26
chaliy merged 7 commits intomainfrom
claude/websocket-mode-integration-BoD6T

Conversation

@chaliy
Copy link
Copy Markdown
Owner

@chaliy chaliy commented Feb 24, 2026

What

Adds WebSocket transport support for the OpenAI Responses API endpoint (/openai/v1/responses), enabling persistent connections for multi-turn agentic workflows.

Why

WebSocket mode reduces connection overhead for agentic tool-call loops. Instead of opening a new HTTP connection per turn, clients maintain a single WebSocket and send response.create events sequentially. This matches the OpenAI SDK's client.responses.connect() interface.

How

  • WebSocket handler (src/cli/ws_handler.rs): Manages upgrade, message dispatch, connection-local response caching, and 60-minute timeout
  • Wire format types (src/openai/websocket.rs): Deserializes both flat (SDK default) and nested response.create formats
  • Stream reuse: The existing ResponsesTokenStream is reused — events are stripped of SSE envelopes and sent as JSON text frames
  • Protocol compliance: All streaming events include sequence_number, item_id, logprobs, and annotations per the OpenAI spec
  • generate: false warmup: Returns a completed response with no output for preflight state preparation
  • previous_response_id caching: Most recent response cached per connection; mismatch returns previous_response_not_found error
  • Stats tracking: WebSocket requests counted alongside HTTP requests

Testing

  • 7 integration tests (tests/websocket_test.rs): basic response, event sequence validation, JSON-not-SSE verification, multi-turn sequential, previous_response_id caching/errors, invalid message handling
  • Python example (examples/openai_websocket_client.py) using the official OpenAI SDK

Risk

  • Low
  • WebSocket is additive; existing HTTP endpoints are unchanged

Checklist

  • Tests added or updated
  • Backward compatibility considered
  • Specs updated (specs/responses-api.md, specs/api-endpoints.md)
  • Docs updated (docs/api.md, examples/README.md)

https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ

Add foundational types and infrastructure for WebSocket transport:
- Enable axum "ws" feature in Cargo.toml
- Add WebSocket client/server message types (openai::websocket)
- Add WebSocket stats tracking (connections, requests)
- Extract shared generate_responses_result() for HTTP and WS reuse

https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ
Add WebSocket transport for the Responses API endpoint, mirroring
OpenAI's WebSocket mode for persistent connections and multi-turn
agentic workflows.

Features:
- WebSocket upgrade on GET /openai/v1/responses
- Client sends response.create events, server streams JSON frames
- Connection-local cache for previous_response_id continuations
- Sequential execution (one in-flight response per connection)
- 60-minute connection timeout
- generate=false warmup support
- Error events: previous_response_not_found, connection_limit_reached

Implementation:
- New ws_handler module with WebSocket connection lifecycle
- Refactored response generation into shared function (HTTP + WS)
- WebSocket stats tracking (connections, requests)
- Exposed build_router() for integration testing
- 7 WebSocket integration tests
- Python smoke test example using websocket-client

https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ
…t format

- Rewrite examples/websocket_client.py to use the official OpenAI Python
  SDK (openai[realtime]>=2.22.0) with client.responses.connect() instead
  of the raw websocket-client library
- Support both flat and nested response.create client event formats:
  flat: {"type": "response.create", "model": "gpt-5", "input": "Hello"}
  nested: {"type": "response.create", "response": {"model": "gpt-5", ...}}
- Update server error events to use the flat format expected by the SDK:
  {"type": "error", "code": "...", "message": "...", "param": null}
- Update integration tests for new error event format

https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ
…content events

The OpenAI SDK requires sequence_number on every server event, not just
delta events. Also adds item_id to content/text events for correlating
events to their parent output item.

Changes:
- ResponsesStreamEvent: all helpers now take a seq parameter
- Stream builder: global monotonically increasing sequence_number
  across all events (0, 1, 2, ... N)
- content_part.added, output_text.delta, output_text.done,
  content_part.done, and all reasoning_summary events now include
  item_id field
- Error SSE events now use the flat format matching the WebSocket
  error format (code, message, param, sequence_number)

Verified with OpenAI Python SDK: all events parse correctly with
continuous sequence_number and item_id fields.

https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ
- Add `annotations: []` to OutputContentPart::OutputText for spec compliance
- Add `logprobs: []` to output_text.delta and output_text.done events
- Rename websocket_client.py to openai_websocket_client.py for clarity
- Update all docs/example references to new filename

https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ
- Update R5.3 to clarify sequence_number is on all events
- Add R5.4 (logprobs field) and R5.5 (annotations field)
- Document flat and nested response.create formats in R10.2
- Fix non-requirements: previous_response_id is cached per WS connection
- Update api.md to show flat format (used by OpenAI SDK)

https://claude.ai/code/session_017no3LbHZSHrjVi7xqopZpZ
@chaliy chaliy merged commit 387f40e into main Feb 24, 2026
11 checks passed
@chaliy chaliy deleted the claude/websocket-mode-integration-BoD6T branch February 24, 2026 14:12
@chaliy chaliy mentioned this pull request Mar 20, 2026
2 tasks
chaliy added a commit that referenced this pull request Mar 20, 2026
## What
Prepare patch release v0.2.3 with version bump and changelog update.

## Why
Release new features and fixes since v0.2.2.

## How
- Bumped version in Cargo.toml from 0.2.2 to 0.2.3
- Updated CHANGELOG.md with v0.2.3 section including all changes since
v0.2.2

## Changelog excerpt

### Highlights
- WebSocket mode for Responses API streaming
- OpenAI thinking/reasoning emulation support
- Fixed repository URLs for crates.io listing

### What's Changed
* chore: routine maintenance - update deps and align specs (#30)
* fix: correct repository URLs for crates.io listing (#29)
* chore: add attribution settings and agent guidance for commits/PRs
(#28)
* feat: add /ship command for full shipping workflow (#27)
* feat(api): add WebSocket mode for Responses API (#26)
* feat(api): add OpenAI thinking/reasoning emulation (#25)

## Risk
- Low
- Standard version bump release

## Checklist
- [x] Tests added or updated
- [x] Backward compatibility considered

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants