Skip to content

Add Gemini and OpenAI chat providers#40

Open
xPoleStarx wants to merge 1 commit intofarzaa:mainfrom
xPoleStarx:feature/add-gemini-openai-chat
Open

Add Gemini and OpenAI chat providers#40
xPoleStarx wants to merge 1 commit intofarzaa:mainfrom
xPoleStarx:feature/add-gemini-openai-chat

Conversation

@xPoleStarx
Copy link
Copy Markdown

Summary

This PR adds first-class OpenAI and Gemini chat support alongside the existing Claude integration.

The main goal was to keep the macOS app's voice flow unchanged while making the chat layer provider-agnostic. The app can now route multimodal screenshot + transcript requests through the Cloudflare Worker to Claude, OpenAI, or Gemini using a shared request/response shape.

What Changed

  • Added a provider-agnostic chat client on the macOS side
  • Expanded the model picker to support:
    • claude-sonnet-4-6
    • claude-opus-4-6
    • gpt-5.4
    • gemini-2.5-flash
  • Updated CompanionManager to use a model-agnostic chat pipeline instead of a Claude-specific one
  • Reworked the Cloudflare Worker /chat route to support:
    • Anthropic Messages API
    • OpenAI Responses API
    • Gemini generateContent
  • Normalized all provider responses into the SSE shape already expected by the macOS client
  • Updated the standalone OpenAI helper to use the current Responses API
  • Updated docs for the new worker secrets and multi-provider chat architecture

Why

The existing app flow was tightly coupled to Claude. This change keeps the user-facing interaction model the same while allowing Clicky to switch between major multimodal providers with minimal client-side branching.

That makes it easier to compare model behavior, keep provider flexibility, and continue evolving the app without hard-coding one vendor into the primary chat path.

Notes

New worker secrets required:

  • OPENAI_API_KEY
  • GEMINI_API_KEY

Existing secrets still required:

  • ANTHROPIC_API_KEY
  • ASSEMBLYAI_API_KEY
  • ELEVENLABS_API_KEY

Testing

I did not run xcodebuild because the repo instructions explicitly say not to run it from the terminal due to TCC permission resets.

Static validation performed:

  • reviewed integration points in the macOS client
  • updated the worker routing logic end-to-end
  • checked staged changes and commit integrity
  • ran git diff --check

Follow-ups

Potential follow-up work, if desired:

  • add provider-specific fallback/error messaging in the UI
  • move more legacy direct-provider helpers behind the worker
  • add a lightweight smoke-test path for worker chat providers

@qodo-ai-reviewer
Copy link
Copy Markdown

Hi, The Worker buffers the full provider response and emits a single SSE content_block_delta event, so the macOS streaming parser only receives one chunk at the end. This removes progressive delivery and can increase perceived latency (and defeats SSE streaming semantics) compared to a real token/chunk stream.

Severity: remediation recommended | Category: reliability

How to fix: Stream provider output as SSE

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

The /chat route emits a single SSE chunk containing the entire model response, which eliminates true streaming behavior.

Issue Context

CompanionChatAPI.analyzeImageStreaming() is implemented as an SSE line parser that can handle multiple incremental content_block_delta events.

Fix Focus Areas

  • worker/src/index.ts[74-90]
  • worker/src/index.ts[380-406]

What to change

  • Implement per-provider streaming adapters (Anthropic/OpenAI/Gemini) that read each provider’s streaming response and re-emit incremental normalized SSE content_block_delta events.
  • Alternatively, if intentional, document that streaming is “single-chunk SSE” and consider renaming client-side APIs or adjusting UX/timeouts accordingly.

Found by Qodo code review. FYI, Qodo is free for open-source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants