Skip to content

feat: 0.5.0 Capture release — dictation, MCP, personalities#544

Open
jamiepine wants to merge 33 commits intomainfrom
feat/capture
Open

feat: 0.5.0 Capture release — dictation, MCP, personalities#544
jamiepine wants to merge 33 commits intomainfrom
feat/capture

Conversation

@jamiepine
Copy link
Copy Markdown
Owner

@jamiepine jamiepine commented Apr 24, 2026

Summary

Screenshot 2026-04-23 at 7 26 52 PM

The 0.5.0 Capture release. Voicebox stops being just a voice-cloning studio and becomes a full AI voice studio: your voice goes into your computer through a global hotkey, and any agent's voice comes back out through a voice you own. Three pillars on top of one shared local LLM:

  • Dictation — global hotkey → Whisper → Qwen3 refine → synthetic paste into the focused app, with a floating on-screen pill across recording / transcribing / refining / done.
  • MCP servervoicebox.speak, voicebox.transcribe, voicebox.list_captures, voicebox.list_profiles exposed to Claude Code, Cursor, Cline, Spacebot, etc. via Streamable HTTP plus a stdio shim binary.
  • Personalities — voice profiles get an optional character prompt and two doors: a Compose button (fresh in-character utterance) and a personality: bool on /generate and voicebox.speak that rewrites text through the profile's LLM before TTS.

Everything still runs on-device. One Qwen3 instance powers refinement and personality, so there's one local LLM, not two.

What's in here

Dictation pipeline (Rust + app webview + backend)

  • Global hotkey via keytap — our own cross-platform observe-only tap (CGEventTap on macOS, SetWindowsHookEx on Windows, evdev on Linux). Lazy-spawned only when the user enables it, so the macOS Input Monitoring TCC prompt fires from an explicit gesture instead of app startup. The chord state machine (Momentary push-to-talk + Toggle hands-free + PTT→Toggle upgrade) lives in keytap's ChordMatcher; voicebox is a thin ChordEvent → Tauri event dispatcher.
  • Two modes, two chords. Push-to-talk and toggle, configurable in ChordPicker. Holding PTT then tapping Space upgrades to hands-free mid-recording without a gap. macOS defaults (right-Cmd + right-Option) deliberately avoid Cmd+Opt+I, Cmd+Opt+Esc, Cmd+Opt+Space. Windows defaults (right-Ctrl + right-Shift) route around AltGr collisions on DE/FR/ES layouts.
  • Auto-paste (tauri/src-tauri/src/{clipboard,synthetic_keys,focus_capture,keyboard_layout}.rs): snapshot focused PID at chord-start → activate (cooperative on macOS 14+) → 120ms settle → save full clipboard (every UTI/format) → write text → post Cmd+V/Ctrl+V at HID level with layout-resolved V keycode → 400ms paste-consume → conditional restore. Short-circuits if the focused app is Voicebox itself or Accessibility is untrusted.
  • Floating dictate window (Tauri webview, 420×64, transparent, always-on-top) hosts the pill. Hidden by parking off-screen + click-through to prevent stale invisible windows from stealing clicks.
  • Six-gate readiness (mic, Input Monitoring, Accessibility, hotkey toggle, Whisper model, LLM model). useDictationReadiness + DictationReadinessChecklist surface each unmet gate with a one-click action; useChordSync auto-arms the chord the moment the last gate flips green. Stops the "stuck pill" failure where the chord fires but has nowhere to land.
  • Recording state machine (useCaptureRecordingSession, 328 LOC): recording → transcribing → refining → completed → rest → hidden, with calibrated dwell/fade timers and ref-stable closures so late-arriving mutations still see the right flags.

MCP server

  • FastMCP mounted at /mcp (Streamable HTTP). app.py migrated from @app.on_event to lifespan= via compose_lifespan, so FastMCP's session manager nests inside FastAPI's startup/shutdown with correct LIFO teardown (MCP drains in-flight sessions before models unload).
  • Client identity & profile precedence: ClientIdMiddleware reads X-Voicebox-Client-Id into a ContextVar; MCPClientBinding table maps client → profile + default_personality bool; resolve_profile() picks explicit arg → per-client binding → global default. last_seen_at stamped on both /mcp (tool calls) and /speak (REST wrapper) so REST callers show up in the Settings UI.
  • Speak SSE: /events/speak broadcasts speak-start / speak-end. Rust speak_monitor.rs subscribes via reqwest streaming body (CRLF-aware parser, 45s idle timeout catches zombie streams, 500ms→30s escalating backoff) and surfaces the pill window when agent-initiated speech actually starts.
  • Stdio shim: separate voicebox-mcp binary (backend/mcp_shim/, ~18MB, built via python build_binary.py --shim) bridges stdio-only clients to the HTTP endpoint. Wired into tauri.conf.json externalBin.
  • Settings UI: MCPPage.tsx with HTTP / stdio / claude-mcp-add snippets, default-voice picker, per-client bindings table.

Personalities

  • VoiceProfile.personality (free-form text, ≤2000 chars) describes what the voice says and how, orthogonal to how it sounds (the existing preset/cloning metadata).
  • Two surfaces only:
    • ComposePOST /profiles/{id}/compose produces a fresh in-character utterance the user can edit (Dices button in FloatingGenerateBox, temp 0.9).
    • Personality rewritepersonality: true on /generate (and voicebox.speak) routes text through personality.rewrite_as_profile() before TTS, marks the row source = "personality_speak" (Wand2 toggle in FloatingGenerateBox, temp 0.3).
  • Same Qwen3 instance powers refinement and personality; no extra model.

Refinement hardening

  • Deterministic loop collapse before the LLM sees the transcript. Two passes:
    • Word-level: token repeated ≥6 times consecutively (with punctuation/casing normalised) gets dropped. Handles "URL, URL, URL, URL, URL, URL".
    • Character-level: any 2–60 char substring repeating ≥6 times in place gets dropped. Handles multi-word English loops ("thanks for watching" × 6) that the word pass misses, and CJK / Japanese loops ("謝謝觀看" × 7, "ご視聴ありがとうございました" × 6) where text.split() returns one unsplit token.
  • RefinementFlags (smart_cleanup, self_correction, preserve_technical) snapshotted per capture so re-refine is reproducible.
  • Eval harnesses: backend/tests/test_refinement_samples.py (10 transcripts, scores prompt leaks / answer leaks / loop echoes / question-mark survival / substring preservation across every bundled model size) and test_personality_samples.py. Plus test_refinement_collapse.py (17 deterministic unit tests pinning both passes against regression).

Capture persistence + UI

  • New Capture model + routes/captures.py (POST /captures upload+STT+optional refine, GET /captures list, POST /captures/{id}/refine).
  • CapturesTab.tsx (834 LOC) — two-pane settings UI: searchable list ↔ detail with a scrubbable WaveSurfer audio player, Raw/Refined transcript toggle, "Play as" voice, Refine / Copy / Delete. Listens for capture:created / capture:updated events from the dictate window so new captures appear instantly.
  • CapturesPage.tsx (607 LOC) — settings page (different from the tab): hotkey toggle, dual ChordPicker, model pickers, auto-paste / auto-refine toggles, inline Accessibility + Input Monitoring notices, live pill preview.
  • useUIStore now persists selectedProfileId via zustand/middleware.persist so reopening the app keeps the last-selected profile.

Database

New tables: Capture, MCPClientBinding, singleton CaptureSettings (chord JSON, model picks, refinement flags, default voice), singleton GenerationSettings. VoiceProfile.personality column. Generation.source column. Idempotent column-existence migrations in backend/database/migrations.py (single-user desktop, no Alembic). MCPClientBinding's legacy default_intent column is dropped on startup; SQLite < 3.35 (Ubuntu 20.04, Debian 11) falls back to leaving the unused column in place rather than crashing — SQLAlchemy only maps declared columns so the stray does no work.

i18n

app/src/i18n/locales/{en,ja,zh-CN,zh-TW}/translation.json each grow ~375 lines. CapturesTab, CapturesPage, MCPPage, ChordPicker, both gates, the readiness checklist, ProfileCard / ProfileForm, FloatingGenerateBox, GenerationPage sidebar, Sidebar — all wired through t().

Landing + docs

New /capture route (CaptureHero, CaptureSection, CapturesMockup, AgentIntegration, SupportedModels, Personalities). New overview docs: captures, dictation, mcp-server, voice-personalities. Design notes in docs/plans/MCP_SERVER.md and docs/plans/VOICE_IO.md.

Notable design decisions

  • One Qwen3, not two. Refinement and personality share the same loaded model; temperature is the per-task knob (0.3 rewrite / 0.7 respond / 0.9 compose).
  • Loop-collapse before LLM. Small models can't echo what they never see. Dual-pass (word + character) covers every script we support.
  • personality: bool over a tri-state. Earlier iterations exposed intent: "respond" | "rewrite" | "compose" on voicebox.speak. Collapsed to a boolean — respond is dropped (callers that want a reply should call an LLM), compose is a UI-only standalone button (POST /profiles/{id}/compose), rewrite is the only meaning of personality: true. MCPClientBinding.default_intent migrated to default_personality: bool (legacy column dropped when SQLite supports it).
  • Lifespan composition via compose_lifespan. AsyncExitStack enters factories in order (Voicebox startup → MCP startup) and exits LIFO (MCP teardown → _run_shutdown). MCP cancels in-flight sessions before models unload, so an agent mid-voicebox.speak at shutdown gets a clean session-cancelled error instead of crashing on "model unloaded."
  • ContextVar for client_id. Avoids plumbing the request through every service; middleware sets it, tool handlers read it.
  • Idempotent migrations, no Alembic. Single-user desktop app; column-existence checks at startup are fast and reliable. Capability checks (_supports_drop_column) guard DDL that needs a newer SQLite.
  • Extract the chord engine into keytap. Upstream rdev hasn't shipped a release since 2023-06 and its macOS 14 main-thread crash only lives on an unreleased branch. Rather than maintain a git-pinned fork, we wrote keytap from the spec: observe-only cross-platform taps, left/right modifier fidelity, clean shutdown via Drop, Sonoma-safe by design, and a ChordMatcher with both Momentary and Toggle modes. Voicebox's PTT→hands-free upgrade falls out of longest-match resolution plus Toggle sticky-end semantics — voicebox no longer carries 80+ LOC of bespoke chord state machine, and supply-chain risk drops to a regular crates.io dep with semver.

Follow-up / known limitations

Not blocking merge, but worth noting:

  • Windows synthetic paste still hardcodes VK_V. SendInput delivers wParam = VK_V to the target regardless of active layout, which is why AutoHotkey's Send "^v" works on Dvorak Windows, so empirically this is fine — but if a Windows-on-Dvorak user reports breakage, VkKeyScanExW(L'V', layout) is the parallel fix (~30 lines).
  • speak_monitor reconnects silently after the idle timeout / backoff land the pill offline. No user-facing "backend unreachable" toast; that belongs in a separate server-health surface, not this loop.

Test plan

  • Dictation, macOS — fresh install: enable hotkey, walk all six readiness gates, hold the chord in TextEdit / VS Code / Slack, transcript pastes into the field that had focus at chord-start (not wherever focus drifted).
  • Dictation, macOS, Dvorak — switch Input Source to Dvorak, dictate, paste should land (layout-aware V keycode resolved via TIS + UCKeyTranslate).
  • Dictation, macOS Sonoma+ — cooperative activation (yieldActivationToApplication:activate) brings the target app forward; paste lands in the original window even when Voicebox wasn't frontmost.
  • Dictation, Windows — same base flow; AttachThreadInput foreground-lock dance lands the paste in the original window. BOOL import from windows::core compiles against windows-rs 0.62.
  • PTT → toggle upgrade — hold PTT chord, tap Space, release PTT — recording continues, stops on next press.
  • Auto-paste denied — revoke Accessibility, dictate, transcript still lands in Captures tab; "missing permission" notice shows.
  • Clipboard race — copy text X, dictate, during the paste window Cmd+C something else in the target app → after paste, clipboard holds the newer Cmd+C content, not X. No regression on the quiet case (no outside write → X restored).
  • Loop collapse — feed a Whisper hallucination loop ("URL URL URL…" ×30 OR "thanks for watching" ×6 OR "謝謝觀看" ×7), refine with 0.6B, LLM doesn't echo it. Rhetorical "no, no, no, no, no" (5) preserved.
  • MCP HTTPclaude-mcp-add snippet from Settings → MCP, call voicebox.speak({text, profile}), pill surfaces with agent speech. last_seen_at advances on the binding row.
  • MCP stdio shim — wire stdio config in a stdio-only client, same call path works through the voicebox-mcp binary.
  • REST /speakcurl -H "X-Voicebox-Client-Id: test", binding shows up in the Settings UI as last-seen.
  • Per-client binding — bind Claude Code to "Morgan", call voicebox.speak({text}) with no profile, hear Morgan; override with profile= and hear that voice instead.
  • Personality rewrite — profile with personality, voicebox.speak({text, personality: true}), output is in character. personality: false (or unset with no binding default) is plain TTS.
  • Compose button — profile with personality, click Dices in FloatingGenerateBox, fresh in-character utterance appears in the input.
  • MCP migration — start app on a DB that has the old default_intent column, verify it's dropped (or left in place on SQLite < 3.35 with a warning) and default_personality exists with default false.
  • Lifespan shutdown — with an MCP request mid-generate, Ctrl+C the dev backend; clean shutdown (StreamableHTTP teardown logs before "Voicebox server shutting down"), no "model unloaded" crashes in logs.
  • Speak-SSE dead stream — pause/freeze the backend after connection opens; within ~45s the Rust speak_monitor logs the idle timeout and reconnects with escalating backoff.
  • i18n smoke — switch UI language to ja / zh-CN / zh-TW, walk through Captures + Captures settings + MCP + a profile with personality, no untranslated keys.
  • Capture audio player — open a capture in the detail view, WaveSurfer renders the waveform, scrubbing works, play/pause toggles.
  • Profile persistence — select a profile, restart the app, last-selected profile still selected.
  • Refinement eval harnessespytest backend/tests/test_refinement_samples.py and test_personality_samples.py produce reasonable output across bundled model sizes. pytest backend/tests/test_refinement_collapse.py backend/tests/test_client_id_middleware.py — both pass (34 tests).

Note

Cursor Bugbot is generating a summary for commit 687ab2a. Configure here.

Summary by CodeRabbit

  • New Features

    • Full Captures workflow: global dictation hotkey/pill, timed recording, auto-paste, upload, inline audio player, refine/retranscribe, play-as voice, retention and compose/rewrite “personality” support.
    • Local MCP server and REST /speak with per-client bindings and speaking-state events; LLM-backed personality compose/rewrite.
  • UI

    • Captures & MCP settings pages, chord picker, readiness checklist, accessibility/input notices, compose/persona controls, capture pill and toolbar.
  • Documentation

    • Extensive guides for dictation, captures, personalities, MCP server, migration.
  • Tests

    • New tests and evaluation scripts for personality and refinement.

jamiepine and others added 22 commits April 22, 2026 18:49
Ships the Capture release end to end. Global-hotkey dictation with
synthetic paste into the focused app on macOS and Windows, an on-screen
pill across recording / transcribing / refining, customizable push-to-
talk and toggle chords, and an accessibility-permission prompt scoped to
Settings → Captures with inline re-check feedback.

Voice profiles gain optional personalities that power compose / rewrite /
respond actions via a local Qwen3 LLM — shared with refinement, so there
is one local LLM in the app, not two.

Refinement hardened with deterministic Whisper-loop collapse before the
LLM sees the transcript, per-capture flag snapshots for re-runs, and a
ten-transcript evaluation harness across every bundled refinement size.

Version bump 0.4.5 → 0.5.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mounts FastMCP at /mcp (Streamable HTTP) so Claude Code, Cursor,
Windsurf, and the VS Code MCP extensions can call voicebox.speak,
voicebox.transcribe, voicebox.list_captures, and voicebox.list_profiles
against the running Voicebox server.

Backend
- new backend/mcp_server package (tools, middleware, profile resolve,
  pub/sub events); named mcp_server to avoid shadowing the installed mcp
  PyPI package FastMCP imports internally
- app.py migrated from @app.on_event to lifespan= so FastMCP's session
  manager cohabits with Voicebox's startup/shutdown
- new MCPClientBinding table + /mcp/bindings CRUD; ClientIdMiddleware
  reads X-Voicebox-Client-Id into a ContextVar and stamps last_seen_at
- profile resolution precedence: explicit -> per-client binding ->
  capture_settings.default_playback_voice_id
- POST /speak REST wrapper for non-MCP callers (shell, ACP, A2A)
- GET /events/speak SSE broadcasts speak-start / speak-end so the pill
  surfaces agent-initiated speech
- backend/mcp_shim proxy (plain httpx) for stdio-only MCP clients
- PyInstaller spec updates + new --shim build target (~18 MB)

Frontend
- Settings -> MCP page with HTTP / stdio / claude-mcp-add copy snippets,
  default voice picker, per-client bindings table, connection status
- useMCPBindings, useSpeakEvents hooks
- CapturePill gains 'speaking' state; DictateWindow subscribes to SSE
  and emits dictate:show so the Rust side surfaces the pill window

Native
- tauri.conf.json externalBin now includes voicebox-mcp
- show_dictate_window helper + dictate:show listener in main.rs
- (also in this commit: InputMonitoringGate UX, hotkey_monitor tweaks,
  landing footer/navbar updates, new overview docs for captures /
  dictation / mcp-server / voice-personalities)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pill window now surfaces for agent-initiated speech without main-window
involvement. Rust subscribes to /events/speak via a tokio task + reqwest
streaming body (speak_monitor.rs), shows the pill, and forwards events to
the dictate webview over Tauri's event bus. The pill plays audio via a
plain HTMLAudioElement and emits dictate:hide when playback ends. The
pill stays hidden through the ~1 s generation wait and only surfaces when
audio actually starts, with the counter armed at that moment.

Fixes a shared-dict mutation in mcp_server/events.publish() that caused
the second subscriber (Rust speak_monitor) to receive `event: message`
instead of named speak-start/speak-end frames. Also teaches the speak_monitor
parser to handle CRLF framing (sse-starlette default). Main-window
AudioPlayer now skips autoplay for source in {mcp, rest} to avoid
double-play when both windows are alive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stops the "stuck pill" failure where pressing the chord with missing
STT/LLM models triggers a recording that has nowhere to land. The
hotkey now stays disarmed until every gate (models downloaded, Input
Monitoring + Accessibility granted) is green; the empty-state checklist
in CapturesTab surfaces each unmet gate with a one-click action and
auto-arms the chord once everything turns green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Collapse intent tri-state (respond/rewrite/compose) to `personality: bool` on /generate, /speak, and voicebox.speak. Drop respond entirely; keep compose as a standalone button via /profiles/{id}/compose. Remove /rewrite, /respond, and /speak profile endpoints.
- FloatingGenerateBox: Wand2 persona toggle + Dices compose button appear when the selected profile has a personality. ProfileCard badges Wand2 alongside the effects Sparkles.
- MCP bindings: default_intent column → default_personality: bool. Migration drops the legacy column.
- i18n: en / ja / zh-CN / zh-TW translation files filled out and wired through the capture, server, and profile UI.

```ts
voicebox.speak({
  text: "Deploy complete.",
  profile: "Morgan",
  personality: true, // rewrite through the profile's personality LLM
});
```
BOOL moved from Win32::Foundation to windows::core in 0.62.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
macOS apps match Cmd+V against the layout-translated character via NSMenu
key equivalents, so posting kVK_ANSI_V (= 9, the QWERTY V position) on
Dvorak produces Cmd+. and never triggers Paste. New keyboard_layout
module resolves the active layout's V keycode via
TISCopyCurrentKeyboardLayoutInputSource + UCKeyTranslate, caches it in an
AtomicU16, and refreshes on kTISNotifySelectedKeyboardInputSourceChanged.
All TIS calls run on the main thread (init from Tauri setup; observer
callback delivered to the main runloop); synthetic_keys::send_paste
reads the cached value once per paste. Falls back to kVK_ANSI_V when
resolution fails or the active input source carries no Unicode key
layout data.

Windows is intentionally left on hardcoded VK_V — SendInput delivers
WM_KEYDOWN with wParam = VK_V to the target regardless of the active
layout, which is why `Send "^v"` works for AutoHotkey on Dvorak Windows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… 14+

macOS 14 deprecated NSRunningApplication.activateWithOptions: in favour
of a cooperative-activation pattern: the caller first yields activation
rights to the target, then the target activate()s against the tightened
Sonoma foreground rules. Without the yield, activate() on 14+ sometimes
silently fails or only bounces the dock icon — the exact "paste lands in
the wrong app" symptom we were previously one API break away from.

activate_pid now discovers the 14+ selector via respondsToSelector: and
branches: on 14+ it yieldActivationToApplication:'s from
NSRunningApplication.current then calls -activate on the target; on
11–13 it stays on -activateWithOptions: (still the only option). Both
branches propagate the BOOL return — if activation is refused we error
out before clobbering the clipboard instead of silently proceeding.

The respondsToSelector: result is cached in a OnceLock so the probe
isn't repeated on every paste.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… failure

Two bugs in paste_final_text' clipboard handling:

1. Restore was unconditional. If the user ⌘C'd in the target app during
   the 400 ms paste-consume window — or a clipboard history tool (Paste,
   Pastebot, Maccy) or Universal Clipboard sync snapshotted our staged
   text — the blind restore overwrote their newer content with the
   pre-paste snapshot, silently losing user data.

2. send_paste' errors were propagated with ? before the restore, so a
   CGEventPost / SendInput failure left the user's clipboard stuck on
   the transcript.

Fix folds both into one pattern: capture the post-write change count,
re-read it after paste-consume, restore only when they match (plus treat
a change-count read failure as "unknown, don't overwrite"). Isolate
send_paste's error so the restore runs regardless of paste success, then
propagate the paste error after.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Upstream Narsil/rdev has shipped no release since 2023-06 (crates.io
still serves 0.5.3), so the Sonoma main-thread fix we depend on — PR
#147, applied at hotkey_monitor.rs:184 — is only reachable via a git
pin. A pin to a third-party repo breaks the build whenever the remote
force-pushes, renames, or is taken down, and Cargo does not durably
cache git-dep archives the way it does crates.io tarballs.

Forking to jamiepine/rdev at the same SHA removes that failure mode
without changing crate behavior and gives us a place to cherry-pick
future OS-compatibility fixes on our own timeline. The SHA was verified
to exist on the fork before re-pinning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two reliability gaps in the /events/speak subscriber:

1. resp.chunk().await had no idle timeout. A backend that accepts the
   TCP connection but stops producing frames (deadlocked SSE endpoint,
   zombie process) would block the task forever without reconnecting.
   The pill window would never surface for agent-initiated speech and
   there would be nothing to log. Backend emits a `:ping` heartbeat
   every 15 s, so 45 s without any data is now treated as a dead
   stream — the task errors out and the reconnect loop takes over.

2. Flat 2 s backoff escalates nowhere. Logs fill with reconnect lines
   when the backend is down for minutes, and a backend that accepts +
   immediately closes connections (no data) spins the loop tightly.
   Backoff now escalates 500 ms → 30 s on unproductive rounds and
   resets only when at least one frame arrives (the connection was
   genuinely productive, not just accepted).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The word-level pass catches single-word Whisper loops ("URL URL URL…")
but misses two common hallucination patterns the PR had to claim as
"edge cases":

1. Multi-word English loops — "thanks for watching thanks for watching…"
   × 6 sails through because no two consecutive tokens are identical
   after text.split().
2. CJK loops — "謝謝觀看" × 7 sails through because text.split() returns
   a single unsplit token for the whole loop (no whitespace between
   characters).

Add a character-level second pass: a non-greedy regex finds any 2–60
char substring that repeats min_run+ times immediately after itself and
strips the run. The 2-char floor keeps emphasised single-letter runs
("wooooooow") intact. The 60-char ceiling covers every observed
Whisper tail hallucination ("Please like and subscribe to my
channel.", "Subtitles by the Amara.org community") while staying short
enough that coincidental long-phrase repetition in legitimate speech
doesn't hit the threshold. Whitespace normalisation only runs when the
pass actually stripped something, so untouched transcripts keep their
original spacing.

New test_refinement_collapse.py gives the pre-processor its first
deterministic unit-test coverage: 17 tests pinning the word-level
legacy behaviour plus the new multi-word English / CJK / Japanese /
emphasis-preservation cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SQLite gained ALTER TABLE … DROP COLUMN in 3.35 (Mar 2021). Production
PyInstaller builds bundle Python 3.12 which links to SQLite 3.40+ so
that path is always safe, but a dev running the backend directly on
Ubuntu 20.04 (3.31) or Debian 11 (3.34) would crash on first startup
trying to drop the legacy default_intent column.

Add _supports_drop_column(engine) — returns True on non-SQLite
dialects (Postgres / MySQL have supported DROP COLUMN for decades) and
gates on the runtime sqlite_version for SQLite. When unsupported, log a
warning and leave the unused column in place: SQLAlchemy only maps
declared columns, so a stray default_intent column does no reads or
writes and can't interfere with runtime behaviour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…g models

The inline lifespan ran _run_shutdown inside the MCP context, so the
TTS / Whisper / LLM models were unloaded *before* FastMCP's __aexit__
got a chance to cancel its in-flight session tasks. Any MCP request
mid-generate at shutdown time would crash on "model unloaded" instead
of receiving a clean session-cancelled error.

Rewire via compose_lifespan (which was already defined in
mcp_server.server for exactly this purpose but never used):
AsyncExitStack enters factories in order and exits in LIFO, so
MCP teardown fires first — cancelling sessions — and _run_shutdown
runs after nothing is holding the models. Smoke test shows the
log order flipped as expected:

  Ready
  StreamableHTTP session manager started
  ... running ...
  StreamableHTTP session manager shutting down   ← was last, now first
  Voicebox server shutting down...               ← was first, now last

As a side benefit, _run_shutdown is now paired with _run_startup via
try/finally inside voicebox_lifespan, so a partial startup (models
half-loaded, MCP __aenter__ fails) still unloads whatever was loaded
instead of leaking it to process exit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
POST /speak is a REST wrapper around voicebox.speak for agents that
don't talk MCP (shell scripts, ACP, A2A). It reads X-Voicebox-Client-Id
and uses it for the same per-client profile resolution + default
personality lookup the MCP tool does (speak.py:39-64), so its callers
are first-class clients — but the ClientIdMiddleware only stamped
last_seen_at on /mcp* paths. REST speak callers showed up as "never
seen" in Settings → MCP despite actively acting on their bindings.

Widen the stamp predicate to an explicit ("/mcp", "/speak") prefix
list, and require a path boundary on match so future routes named
/mcpfoo or /speakers don't silently inherit the stamp via the prefix.
New test_client_id_middleware.py pins the scope with 17 parametrised
cases (both the allowed set and the overlap cases that must not match).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the placeholder fake-waveform + play button in CapturesTab's
audio card with a real CaptureInlinePlayer (wavesurfer.js). The player
renders the actual waveform, lets users scrub through the clip, and
shows a proper current/total timestamp pair in place of the
duration-only label.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wrap useUIStore in zustand/middleware's persist under the key
voicebox-ui. partialize only selectedProfileId so volatile UI state
(dialog open flags, form drafts, engine/voice pickers, sidebar) stays
in-memory as before — but reopening the app no longer loses whichever
profile the user was last working with.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds Captures/dictation and personality-driven generation, an MCP server (HTTP + stdio shim) with per-client bindings, local Qwen3 LLM backends, capture CRUD/refinement pipelines, extensive Tauri native integrations for hotkeys/permissions/clipboard/focus, many frontend UI pages/components, API surface and DB migrations, and bumps version to 0.5.0.

Changes

Cohort / File(s) Summary
Version & Manifests
\.bumpversion.cfg, package.json, app/package.json, landing/package.json, tauri/package.json
Bumped project and package versions to 0.5.0.
Changelog & Docs
CHANGELOG.md, README.md, docs/..., docs/content/docs/...
Large documentation additions and rewrites for Captures, Dictation, Voice Personalities, MCP server, roadmap and overview content.
Frontend entry & routing
app/src/App.tsx, app/src/router.tsx
Route-aware App split (DictateWindow/MainApp) and new /captures route plus /settings/captures and /settings/mcp.
Captures UI & components
app/src/components/...
app/src/components/CapturesTab/CapturesTab.tsx, CapturePill.tsx, CaptureInlinePlayer.tsx, CapturesPage.tsx, DictateWindow.tsx, landing/src/components/CapturesMockup.tsx
New Captures feature: UI, DictateWindow, pill/player components, upload/refine/retranscribe/play-as flows and mockups.
Recording/session hooks
app/src/lib/hooks/...
useCaptureRecordingSession.ts, useAudioRecording.ts
New capture recording session hook, refined audio recording behavior (no timesliced chunks; optional max duration).
Hotkeys, chords & permissions
app/src/components/ChordPicker/..., useChordSync.ts, tauri/src-tauri/src/hotkey_monitor.rs, input_monitoring.rs, AccessibilityGate.tsx, InputMonitoringGate.tsx
Chord picker UI, chord synchronization hook, Rust hotkey monitor, macOS input-monitoring & accessibility permission hooks and notices.
Personality & generation UI
app/src/components/Generation/..., ProfileForm.tsx, ProfileCard.tsx, removed GenerationForm.tsx
Adds personality field to profiles, compose action and persona toggle in generation UI; removes legacy GenerationForm.
API client & types
app/src/lib/api/client.ts, app/src/lib/api/types.ts
New API client methods and types: captures CRUD, composeWithPersonality, settings, MCP bindings, generation/personality flags and capture-related types.
Settings, stores & settings UI
app/src/lib/hooks/useSettings.ts, app/src/stores/*, ServerTab/*, MCPPage.tsx, ServerTab/CapturesPage.tsx
Server-backed capture/generation settings hooks with optimistic updates; MCP bindings UI; store changes (persist config, personality draft) and new settings pages.
MCP server & shim
backend/mcp_server/*, backend/mcp_shim/*, backend/build_binary.py, backend/mcp_server/README.md
FastMCP-based MCP server mounted at /mcp, tools (speak/transcribe/list), ClientId middleware, in-memory events, stdio shim and packaging/build support.
Backend models, migrations & routes
backend/database/models.py, migrations.py, backend/models.py, backend/routes/*, backend/services/*
DB additions (captures, settings, MCP bindings, profile.personality, generation.source), migrations, new REST routes (/captures, /speak, /settings, /llm, /mcp/bindings, /events/speak), capture/refine services, personality and LLM services.
LLM backend & generation
backend/backends/qwen_llm_backend.py, backend/backends/__init__.py, backend/services/generation.py
New Qwen3 LLM backends (PyTorch & MLX), backend registry integration, personality rewrite/compose integration and new sync TTS generation helper.
Tauri native integrations
tauri/src-tauri/src/*.rs, Cargo.toml, capabilities/default.json
Native modules: accessibility, clipboard snapshot/restore, focus capture/restore, hotkey monitor, key-codes/layout, input-monitoring; capability and Cargo updates.
Utilities & i18n
app/src/lib/utils/keyCodes.ts, format.ts, app/src/i18n/locales/*/translation.json
Keycode helpers, date formatting util, large i18n additions for captures/personality/MCP and related UI strings.
Tests & tooling
backend/tests/*, scripts/setup-dev-sidecar.js, backend/requirements.txt
New tests (middleware, refinement/personality), dev sidecar placeholder creation updated, backend deps added (fastmcp, sse-starlette) and PyInstaller/shim build support.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Hotkey as Hotkey Monitor<br/>(Tauri/Rust)
    participant Dictate as DictateWindow<br/>(React)
    participant Backend as Backend<br/>(Python)
    participant STT as Whisper<br/>(STT)
    participant LLM as Qwen3<br/>(LLM)
    participant Clipboard as Clipboard<br/>(Tauri)

    User->>Hotkey: Hold/Toggle chord
    Hotkey->>Dictate: dictate:start (with FocusSnapshot)
    Dictate->>Dictate: Show CapturePill (recording)
    Dictate->>Dictate: Record audio
    User->>Hotkey: Release chord
    Hotkey->>Dictate: dictate:stop
    Dictate->>Backend: POST /captures (upload)
    Backend->>STT: Transcribe audio
    STT-->>Backend: Raw transcript
    alt auto_refine true
        Backend->>LLM: refine/rewrite transcript
        LLM-->>Backend: refined transcript
    end
    Backend-->>Dictate: Capture response (id, text, flags)
    alt allow_auto_paste
        Dictate->>Clipboard: write/paste final text
    end
    Dictate->>Dictate: CapturePill -> completed -> rest
Loading
sequenceDiagram
    actor Agent
    participant MCP as MCP Server (FastMCP)
    participant Backend as Backend (Python)
    participant LLM as Qwen3 (LLM)
    participant TTS as TTS Engine
    participant Frontend as DictateWindow (React)
    participant SSE as SSE (/events/speak)

    Agent->>MCP: voicebox.speak(profile?, personality?)
    MCP->>MCP: resolve_profile (explicit → binding → default)
    MCP->>Backend: POST /generate (personality=true)
    Backend->>LLM: rewrite/compose if personality
    LLM-->>Backend: in-character text
    Backend->>TTS: generate audio
    TTS-->>Backend: WAV
    Backend->>Backend: persist generation, publish speak-start
    MCP-->>Agent: generation_id + status URL
    Frontend->>SSE: subscribe
    SSE-->>Frontend: speak-start
    Frontend->>Backend: GET /audio/{id}
    Backend-->>Frontend: stream audio
    Frontend->>Frontend: play audio, show pill
    Backend->>SSE: publish speak-end
    SSE-->>Frontend: speak-end -> cleanup
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

"I hopped along the hotkey trail,
Captured words in pill and tale,
Profiles donned a playful voice,
Agents spoke — the users rejoice,
On-device, the little rabbit sings: v0.5!" 🐇✨

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/capture

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 687ab2a. Configure here.

// would lose the user's first new keypress.
setCaptured((prevCaptured) => {
const candidate = sortChordKeys(Array.from(next));
return candidate.length >= prevCaptured.length ? candidate : prevCaptured;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChordPicker prevents saving shorter replacement chords

Medium Severity

The peak-tracking logic in setCaptured uses candidate.length >= prevCaptured.length, which means captured can only ever grow or stay the same length within a picker session. If the user opens the picker with a 3-key chord as initialKeys and wants to replace it with a 2-key chord, the new candidate will always be shorter than prevCaptured (which starts as the 3-key initialKeys), so captured will never update to the shorter chord. The user can only save chords that are at least as long as the existing one.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 687ab2a. Configure here.

}
}
prevActive.current = current;
}, [activeTasks, queryClient, downloadByModel]);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unstable Map reference in useEffect dependency array

Low Severity

downloadByModel is a new Map() created on every render, then included in the useEffect dependency array at line 113. Since a new Map is referentially different each time, the effect fires on every render — repeatedly calling invalidateQueries even when no downloads have actually changed. This causes unnecessary query invalidations in a tight loop with the 1-second polling interval.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 687ab2a. Configure here.

The six-gate checklist only rendered in the CapturesTab empty state, so
a user already on the settings page had no single surface showing which
gate was red — the inline InputMonitoringNotice covered one, the model
pickers covered another, and Accessibility was only hinted at by the
auto-paste toggle. Mirror the same component into the right sidebar of
the settings page so every gate (STT model, LLM model, Input Monitoring,
Accessibility, plus the hotkey toggle in the main column) is always
visible while the user configures dictation.

New compact prop on DictationReadinessChecklist drops the centered
header and empty-state max-width so it fits the 280 px sidebar next to
the existing About / Differences blocks. Callers in compact mode own
the heading — CapturesPage reuses the existing captures.readiness.title
key (present in en / ja / zh-CN / zh-TW already) as an h3 matching the
sibling sidebar sections.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
app/src/components/ServerTab/GenerationPage.tsx (1)

64-95: ⚠️ Potential issue | 🔴 Critical

Slider writes are unthrottled and will spam the settings endpoint on every pointer move during drag.

Dragging maxChunkChars or crossfadeMs triggers onValueChange dozens of times per second, each calling update()mutation.mutate()apiClient.updateGenerationSettings(patch) with no debounce, throttle, or request coalescing. This produces a stream of unthrottled HTTP PATCH requests. The optimistic onMutate update masks latency but does not prevent the request storm.

Additionally, if any mid-drag request fails and onError reverts the cache to a previous state, a subsequent successful request could overwrite that reversion, leaving the persisted value out of sync with the UI. Add debounce (e.g., 500–1000ms) to the update call or use useDebouncedCallback to coalesce slider changes during drag.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/ServerTab/GenerationPage.tsx` around lines 64 - 95, The
Slider onValueChange handlers call update(...) which immediately invokes
mutation.mutate (via apiClient.updateGenerationSettings) on every pointer move;
debounce or coalesce those calls so the server is only patched after pauses
(e.g., 500–1000ms) and only send the latest value. Modify the update function
used by the Slider components (referenced as update and the Slider id props
"maxChunkChars" and "crossfadeMs") to buffer rapid changes and call
mutation.mutate once after a debounce delay (or use useDebouncedCallback),
cancel/replace previous pending debounced calls, and ensure the mutation’s
onMutate/onError handlers correctly reconcile optimistic state with the final
settled value to avoid reverting to stale values.
app/src/components/VoiceProfiles/ProfileForm.tsx (1)

825-835: ⚠️ Potential issue | 🟡 Minor

Discard reset is missing personality (and avatarFile).

The other three form.reset(...) calls for this form (lines 333, 347, 372) all include personality: ''. The discard-draft reset below omits it, so after clicking Discard a draft's personality value stays in the form state (and the rendered textarea) rather than being cleared. Same for avatarFile. Align with the other resets for consistency.

🔧 Proposed fix
                     form.reset({
                       name: '',
                       description: '',
                       language: 'en',
+                      personality: '',
                       sampleFile: undefined,
                       referenceText: '',
+                      avatarFile: undefined,
                     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/VoiceProfiles/ProfileForm.tsx` around lines 825 - 835, The
discard handler currently resets the draft via setProfileFormDraft(null) and
calls form.reset(...) but omits the personality and avatarFile fields, leaving
stale values; update the form.reset call inside the onClick to include
personality: '' and avatarFile: undefined (matching the other resets) so the
textarea and avatar state are cleared, keeping setProfileFormDraft and
setSampleMode('record') as they are.
🧹 Nitpick comments (41)
landing/src/components/Personalities.tsx (1)

175-235: Optional: respect prefers-reduced-motion for the 4.5s auto-cycle.

The section auto-cycles continuously via setInterval, which can be uncomfortable for motion-sensitive visitors. Consider pausing the rotation (and/or skipping the framer-motion entrance) when window.matchMedia('(prefers-reduced-motion: reduce)').matches.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@landing/src/components/Personalities.tsx` around lines 175 - 235, The
auto-rotation in Personalities uses setInterval inside the useEffect which
ignores users' prefers-reduced-motion; update the effect to detect
window.matchMedia('(prefers-reduced-motion: reduce)') and if it matches, do not
start the interval (or clear it immediately), and instead keep idx fixed; also
pass a prop like reducedMotion or prefersReducedMotion into ModeDemo (and any
child components that use framer-motion) so they can skip entrance/animation
when true. Ensure you reference Personalities, its useEffect that creates iv,
and ModeDemo (add a boolean prop) when making the changes.
app/src/components/ServerSettings/ModelManagement.tsx (2)

86-91: Model descriptions remain hardcoded English while surrounding UI is i18n'd.

These strings (like the pre-existing TTS/Whisper entries) bypass t() so they won't be translated in ja / zh-CN / zh-TW. Consistent with the existing pattern here, so not a regression for this PR — flagging for a future refactor to move MODEL_DESCRIPTIONS behind an i18n key namespace (e.g. models.descriptions.qwen3-0.6b).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/ServerSettings/ModelManagement.tsx` around lines 86 - 91,
MODEL_DESCRIPTIONS contains hardcoded English strings (e.g., keys 'qwen3-0.6b',
'qwen3-1.7b', 'qwen3-4b') that bypass i18n; update the code so descriptions are
retrieved via the translation function t() from an i18n namespace (suggested
keys like models.descriptions.qwen3-0.6b etc.) instead of inline text, and
refactor any use-sites in ModelManagement.tsx to call
t(`models.descriptions.${modelKey}`) (or fallback to the English string) so
descriptions are translatable.

420-420: Prefix qwen3- is fine today but narrow — consider explicit allowlist.

If a future model is named e.g. qwen3-tts-* or qwen3-embed-*, it will be silently categorized as a language model. Since the three qwen3 LLM variants are enumerated in MODEL_DESCRIPTIONS just above, you could tighten this by matching against that known set, or by introducing a model.kind === 'llm' field returned by the backend.

♻️ Optional: filter against known LLM ids
-  const llmModels = modelStatus?.models.filter((m) => m.model_name.startsWith('qwen3-')) ?? [];
+  const LLM_MODEL_NAMES = new Set(['qwen3-0.6b', 'qwen3-1.7b', 'qwen3-4b']);
+  const llmModels = modelStatus?.models.filter((m) => LLM_MODEL_NAMES.has(m.model_name)) ?? [];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/ServerSettings/ModelManagement.tsx` at line 420, The
current llmModels filter uses a broad prefix check (llmModels =
modelStatus?.models.filter((m) => m.model_name.startsWith('qwen3-'))), which can
misclassify future qwen3 variants; change the filter to match only the explicit,
known LLM ids from MODEL_DESCRIPTIONS (or another local allowlist) by comparing
model.model_name against that set (e.g. build a Set from
Object.keys(MODEL_DESCRIPTIONS) or the enumerated ids) or, if available, use a
backend-provided discriminator like model.kind === 'llm' instead; update the
llmModels assignment to use that allowlist-based check against
modelStatus.models so only the intended qwen3 LLMs are included.
backend/mcp_shim/__init__.py (1)

6-6: Docstring hardcodes 127.0.0.1:17493.

If the backend port is configurable elsewhere (env var / CLI flag in __main__.py), consider softening the docstring to "the Voicebox server's /mcp/ endpoint" to avoid it drifting from reality later. Minor doc nit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/mcp_shim/__init__.py` at line 6, The module-level docstring in
backend/mcp_shim/__init__.py currently hardcodes "http://127.0.0.1:17493/mcp/";
update that docstring to a generic description such as "the Voicebox server's
/mcp/ endpoint" (or otherwise reference the configurable backend host/port) so
it doesn't drift if the address is configurable elsewhere (e.g., __main__.py or
env vars); edit the module docstring text in __init__.py to remove the literal
IP:port and replace it with the softer phrasing.
scripts/setup-dev-sidecar.js (1)

369-379: Optional: derive sidecar list from tauri.conf.json instead of duplicating.

The in-code comment already acknowledges this list must be kept in sync with externalBin in tauri.conf.json. Reading and parsing that file would eliminate the drift risk the comment warns about. Not a blocker; fine to keep as-is for now.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/setup-dev-sidecar.js` around lines 369 - 379, Replace the hard-coded
SIDECAR_BASE_NAMES with a dynamically derived list by reading and parsing
tauri.conf.json: in main() (which currently calls getTargetTriple() and loops
over SIDECAR_BASE_NAMES to call createPlaceholderBinary), load and parse
tauri.conf.json, extract the externalBin keys/names from the configuration (or
from build.externalBin depending on schema), map/normalize them to the same
baseName format used today, and iterate that array instead of
SIDECAR_BASE_NAMES; keep getTargetTriple() and createPlaceholderBinary usages
unchanged and add minimal error handling if tauri.conf.json is missing or
malformed.
app/src/router.tsx (1)

116-116: Stale comment: captures is no longer a prototype.

AudioTab has already been removed from the router and sidebar in this PR, so the "prototype — will replace AudioTab once the new flow is ready" wording is misleading. Consider updating or removing the comment.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/router.tsx` at line 116, The inline comment above the captures route
is stale and misleading because AudioTab has already been removed; update or
delete the comment near the captures route declaration in router.tsx so it no
longer says "prototype — will replace AudioTab once the new flow is ready"
(either remove the prototype note or replace it with a brief current description
of the captures route/component and its purpose, referencing the captures
route/component name and AudioTab for clarity).
backend/routes/events.py (1)

35-40: Optional: use the builtin TimeoutError alias (Ruff UP041).

Since Python 3.11, asyncio.TimeoutError is an alias for the builtin TimeoutError. Small cleanup:

-                try:
-                    event = await asyncio.wait_for(queue.get(), timeout=15.0)
-                except asyncio.TimeoutError:
+                try:
+                    event = await asyncio.wait_for(queue.get(), timeout=15.0)
+                except TimeoutError:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/events.py` around lines 35 - 40, The except block catches
asyncio.TimeoutError but Python 3.11 provides a builtin alias, so replace the
exception reference with the builtin TimeoutError in the async generator that
awaits asyncio.wait_for(queue.get(), timeout=15.0); update the except clause
from "except asyncio.TimeoutError:" to "except TimeoutError:" (no other behavior
changes) to satisfy the Ruff UP041 suggestion while keeping the heartbeat yield
logic intact.
landing/src/app/page.tsx (1)

84-88: Nit: hardcoded text-white inside a themed paragraph.

The surrounding subtitle uses text-muted-foreground; text-white on the bolded span will look slightly off if the landing site ever gets a light-mode variant. Consider text-foreground (or a dedicated token) instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@landing/src/app/page.tsx` around lines 84 - 88, The bold span uses a
hardcoded class "text-white" inside a paragraph that uses
"text-muted-foreground", which will break theming; replace the "text-white"
class on the <b> element with a theme-aware token like "text-foreground" (or a
dedicated token) so the emphasized text inherits the correct foreground color in
both light and dark modes and keep the surrounding "text-muted-foreground"
intact.
backend/routes/generations.py (1)

11-14: Move logger below the imports to satisfy E402.

Ruff flags from ..services import … at line 14 as a module-level import after non-import code (the logger = … at line 11). Swap the ordering so all imports sit at the top of the module.

🔧 Proposed fix
-logger = logging.getLogger(__name__)
-
 from .. import models
 from ..services import history, personality, profiles, tts
 from ..database import Generation as DBGeneration, VoiceProfile as DBVoiceProfile, get_db
 from ..services.generation import run_generation
 from ..services.task_queue import cancel_generation as cancel_generation_job, enqueue_generation
 from ..utils.tasks import get_task_manager
+
+logger = logging.getLogger(__name__)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/generations.py` around lines 11 - 14, The logger assignment is
placed before module imports causing an E402 import-order error; move the line
"logger = logging.getLogger(__name__)" so it appears after the import block
(after the "from .. import models" and "from ..services import history,
personality, profiles, tts" lines) so all imports are at the top of the module
and then initialize logger.
app/src/lib/hooks/useSettings.ts (1)

18-99: Clean optimistic-update pattern; consider extracting the shared shape.

Both useCaptureSettings and useGenerationSettings follow identical optimistic-patch + rollback + server-reconcile logic — the only variations are the query key, fetcher, and mutator. If a third settings surface appears (and the MCP/captures trajectory suggests one will), a tiny createSettingsHook({ key, fetch, patch }) factory would eliminate the duplicated branches. Not blocking — fine to defer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/lib/hooks/useSettings.ts` around lines 18 - 99, Both hooks
useCaptureSettings and useGenerationSettings duplicate the same
optimistic-update/rollback/reconcile logic; extract that shared pattern into a
factory like createSettingsHook({ key, queryFn, mutationFn }) which encapsulates
the useQuery + useMutation setup (including cancelQueries, getQueryData,
setQueryData, onMutate/onError/onSettled behavior) and returns { settings,
isLoading, update }; then replace useCaptureSettings to call createSettingsHook
with CAPTURE_SETTINGS_KEY, apiClient.getCaptureSettings,
apiClient.updateCaptureSettings and replace useGenerationSettings to call it
with GENERATION_SETTINGS_KEY, apiClient.getGenerationSettings,
apiClient.updateGenerationSettings so future setting hooks reuse the same
implementation.
backend/mcp_server/server.py (1)

13-13: Prefer collections.abc.Callable (Ruff UP035).

Per static analysis and PEP 585 guidance, import Callable from collections.abc rather than typing.

-from typing import Callable
+from collections.abc import Callable
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/mcp_server/server.py` at line 13, The import currently uses Callable
from typing; update the import to use collections.abc.Callable to satisfy Ruff
UP035/PEP585 guidance—replace the line "from typing import Callable" with an
import of Callable from collections.abc so any references to Callable in this
module continue to work while avoiding the deprecated typing import.
backend/mcp_server/resolve.py (1)

55-57: with_db() name implies a context manager but it isn't.

Returning next(get_db()) produces a bare Session that the caller must remember to .close() — the with_ prefix suggests with with_db() as db: usage which will fail. Consider renaming to new_db_session() (or similar) or converting to an actual @contextmanager.

♻️ Option: context manager form
+from contextlib import contextmanager
+
-def with_db() -> Session:
-    """Utility for tool handlers that aren't managed by FastAPI's Depends."""
-    return next(get_db())
+@contextmanager
+def with_db():
+    """Utility for tool handlers that aren't managed by FastAPI's Depends."""
+    db = next(get_db())
+    try:
+        yield db
+    finally:
+        db.close()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/mcp_server/resolve.py` around lines 55 - 57, The function with_db()
is misleading because it returns a raw Session from next(get_db()) but is named
like a context manager; change it so callers don't leak sessions by either (a)
convert with_db into a real context manager (e.g., decorate with
contextlib.contextmanager or return a contextmanager object that yields the
Session and ensures session.close() after use) using get_db() to obtain and
finally close the session, or (b) rename the function to new_db_session() (or
acquire_db_session()) and document that it returns a plain Session that the
caller must close; update all call sites accordingly (references: with_db,
get_db).
backend/services/settings.py (2)

21-38: _get_or_create_* has a narrow race on first creation.

Two concurrent requests hitting the endpoint before the singleton row exists can both miss the first() check and both db.add(row) → commit; the second loses to a PRIMARY KEY conflict. Under the single-user desktop assumption this is essentially impossible, but a single-row INSERT OR IGNORE (or a try/except IntegrityError that re-queries) would make it bulletproof and cheap.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/services/settings.py` around lines 21 - 38, The
_get_or_create_capture_row and _get_or_create_generation_row functions have a
race when two requests concurrently hit the first() check; fix by performing an
atomic insert-or-ignore or by catching IntegrityError on commit and re-querying:
attempt to create the singleton row (DBCaptureSettings/DBGenerationSettings with
id=SINGLETON_ID), commit in a try block, on IntegrityError rollback and run the
original query again to return the existing row; ensure you import
IntegrityError from sqlalchemy.exc and apply the same pattern to both functions
so the code is race-proof.

46-53: value is not None filter prevents clearing nullable columns.

Callers cannot set default_playback_voice_id (or any other nullable field) back to NULL through this API since None values are unconditionally skipped. If clearing isn't a desired operation, document the contract; otherwise use a sentinel (MISSING) so explicit None flows through:

♻️ Option — sentinel-based patch filtering
-def update_capture_settings(db: Session, patch: dict[str, Any]) -> DBCaptureSettings:
-    row = _get_or_create_capture_row(db)
-    for key, value in patch.items():
-        if value is not None and hasattr(row, key):
-            setattr(row, key, value)
+_MISSING = object()
+
+def update_capture_settings(db: Session, patch: dict[str, Any]) -> DBCaptureSettings:
+    row = _get_or_create_capture_row(db)
+    for key, value in patch.items():
+        if value is _MISSING or not hasattr(row, key):
+            continue
+        setattr(row, key, value)
     db.commit()
     db.refresh(row)
     return row

…with callers (Pydantic models) producing _MISSING for unset fields and None for explicit clears.

Also applies to: 61-68

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/services/settings.py` around lines 46 - 53, The current
update_capture_settings function skips any keys whose value is None, which
prevents callers from clearing nullable columns (e.g.,
default_playback_voice_id); change the logic to treat None as a valid explicit
value by either (a) using a sentinel (e.g., _MISSING) such that callers send
_MISSING for absent fields and None for explicit clears and then only skip when
value is _MISSING, or (b) stop filtering on value and instead check key presence
(if key in patch) and hasattr(row, key) then setattr(row, key, value). Apply the
same sentinel/presence-based fix to the analogous code block referenced at lines
61-68 so explicit None values are persisted rather than ignored.
backend/database/migrations.py (1)

269-274: Redundant local import sqlite3.

_supports_drop_column already imports sqlite3; a top-of-file import would remove the duplicate local imports here and at line 283. Not a bug — purely tidiness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/database/migrations.py` around lines 269 - 274, The local import
sqlite3 inside the _supports_drop_column context is redundant; remove the inline
import statements (both occurrences in this function) and instead add a single
top-level import sqlite3 at the module scope, so functions like
_supports_drop_column and the logger.warning call can use the module-level
sqlite3.sqlite_version without duplicate local imports.
app/src/components/CapturesTab/CaptureInlinePlayer.tsx (2)

90-109: Minor: in-flight load on a previous audioUrl can land after a new load starts.

When audioUrl changes rapidly, the prior ws.load(url).catch(...) may reject with an abort error after the new load has already reset state, briefly flashing the old URL's error. Low severity since WaveSurfer aborts pending loads on load()/destroy(), but worth guarding with an isCurrent flag if it proves visible.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/CapturesTab/CaptureInlinePlayer.tsx` around lines 90 -
109, When audioUrl changes fast a previous ws.load(audioUrl) can reject after a
newer load has started causing stale error/state flashes; in the useEffect that
references wavesurferRef and ws.load, introduce a local boolean flag (e.g.
isCurrent) set true at start and set false in the effect cleanup, and only call
setError, setIsLoading, setCurrentTime, setDuration, setIsPlaying (and any other
state updates) inside ws.load.then/catch if isCurrent is still true; ensure you
also guard the try/catch reset block (ws.pause(), ws.seekTo(0)) if needed so
they only affect the current load lifecycle for the current audioUrl.

33-57: Waveform colors are captured once and won't follow theme changes.

cssHsla reads --foreground/--accent at instantiation; switching theme (light ↔ dark) at runtime leaves the existing WaveSurfer instance with stale colors until the component remounts. If runtime theme switching is supported, subscribe to the theme change and call ws.setOptions({ waveColor, progressColor }), or key the component on the theme.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/CapturesTab/CaptureInlinePlayer.tsx` around lines 33 - 57,
The WaveSurfer instance in CaptureInlinePlayer captures CSS variables once
inside the useEffect (cssHsla) so it won't update when the app theme changes;
modify the hook to react to theme changes by either adding the theme variable as
a dependency (or subscribing to your theme provider) and when theme changes call
ws.setOptions({ waveColor: cssHsla('--foreground', 0.25), progressColor:
cssHsla('--accent', 1) }) or recreate the WaveSurfer instance, and ensure you
reference the existing ws instance created in the useEffect so colors are
updated at runtime.
backend/routes/llm.py (1)

27-40: Private-method access (backend._is_model_cached) leaks implementation detail.

Reaching into backend._is_model_cached(...) from a route handler couples the HTTP layer to a protected member of the LLM backend. Either promote it to a public is_model_cached(size) on the LLMBackend protocol (so other backends implement it consistently) or fold the "cached?" check into a higher-level service call — e.g. llm.ensure_loaded_or_schedule_download(model_size) returning ("loaded" | "scheduled", task_name) — so the route doesn't need to reason about cache layout at all.

While you're here, the except Exception at line 71-72 also trips Ruff's B904 and spills the backend exception string to the client; chaining with from e preserves the traceback for logs, and the client-facing detail should probably be a generic "LLM generation failed" with the real exception logged server-side:

♻️ Smaller inline tweak for B904 + message hygiene
     except Exception as e:
-        raise HTTPException(status_code=500, detail=str(e))
+        logger.exception("LLM generation failed")
+        raise HTTPException(status_code=500, detail="LLM generation failed") from e

(Requires a module-level logger = logging.getLogger(__name__).)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/llm.py` around lines 27 - 40, The route is reaching into a
protected backend method backend._is_model_cached and also catches Exception
then exposes the raw error; replace this by adding a public
is_model_cached(self, size) to the LLMBackend protocol (or implement a
higher-level llm.ensure_loaded_or_schedule_download(model_size) that returns
("loaded" | "scheduled", task_name)) and update the route to call that public
API instead of backend._is_model_cached; move download logic into a backend/llm
service method (e.g., download_llm_background or
ensure_loaded_or_schedule_download) so the route only starts the background task
via task_manager.start_download and create_background_task with the service
coroutine, and change the exception handling inside the background task to log
the real exception with logger.exception(...) or re-raise using "from e" while
returning/sending a generic client-facing message like "LLM generation failed"
to avoid leaking internals (update
task_manager.error_download(progress_model_name, ...) to receive only sanitized
message while logging the original).
backend/routes/speak.py (1)

66-77: In-function import hints at a circular-import workaround — consider refactoring.

Importing generate_speech inside the handler suggests a circular dependency between routes/speak.py and routes/generations.py. If that's the case, the cleaner fix is to extract the generation orchestration (generate_speech) into a services/ module that both route modules depend on, so both REST surfaces and the MCP tool share one canonical call site and the handler can import it at module top.

Not blocking — the inline import works — but it's a latent source of ordering bugs and complicates the test surface.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/speak.py` around lines 66 - 77, The inline import of
generate_speech inside the handler indicates a circular-import workaround
between routes/speak.py and routes/generations.py; refactor by moving the
orchestration function (generate_speech) into a new services module (e.g.,
services/generation_service or services/speech_service) that exposes the same
function signature (accepting models.GenerationRequest and db) so both
routes/speak.py and routes/generations.py can import it at module top; update
the imports in both route modules to import generate_speech from the new
services module and keep the call site using models.GenerationRequest(...)
unchanged to preserve behavior and tests.
backend/app.py (1)

214-214: application parameter is unused.

_run_startup takes application: FastAPI but never references it. Drop the parameter (and update the call site on line 83) for clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app.py` at line 214, The _run_startup function currently declares an
unused parameter application: FastAPI; remove that parameter from the
_run_startup signature and update any callers to call _run_startup() without
arguments (specifically the place that invoked _run_startup previously using the
application variable). Ensure the function definition matches the new
no-argument call and run tests / linter to confirm no remaining references to
the removed parameter.
backend/tests/test_personality_samples.py (1)

1-23: Filename starts with test_ but this is a manual evaluator, not a pytest test.

Pytest will collect it during test runs. It currently has no top-level side effects and no test_* functions, so collection is harmless today, but a future reader may mistake it for CI coverage of the personality service. Consider renaming (e.g., eval_personality_samples.py) or moving it out of backend/tests/.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/test_personality_samples.py` around lines 1 - 23, The file is
named with a pytest-discovered prefix but is an interactive evaluator, so rename
or relocate it to avoid accidental test collection; specifically, change
backend/tests/test_personality_samples.py to a non-test name like
eval_personality_samples.py (or move it out of backend/tests/) and update any
README/usage comments or scripts that reference the original filename so
invocations (e.g., the top-level docstring examples) still work; ensure no
import paths or tooling expect the old test_ prefix before committing.
app/src/components/DictateWindow/DictateWindow.tsx (1)

190-194: source.onerror silently hopes EventSource recovers — consider a bounded retry cap.

The handler is intentionally empty so EventSource can auto-reconnect on transient drops. If the backend pod is actually gone (generation row deleted, server restart mid-flight), the browser will reconnect forever and the pill sits on "speaking" until dictate:speak-end rescues it. Worth adding either a soft timeout (e.g., dismiss if no completed/failed after N seconds) or a retry counter, so a missed speak-end can't leave the pill stuck permanently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/DictateWindow/DictateWindow.tsx` around lines 190 - 194,
The EventSource onerror handler currently does nothing (source.onerror), letting
the browser reconnect indefinitely and potentially leaving the UI stuck; add a
bounded retry/timeout: in the component/function that creates the EventSource
(DictateWindow where source is created), track a retry counter and/or start a
soft timeout when opening the EventSource, increment the counter in
source.onerror and if it exceeds N (e.g., 3) or the timeout elapses without
receiving a terminal event ('completed'/'failed' or the existing
dictate:speak-end), programmatically dismiss the pill by dispatching the same
action used for speak-end (or calling the existing handler that ends speaking),
clear timers and close source; also ensure the 'completed'/'failed' message
handlers and the dictate:speak-end flow clear the timer and reset the retry
counter so normal reconnections don't trigger the fallback.
app/src/components/AccessibilityGate/AccessibilityGate.tsx (1)

105-107: <Trans> path component renders as an unstyled <span>.

components={{ path: <span /> }} means the <path>System Settings → …</path> markers in the translations add no visual affordance (bold, monospace, color) — they just become inert spans. If the intent is to make the settings breadcrumb visually distinct (common in i18n'd instructional copy), apply a class; if there's no intent, drop the wrapper and simplify the translations.

📝 Example
-            <Trans i18nKey="captures.permissions.accessibility.body" components={{ path: <span /> }} />
+            <Trans
+              i18nKey="captures.permissions.accessibility.body"
+              components={{ path: <span className="font-medium text-foreground" /> }}
+            />
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/AccessibilityGate/AccessibilityGate.tsx` around lines 105
- 107, The <Trans> usage in AccessibilityGate.tsx (i18nKey
"captures.permissions.accessibility.body") passes components={{ path: <span />
}} which renders unstyled spans for the "path" markers; update the components
prop to either remove the wrapper and adjust the translation to plain text, or
replace the empty span with a styled element (e.g., add a clear visual
affordance via className such as monospace/bold/color) so the settings
breadcrumb is visually distinct; ensure the change targets the <Trans> call in
AccessibilityGate.tsx and the "path" component only.
CHANGELOG.md (1)

62-64: Add shell tag to code fence (MD040).

Same markdownlint nit as the other docs — specify the language for the Claude Code one-liner block.

📝 Proposed fix
-```
+```shell
 claude mcp add voicebox --transport http --url http://127.0.0.1:17493/mcp --header "X-Voicebox-Client-Id: claude-code"
 ```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG.md` around lines 62 - 64, The code fence around the one-line Claude
Code command is missing a language tag (MD040); update the fenced block that
contains "claude mcp add voicebox --transport http --url
http://127.0.0.1:17493/mcp --header \"X-Voicebox-Client-Id: claude-code\"" to
include the shell tag (i.e., change ``` to ```shell) so the block is fenced as a
shell snippet.
backend/mcp_server/context.py (2)

51-52: Redundant __init__ override.

BaseHTTPMiddleware.__init__ already accepts (app, dispatch=None). Overriding just to call super().__init__(app) adds noise without behavior change — drop it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/mcp_server/context.py` around lines 51 - 52, Remove the redundant
__init__ override from the middleware class in context.py: delete the
__init__(self, app: ASGIApp) -> None: super().__init__(app) method so the class
inherits BaseHTTPMiddleware.__init__(app, dispatch=None) unchanged; this removes
noise without changing behavior and relies on the base class's constructor
(identify the class by its use of BaseHTTPMiddleware and method name __init__).

54-101: Sync DB work runs on the event loop in every request.

_stamp_last_seen does a SQLAlchemy SELECT + INSERT/UPDATE + COMMIT synchronously right before returning the response, so every /mcp/* and /speak request blocks the async event loop on SQLite I/O. For a chatty agent (batched voicebox.speak calls) this serializes requests behind the stamp commit and adds tail latency to SSE streams. Since stamping is already best-effort (the broad except Exception swallows everything), fire-and-forget into a worker thread is a clean fit.

⚡ Proposed fix — dispatch stamping off the event loop
+import asyncio
@@
         if client_id and _is_stamped_path(request.url.path):
-            _stamp_last_seen(client_id)
+            # Best-effort; don't block the event loop on SQLite commit.
+            asyncio.create_task(asyncio.to_thread(_stamp_last_seen, client_id))
         return response
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/mcp_server/context.py` around lines 54 - 101, dispatch currently
calls the synchronous _stamp_last_seen on the request path, blocking the async
event loop; change dispatch to schedule stamping off the event loop (e.g., use
asyncio.get_running_loop().run_in_executor or asyncio.to_thread /
loop.run_in_executor) so the DB work runs in a worker thread and is
fire-and-forget. Move or wrap the DB-heavy logic in _stamp_last_seen (or create
a new helper like _stamp_last_seen_blocking) so it contains the imports,
query/insert/commit/rollback/close and retains the existing broad try/except
behavior, and have dispatch call asyncio.create_task or loop.run_in_executor to
invoke that blocking helper when client_id and
_is_stamped_path(request.url.path) are true; ensure dispatch does not await the
task and still resets current_client_id as before.
backend/mcp_server/README.md (1)

41-41: Add language identifiers to fenced code blocks (MD040).

markdownlint flags the claude mcp add, npx @modelcontextprotocol/inspector``, curl, and package-tree code blocks. Tag them (`shell` for the three commands, `text` for the tree) to silence the lint and get syntax highlighting in the MDX-rendered docs.

Also applies to: 68-68, 80-80, 89-89

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/mcp_server/README.md` at line 41, The README has multiple fenced code
blocks missing language identifiers causing MD040 warnings; update the fenced
blocks containing the commands "claude mcp add", "npx
`@modelcontextprotocol/inspector`", and the "curl" example to use the shell
language tag (```shell) and update the package-tree block (the ASCII tree) to
use the text language tag (```text); apply the same change to the other flagged
occurrences referenced (the other blocks at the same locations) so all command
blocks are tagged shell and the tree block is tagged text.
app/src/components/CapturesTab/DictationReadinessChecklist.tsx (1)

94-113: downloadByModel in the effect deps runs the effect on every render.

downloadByModel is constructed as a fresh Map on each render, so including it in the dependency array effectively makes the effect run on every render rather than only when activeTasks changes. The inner set-diff still guards against spurious invalidations so this isn't a correctness bug, but drop downloadByModel from deps and recompute the set inside the effect — that keeps the ref-identity semantics clean and makes intent obvious.

♻️ Proposed fix
-  const prevActive = useRef<Set<string>>(new Set());
-  useEffect(() => {
-    const current = new Set(downloadByModel.keys());
-    for (const name of prevActive.current) {
-      if (!current.has(name)) {
-        queryClient.invalidateQueries({ queryKey: ['capture-readiness'] });
-        queryClient.invalidateQueries({ queryKey: ['modelStatus'] });
-        break;
-      }
-    }
-    prevActive.current = current;
-  }, [activeTasks, queryClient, downloadByModel]);
+  const prevActive = useRef<Set<string>>(new Set());
+  useEffect(() => {
+    const current = new Set<string>();
+    for (const dl of activeTasks?.downloads ?? []) {
+      if (dl.status === 'downloading') current.add(dl.model_name);
+    }
+    for (const name of prevActive.current) {
+      if (!current.has(name)) {
+        queryClient.invalidateQueries({ queryKey: ['capture-readiness'] });
+        queryClient.invalidateQueries({ queryKey: ['modelStatus'] });
+        break;
+      }
+    }
+    prevActive.current = current;
+  }, [activeTasks, queryClient]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/CapturesTab/DictationReadinessChecklist.tsx` around lines
94 - 113, downloadByModel is rebuilt each render causing the useEffect
dependency array to trigger every time; remove downloadByModel from the deps and
instead compute a Set of model names inside the effect. Specifically, in the
useEffect that references prevActive/current, use activeTasks (and queryClient)
in the dependency array only, construct const current = new
Set((activeTasks?.downloads ?? []).filter(dl => dl.status ===
'downloading').map(dl => dl.model_name)) inside the effect, keep the same diff
logic that invalidates via queryClient.invalidateQueries(['capture-readiness'])
and ['modelStatus'] when a name disappears, and then assign prevActive.current =
current at the end.
backend/routes/mcp_bindings.py (1)

59-59: Replace deprecated datetime.utcnow() with timezone-aware alternative.

datetime.utcnow() is deprecated as of Python 3.12 and scheduled for removal; it returns a naïve datetime which loses the UTC contract at the type boundary. The same pattern appears in backend/mcp_server/context.py at line 93, so both should be updated together.

🕒 Proposed replacement
-from datetime import datetime
+from datetime import datetime, timezone
@@
-    row.updated_at = datetime.utcnow()
+    row.updated_at = datetime.now(timezone.utc)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/mcp_bindings.py` at line 59, Replace the naïve UTC timestamp
assignment using datetime.utcnow() with a timezone-aware UTC datetime (e.g.,
datetime.now(timezone.utc)); update the import to bring in timezone (or use
datetime.now with datetime.timezone) and set row.updated_at = timezone-aware
datetime in the module where row.updated_at is assigned, and make the same
change to the analogous timestamp assignment in the MCP server context code (the
datetime assignment at the other location) so both places use timezone-aware UTC
datetimes.
app/src/components/ChordPicker/ChordPicker.tsx (1)

62-72: Nit: local t shadows the translation function.

Inside the effect, const t = window.setTimeout(...) shadows const { t } = useTranslation() from line 51. It's harmless today (the effect body doesn't call t(...)), but the shadow is a footgun for future edits. Consider renaming to timeoutId.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/ChordPicker/ChordPicker.tsx` around lines 62 - 72, The
effect in ChordPicker.tsx shadows the translation function t (from const { t } =
useTranslation()) by declaring const t = window.setTimeout(...) inside
useEffect; rename that local to a non-conflicting name (e.g., timeoutId) in the
useEffect closure where setTimeout is assigned and cleared (the code that calls
captureRef.current?.focus() and window.clearTimeout), so the translation
function t remains unshadowed for future edits.
app/src/components/CapturesTab/CapturesTab.tsx (1)

116-117: Dead fileInputRef: declared but never triggered.

fileInputRef (line 116) is attached to a hidden <input> (line 322) but no code calls fileInputRef.current?.click() — only uploadInputRef is triggered (via handleUploadClick). The two inputs share the same accept and change handler, so the second is dead UI. Either remove it or wire the intended code path.

🔧 Proposed fix
-  const fileInputRef = useRef<HTMLInputElement>(null);
   const uploadInputRef = useRef<HTMLInputElement>(null);
...
-      <input
-        ref={fileInputRef}
-        type="file"
-        accept={CAPTURE_AUDIO_MIME}
-        onChange={(e) => handleUploadFile(e, 'file')}
-        className="hidden"
-      />

Also applies to: 321-327

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/CapturesTab/CapturesTab.tsx` around lines 116 - 117, The
dead ref fileInputRef is never used (only uploadInputRef is clicked), so either
remove the redundant hidden input and fileInputRef declaration, or wire the
intended click path: update the upload trigger (e.g., the button or handler that
currently calls uploadInputRef.current?.click()) to call
fileInputRef.current?.click() (or both) and ensure the shared accept and
onChange handlers (the change handler function and the input props) remain
attached to the active input; also remove fileInputRef and its unused hidden
<input> if you choose the removal option to avoid unused refs.
backend/database/models.py (1)

275-275: Use JSON column type for refinement_flags to match CaptureSettings and eliminate manual serialization.

refinement_flags is stored as Text and requires manual json.loads() / json.dumps() in callers, while CaptureSettings.chord_push_to_talk_keys uses SQLAlchemy's JSON column type. Switching to JSON removes manual serialization boilerplate and makes the intent self-evident.

If adopted, update callers in backend/services/captures.py:

  • Line 36: Change json.loads(row.refinement_flags) to row.refinement_flags (already deserialized by SQLAlchemy)
  • Line 185: Change json.dumps(flags.to_dict()) to flags.to_dict() (JSON column auto-serializes)
Proposed schema change
-    refinement_flags = Column(Text, nullable=True)  # JSON blob
+    refinement_flags = Column(JSON, nullable=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/database/models.py` at line 275, Change the refinement_flags column
from Text to SQLAlchemy JSON so SQLAlchemy handles (de)serialization: update the
Column definition named refinement_flags in models.py to use JSON(nullable=True)
matching CaptureSettings.chord_push_to_talk_keys; then remove manual
json.loads/json.dumps usage in backend/services/captures.py by replacing
json.loads(row.refinement_flags) with just row.refinement_flags and replacing
json.dumps(flags.to_dict()) with flags.to_dict(), ensuring callers expect native
Python dicts.
backend/tests/test_refinement_samples.py (2)

1-25: This is an eval harness but will be collected by pytest.

Because the filename matches test_*.py, pytest backend/tests will import the module (running sys.path.insert(0, str(REPO_ROOT))) and report zero tests collected from it — harmless but potentially confusing in CI output. The docstring already says "this is an interactive evaluation harness, not a pass/fail unit test".

Consider either renaming (eval_refinement_samples.py), moving out of backend/tests/, or adding collect_ignore = ["test_refinement_samples.py"] to backend/tests/conftest.py so regular test runs don't touch it.

Also applies to: 451-452

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/test_refinement_samples.py` around lines 1 - 25, The test
harness file test_refinement_samples.py is being picked up by pytest because it
matches the test_*.py pattern; either rename or move the file (e.g.,
eval_refinement_samples.py or into a non-tests directory) so it is not
auto-collected, or add a pytest collection exclusion by appending the filename
to collect_ignore in backend/tests/conftest.py (use the collect_ignore
list/variable name) so pytest will skip importing test_refinement_samples.py
during regular test runs.

37-37: Iterable should come from collections.abc (Ruff UP035).

Preferred on Python 3.9+.

♻️ Fix
-from typing import Iterable, Optional
+from collections.abc import Iterable
+from typing import Optional
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/test_refinement_samples.py` at line 37, Replace the
typing-sourced Iterable with the collections.abc one to satisfy Ruff UP035:
update the import so that Iterable is imported from collections.abc while
keeping Optional from typing (i.e. change the current "from typing import
Iterable, Optional" usage to "from collections.abc import Iterable" and "from
typing import Optional" or a combined import statement that reflects that
separation). This targets the import statement referencing Iterable and
Optional.
backend/routes/captures.py (1)

33-36: Chunked read offers no streaming benefit — simplify.

file.read(UPLOAD_CHUNK_SIZE) in a loop followed by b"".join(chunks) is functionally identical to await file.read(), since audio_bytes is held entirely in memory before being passed to create_capture. If you eventually want a real streaming path, it would need to pass an iterator through to the service, not pre-materialize.

♻️ Simplification
-    chunks = []
-    while chunk := await file.read(UPLOAD_CHUNK_SIZE):
-        chunks.append(chunk)
-    audio_bytes = b"".join(chunks)
+    audio_bytes = await file.read()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/captures.py` around lines 33 - 36, The chunked loop reading
into chunks and joining them is unnecessary because audio_bytes is fully
materialized; replace the while chunk := await file.read(UPLOAD_CHUNK_SIZE) /
chunks append / b"".join(chunks) logic with a single await file.read() to
produce audio_bytes and pass that to create_capture (or, if you want true
streaming later, change create_capture to accept and consume an async iterator
instead). Target symbols: file.read, UPLOAD_CHUNK_SIZE, chunks, audio_bytes,
create_capture.
backend/mcp_shim/__main__.py (1)

171-186: One transport hiccup kills the whole MCP session.

If _handle_request raises for any reason (connection reset mid-stream, malformed SSE from the server, httpx.ReadError, etc.), the outer except Exception returns 1 and the client loses its MCP server until the parent process respawns it. Consider catching and logging per-request so a single bad request doesn't take down long-lived sessions.

♻️ Proposed resilience
-                await _handle_request(
-                    client, url, line, forward_headers, session_id
-                )
+                try:
+                    await _handle_request(
+                        client, url, line, forward_headers, session_id
+                    )
+                except Exception as exc:
+                    _err(f"request failed, continuing: {exc!r}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/mcp_shim/__main__.py` around lines 171 - 186, The main loop currently
exits the whole process if any call to _handle_request raises; wrap the
per-request call in its own try/except so transient errors don’t kill the
session: inside the while True loop (around await _handle_request(client, url,
line, forward_headers, session_id)) catch Exception as exc, call _err with a
descriptive message including exc, then continue the loop; keep the existing
outer except for KeyboardInterrupt/SystemExit and the check that
_read_stdin_line() returning None exits with 0.
backend/mcp_server/tools.py (2)

133-141: Check base64 length before decoding.

b64.b64decode allocates the full decoded buffer before the length check on line 138, so a 270 MB base64 payload materializes ~200 MB before being rejected. Gate on the encoded length first (4 base64 chars ≈ 3 raw bytes) to reject cheaply.

♻️ Proposed fix
+        if len(audio_base64) > MAX_TRANSCRIBE_BYTES // 3 * 4 + 4:
+            raise ValueError(
+                f"Audio exceeds {MAX_TRANSCRIBE_BYTES // (1024 * 1024)} MB limit."
+            )
         try:
             raw = b64.b64decode(audio_base64, validate=True)
         except Exception as exc:
             raise ValueError(f"Invalid audio_base64: {exc}") from exc
-        if len(raw) > MAX_TRANSCRIBE_BYTES:
-            raise ValueError(
-                f"Audio exceeds {MAX_TRANSCRIBE_BYTES // (1024 * 1024)} MB limit."
-            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/mcp_server/tools.py` around lines 133 - 141, The code decodes
audio_base64 into raw before checking size, allowing huge memory allocation;
change the logic in the Base64 branch to first compute an estimated decoded
length from len(audio_base64) (use the 4:3 ratio and account for padding) and
compare it to MAX_TRANSCRIBE_BYTES, raising ValueError if too large, and only
then call b64.b64decode(audio_base64, validate=True) to produce raw (and keep
the existing exception handling and subsequent length check/cleanup around raw).

290-296: Replace whisper._is_model_cached() call with the public is_model_cached() API.

The code directly accesses a private backend method. Use the public is_model_cached(hf_repo) function from backend.backends.base instead, similar to how backend/routes/captures.py does it. To refactor, map model_size to the corresponding Hugging Face repository ID (following the pattern in backend implementations), then call the public API.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/mcp_server/tools.py` around lines 290 - 296, Current check uses the
private whisper._is_model_cached(model_size); change it to use the public
is_model_cached(hf_repo) API from backend.backends.base by mapping the local
model_size string to the corresponding Hugging Face repo ID (same
mapping/pattern used in backend implementations and backend/routes/captures.py),
then call is_model_cached(mapped_hf_repo) in place of whisper._is_model_cached;
keep the existing conditions using whisper.is_loaded() and whisper.model_size
but replace the private check with the mapped public API call (use the same
mapping logic found near other backend whisper model checks to derive the HF
repo ID).
backend/services/captures.py (1)

90-111: Minor style cleanups flagged by Ruff.

  • Line 90: _WHISPER_NATIVE_FORMATS is a function-local constant; Ruff N806 expects lowercase in function scope. Either hoist it to module scope next to VALID_SOURCES (more useful — it's conceptually a constant and won't be re-allocated on every call), or rename it.
  • Lines 108–111: try/except OSError: pass can be contextlib.suppress(OSError) for readability (SIM105).
♻️ Proposed diff
 VALID_SOURCES = {"dictation", "recording", "file"}
+_WHISPER_NATIVE_FORMATS = (".wav", ".mp3", ".flac", ".ogg")
@@
-    _WHISPER_NATIVE_FORMATS = (".wav", ".mp3", ".flac", ".ogg")
-
     if audio is None or sr is None:
@@
-        audio_path = config.get_captures_dir() / f"{capture_id}.wav"
-        sf.write(str(audio_path), audio, sr, format="WAV")
-        try:
-            raw_path.unlink()
-        except OSError:
-            pass
+        audio_path = config.get_captures_dir() / f"{capture_id}.wav"
+        sf.write(str(audio_path), audio, sr, format="WAV")
+        with contextlib.suppress(OSError):
+            raw_path.unlink()

Add import contextlib at the top.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/services/captures.py` around lines 90 - 111, Move the function-local
constant _WHISPER_NATIVE_FORMATS out to module scope (e.g., next to
VALID_SOURCES) or rename it to lowercase to satisfy Ruff N806, and replace the
try/except OSError: pass block that attempts raw_path.unlink() with
contextlib.suppress(OSError): raw_path.unlink() while adding an import for
contextlib at the top of the module so captures.py uses the suppress pattern for
readability (ensure you update references if you relocate the constant).
app/src/lib/api/types.ts (1)

193-201: Field name inconsistency between refine/retranscribe requests.

CaptureRefineRequest.model_size vs CaptureRetranscribeRequest.model — I understand one targets the LLM and the other Whisper, so the semantics differ, but consumers now have to remember which endpoint uses which key. Consider stt_model on retranscribe to match the backend's retranscribe_capture(stt_model=...) parameter and to parallel llm_model on the response type. Purely a naming nit; skip if the wire format is already locked.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/lib/api/types.ts` around lines 193 - 201, Rename the inconsistent
field on the retranscribe request to match backend parameter naming: change the
CaptureRetranscribeRequest property currently named model to stt_model (type
WhisperModelSize) and update all usages to pass/read stt_model (e.g., where
retranscribe_capture(stt_model=...) is called or the request is constructed);
this parallels the llm model naming used for refine responses and avoids
consumers having to remember different keys. Ensure type references and any
serialization/deserialization that expect model are updated to stt_model.
backend/backends/qwen_llm_backend.py (1)

238-246: Consider clearing MLX's Metal cache on unload.

del self.model; self.model = None drops the Python ref but MLX's Metal allocator may hold the GPU buffers until a subsequent allocation reuses them. For explicit memory recovery (important when users toggle between 0.6B / 1.7B / 4B or unload before switching to a TTS engine), mx.metal.clear_cache() is the complement to empty_device_cache(self.device) in the PyTorch path above.

mlx python clear metal GPU cache after deleting model
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/qwen_llm_backend.py` around lines 238 - 246, The
unload_model method currently deletes Python references to the MLX
model/tokenizer but doesn't free MLX's Metal GPU buffers; after setting
self.model and self.tokenizer to None in unload_model, call
mx.metal.clear_cache() (or the equivalent MLX API) to explicitly clear Metal
allocator caches so memory is released when switching models or before TTS
usage; update unload_model to import/use mx.metal.clear_cache() and run it after
the deletions and before logging "Qwen3 (MLX) unloaded".

Comment on lines +581 to +613
<div className="inline-flex">
<Button
variant="outline"
size="sm"
onClick={() => handlePlayAs()}
disabled={!playAsVoice || playAsMutation.isPending}
className={cn(
'gap-2 rounded-r-none border-r-0 pr-3 pl-2 transition-colors',
playbackState !== 'idle' &&
'border-accent/50 text-foreground bg-accent/10 hover:bg-accent/15',
)}
>
{playbackState === 'generating' ? (
<>
<Loader2 className="h-3.5 w-3.5 animate-spin" />
{t('captures.actions.playAsGenerating')}
</>
) : playbackState === 'playing' ? (
<>
<Square className="h-3 w-3 fill-current" />
{playAsVoice
? t('captures.actions.playAsStop', { name: playAsVoice.name })
: t('captures.actions.playAsStopFallback')}
</>
) : (
<>
<Volume2 className="h-3.5 w-3.5" />
{playAsVoice
? t('captures.actions.playAs', { name: playAsVoice.name })
: t('captures.actions.playAsFallback')}
</>
)}
</Button>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -nP --type=ts -C5 'interface PlayerState|playerStore|setAudioWithAutoPlay|clearAudio|stopAudio' -g 'app/src/stores/**'

Repository: jamiepine/voicebox

Length of output: 1987


🏁 Script executed:

#!/bin/bash
# Get the full playerStore to see all methods
cat app/src/stores/playerStore.ts

Repository: jamiepine/voicebox

Length of output: 2613


🏁 Script executed:

#!/bin/bash
# Get handlePlayAs function and surrounding code from CapturesTab
sed -n '290,320p' app/src/components/CapturesTab/CapturesTab.tsx

Repository: jamiepine/voicebox

Length of output: 998


🏁 Script executed:

#!/bin/bash
# Check if there are any stop/clear methods elsewhere in playerStore
rg -n 'clear|stop|pause' app/src/stores/playerStore.ts -i

Repository: jamiepine/voicebox

Length of output: 255


🏁 Script executed:

#!/bin/bash
# Check playbackState state variable initialization and transitions
sed -n '1,250p' app/src/components/CapturesTab/CapturesTab.tsx | grep -n "playbackState\|setPlaybackState" | head -20

Repository: jamiepine/voicebox

Length of output: 143


🏁 Script executed:

#!/bin/bash
# Check the complete button onClick and surrounding context around line 585
sed -n '580,610p' app/src/components/CapturesTab/CapturesTab.tsx

Repository: jamiepine/voicebox

Length of output: 1554


"Stop · {voice}" button re-triggers generation instead of stopping playback.

When playbackState === 'playing', the button renders a Square icon and "Stop · {voice}" label, but its onClick={() => handlePlayAs()} always calls handlePlayAs(), which kicks off another playAsMutation. There is no branch that clears the player or pauses audio when the state is 'playing', so clicking the stop-labeled button will start a brand-new TTS generation rather than halting the current one.

Wire a stop path by checking playbackState in the onClick handler. The usePlayerStore provides a reset() method that clears audioUrl and resets playback state:

🔧 Proposed fix
              <Button
                variant="outline"
                size="sm"
-               onClick={() => handlePlayAs()}
+               onClick={() => {
+                 if (playbackState === 'playing') {
+                   usePlayerStore.setState({ audioUrl: null, isPlaying: false });
+                   setPlaybackState('idle');
+                   return;
+                 }
+                 handlePlayAs();
+               }}
                disabled={!playAsVoice || playAsMutation.isPending}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/CapturesTab/CapturesTab.tsx` around lines 581 - 613, The
stop button currently always calls handlePlayAs() which re-triggers
playAsMutation; update the Button onClick to branch on playbackState so that
when playbackState === 'playing' it calls the player reset method from
usePlayerStore (reset()) to stop/clear audio, and otherwise calls handlePlayAs()
to start generation; ensure the handler references playbackState, reset(), and
playAsMutation consistently so clicking the "Stop · {voice}" button halts
playback instead of kicking off a new TTS generation.

Comment thread app/src/stores/uiStore.ts Outdated
Comment thread backend/app.py
Comment on lines +72 to +79
from .mcp_server.server import build_mcp_server, compose_lifespan
from .mcp_server.context import ClientIdMiddleware

# Build the MCP app up-front so we can wire its lifespan into FastAPI's —
# FastMCP's Streamable HTTP transport only works if its session manager
# runs inside the parent ASGI lifespan.
mcp = build_mcp_server()
mcp_app = mcp.http_app(path="/", transport="http")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

FastMCP http_app path transport parameter Streamable HTTP mount FastAPI lifespan

💡 Result:

FastMCP is a Python library for building MCP (Model Context Protocol) servers. The http_app method creates a Starlette ASGI application for Streamable HTTP transport (default for HTTP). Key elements: - path parameter: Specifies the internal MCP endpoint path within the ASGI app (default /mcp). Example: mcp_app = mcp.http_app(path="/api/mcp/"). When mounting, set path="/" if mounting at /mcp to avoid double-prefixing (e.g., /mcp/mcp). - transport parameter: Primarily used in mcp.run(transport="http") for Streamable HTTP. http_app uses Streamable HTTP by default; legacy SSE via transport="sse". - Streamable HTTP mount in FastAPI: Mount the ASGI app with app.mount("/mcp", mcp_app). Critical: Pass the MCP app's lifespan to FastAPI for session manager initialization. Example: from fastapi import FastAPI from fastmcp import FastMCP from fastmcp.utilities.lifespan import combine_lifespans mcp = FastMCP("MyServer") mcp_app = mcp.http_app(path="/") # Root since mounting at /mcp app = FastAPI(lifespan=mcp_app.lifespan) # Or combine_lifespans(your_lifespan, mcp_app.lifespan) app.mount("/mcp", mcp_app) - FastAPI lifespan: FastAPI uses lifespan for startup/shutdown. MCP's http_app provides a lifespan context manager (initializes session manager). Use combine_lifespans for multiple. Missing lifespan causes "Task group not initialized" errors. Run with uvicorn app:app --host 0.0.0.0 --port 8000. MCP endpoint at http://localhost:8000/mcp. Official docs emphasize: mount prefix in base_url, internal path separate; always handle lifespan when mounting.

Citations:


🏁 Script executed:

# First, find dependency files to check FastMCP version
fd -e 'requirements.txt' -o -e 'pyproject.toml' -o -e 'poetry.lock' -o -e 'setup.py' | head -20

Repository: jamiepine/voicebox

Length of output: 179


🏁 Script executed:

# Look for FastMCP version specification
find . -type f \( -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" \) | xargs grep -l "fastmcp" 2>/dev/null || echo "No matches found"

Repository: jamiepine/voicebox

Length of output: 89


🏁 Script executed:

# Check the backend/app.py file around the context to understand the full usage
head -90 backend/app.py | tail -25

Repository: jamiepine/voicebox

Length of output: 1084


🏁 Script executed:

# Check FastMCP version in requirements.txt
cat backend/requirements.txt | grep -i fastmcp

Repository: jamiepine/voicebox

Length of output: 80


🏁 Script executed:

# See the full create_app function to understand lifespan composition
cat -n backend/app.py | sed -n '50,120p'

Repository: jamiepine/voicebox

Length of output: 3437


🏁 Script executed:

# Check what compose_lifespan does
cat -n backend/mcp_server/server.py | head -100

Repository: jamiepine/voicebox

Length of output: 3276


🌐 Web query:

FastMCP 3.0 http_app method signature parameters

💡 Result:

FastMCP 3.0 (current docs/main) http_app parameters are: http_app(self, path: str = "/mcp", *, middleware: list[starlette.middleware.base.Middleware] | None = None, event_store: EventStore | None = None, retry_interval: int | None = None, auth: AuthProvider | None = None, json_response: bool = False, stateless_http: bool = False, debug: bool = False, routes: list[starlette.routing.BaseRoute] | None = None) What they mean (from the StreamableHTTP app factory docs/source, which backs the HTTP ASGI app created by http_app): - path: the URL path used for the StreamableHTTP/MCP endpoint. - middleware: extra Starlette middleware to add. - event_store: optional event store enabling resumability/polling. - retry_interval: milliseconds between reconnect attempts for polling/resumability (only meaningful with an event_store). - auth: optional auth provider. - json_response: use JSON responses instead of SSE-style streaming for StreamableHTTP. - stateless_http: whether to create a new transport per request. - debug: enable debug mode. - routes: additional custom Starlette routes to include with the MCP server. Note: If you just call mcp.http_app without parameters, the HTTP deployment docs show it returns an ASGI app, and you can optionally pass path=... (and middleware=...).

Citations:


🏁 Script executed:

# Check if there's any other reference to http_app in the codebase to understand usage
rg "http_app" -B 2 -A 2

Repository: jamiepine/voicebox

Length of output: 2052


🏁 Script executed:

# Look for any comments or documentation about the transport parameter in the codebase
rg "transport.*http|http.*transport" -i

Repository: jamiepine/voicebox

Length of output: 2936


Remove the invalid transport="http" parameter from mcp.http_app() — it's not a valid parameter in FastMCP 3.x.

FastMCP 3.x http_app(path="/", ...) doesn't accept a transport parameter; it creates a Streamable HTTP ASGI app by default. The transport kwarg only applies to mcp.run() for CLI deployments. The correct call is mcp.http_app(path="/"). The subsequent mount pattern and lifespan composition (mcp_app.router.lifespan_context) are correct per FastMCP 3.x integration docs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app.py` around lines 72 - 79, The call to mcp.http_app(...) is using
an invalid transport="http" kwarg; remove that parameter and call
mcp.http_app(path="/") instead. Locate the build_mcp_server() instantiation and
the mcp_app = mcp.http_app(...) line, update the call to omit transport so
mcp_app is created via mcp.http_app(path="/"), and keep the existing lifespan
wiring (mcp_app.router.lifespan_context) unchanged.

Comment on lines +104 to +118
with model_load_progress(progress_model_name, is_cached):
logger.info("Loading Qwen3 %s on %s...", model_size, self.device)
with force_offline_if_cached(is_cached, progress_model_name):
self.tokenizer = AutoTokenizer.from_pretrained(repo)
dtype = torch.float16 if self.device in ("cuda", "mps") else torch.float32
self.model = AutoModelForCausalLM.from_pretrained(
repo,
torch_dtype=dtype,
)
self.model.to(self.device)
self.model.eval()

self._current_model_size = model_size
self.model_size = model_size
logger.info("Qwen3 %s loaded successfully", model_size)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm installed transformers version to decide whether dtype= is preferred.
fd -t f 'pyproject.toml|requirements*.txt|setup.py|setup.cfg' backend | xargs rg -nP 'transformers\b'

Repository: jamiepine/voicebox

Length of output: 120


🌐 Web query:

transformers AutoModelForCausalLM torch_dtype vs dtype deprecation

💡 Result:

In Hugging Face Transformers, torch_dtype in from_pretrained(...) has been deprecated in favor of a new keyword argument dtype. For causal LMs loaded via AutoModelForCausalLM.from_pretrained, you should use dtype=... (e.g., dtype=torch.float16 or dtype="auto") instead of torch_dtype=.... What this means in practice 1) Preferred API - Use dtype: - model = AutoModelForCausalLM.from_pretrained(repo_id, dtype=torch.float16, device_map="auto") - model = AutoModelForCausalLM.from_pretrained(repo_id, dtype="auto", device_map="auto") - The dtype parameter controls which PyTorch dtype the model weights are instantiated in; dtype="auto" scans config.json for a dtype or legacy torch_dtype entry and loads weights in that format. If config.json lacks this information, Transformers inspects the first floating-point weight in the checkpoint and adopts its data type. 2) Backward compatibility - torch_dtype still works (forwards to dtype) but triggers a deprecation warning: "torch_dtype is deprecated! Use dtype instead!" - Update your code to use dtype to avoid the warning. 3) Timeline - Deprecation introduced in Transformers PR #39782 (July 2025). - Warnings appear when loading models (e.g., in diffusers pipelines using Transformers models). - Transformers v5.0.0rc1+ defaults dtype="auto" for from_pretrained (loads model in saved dtype, not always fp32). Examples import torch from transformers import AutoModelForCausalLM # Recommended model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", dtype=torch.bfloat16) model = AutoModelForCausalLM.from_pretrained("google/gemma-7b", dtype="auto") # Legacy (warns) model = AutoModelForCausalLM.from_pretrained("google/gemma-7b", torch_dtype=torch.float16)

Citations:


Replace torch_dtype with dtype to avoid deprecation warnings.

In transformers 4.41+, from_pretrained(..., torch_dtype=...) triggers a FutureWarning in favor of dtype=.... Since the project pins transformers up to 4.57.6, this will emit warnings on every model load.

Change line 109 from:

torch_dtype=dtype,

to:

dtype=dtype,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/qwen_llm_backend.py` around lines 104 - 118, The call to
AutoModelForCausalLM.from_pretrained in the model loading block uses the
deprecated torch_dtype kwarg and should use dtype instead to avoid warnings; in
the with model_load_progress(...) / force_offline_if_cached(...) section update
the from_pretrained invocation (referenced as
AutoModelForCausalLM.from_pretrained) to pass the local dtype variable via the
dtype parameter rather than torch_dtype so the model is loaded without
deprecation warnings while keeping tokenizer and device code (self.tokenizer,
self.model.to(self.device), self.model.eval()) unchanged.

Comment on lines +40 to +60
def mount_into(
app: FastAPI,
*,
extra_startup: Callable[[], None] | None = None,
) -> None:
"""Attach the MCP app to ``app`` at ``/mcp`` and install the client-id middleware.

``extra_startup`` — if provided, runs during the FastAPI lifespan. This
is the hook that lets ``app.py`` keep its existing startup/shutdown
bodies while also driving FastMCP's session manager.
"""
mcp = build_mcp_server()
mcp_app = mcp.http_app(path="/", transport="http")

# ClientIdMiddleware must run before FastMCP so the ContextVar is set
# by the time tool handlers execute. Starlette composes middlewares
# outermost-first, so adding here on the parent app is correct.
app.add_middleware(ClientIdMiddleware)
app.mount("/mcp", mcp_app)
app.state.mcp_lifespan = mcp_app.router.lifespan_context
logger.info("MCP: mounted at /mcp (FastMCP %s)", getattr(mcp, "version", ""))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find callers of mount_into to see whether extra_startup is passed anywhere.
rg -nP --type=py -C3 '\bmount_into\s*\('

Repository: jamiepine/voicebox

Length of output: 397


🏁 Script executed:

#!/bin/bash
# Search for all calls to mount_into (not just definition)
rg -nP --type=py 'mount_into' | head -20

Repository: jamiepine/voicebox

Length of output: 233


🏁 Script executed:

#!/bin/bash
# Read the full mount_into and compose_lifespan functions to check if extra_startup is referenced
sed -n '40,80p' backend/mcp_server/server.py

Repository: jamiepine/voicebox

Length of output: 1608


🏁 Script executed:

#!/bin/bash
# Search for mount_into calls in backend/app.py and other files
rg -nP --type=py -B2 -A2 'mount_into\(' backend/

Repository: jamiepine/voicebox

Length of output: 265


🏁 Script executed:

#!/bin/bash
# Search more broadly for mount_into references without the opening paren
rg -nP --type=py 'mount_into' backend/ | grep -v 'def mount_into\|from.*mount_into\|__all__'

Repository: jamiepine/voicebox

Length of output: 44


🏁 Script executed:

#!/bin/bash
# Check backend/app.py specifically for mount_into usage
if [ -f backend/app.py ]; then
  echo "=== backend/app.py ==="
  cat backend/app.py | head -100
fi

Repository: jamiepine/voicebox

Length of output: 3768


🏁 Script executed:

#!/bin/bash
# Read the rest of create_app function
sed -n '75,150p' backend/app.py

Repository: jamiepine/voicebox

Length of output: 3054


extra_startup parameter is declared and documented but never invoked.

The docstring states extra_startup "runs during the FastAPI lifespan", but the function body never references it. Either wire it into the lifespan or remove the parameter and its doc paragraph to avoid misleading callers.

🛠️ Option A — remove the dead parameter
-def mount_into(
-    app: FastAPI,
-    *,
-    extra_startup: Callable[[], None] | None = None,
-) -> None:
-    """Attach the MCP app to ``app`` at ``/mcp`` and install the client-id middleware.
-
-    ``extra_startup`` — if provided, runs during the FastAPI lifespan. This
-    is the hook that lets ``app.py`` keep its existing startup/shutdown
-    bodies while also driving FastMCP's session manager.
-    """
+def mount_into(app: FastAPI) -> None:
+    """Attach the MCP app to ``app`` at ``/mcp`` and install the client-id middleware."""

Option B — actually run it (e.g., wrap app.state.mcp_lifespan so extra_startup() is called inside the composed lifespan). The current pattern in create_app uses compose_lifespan for this; match whichever approach is intended here.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def mount_into(
app: FastAPI,
*,
extra_startup: Callable[[], None] | None = None,
) -> None:
"""Attach the MCP app to ``app`` at ``/mcp`` and install the client-id middleware.
``extra_startup``if provided, runs during the FastAPI lifespan. This
is the hook that lets ``app.py`` keep its existing startup/shutdown
bodies while also driving FastMCP's session manager.
"""
mcp = build_mcp_server()
mcp_app = mcp.http_app(path="/", transport="http")
# ClientIdMiddleware must run before FastMCP so the ContextVar is set
# by the time tool handlers execute. Starlette composes middlewares
# outermost-first, so adding here on the parent app is correct.
app.add_middleware(ClientIdMiddleware)
app.mount("/mcp", mcp_app)
app.state.mcp_lifespan = mcp_app.router.lifespan_context
logger.info("MCP: mounted at /mcp (FastMCP %s)", getattr(mcp, "version", ""))
def mount_into(app: FastAPI) -> None:
"""Attach the MCP app to ``app`` at ``/mcp`` and install the client-id middleware."""
mcp = build_mcp_server()
mcp_app = mcp.http_app(path="/", transport="http")
# ClientIdMiddleware must run before FastMCP so the ContextVar is set
# by the time tool handlers execute. Starlette composes middlewares
# outermost-first, so adding here on the parent app is correct.
app.add_middleware(ClientIdMiddleware)
app.mount("/mcp", mcp_app)
app.state.mcp_lifespan = mcp_app.router.lifespan_context
logger.info("MCP: mounted at /mcp (FastMCP %s)", getattr(mcp, "version", ""))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/mcp_server/server.py` around lines 40 - 60, The declared
extra_startup parameter on mount_into is never used; either remove it and its
doc paragraph or wire it into the FastAPI lifespan so it actually runs. To fix
by wiring: when you set app.state.mcp_lifespan =
mcp_app.router.lifespan_context, if extra_startup is not None replace/compose
that lifespan with one that runs extra_startup during startup (use the existing
compose_lifespan helper or the same pattern used in create_app) so mount_into
installs the composed lifespan that calls extra_startup; alternatively delete
the extra_startup parameter, its docstring mention, and any callers relying on
it. Ensure you update the mount_into signature/comments to match the chosen
approach.

Comment on lines +120 to +131
# Absolute-path mode: validate and transcribe in place.
if audio_path is not None:
path = Path(audio_path)
if not path.is_absolute():
raise ValueError("`audio_path` must be absolute.")
if not path.is_file():
raise ValueError(f"File not found: {audio_path}")
if path.stat().st_size > MAX_TRANSCRIBE_BYTES:
raise ValueError(
f"File exceeds {MAX_TRANSCRIBE_BYTES // (1024 * 1024)} MB limit."
)
return await _transcribe_file(path, language, model)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# MCP mount and auth middleware
rg -nP --type=py -C4 '(mount|add_middleware|ClientId|CORSMiddleware|allow_origins)' -g '**/mcp_server/**' -g '**/app.py'
# Network-access / host-binding config
rg -nP --type=py -C3 '(host\s*=|uvicorn\.(run|Server)|0\.0\.0\.0|network[_-]?access)'

Repository: jamiepine/voicebox

Length of output: 10609


🏁 Script executed:

#!/bin/bash
# Check default CORS origins
rg -nP --type=py -B2 -A4 'default_origins' backend/app.py

# Check ClientIdMiddleware implementation - is it enforcing auth or just copying?
rg -nP --type=py -A20 'class ClientIdMiddleware' backend/mcp_server/context.py

# Check if there's a network-access feature that gates the host binding
rg -nP --type=py 'network.access|networkAccess' -g '**/*.py'

# Confirm audio_path validation in the tools.py file
head -150 backend/mcp_server/tools.py | tail -50

Repository: jamiepine/voicebox

Length of output: 3455


🏁 Script executed:

#!/bin/bash
# Search for client_id validation in tools
rg -nP --type=py 'current_client_id|CLIENT_ID' backend/mcp_server/tools.py -A3 -B3

# Check if any tool implementation validates or requires a client_id
rg -nP --type=py '@mcp.tool' backend/mcp_server/tools.py -A25 | head -100

Repository: jamiepine/voicebox

Length of output: 4873


Arbitrary local-file read via audio_path without authentication.

voicebox.transcribe accepts any absolute path and will transcribe up to 200 MB of the file's decoded audio into the tool response, with no client authentication required. Combined with the ability to run the backend on 0.0.0.0 (via the --host CLI argument), this becomes a remote "read any audio-decodable file" primitive scoped by the user the backend runs as.

Notably, unlike voicebox.speak, the transcribe tool does not validate or require a X-Voicebox-Client-Id header — ClientIdMiddleware only copies the header into a ContextVar if present but does not reject unauthenticated requests.

Mitigations worth considering:

  • Confine audio_path to a configured allow-list of roots (e.g. user captures dir, $HOME/Music) — anything outside rejected before stat.
  • Require audio_base64 when the server is exposed on 0.0.0.0, and reject audio_path outright.
  • Gate transcribe on authenticated client bindings like voicebox.speak does.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/mcp_server/tools.py` around lines 120 - 131, The transcribe endpoint
currently allows any absolute audio_path (in the transcribe handler that calls
_transcribe_file) which enables arbitrary local-file reads; restrict this by
validating audio_path.resolve() is under a configured allow-list of root
directories (e.g., AUDIO_ALLOWED_ROOTS or a single AUDIO_BASE_DIR) before stat()
and reject otherwise, and additionally require an authenticated client (check
for the presence of ClientIdMiddleware/ X-Voicebox-Client-Id in the request
context) when audio_path is provided; implement these checks in the same block
that currently validates path.is_absolute() and path.is_file(), returning an
error if the path is outside allowed roots or if the client id is missing (or
alternatively force use of audio_base64 when the server is bound to 0.0.0.0).

Comment thread backend/routes/llm.py
Comment on lines +42 to +49
raise HTTPException(
status_code=202,
detail={
"message": f"Qwen3 {model_size} is being downloaded. Please wait and try again.",
"model_name": progress_model_name,
"downloading": True,
},
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

HTTPException(status_code=202, …) is a semantic anti-pattern — returns error-shaped body for a success status.

HTTPException is intended for 4xx/5xx. FastAPI wraps its detail payload as {"detail": {...}}, so callers polling this endpoint receive an error-shape envelope on what is supposed to be a successful "download started, try again later" response. That bakes an awkward contract in: clients have to parse response.json()["detail"]["downloading"] on a 202, which reads like an error.

Return a regular JSONResponse instead so the 202 body is the structured payload directly:

♻️ Proposed fix
-from fastapi import APIRouter, HTTPException
+from fastapi import APIRouter, HTTPException
+from fastapi.responses import JSONResponse
         task_manager.start_download(progress_model_name)
         create_background_task(download_llm_background())
 
-        raise HTTPException(
+        return JSONResponse(
             status_code=202,
-            detail={
+            content={
                 "message": f"Qwen3 {model_size} is being downloaded. Please wait and try again.",
                 "model_name": progress_model_name,
                 "downloading": True,
             },
         )

Note the signature changes from -> models.LLMGenerateResponse implicitly to Union[LLMGenerateResponse, JSONResponse], so you may also want to relax response_model via response_model=None on the decorator or split this into two routes. Either way the 202 case no longer masquerades as an error.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
raise HTTPException(
status_code=202,
detail={
"message": f"Qwen3 {model_size} is being downloaded. Please wait and try again.",
"model_name": progress_model_name,
"downloading": True,
},
)
return JSONResponse(
status_code=202,
content={
"message": f"Qwen3 {model_size} is being downloaded. Please wait and try again.",
"model_name": progress_model_name,
"downloading": True,
},
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/llm.py` around lines 42 - 49, Replace the HTTPException call
that returns a 202 with a normal JSONResponse so the body is a success-shaped
payload; specifically, where HTTPException(status_code=202, detail={...}) is
raised, construct and return fastapi.responses.JSONResponse(status_code=202,
content={"message": f"Qwen3 {model_size} is being downloaded. Please wait and
try again.", "model_name": progress_model_name, "downloading": True}); also
update the route signature/decorator to allow this alternate return (either set
response_model=None on the endpoint decorator or widen the return type from
models.LLMGenerateResponse to Union[models.LLMGenerateResponse, JSONResponse] or
split into two routes) so the 202 path does not rely on HTTPException semantics.

Comment on lines +18 to +36
@router.put("/captures", response_model=models.CaptureSettingsResponse)
async def update_capture_settings_endpoint(
patch: models.CaptureSettingsUpdate,
db: Session = Depends(get_db),
):
return settings_service.update_capture_settings(db, patch.model_dump(exclude_none=True))


@router.get("/generation", response_model=models.GenerationSettingsResponse)
async def get_generation_settings_endpoint(db: Session = Depends(get_db)):
return settings_service.get_generation_settings(db)


@router.put("/generation", response_model=models.GenerationSettingsResponse)
async def update_generation_settings_endpoint(
patch: models.GenerationSettingsUpdate,
db: Session = Depends(get_db),
):
return settings_service.update_generation_settings(db, patch.model_dump(exclude_none=True))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -nP -C5 'def update_capture_settings|def update_generation_settings' -g 'backend/services/settings*.py'
rg -nP -C3 'default_playback_voice_id' --type=ts -g 'app/src/**'

Repository: jamiepine/voicebox

Length of output: 4919


exclude_none=True + service's if value is not None check together prevent clearing nullable fields.

The UI allows users to clear default_playback_voice_id by sending null (see MCPPage.tsx:108), but the endpoint strips it with exclude_none=True, and the service ignores null values with if value is not None (backend/services/settings.py:49, 64). Users cannot actually clear the field.

Fix: Use model_dump(exclude_unset=True) to preserve explicit null values, then update the service to distinguish between missing keys (don't update) and explicit null (clear the field).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/settings.py` around lines 18 - 36, Change the endpoint payload
serialization to preserve explicit nulls by replacing
model_dump(exclude_none=True) with model_dump(exclude_unset=True) in
update_capture_settings_endpoint and update_generation_settings_endpoint so
client-sent nulls reach the service, and then modify the service methods
settings_service.update_capture_settings and
settings_service.update_generation_settings to differentiate missing keys (do
not change the DB) from explicit null values (clear the nullable field) instead
of using the current if value is not None checks that ignore nulls.

Comment on lines +73 to +130
raw_path = config.get_captures_dir() / f"{capture_id}{suffix}"
raw_path.write_bytes(audio_bytes)

# Decode once with librosa — its audioread fallback handles webm/opus
# via ffmpeg, which miniaudio (used inside mlx-audio's whisper) can't.
# The decoded array gives us an accurate duration and becomes the
# canonical WAV we hand to whisper.
try:
audio, sr = load_audio(str(raw_path))
duration_ms = int((len(audio) / sr) * 1000) if sr else None
except Exception as decode_err:
logger.warning(
"Could not decode capture %s (%s): %r", capture_id, suffix, decode_err
)
audio, sr = None, None
duration_ms = None

_WHISPER_NATIVE_FORMATS = (".wav", ".mp3", ".flac", ".ogg")

if audio is None or sr is None:
# Decode failed. Only pass the file straight to whisper if the
# source is a format its miniaudio loader can still read — webm,
# m4a, etc. would just 500 later. Surface a clean error instead.
if suffix not in _WHISPER_NATIVE_FORMATS:
raise ValueError(
f"Could not decode {suffix} audio — the recording may be empty or corrupt"
)
audio_path = raw_path
elif suffix == ".wav":
audio_path = raw_path
else:
# Transcode to WAV so downstream loaders (miniaudio, soundfile) work
# regardless of what format the client shipped.
audio_path = config.get_captures_dir() / f"{capture_id}.wav"
sf.write(str(audio_path), audio, sr, format="WAV")
try:
raw_path.unlink()
except OSError:
pass

whisper = get_whisper_model()
resolved_stt = stt_model or whisper.model_size
transcript = await whisper.transcribe(str(audio_path), language, resolved_stt)

row = DBCapture(
id=capture_id,
audio_path=config.to_storage_path(audio_path),
source=source,
language=language,
duration_ms=duration_ms,
transcript_raw=transcript,
stt_model=resolved_stt,
)
db.add(row)
db.commit()
db.refresh(row)

return _to_response(row)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Orphan audio files on STT or transcode failure.

raw_path.write_bytes(...) at line 74 (and the transcoded .wav at line 107) lands on disk before db.commit() at line 127. If sf.write, whisper.transcribe, or the commit itself raises, no row is persisted but the audio files remain in data/captures forever. delete_capture can never reach them since there's no DB row, so this accumulates silently over time and especially hurts long-running daemons that hit STT errors (OOM, cancellation, bad input).

🛡️ Proposed fix — track and clean up on failure
     raw_path = config.get_captures_dir() / f"{capture_id}{suffix}"
     raw_path.write_bytes(audio_bytes)
+    written_paths: list[Path] = [raw_path]
+    committed = False
+    try:
+        # ... existing decode / transcode / transcribe / db.add / db.commit ...
+        # after each new file written (e.g. the transcoded .wav), append to written_paths
+        committed = True
+        return _to_response(row)
+    finally:
+        if not committed:
+            for p in written_paths:
+                try:
+                    p.unlink(missing_ok=True)
+                except OSError:
+                    logger.exception("Failed to clean up orphan capture audio %s", p)

(Append audio_path to written_paths right after sf.write at line 107 so the transcoded WAV is also swept up on later failures.)

🧰 Tools
🪛 Ruff (0.15.11)

[warning] 90-90: Variable _WHISPER_NATIVE_FORMATS in function should be lowercase

(N806)


[warning] 108-111: Use contextlib.suppress(OSError) instead of try-except-pass

Replace try-except-pass with with contextlib.suppress(OSError): ...

(SIM105)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/services/captures.py` around lines 73 - 130, The code writes raw_path
and may write a transcoded WAV before DB commit, so failures in sf.write,
whisper.transcribe, or db.commit leak files; fix by tracking all filesystem
writes (append raw_path and any transcoded audio_path to a written_paths list
right after raw_path.write_bytes(...) and after sf.write(...)) and wrap the
remaining logic (transcode, whisper.transcribe, db.add/db.commit) in a
try/except/finally that on exception or on returning without a DB row removes
any paths in written_paths (using unlink with OSError ignored); update
functions/symbols involved: raw_path.write_bytes, sf.write, whisper.transcribe,
db.commit, and ensure cleanup runs before re-raising the original exception.

Comment on lines +140 to +150
{
name: 'Qwen 3.5',
author: 'Alibaba',
sizes: ['4B', '2B', '0.8B'],
description:
'Powers transcript cleanup, persona voice replies, and the voice I/O loop. Shares its runtime with the TTS/STT stack — one model cache, one GPU story.',
tags: [
{ icon: Sparkles, label: 'Refinement' },
{ icon: Brain, label: 'Persona replies' },
],
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Identify which Qwen LLM the backend actually loads for refinement/personality.
rg -nP --type=py -C3 '\bqwen\w*3(\.5)?\b|Qwen3(\.5)?|qwen3(\.5)?_llm|get_llm_backend' 
rg -nP --type=py -C3 'model_name|MODEL_ID|hf_repo|huggingface' backend/backends backend/services 2>/dev/null
fd -e py -e md -e mdx 'qwen' | head -50

Repository: jamiepine/voicebox

Length of output: 50374


Correct the LLM model name and size options in the landing page to match the backend.

The backend loads Qwen3 (sizes 0.6B, 1.7B, 4B), but the SupportedModels card advertises "Qwen 3.5" (sizes 0.8B, 2B, 4B). These are inconsistent—the landing page will mislead users about model capabilities and available size options. Either update the card to display "Qwen3" with the correct backend sizes, or confirm the backend has moved to Qwen3.5 and update all model references, HuggingFace repo IDs, and backend code accordingly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@landing/src/components/SupportedModels.tsx` around lines 140 - 150, Update
the SupportedModels card to match the backend: replace the 'Qwen 3.5' entry in
SupportedModels.tsx (the object with name: 'Qwen 3.5' and sizes:
['4B','2B','0.8B']) so it either becomes 'Qwen3' with sizes ['0.6B','1.7B','4B']
to match the backend, or, if you intend to standardize on Qwen3.5, update all
backend references (model IDs, loader code) to Qwen3.5 consistently;
specifically modify the model object’s name and sizes arrays in
SupportedModels.tsx to reflect the chosen canonical model and sizes.

jamiepine and others added 2 commits April 23, 2026 19:41
…all green

Two small follow-ups to the sidebar checklist placement. Move it below
the What's different section so the sticky top of the sidebar stays the
page's narrative context (About → differences) and the checklist reads
as a status panel rather than preamble. Gate the whole block on
!readiness.allReady so once every gate is green the sidebar drops back
to just About + What's different — no value in real estate full of
checkmarks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
useCaptureSettings updated its own cache optimistically but never
invalidated ['capture-readiness'], so for up to 5 s (the poll interval)
after switching stt_model or llm_model the checklist kept showing the
previous model's ready/missing state. The backend endpoint resolves
the model live on each call — it was just the frontend cache that
lagged. Invalidate in onSettled only when the patch touched a model
field, so unrelated updates (chord keys, toggles) don't pay for a
refetch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
app/src/components/CapturesTab/DictationReadinessChecklist.tsx (1)

114-125: Unstable downloadByModel in effect deps causes re-run every render.

downloadByModel is a fresh Map instance on every render (built at lines 106–109), so including it in the dep array makes this effect fire on every commit — not just when activeTasks actually changes. The logic is idempotent (when data is unchanged, current equals prevActive.current so no invalidation occurs), but every render still rebuilds the Set and reassigns the ref.

Prefer useMemo for the map, or key the effect to activeTasks alone and derive the key set inside:

♻️ Proposed refactor
-  const downloadByModel = new Map<string, ActiveDownloadTask>();
-  for (const dl of activeTasks?.downloads ?? []) {
-    if (dl.status === 'downloading') downloadByModel.set(dl.model_name, dl);
-  }
+  const downloadByModel = useMemo(() => {
+    const map = new Map<string, ActiveDownloadTask>();
+    for (const dl of activeTasks?.downloads ?? []) {
+      if (dl.status === 'downloading') map.set(dl.model_name, dl);
+    }
+    return map;
+  }, [activeTasks]);

   // When a download disappears from activeTasks, it just finished — refetch
   // readiness immediately so the row flips to ✓ instead of waiting up to 5s
   // for the next readiness poll.
   const prevActive = useRef<Set<string>>(new Set());
   useEffect(() => {
     const current = new Set(downloadByModel.keys());
     for (const name of prevActive.current) {
       if (!current.has(name)) {
         queryClient.invalidateQueries({ queryKey: ['capture-readiness'] });
         queryClient.invalidateQueries({ queryKey: ['modelStatus'] });
         break;
       }
     }
     prevActive.current = current;
-  }, [activeTasks, queryClient, downloadByModel]);
+  }, [downloadByModel, queryClient]);

(activeTasks can be dropped from deps since downloadByModel is now memoized against it.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/CapturesTab/DictationReadinessChecklist.tsx` around lines
114 - 125, The effect currently reruns every render because downloadByModel is a
new Map each render; memoize downloadByModel with useMemo (keyed by activeTasks)
or remove downloadByModel from the effect deps and compute the current Set from
activeTasks inside the useEffect instead; update the code around prevActive,
useEffect, and downloadByModel so downloadByModel is stable (useMemo([...],
[activeTasks])) or so the effect depends only on activeTasks and queryClient,
then call queryClient.invalidateQueries(['capture-readiness']) and
queryClient.invalidateQueries(['modelStatus']) when a name was removed as
before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@app/src/components/CapturesTab/DictationReadinessChecklist.tsx`:
- Around line 114-125: The effect currently reruns every render because
downloadByModel is a new Map each render; memoize downloadByModel with useMemo
(keyed by activeTasks) or remove downloadByModel from the effect deps and
compute the current Set from activeTasks inside the useEffect instead; update
the code around prevActive, useEffect, and downloadByModel so downloadByModel is
stable (useMemo([...], [activeTasks])) or so the effect depends only on
activeTasks and queryClient, then call
queryClient.invalidateQueries(['capture-readiness']) and
queryClient.invalidateQueries(['modelStatus']) when a name was removed as
before.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 044f8174-cd4e-498e-8697-ab56b4b8cb18

📥 Commits

Reviewing files that changed from the base of the PR and between 687ab2a and ef63faf.

📒 Files selected for processing (3)
  • app/src/components/CapturesTab/DictationReadinessChecklist.tsx
  • app/src/components/ServerTab/CapturesPage.tsx
  • app/src/lib/hooks/useSettings.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • app/src/lib/hooks/useSettings.ts
  • app/src/components/ServerTab/CapturesPage.tsx

Two surfaces leaked macOS-specific copy onto other platforms:

1. The Input Monitoring + Accessibility rows in the readiness
   checklist rendered everywhere. On Windows/Linux the Rust permission
   stubs return true, so the rows showed as permanent green checkmarks
   with copy like "macOS allows Voicebox to detect your global
   shortcut." — nonsense when you're on Windows. Gate both rows on a
   userAgent-based isMacOS check so they only render where the
   underlying TCC permission actually exists.

2. The global-shortcut setting description ended with "macOS will ask
   for Input Monitoring permission the first time you turn this on."
   That sentence rendered on every platform. The readiness checklist
   already surfaces the TCC requirement at the right moment on macOS,
   so the description doesn't need the platform note — drop it from
   en / ja / zh-CN / zh-TW.

Other macOS strings (AccessibilityNotice, InputMonitoringNotice, their
"stillMissing" hints) are already gated behind the Rust permission
booleans returning false, which never happens on Windows/Linux, so they
stay inert without further changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
app/src/i18n/locales/en/translation.json (2)

262-264: Trim personalityPlaceholder; it's doing the job of the hint.

The placeholder is ~300 characters and restates information that already lives in personalityHint ("Drives the Compose button and the in-character rewrite toggle on the generate page." ↔ "Leave blank to hide the Compose button and persona toggle on the generate page."). Placeholders are for an example of input; explanatory copy belongs in the hint. Long placeholders also tend to wrap or get truncated in narrower forms and are noisier for screen readers.

♻️ Suggested split
-      "personalityPlaceholder": "Optional. Who this voice is and how they talk. E.g. \"a grumpy pirate who only speaks in nautical metaphors\". Drives the Compose button and the in-character rewrite toggle on the generate page.",
-      "personalityHint": "Leave blank to hide the Compose button and persona toggle on the generate page."
+      "personalityPlaceholder": "e.g. a grumpy pirate who only speaks in nautical metaphors",
+      "personalityHint": "Who this voice is and how they talk. Drives the Compose button and the in-character rewrite toggle on the generate page. Leave blank to hide both."

Apply the same trim to ja, zh-CN, zh-TW.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/i18n/locales/en/translation.json` around lines 262 - 264, The
personalityPlaceholder string is excessively long and duplicates explanatory
text that belongs in personalityHint; replace personalityPlaceholder with a
short example input (e.g., a concise persona snippet) and keep the explanatory
sentence only in personalityHint. Update the JSON key "personalityPlaceholder"
in the English locale (and apply the same trimmed-example change to the
"personalityPlaceholder" entries in the ja, zh-CN, and zh-TW locales) while
leaving "personalityHint" as the place for the longer explanatory copy.

931-933: Inconsistent key naming for Qwen3 size options.

size06 and size17 encode the decimal (0.6 → 06, 1.7 → 17), but size4 breaks that scheme — by the same convention it should be size40. Rename for consistency so sorting/auto-completion align:

Proposed rename
-          "size06": "Qwen3 · 0.6B · 400 MB · {{tail}}",
-          "size17": "Qwen3 · 1.7B · 1.1 GB · {{tail}}",
-          "size4": "Qwen3 · 4B · 2.5 GB · {{tail}}",
+          "size06": "Qwen3 · 0.6B · 400 MB · {{tail}}",
+          "size17": "Qwen3 · 1.7B · 1.1 GB · {{tail}}",
+          "size40": "Qwen3 · 4B · 2.5 GB · {{tail}}",

Update CapturesPage.tsx (line 419) and the other locale files (ja, zh-CN, zh-TW) in lockstep.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/i18n/locales/en/translation.json` around lines 931 - 933, Rename the
inconsistent Qwen3 locale key "size4" to "size40" across all locale files (en,
ja, zh-CN, zh-TW) and update every reference in code (e.g., the options lookup
in CapturesPage.tsx where Qwen3 size keys are used) to use "size40" instead of
"size4" so the decimal-based naming scheme (size06, size17, size40) remains
consistent and sorting/auto-complete work correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/src/i18n/locales/zh-CN/translation.json`:
- Around line 696-701: Update the zh-CN persona strings so active vs inactive
states are distinct: change "persona.tooltipActive" and
"persona.ariaLabelActive" from "以人物设定朗读" to an ongoing-state wording such as
"以人物设定朗读中", and leave "persona.tooltipInactive" and "persona.ariaLabelInactive"
as the base form "以人物设定朗读" to match the pattern used in en/ja/zh-TW.

---

Nitpick comments:
In `@app/src/i18n/locales/en/translation.json`:
- Around line 262-264: The personalityPlaceholder string is excessively long and
duplicates explanatory text that belongs in personalityHint; replace
personalityPlaceholder with a short example input (e.g., a concise persona
snippet) and keep the explanatory sentence only in personalityHint. Update the
JSON key "personalityPlaceholder" in the English locale (and apply the same
trimmed-example change to the "personalityPlaceholder" entries in the ja, zh-CN,
and zh-TW locales) while leaving "personalityHint" as the place for the longer
explanatory copy.
- Around line 931-933: Rename the inconsistent Qwen3 locale key "size4" to
"size40" across all locale files (en, ja, zh-CN, zh-TW) and update every
reference in code (e.g., the options lookup in CapturesPage.tsx where Qwen3 size
keys are used) to use "size40" instead of "size4" so the decimal-based naming
scheme (size06, size17, size40) remains consistent and sorting/auto-complete
work correctly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fdb117ea-54fe-46de-8aab-4a806f37cac8

📥 Commits

Reviewing files that changed from the base of the PR and between ef63faf and 1ca7895.

📒 Files selected for processing (5)
  • app/src/components/CapturesTab/DictationReadinessChecklist.tsx
  • app/src/i18n/locales/en/translation.json
  • app/src/i18n/locales/ja/translation.json
  • app/src/i18n/locales/zh-CN/translation.json
  • app/src/i18n/locales/zh-TW/translation.json
🚧 Files skipped from review as they are similar to previous changes (3)
  • app/src/components/CapturesTab/DictationReadinessChecklist.tsx
  • app/src/i18n/locales/zh-TW/translation.json
  • app/src/i18n/locales/ja/translation.json

Comment on lines +696 to 701
"persona": {
"tooltipActive": "以人物设定朗读",
"tooltipInactive": "以人物设定朗读",
"ariaLabelActive": "以人物设定朗读",
"ariaLabelInactive": "以人物设定朗读"
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Compare persona active/inactive strings across locales to confirm whether they should differ.
fd -t f 'translation.json' app/src/i18n/locales | while read -r f; do
  echo "=== $f ==="
  jq '.generation.persona' "$f" 2>/dev/null
done

Repository: jamiepine/voicebox

Length of output: 850


zh-CN persona strings should differentiate between active/inactive states like other locales.

All four keys are identical ("以人物设定朗读"), but the English, Japanese, and Traditional Chinese versions all distinguish between active and inactive states. For example:

  • en: active "Speaking in character" vs. inactive "Speak in character"
  • ja: active "キャラクターとして発話中" vs. inactive "キャラクターとして発話"
  • zh-TW: active "以角色發話中" vs. inactive "以角色發話"

Simplify Chinese Simplified (zh-CN) tooltipActive, ariaLabelActive to indicate an ongoing state (e.g., "以人物设定朗读中") while keeping tooltipInactive and ariaLabelInactive as the base form ("以人物设定朗读"), or use a pattern that clearly conveys the distinction.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/i18n/locales/zh-CN/translation.json` around lines 696 - 701, Update
the zh-CN persona strings so active vs inactive states are distinct: change
"persona.tooltipActive" and "persona.ariaLabelActive" from "以人物设定朗读" to an
ongoing-state wording such as "以人物设定朗读中", and leave "persona.tooltipInactive"
and "persona.ariaLabelInactive" as the base form "以人物设定朗读" to match the pattern
used in en/ja/zh-TW.

…state machine

Dep swap:
- Drop the git-pinned jamiepine/rdev fork we were carrying since the
  upstream crate is abandoned.
- Depend on keytap 0.2 from crates.io — our own cross-platform global
  keyboard tap crate. Clean shutdown via Drop, Sonoma-safe by design
  (no TSMGetInputSourceProperty calls off the main thread, so
  `set_is_main_thread(false)` is gone), and properly versioned.

Chord engine rewrite:
- Delete hotkey_monitor.rs's internal Chord state machine (Match enum,
  KeyEvent enum, step()/classify() methods, associated unit tests).
  keytap's ChordMatcher subsumes it: Momentary chord for PTT,
  add_toggle() for Toggle-to-talk, longest-match resolution, sticky-end
  for Toggle. Net: -80 LOC in hotkey_monitor.rs; the remaining module
  is the dispatcher loop + Effect→Tauri translation.
- Preserve the PTT→Toggle "RestartRecording" upgrade signal. keytap
  emits End(PTT)+Start(Toggle) atomically (same Instant) when the held
  set upgrades from a shorter chord to a longer superset. The
  dispatcher peeks at the matcher with a 5 ms recv_timeout after any
  End and coalesces the pair into Effect::RestartRecording so the
  frontend still gets the "discard the transition-moment audio" signal
  instead of an unrelated Stop+Start pair.
- HotkeyMonitor::update_bindings now actually tears down the tap on
  empty bindings instead of leaving an idle CGEventTap around. New
  bindings rebuild the matcher and the dispatcher thread from scratch.

key_codes.rs:
- Rewrite the browser-code → Key table against keytap's cleaner Key
  variant names (`A`..`Z` not `KeyA`..`KeyZ`, `Digit0`..`Digit9` not
  `Num0`..`Num9`, `ArrowUp` not `UpArrow`, `AltLeft`/`AltRight` instead
  of `Alt`/`AltGr`, `Period` not `Dot`, …). On-disk chord string
  format (W3C `KeyboardEvent.code` identifiers) is unchanged, so
  capture_settings rows written before the swap round-trip identically.
  Legacy aliases (`Alt`, `AltGr`, `Num0`, `UpArrow`, `Dot`, …) kept for
  forward-compat on old rows.

main.rs / input_monitoring.rs:
- Update the few doc comments that referenced `rdev::listen` to
  describe keytap's Tap; no behavioural change.
- build_chord_bindings now imports from keytap::Key.
- enable_hotkey / disable_hotkey / update_chord_bindings reach into
  HotkeyMonitor via &mut since apply()/update_bindings() now mutate.

Tests live in keytap now (22 chord-related tests in keytap 0.2,
including the PTT→Toggle upgrade scenario that used to be tested in
hotkey_monitor.rs). Voicebox's hotkey_monitor.rs is thin enough that
local testing would be trivia.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tauri/src-tauri/src/hotkey_monitor.rs (1)

191-227: Optional: convert the same-instant peek to a loop.

process_event recurses when the peek returns an unrelated event. Depth is bounded in realistic keyboard scenarios (1–2), and the per-frame stack is small, so this is not a bug. Still, a small loop would remove the theoretical unbounded-recursion shape and keep the peek/dispatch logic co-located, which reads a bit more naturally for this kind of event pipeline.

♻️ Sketch: iterative peek
 fn process_event(
     app: &AppHandle,
     matcher: &ChordMatcher<ChordAction>,
-    event: ChordEvent<ChordAction>,
+    mut event: ChordEvent<ChordAction>,
 ) {
-    match event {
-        ChordEvent::Start { id, .. } => {
-            apply_effect(app, Effect::StartRecording(id));
-        }
-        ChordEvent::End { id: end_id, time: end_time } => {
-            match matcher.recv_timeout(Duration::from_millis(5)) {
-                Ok(ChordEvent::Start { id: start_id, time: start_time })
-                    if start_time == end_time =>
-                {
-                    apply_effect(app, Effect::RestartRecording(start_id));
-                }
-                Ok(other) => {
-                    apply_effect(app, Effect::StopRecording(end_id));
-                    process_event(app, matcher, other);
-                }
-                Err(_) => {
-                    apply_effect(app, Effect::StopRecording(end_id));
-                }
-            }
-        }
-    }
+    loop {
+        match event {
+            ChordEvent::Start { id, .. } => {
+                apply_effect(app, Effect::StartRecording(id));
+                return;
+            }
+            ChordEvent::End { id: end_id, time: end_time } => {
+                match matcher.recv_timeout(Duration::from_millis(5)) {
+                    Ok(ChordEvent::Start { id: start_id, time: start_time })
+                        if start_time == end_time =>
+                    {
+                        apply_effect(app, Effect::RestartRecording(start_id));
+                        return;
+                    }
+                    Ok(other) => {
+                        apply_effect(app, Effect::StopRecording(end_id));
+                        event = other;
+                        continue;
+                    }
+                    Err(_) => {
+                        apply_effect(app, Effect::StopRecording(end_id));
+                        return;
+                    }
+                }
+            }
+        }
+    }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tauri/src-tauri/src/hotkey_monitor.rs` around lines 191 - 227, The
process_event recursion can be replaced by an iterative loop: in process_event,
when handling ChordEvent::End use a small loop that peeks via
matcher.recv_timeout(Duration::from_millis(5)) and, if you observe a Start with
identical time, emit Effect::RestartRecording(start_id) and break; if you
receive some other event, first emit Effect::StopRecording(end_id) and then set
event = other and continue the loop to handle the newly peeked event (instead of
calling process_event recursively); ensure you still handle the Err(_) case by
emitting Effect::StopRecording(end_id) and breaking. Keep references to
matcher.recv_timeout, apply_effect,
Effect::StartRecording/RestartRecording/StopRecording, and preserve the timing
equality check (start_time == end_time).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tauri/src-tauri/src/hotkey_monitor.rs`:
- Around line 191-227: The process_event recursion can be replaced by an
iterative loop: in process_event, when handling ChordEvent::End use a small loop
that peeks via matcher.recv_timeout(Duration::from_millis(5)) and, if you
observe a Start with identical time, emit Effect::RestartRecording(start_id) and
break; if you receive some other event, first emit Effect::StopRecording(end_id)
and then set event = other and continue the loop to handle the newly peeked
event (instead of calling process_event recursively); ensure you still handle
the Err(_) case by emitting Effect::StopRecording(end_id) and breaking. Keep
references to matcher.recv_timeout, apply_effect,
Effect::StartRecording/RestartRecording/StopRecording, and preserve the timing
equality check (start_time == end_time).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 609593e6-8d9c-4fee-ad9b-da1fd47ef38b

📥 Commits

Reviewing files that changed from the base of the PR and between 1ca7895 and c6114b6.

⛔ Files ignored due to path filters (1)
  • tauri/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • tauri/src-tauri/Cargo.toml
  • tauri/src-tauri/src/hotkey_monitor.rs
  • tauri/src-tauri/src/input_monitoring.rs
  • tauri/src-tauri/src/key_codes.rs
  • tauri/src-tauri/src/main.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • tauri/src-tauri/src/input_monitoring.rs
  • tauri/src-tauri/src/key_codes.rs

0.2 read CGEventFlags via CGEventGetIntegerValueField(event, 0x81),
which is not a valid CGEventField id — macOS silently returned 0, so
FlagsChanged events produced no KeyDown / KeyUp for any modifier key
and the PTT / toggle chords never armed on macOS. 0.4 uses the
documented CGEventGetFlags(event) API.

0.3 (tracing / serde / Fn / IntlBackslash) is picked up as a free
consequence; no API surface we depend on changed.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tauri/src-tauri/Cargo.toml (1)

52-63: keytap 0.4 swap and inline rationale — LGTM.

The 0.2 → 0.4 bump aligns with the PR commit-message note about the macOS CGEventField modifier-events bug (invalid field id → silent 0 → missed FlagsChanged → PTT/toggle chords never armed). The inline comment is excellent documentation for future maintainers — it explains provenance (own crate, crates.io), what was replaced (git-pinned rdev fork), and the concrete behavioral wins (clean Drop shutdown, no set_is_main_thread(false) / Sonoma layout-translation crash). No further concerns.

One tiny nit you can take or leave: since keytap is a first-party crate you control, consider pinning to a =0.4.x exact version (or at least ~0.4) in the next maintenance pass so a future 0.4.x minor release to crates.io can't silently change chord semantics on users rebuilding from source without a Cargo.lock refresh. Not blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tauri/src-tauri/Cargo.toml` around lines 52 - 63, The comment suggests
pinning the first-party crate dependency to avoid silent semantic changes:
update the dependency entry for keytap (the keytap = "0.4" line in Cargo.toml)
to a more restrictive version requirement (for example use an exact patch pin
like =0.4.x or a tilde requirement ~0.4) so future 0.4.x releases can't change
behavior unexpectedly for users building from source; locate the keytap entry
and replace the loose "0.4" specifier with the chosen pinned form.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tauri/src-tauri/Cargo.toml`:
- Line 16: Add a short note to CONTRIBUTING.md (or the next release notes)
stating that the tauri Cargo feature macos-private-api (used in Cargo.toml via
tauri = { ..., features = ["macos-private-api"] } and by the titleBarStyle:
"Overlay" setting in tauri.conf.json) opts into private NSWindow APIs which will
cause rejection from the Mac App Store; instruct maintainers to remove that
feature gate before attempting App Store distribution and link to the official
Tauri App Store guidance for reference.

---

Nitpick comments:
In `@tauri/src-tauri/Cargo.toml`:
- Around line 52-63: The comment suggests pinning the first-party crate
dependency to avoid silent semantic changes: update the dependency entry for
keytap (the keytap = "0.4" line in Cargo.toml) to a more restrictive version
requirement (for example use an exact patch pin like =0.4.x or a tilde
requirement ~0.4) so future 0.4.x releases can't change behavior unexpectedly
for users building from source; locate the keytap entry and replace the loose
"0.4" specifier with the chosen pinned form.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 312ff363-10be-43ba-80ab-d33c9f00646f

📥 Commits

Reviewing files that changed from the base of the PR and between c6114b6 and c7cd7fd.

⛔ Files ignored due to path filters (1)
  • tauri/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • tauri/src-tauri/Cargo.toml


[dependencies]
tauri = { version = "2.0", features = [] }
tauri = { version = "2.0", features = ["macos-private-api"] }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Tauri macos-private-api feature App Store rejection

💡 Result:

Enabling the macos-private-api feature in Tauri for macOS apps will cause rejection from the Mac App Store. Apple's App Store guidelines prohibit the use of private APIs, and this feature enables private APIs for functionalities like transparent windows on WKWebView. Key evidence: - Official Tauri code and schema explicitly warn: "WARNING: Using private APIs on macOS prevents your application from being accepted to the App Store." (Tauri commit adding the feature, schema.json, window config docs) - Tauri core team statements: Apps using macOSPrivateApi can be notarized but will be rejected from the App Store. (GitHub issues #463 in tauri-docs, PR #3511) - Configuration docs state: If tauri.macOSPrivateApi is enabled and used (e.g., transparent background), the app can't be submitted to the Mac App Store. (tauri-docs-wip) - Related plugins like window-vibrancy confirm private API usage blocks App Store submission, though some anecdotal reports suggest mixed results for specific cases. To submit to App Store: - Set "macOSPrivateApi": false in tauri.conf.json - Remove "macos-private-api" feature from Cargo.toml if present - Avoid features requiring it, like window.transparent: true (without public API alternatives) - Follow Tauri's App Store guide: enable App Sandbox, provisioning profile, entitlements, etc. Notarization for Developer ID distribution (outside App Store) is possible even with private APIs.

Citations:


🏁 Script executed:

# Find CONTRIBUTING and release notes files
find . -maxdepth 3 -type f \( -name "CONTRIBUTING*" -o -name "CHANGELOG*" -o -name "RELEASE*" -o -name "README*" \) | head -20

Repository: jamiepine/voicebox

Length of output: 221


🏁 Script executed:

# Check the full Cargo.toml context around line 16
cat -n tauri/src-tauri/Cargo.toml | head -30

Repository: jamiepine/voicebox

Length of output: 1158


🏁 Script executed:

# Search for any existing documentation about macos-private-api in the repo
rg -i "macos-private-api|app.store|distribution" --type md --type toml -A 2 -B 2 | head -100

Repository: jamiepine/voicebox

Length of output: 965


🏁 Script executed:

# Check if there are any feature gates or conditional configs related to macos-private-api
rg "macos-private-api" -A 3 -B 3

Repository: jamiepine/voicebox

Length of output: 457


🏁 Script executed:

# Check CONTRIBUTING.md content
cat CONTRIBUTING.md

Repository: jamiepine/voicebox

Length of output: 11269


🏁 Script executed:

# Check if there are any build/distribution docs
find . -type f -name "*.md" | xargs grep -l "App.Store\|distribution\|macos" 2>/dev/null | head -10

Repository: jamiepine/voicebox

Length of output: 96


🏁 Script executed:

# Check if macos-private-api features are actually used in the code
rg "transparent|vibrancy|fullsize|NSWindow" --type ts --type tsx --type rs -i | head -20

Repository: jamiepine/voicebox

Length of output: 90


🏁 Script executed:

# Check tauri.conf.json for any macOS-specific configuration
find . -name "tauri.conf.json" -exec cat {} \; | grep -A 20 -B 5 "macos\|mac"

Repository: jamiepine/voicebox

Length of output: 912


🏁 Script executed:

# Check README for distribution/download info
head -100 README.md | grep -i "download\|distribute\|release\|store"

Repository: jamiepine/voicebox

Length of output: 1042


🏁 Script executed:

# Check if there are any other references to App Store or distribution strategy
rg "App.Store|release|distribution" README.md -i

Repository: jamiepine/voicebox

Length of output: 662


🏁 Script executed:

# Get broader README context for distribution strategy
head -200 README.md

Repository: jamiepine/voicebox

Length of output: 11154


Document that macos-private-api feature blocks Mac App Store distribution.

Enabling tauri/macos-private-api (currently set in Cargo.toml line 16 and used for the titleBarStyle: "Overlay" setting in tauri.conf.json) opts into private NSWindow APIs that are rejected by App Store review per official Tauri warnings. Since Voicebox distributes directly via GitHub releases and voicebox.sh downloads, this is intentional — but add a note to CONTRIBUTING.md or the next release notes making this explicit, so future maintainers don't attempt App Store submission without first removing this feature gate.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tauri/src-tauri/Cargo.toml` at line 16, Add a short note to CONTRIBUTING.md
(or the next release notes) stating that the tauri Cargo feature
macos-private-api (used in Cargo.toml via tauri = { ..., features =
["macos-private-api"] } and by the titleBarStyle: "Overlay" setting in
tauri.conf.json) opts into private NSWindow APIs which will cause rejection from
the Mac App Store; instruct maintainers to remove that feature gate before
attempting App Store distribution and link to the official Tauri App Store
guidance for reference.

useQuery was firing GET /capture/readiness every 5s forever, and also
on every window focus. Once stt.ready and llm.ready are both true the
answer can only change when the user swaps a model in settings, and
useSettings already invalidates the query on that path — the polling
was pure noise.

Gate both refetchInterval and refetchOnWindowFocus on "not fully ready"
so we fall silent once the checklist is green.
- add dark/light/system theme with persisted choice + OS change listener
- restyle stories sidebar (search, item layout, border) to match captures
- move floating generate box to right column of stories, add top fade mask
- story track editor: sticky track labels aligned via flex rows, custom scrollbar with left/right zoom handles
- capture pill light mode pass, fix inline waveform progress color
- pull mcp_server hidden imports into the pyinstaller spec
- notarization doc draft
ListPane is a compound component (Header / TitleRow / Title / Actions /
Search / Scroll) that owns the relative wrapper, faded right divider
(50px top fade), top scroll mask, and absolute-positioned header used by
every list-detail tab. Wires up CapturesTab, StoryList, and EffectsList.
EffectsTab gets -mx-8 / pr-8 to match the edge-to-edge layout used
elsewhere.

Other changes:
- MCPPage: native <select> → shadcn <Select> for default voice and
  per-binding voice pickers
- Button outline variant: add hover:border-accent
- Drop hover:text-destructive from trailing delete buttons
  (HistoryTable, GpuAcceleration, GpuPage, EffectsChainEditor,
  EffectsDetail)
- HistoryTable empty state moved behind t('history.empty')
- StoryContent scroll padding pt-14 → pt-16
- backend health reports the captures dir
- landing CapturesMockup: "Send to" → "Export" with Download icon
- CHANGELOG: drop [Unreleased] personality section
- Hand /generate ids to the global SSE watcher so playback fires on completion. The mutation onSuccess was checking audio_path on a queued row, which is always empty — autoplay never ran.
- Bind the Play As voice selection to capture_settings.default_playback_voice_id, kept in sync with the Settings → Captures and Settings → MCP pickers. Picking from the split-button dropdown writes back to settings.
- Extract AudioBars from HistoryTable into a shared component; use it for the Play As generating state in place of Loader2.
- Stop the active-state hover from flashing white text when the button is in its lighter accent/10 fill.
- Drop the gradient avatar swatches from the Settings → Captures voice dropdown.
- Backend: when the gen worker exits without writing a terminal status (e.g. SQLite lock racing the failed-status write inside its own exception handler), the cancel endpoint now flips the row to failed instead of 409-ing. Worker also force-fails on its way out as a belt-and-suspenders.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant