Skip to content

fix(VoiceServer): cross-platform audio playback in playAudio()#1061

Open
MHoroszowski wants to merge 1 commit intodanielmiessler:mainfrom
MHoroszowski:fix/voice-server-cross-platform-audio
Open

fix(VoiceServer): cross-platform audio playback in playAudio()#1061
MHoroszowski wants to merge 1 commit intodanielmiessler:mainfrom
MHoroszowski:fix/voice-server-cross-platform-audio

Conversation

@MHoroszowski
Copy link
Copy Markdown

Summary

VoiceServer/server.ts:playAudio() hardcodes /usr/bin/afplay, which is macOS-only. On Linux every TTS notification fails with ENOENT and the voice server appears to work but produces no audio — the failure is swallowed by the fire-and-forget curl pattern at the call sites, so users see ✅ success in their terminal and silent speakers.

This PR makes audio playback cross-platform. It is complementary to #1030 (which covers the desktop-notification half via notify-send) and addresses the audio-playback half of #855 — neither file region overlaps.

Change

Extract player resolution into a small getAudioPlayer() helper, then call it from playAudio():

Platform Player Notes
darwin /usr/bin/afplay unchanged behavior
linux + ffplay present /usr/bin/ffplay preferred — ffmpeg is widely preinstalled
linux + mpg123 present /usr/bin/mpg123 lightweight fallback (~500 KB)
neither throws with install hint actionable error instead of ENOENT

Volume is preserved across players: afplay -v (0..1 float), ffplay -volume (0..100 int), mpg123 -f (0..32768 PCM scale).

Why ffplay first, mpg123 second

ffplay ships with ffmpeg which is already a dependency on most modern dev boxes; mpg123 is the well-known minimal fallback called out in #855. Trying both gives users a graceful path on minimal containers/distros without forcing a heavy install.

Test plan

  • Verified on Ubuntu 24.04 / WSL2 on Windows 11 — TTS audio plays through WSLg PulseAudio to Windows speakers with zero additional configuration (no PulseAudio TCP forwarding, no Docker shenanigans)
  • mpg123 fallback path verified directly (mpg123 -q -f 32768 /tmp/voice-*.mp3 → exit 0, audio plays)
  • macOS code path unchanged — same afplay -v invocation
  • No new dependencies; no changes outside playAudio() and the new helper
  • macOS smoke test (would appreciate a maintainer running it before merge — I don't have a Mac handy)

Scope

References

playAudio() hardcoded /usr/bin/afplay, which is macOS-only. On Linux,
every TTS notification fails with ENOENT and the voice server appears
to work but produces no audio (the failure is swallowed by the
fire-and-forget curl pattern used at the call sites).

Extract player resolution into getAudioPlayer():
- darwin           → afplay  (unchanged)
- linux + ffplay   → ffplay -nodisp -autoexit -volume 0..100
- linux + mpg123   → mpg123 -f 0..32768 (PCM scale)
- neither          → throw with an actionable install hint

ffplay is preferred because ffmpeg is widely preinstalled; mpg123 is
the lightweight fallback. Both route through PulseAudio, so this works
on native Linux and on Windows via WSL2 + WSLg out of the box.

Verified on Ubuntu 24.04 / WSL2 (Windows 11): TTS audio plays through
WSLg PulseAudio to Windows speakers with no additional configuration.

Addresses the audio-playback half of danielmiessler#855. Complementary to danielmiessler#1030,
which covers the desktop-notification half (osascript → notify-send)
without overlap.
MHoroszowski added a commit to MHoroszowski/Personal_AI_Infrastructure that referenced this pull request Apr 12, 2026
PAI currently owns the user's ~/.env file via a symlink created during
install. This is a home-directory namespace grab: ~/.env is a conventional,
user-owned file that many shells and tools look for. If the user installs
any other tool that expects to read ~/.env, it either collides with PAI's
secrets or is silently overwritten on the next PAI install. This is the
wrong shape.

The cause is that VoiceServer hardcodes `join(homedir(), '.env')` as the
only place it looks for ELEVENLABS_API_KEY. The installer created the
~/.env symlink to make that hardcoded read resolve to the real secrets
file at ~/.config/PAI/.env (which is already the XDG-compliant, correct
canonical location).

This change:

- VoiceServer/server.ts: load ~/.config/PAI/.env first (XDG canonical
  location), then optionally overlay from ~/.env if the user has chosen
  to put PAI-relevant keys there. Values in ~/.env win on key collisions,
  preserving the "explicit user override" mental model. The error message
  when ELEVENLABS_API_KEY is missing now points at the canonical path.

- PAI-Install/engine/actions.ts: stop creating the ~/.env symlink.
  ~/.claude/.env stays symlinked (that path is PAI's own namespace, so
  the symlink is safe and the hooks' existing reads continue working
  unchanged). The new comment explains why we no longer touch ~/.env.

Backward compatibility:

- Existing installs that already have the ~/.env symlink continue to
  work — both paths point at the same real file, so loading both is a
  no-op.
- Existing installs that already have a real ~/.env with PAI values in
  it continue to work — loadEnvFile() reads both paths and either one
  (or both) can contain the keys.
- New installs after this change will have a real file at
  ~/.config/PAI/.env and will NOT touch ~/.env. Users own ~/.env.

Related to the VoiceServer cross-platform audio work in danielmiessler#1061 and the
Pushcut notification channel in danielmiessler#1062.
MHoroszowski added a commit to MHoroszowski/Personal_AI_Infrastructure that referenced this pull request Apr 15, 2026
… port check

Extends the getAudioPlayer() pattern from danielmiessler#1061 to cover three more macOS-
only assumptions that fail silently or visibly on Linux and WSL2:

1. sendNotification() hardcoded /usr/bin/osascript with an AppleScript
   `display notification` call. On Linux this is ENOENT; on WSL2 it is
   also ENOENT and the user loses every desktop banner.
2. The hardcoded ~/Library/Logs/pai-voice-server.log path appears in six
   different scripts (install/uninstall/start/stop/status and the menubar
   BitBar indicator). On Linux it writes into a non-standard location
   inside $HOME that XDG-aware tools never discover.
3. status.sh and stop.sh/uninstall.sh gate all port-check logic on `lsof`,
   which is not installed by default on many Linux and container images.
   When lsof is absent the "is port 8888 in use?" check silently returns
   no, even when pai-voice is actively listening.

Changes:

- server.ts: add three helpers next to getAudioPlayer().
  * isWSL()              — single source of truth for WSL1/WSL2 detection
                           via /proc/version, short-circuits on non-Linux.
  * getLogPath()         — darwin unchanged; linux/wsl uses
                           ${XDG_DATA_HOME:-$HOME/.local/share}/pai/logs/…
  * getNotificationCmd() — mirrors getAudioPlayer() shape.
      darwin → /usr/bin/osascript (literal AppleScript preserved
               byte-identically so macOS behavior is unchanged).
      wsl2   → wsl-notify-send if present, else powershell.exe with
               BurntToast (if the module is importable) or a bare
               [Windows.UI.Notifications.ToastNotificationManager]
               one-liner as the final fallback.
      linux  → /usr/bin/notify-send.
  Call site inside sendNotification() routes through the new helper. The
  darwin branch produces argv identical to the pre-refactor literal, so
  macOS is an obvious-by-inspection no-op.

- lib/platform.sh (new): shared shell helpers sourced by every script.
  * pai_is_wsl           — matches isWSL() in TS (single detector).
  * pai_log_path         — matches getLogPath() in TS.
  * pai_port_pids PORT   — cascades lsof → ss → netstat, printing one
                           PID per line; returns 1 if nothing listens.
  POSIX-leaning bash, side-effect free on source, no mkdir, no exit.

- install.sh, uninstall.sh, status.sh, start.sh, stop.sh,
  menubar/pai-voice.5s.sh: source lib/platform.sh, replace literal
  LOG_PATH assignments with "$(pai_log_path)", and swap lsof-only port
  checks for pai_port_pids so the scripts work when lsof is unavailable.
  Darwin launchctl logic is untouched; only the log-path string and the
  port-check call site change on the macOS flow.

Verified on Ubuntu 24.04 / WSL2:
- bun bundles server.ts cleanly.
- bash -n passes on every modified script.
- pai_is_wsl returns 0 inside WSL2 and stays false on pure Linux
  (no /proc/version microsoft match).
- pai_log_path resolves to /home/$USER/.local/share/pai/logs/pai-voice-server.log.
- pai_port_pids 8888 returns the live PID via lsof, ss (lsof masked),
  and netstat (lsof + ss masked) — all three branches confirmed against
  a running pai-voice.service.
- getNotificationCmd() on WSL2 selects the powershell.exe branch when
  wsl-notify-send is absent; powershell.exe returns exit 0 and fires a
  toast via BurntToast/WinRT.

Darwin paths are preserved byte-identically and were not exercised on
hardware (author runs PAI on WSL2 only). Please review the darwin
branches carefully — they are intentionally line-for-line equal to the
pre-refactor literals.

Stacked on top of danielmiessler#1061 (cross-platform audio playback). Should be
merged after danielmiessler#1061, or rebased onto main if danielmiessler#1061 lands first.
@MHoroszowski
Copy link
Copy Markdown
Author

Heads up: #1072 is stacked on this branch — it extends the getAudioPlayer() pattern introduced here to cover desktop notifications, log paths, and port checks for Linux/WSL2. Because the base of #1072 can't point at a fork branch, it targets main directly, so its diff currently shows this PR's changes + the new changes combined; once this merges, #1072's diff will auto-collapse to only the new work. Suggest reviewing/merging this one first.

MHoroszowski added a commit to MHoroszowski/Personal_AI_Infrastructure that referenced this pull request Apr 15, 2026
VoiceServer's install/start/stop/status/uninstall scripts previously
assumed macOS/launchctl exclusively. Linux and WSL2 users had no
supported path to run the voice server as a supervised service. This
adds a systemd --user branch to each script, selected at runtime via
the pai_is_darwin helper from lib/platform.sh. The Darwin launchctl
flow is preserved byte-identical.

### install.sh

- New systemd_unit_* configuration constants alongside the existing
  PLIST_PATH.
- "Existing installation" check branches on pai_is_darwin. On
  Linux/WSL it probes systemctl --user list-unit-files and the on-
  disk unit file, prompts the user with the same y/n reinstall UX
  as the macOS path, and on decline exits 0 without touching the
  live unit.
- Linux/WSL prechecks that systemctl is present and that
  systemctl --user list-units --no-pager is reachable. On WSL2 it
  prints the /etc/wsl.conf [boot] systemd=true hint if the user
  session is unavailable.
- New unit generator writes ~/.config/systemd/user/pai-voice.service
  templated on the reference unit that ships with PAI on WSL2:
    [Unit]  Description / After=default.target
    [Service]  Type=simple  WorkingDirectory=${SCRIPT_DIR}
               ExecStart=${BUN_BIN} run server.ts
               Restart=on-failure RestartSec=3
               StandardOutput/StandardError=append:${LOG_PATH}
               Environment=HOME/PATH
    [Install]  WantedBy=default.target
  LOG_PATH comes from pai_log_path (XDG on Linux, Library/Logs on
  macOS). BUN_BIN is resolved with command -v bun at install time.
  HOME and PATH are set explicitly so the child process can find
  the user's ~/.env and runtime helpers like mpg123. The unit
  passes systemd-analyze verify with no warnings.
- daemon-reload + enable --now starts and persists the service.
  Failure prints the systemctl/journalctl commands to diagnose.
- Post-install summary branches by service manager ("launchd" vs
  "systemd --user") and the stale "macOS Say (fallback)" voice
  string is now Darwin-only, matching the honest message PR danielmiessler#1075
  introduced elsewhere.

### start.sh

- Darwin path preserved byte-identical (LaunchAgent existence check,
  launchctl list, launchctl load, START_RC capture).
- Linux/WSL branch checks $SYSTEMD_UNIT_PATH for existence,
  systemctl --user is-active --quiet for the already-running fast
  path, and systemctl --user start otherwise. The "already running"
  hint on Linux points at `systemctl --user restart` instead of the
  macOS-only ./restart.sh.

### stop.sh

- Darwin path preserved byte-identical.
- Linux/WSL branch: systemctl --user is-active --quiet → stop → ok.
  The existing pai_port_pids-based port-8888 cleanup at the tail of
  the script stays common to both platforms (unchanged from danielmiessler#1072).

### status.sh

- Service Status block branches on pai_is_darwin. Linux/WSL reads
  systemctl --user is-active + MainPID, falling back to
  list-unit-files for the installed-but-inactive state, and prints
  "not installed" if neither.
- The Voice Configuration block (Darwin "Using macOS 'say'" vs
  Linux "No TTS fallback") from PR danielmiessler#1075 is untouched.

### uninstall.sh

- Confirmation banner branches so Linux/WSL says "Remove the
  systemd --user unit" instead of "Remove the LaunchAgent".
- Stop-and-remove block branches on pai_is_darwin. Linux/WSL path
  stops the unit, disables it, removes the file, and daemon-reloads.
- The optional log-file cleanup and post-uninstall notes are
  platform-agnostic and unchanged.

### Reference unit

Templated on the working pai-voice.service unit that ships with PAI
on WSL2 (Description, After, Type, Restart, StandardOutput/Error
format, WantedBy). Differences from the reference:

- WorkingDirectory uses ${SCRIPT_DIR} instead of %h/.claude/VoiceServer
  so fork checkouts at any path work correctly.
- ExecStart uses $(command -v bun) instead of %h/.bun/bin/bun so
  non-default bun install locations work.
- Log path uses pai_log_path (XDG on Linux) instead of
  %h/.claude/VoiceServer/voice-server.log so logs land in the
  XDG-compliant location introduced by danielmiessler#1072.
- Explicit Environment=HOME and Environment=PATH so the service can
  locate ~/.env and runtime helpers (mpg123, ffplay, etc.)
  regardless of how the systemd --user session was launched.

### Idempotency

On a machine with an already-active pai-voice.service unit, re-
running install.sh prompts for reinstall (y/n). Declining exits
cleanly without touching the live unit file or the running process
(verified on the author's machine: PID and unit mtime unchanged
across a full install.sh run with 'n' answer). Accepting will stop,
disable, rewrite, daemon-reload, enable, and start, matching the
exact UX of the macOS reinstall path.

### Verification

On Ubuntu 24.04 / WSL2 with systemd --user:

- bash -n passes on all five modified scripts.
- systemd-analyze verify on the generated unit: clean exit.
- Generated plist content (Darwin branch) is byte-identical to the
  pre-refactor heredoc — confirmed by running the unaltered heredoc
  body with identical stubs and diffing. The Darwin branch only
  gains an enclosing `if pai_is_darwin; then ... fi` wrapper; the
  heredoc body lines are at their original columns so the plist
  written to disk is byte-identical.
- install.sh with an existing unit file + 'n' answer: hits the
  "Installation cancelled" path, exit 0, live unit mtime and the
  running service PID both unchanged.
- start.sh against a live running unit: hits "already running",
  exit 0, live PID unchanged.
- start.sh with a missing unit (fake HOME): hits "Service not
  installed", exit 1, no systemctl invocation.
- status.sh against a live running unit: reports "OK Service is
  active (PID: ...)" with the real MainPID.
- status.sh with a bogus unit name: hits "Service is not installed".
- stop.sh sliced with a bogus unit name: hits "not running" branch,
  no side effects.
- uninstall.sh with 'n' answer: prints Linux-specific confirmation
  banner, "Uninstall cancelled", live state untouched.

Darwin path not exercised on hardware (author has no Mac). The
launchctl/plist code paths are wrapped in `if pai_is_darwin; then`
with the pre-refactor content preserved verbatim, and the generated
plist is confirmed byte-identical. Darwin reviewers please spot-
check the wrapped launchctl flow for any regressions.

Stacked on danielmiessler#1061, danielmiessler#1072, danielmiessler#1075. Rebase onto main after those land.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant