fix(VoiceServer): cross-platform audio playback in playAudio() by MHoroszowski · Pull Request #1061 · danielmiessler/Personal_AI_Infrastructure

MHoroszowski · 2026-04-11T13:45:23Z

Summary

VoiceServer/server.ts:playAudio() hardcodes /usr/bin/afplay, which is macOS-only. On Linux every TTS notification fails with ENOENT and the voice server appears to work but produces no audio — the failure is swallowed by the fire-and-forget curl pattern at the call sites, so users see ✅ success in their terminal and silent speakers.

This PR makes audio playback cross-platform. It is complementary to #1030 (which covers the desktop-notification half via notify-send) and addresses the audio-playback half of #855 — neither file region overlaps.

Change

Extract player resolution into a small getAudioPlayer() helper, then call it from playAudio():

Platform	Player	Notes
`darwin`	`/usr/bin/afplay`	unchanged behavior
linux + ffplay present	`/usr/bin/ffplay`	preferred — `ffmpeg` is widely preinstalled
linux + mpg123 present	`/usr/bin/mpg123`	lightweight fallback (~500 KB)
neither	throws with install hint	actionable error instead of `ENOENT`

Volume is preserved across players: afplay -v (0..1 float), ffplay -volume (0..100 int), mpg123 -f (0..32768 PCM scale).

Why ffplay first, mpg123 second

ffplay ships with ffmpeg which is already a dependency on most modern dev boxes; mpg123 is the well-known minimal fallback called out in #855. Trying both gives users a graceful path on minimal containers/distros without forcing a heavy install.

Test plan

Verified on Ubuntu 24.04 / WSL2 on Windows 11 — TTS audio plays through WSLg PulseAudio to Windows speakers with zero additional configuration (no PulseAudio TCP forwarding, no Docker shenanigans)
mpg123 fallback path verified directly (mpg123 -q -f 32768 /tmp/voice-*.mp3 → exit 0, audio plays)
macOS code path unchanged — same afplay -v invocation
No new dependencies; no changes outside playAudio() and the new helper
macOS smoke test (would appreciate a maintainer running it before merge — I don't have a Mac handy)

Scope

✅ playAudio() only
❌ Desktop notifications — already handled by fix: cross-platform desktop notifications in VoiceServer #1030
❌ start.sh / stop.sh / restart.sh (launchctl → systemd-user) — separate PR, deserves its own discussion

References

Addresses the audio-playback half of VoiceServer: afplay (macOS-only) breaks audio on Linux — silent failure #855
Complementary to fix: cross-platform desktop notifications in VoiceServer #1030
Historical context in v3.0 VoiceServer has hard macOS dependencies — no Linux support #685 (notes that PR feat: Linux compatibility fixes for cross-platform PAI #288 once fixed all of this and it was lost in the v3.0 restructuring)

playAudio() hardcoded /usr/bin/afplay, which is macOS-only. On Linux, every TTS notification fails with ENOENT and the voice server appears to work but produces no audio (the failure is swallowed by the fire-and-forget curl pattern used at the call sites). Extract player resolution into getAudioPlayer(): - darwin → afplay (unchanged) - linux + ffplay → ffplay -nodisp -autoexit -volume 0..100 - linux + mpg123 → mpg123 -f 0..32768 (PCM scale) - neither → throw with an actionable install hint ffplay is preferred because ffmpeg is widely preinstalled; mpg123 is the lightweight fallback. Both route through PulseAudio, so this works on native Linux and on Windows via WSL2 + WSLg out of the box. Verified on Ubuntu 24.04 / WSL2 (Windows 11): TTS audio plays through WSLg PulseAudio to Windows speakers with no additional configuration. Addresses the audio-playback half of danielmiessler#855. Complementary to danielmiessler#1030, which covers the desktop-notification half (osascript → notify-send) without overlap.

PAI currently owns the user's ~/.env file via a symlink created during install. This is a home-directory namespace grab: ~/.env is a conventional, user-owned file that many shells and tools look for. If the user installs any other tool that expects to read ~/.env, it either collides with PAI's secrets or is silently overwritten on the next PAI install. This is the wrong shape. The cause is that VoiceServer hardcodes `join(homedir(), '.env')` as the only place it looks for ELEVENLABS_API_KEY. The installer created the ~/.env symlink to make that hardcoded read resolve to the real secrets file at ~/.config/PAI/.env (which is already the XDG-compliant, correct canonical location). This change: - VoiceServer/server.ts: load ~/.config/PAI/.env first (XDG canonical location), then optionally overlay from ~/.env if the user has chosen to put PAI-relevant keys there. Values in ~/.env win on key collisions, preserving the "explicit user override" mental model. The error message when ELEVENLABS_API_KEY is missing now points at the canonical path. - PAI-Install/engine/actions.ts: stop creating the ~/.env symlink. ~/.claude/.env stays symlinked (that path is PAI's own namespace, so the symlink is safe and the hooks' existing reads continue working unchanged). The new comment explains why we no longer touch ~/.env. Backward compatibility: - Existing installs that already have the ~/.env symlink continue to work — both paths point at the same real file, so loading both is a no-op. - Existing installs that already have a real ~/.env with PAI values in it continue to work — loadEnvFile() reads both paths and either one (or both) can contain the keys. - New installs after this change will have a real file at ~/.config/PAI/.env and will NOT touch ~/.env. Users own ~/.env. Related to the VoiceServer cross-platform audio work in danielmiessler#1061 and the Pushcut notification channel in danielmiessler#1062.

… port check Extends the getAudioPlayer() pattern from danielmiessler#1061 to cover three more macOS- only assumptions that fail silently or visibly on Linux and WSL2: 1. sendNotification() hardcoded /usr/bin/osascript with an AppleScript `display notification` call. On Linux this is ENOENT; on WSL2 it is also ENOENT and the user loses every desktop banner. 2. The hardcoded ~/Library/Logs/pai-voice-server.log path appears in six different scripts (install/uninstall/start/stop/status and the menubar BitBar indicator). On Linux it writes into a non-standard location inside $HOME that XDG-aware tools never discover. 3. status.sh and stop.sh/uninstall.sh gate all port-check logic on `lsof`, which is not installed by default on many Linux and container images. When lsof is absent the "is port 8888 in use?" check silently returns no, even when pai-voice is actively listening. Changes: - server.ts: add three helpers next to getAudioPlayer(). * isWSL() — single source of truth for WSL1/WSL2 detection via /proc/version, short-circuits on non-Linux. * getLogPath() — darwin unchanged; linux/wsl uses ${XDG_DATA_HOME:-$HOME/.local/share}/pai/logs/… * getNotificationCmd() — mirrors getAudioPlayer() shape. darwin → /usr/bin/osascript (literal AppleScript preserved byte-identically so macOS behavior is unchanged). wsl2 → wsl-notify-send if present, else powershell.exe with BurntToast (if the module is importable) or a bare [Windows.UI.Notifications.ToastNotificationManager] one-liner as the final fallback. linux → /usr/bin/notify-send. Call site inside sendNotification() routes through the new helper. The darwin branch produces argv identical to the pre-refactor literal, so macOS is an obvious-by-inspection no-op. - lib/platform.sh (new): shared shell helpers sourced by every script. * pai_is_wsl — matches isWSL() in TS (single detector). * pai_log_path — matches getLogPath() in TS. * pai_port_pids PORT — cascades lsof → ss → netstat, printing one PID per line; returns 1 if nothing listens. POSIX-leaning bash, side-effect free on source, no mkdir, no exit. - install.sh, uninstall.sh, status.sh, start.sh, stop.sh, menubar/pai-voice.5s.sh: source lib/platform.sh, replace literal LOG_PATH assignments with "$(pai_log_path)", and swap lsof-only port checks for pai_port_pids so the scripts work when lsof is unavailable. Darwin launchctl logic is untouched; only the log-path string and the port-check call site change on the macOS flow. Verified on Ubuntu 24.04 / WSL2: - bun bundles server.ts cleanly. - bash -n passes on every modified script. - pai_is_wsl returns 0 inside WSL2 and stays false on pure Linux (no /proc/version microsoft match). - pai_log_path resolves to /home/$USER/.local/share/pai/logs/pai-voice-server.log. - pai_port_pids 8888 returns the live PID via lsof, ss (lsof masked), and netstat (lsof + ss masked) — all three branches confirmed against a running pai-voice.service. - getNotificationCmd() on WSL2 selects the powershell.exe branch when wsl-notify-send is absent; powershell.exe returns exit 0 and fires a toast via BurntToast/WinRT. Darwin paths are preserved byte-identically and were not exercised on hardware (author runs PAI on WSL2 only). Please review the darwin branches carefully — they are intentionally line-for-line equal to the pre-refactor literals. Stacked on top of danielmiessler#1061 (cross-platform audio playback). Should be merged after danielmiessler#1061, or rebased onto main if danielmiessler#1061 lands first.

MHoroszowski · 2026-04-15T20:41:25Z

Heads up: #1072 is stacked on this branch — it extends the getAudioPlayer() pattern introduced here to cover desktop notifications, log paths, and port checks for Linux/WSL2. Because the base of #1072 can't point at a fork branch, it targets main directly, so its diff currently shows this PR's changes + the new changes combined; once this merges, #1072's diff will auto-collapse to only the new work. Suggest reviewing/merging this one first.

VoiceServer's install/start/stop/status/uninstall scripts previously assumed macOS/launchctl exclusively. Linux and WSL2 users had no supported path to run the voice server as a supervised service. This adds a systemd --user branch to each script, selected at runtime via the pai_is_darwin helper from lib/platform.sh. The Darwin launchctl flow is preserved byte-identical. ### install.sh - New systemd_unit_* configuration constants alongside the existing PLIST_PATH. - "Existing installation" check branches on pai_is_darwin. On Linux/WSL it probes systemctl --user list-unit-files and the on- disk unit file, prompts the user with the same y/n reinstall UX as the macOS path, and on decline exits 0 without touching the live unit. - Linux/WSL prechecks that systemctl is present and that systemctl --user list-units --no-pager is reachable. On WSL2 it prints the /etc/wsl.conf [boot] systemd=true hint if the user session is unavailable. - New unit generator writes ~/.config/systemd/user/pai-voice.service templated on the reference unit that ships with PAI on WSL2: [Unit] Description / After=default.target [Service] Type=simple WorkingDirectory=${SCRIPT_DIR} ExecStart=${BUN_BIN} run server.ts Restart=on-failure RestartSec=3 StandardOutput/StandardError=append:${LOG_PATH} Environment=HOME/PATH [Install] WantedBy=default.target LOG_PATH comes from pai_log_path (XDG on Linux, Library/Logs on macOS). BUN_BIN is resolved with command -v bun at install time. HOME and PATH are set explicitly so the child process can find the user's ~/.env and runtime helpers like mpg123. The unit passes systemd-analyze verify with no warnings. - daemon-reload + enable --now starts and persists the service. Failure prints the systemctl/journalctl commands to diagnose. - Post-install summary branches by service manager ("launchd" vs "systemd --user") and the stale "macOS Say (fallback)" voice string is now Darwin-only, matching the honest message PR danielmiessler#1075 introduced elsewhere. ### start.sh - Darwin path preserved byte-identical (LaunchAgent existence check, launchctl list, launchctl load, START_RC capture). - Linux/WSL branch checks $SYSTEMD_UNIT_PATH for existence, systemctl --user is-active --quiet for the already-running fast path, and systemctl --user start otherwise. The "already running" hint on Linux points at `systemctl --user restart` instead of the macOS-only ./restart.sh. ### stop.sh - Darwin path preserved byte-identical. - Linux/WSL branch: systemctl --user is-active --quiet → stop → ok. The existing pai_port_pids-based port-8888 cleanup at the tail of the script stays common to both platforms (unchanged from danielmiessler#1072). ### status.sh - Service Status block branches on pai_is_darwin. Linux/WSL reads systemctl --user is-active + MainPID, falling back to list-unit-files for the installed-but-inactive state, and prints "not installed" if neither. - The Voice Configuration block (Darwin "Using macOS 'say'" vs Linux "No TTS fallback") from PR danielmiessler#1075 is untouched. ### uninstall.sh - Confirmation banner branches so Linux/WSL says "Remove the systemd --user unit" instead of "Remove the LaunchAgent". - Stop-and-remove block branches on pai_is_darwin. Linux/WSL path stops the unit, disables it, removes the file, and daemon-reloads. - The optional log-file cleanup and post-uninstall notes are platform-agnostic and unchanged. ### Reference unit Templated on the working pai-voice.service unit that ships with PAI on WSL2 (Description, After, Type, Restart, StandardOutput/Error format, WantedBy). Differences from the reference: - WorkingDirectory uses ${SCRIPT_DIR} instead of %h/.claude/VoiceServer so fork checkouts at any path work correctly. - ExecStart uses $(command -v bun) instead of %h/.bun/bin/bun so non-default bun install locations work. - Log path uses pai_log_path (XDG on Linux) instead of %h/.claude/VoiceServer/voice-server.log so logs land in the XDG-compliant location introduced by danielmiessler#1072. - Explicit Environment=HOME and Environment=PATH so the service can locate ~/.env and runtime helpers (mpg123, ffplay, etc.) regardless of how the systemd --user session was launched. ### Idempotency On a machine with an already-active pai-voice.service unit, re- running install.sh prompts for reinstall (y/n). Declining exits cleanly without touching the live unit file or the running process (verified on the author's machine: PID and unit mtime unchanged across a full install.sh run with 'n' answer). Accepting will stop, disable, rewrite, daemon-reload, enable, and start, matching the exact UX of the macOS reinstall path. ### Verification On Ubuntu 24.04 / WSL2 with systemd --user: - bash -n passes on all five modified scripts. - systemd-analyze verify on the generated unit: clean exit. - Generated plist content (Darwin branch) is byte-identical to the pre-refactor heredoc — confirmed by running the unaltered heredoc body with identical stubs and diffing. The Darwin branch only gains an enclosing `if pai_is_darwin; then ... fi` wrapper; the heredoc body lines are at their original columns so the plist written to disk is byte-identical. - install.sh with an existing unit file + 'n' answer: hits the "Installation cancelled" path, exit 0, live unit mtime and the running service PID both unchanged. - start.sh against a live running unit: hits "already running", exit 0, live PID unchanged. - start.sh with a missing unit (fake HOME): hits "Service not installed", exit 1, no systemctl invocation. - status.sh against a live running unit: reports "OK Service is active (PID: ...)" with the real MainPID. - status.sh with a bogus unit name: hits "Service is not installed". - stop.sh sliced with a bogus unit name: hits "not running" branch, no side effects. - uninstall.sh with 'n' answer: prints Linux-specific confirmation banner, "Uninstall cancelled", live state untouched. Darwin path not exercised on hardware (author has no Mac). The launchctl/plist code paths are wrapped in `if pai_is_darwin; then` with the pre-refactor content preserved verbatim, and the generated plist is confirmed byte-identical. Darwin reviewers please spot- check the wrapped launchctl flow for any regressions. Stacked on danielmiessler#1061, danielmiessler#1072, danielmiessler#1075. Rebase onto main after those land.

MHoroszowski mentioned this pull request Apr 12, 2026

statusline-command.sh silently shows empty defaults when jq is missing (common on fresh Linux/WSL installs) #1065

Open

MHoroszowski mentioned this pull request Apr 12, 2026

fix(env): stop claiming ~/.env; read from ~/.config/PAI/.env instead #1066

Open

5 tasks

MHoroszowski mentioned this pull request Apr 15, 2026

fix(VoiceServer): cross-platform desktop notifications, log path, and port check #1072

Open

MHoroszowski mentioned this pull request Apr 15, 2026

fix(docs+menubar): platform-aware fallback messages and macOS-only guard #1075

Open

7 tasks

MHoroszowski mentioned this pull request Apr 15, 2026

fix(VoiceServer): service-manager detect-and-branch for launchd+systemd #1076

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(VoiceServer): cross-platform audio playback in playAudio()#1061

fix(VoiceServer): cross-platform audio playback in playAudio()#1061
MHoroszowski wants to merge 1 commit intodanielmiessler:mainfrom
MHoroszowski:fix/voice-server-cross-platform-audio

MHoroszowski commented Apr 11, 2026

Uh oh!

MHoroszowski commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

MHoroszowski commented Apr 11, 2026

Summary

Change

Why ffplay first, mpg123 second

Test plan

Scope

References

Uh oh!

MHoroszowski commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant