speechmatics
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎examples/voice/cli/.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎examples/voice/cli/.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎examples/voice/cli/README.md‎
Lines changed: 141 additions & 21 deletions b/‎examples/voice/cli/README.md‎
Lines changed: 141 additions & 21 deletions
@@ -169,4 +169,5 @@ tmp/
 .claude
 
 # Examples
+output/
 **/output.wav
@@ -0,0 +1 @@
+output/
@@ -5,15 +5,25 @@ Real-time transcription tool using the Speechmatics Voice SDK. Supports micropho
 ## Quick Start
 
 **Microphone:**
+
 ```bash
-python cli.py -p -k YOUR_API_KEY
+# Quick example
+python cli.py -k YOUR_API_KEY -p
+
+# Example that saves the output in verbose mode using a preset
+python cli.py -k YOUR_API_KEY -vvvvvpDSr -P conversation_smart_turn
 ```
 
+Output saved to `./output/YYYYMMDD_HHMMSS/log.jsonl`
+
 **Audio file:**
+
 ```bash
-python cli.py -p -k YOUR_API_KEY -i audio.wav
+python cli.py -k YOUR_API_KEY -i audio.wav -p
 ```
 
+Output saved to `./output/YYYYMMDD_HHMMSS/log.jsonl`
+
 Press `CTRL+C` to stop.
 
 ## Requirements
@@ -23,41 +33,71 @@ Press `CTRL+C` to stop.
 
 ## Options
 
+### Quick Reference
+
+Common short codes:
+
+- `-k` API key | `-i` input file | `-o` output dir | `-p` pretty print | `-v` verbose
+- `-r` record | `-S` save slices | `-P` preset | `-W` show config
+- `-l` language | `-m` mode | `-d` max delay | `-t` silence trigger
+- `-f` focus speakers | `-s` known speakers | `-E` enrol
+
 ### Core
 
 - `-k, --api-key` - API key (defaults to `SPEECHMATICS_API_KEY` env var)
 - `-u, --url` - Server URL (defaults to `SPEECHMATICS_RT_URL` env var)
 - `-i, --input-file` - Audio file path (WAV, mono 16-bit). Uses microphone if not specified
-- `-c, --config` - JSON config string or file path (overrides other Voice Agent options)
 
 ### Output
 
+- `-o, --output-dir` - Base output directory (default: ./output)
+  - Creates a session subdirectory with timestamp (YYYYMMDD_HHMMSS)
+  - Inside session directory:
+    - `log.jsonl` - All events with timestamps
+    - `recording.wav` - Microphone recording (if `-r` is used)
+    - `slice_*.wav` and `slice_*.json` - Audio slices (if `-S` is used)
+- `-r, --record` - Record microphone audio to recording.wav (microphone input only)
+- `-S, --save-slices` - Save audio slices on SPEAKER_ENDED events (SMART_TURN mode only)
 - `-p, --pretty` - Formatted console output with colors
-- `-o, --output-file` - Save output to JSONL file
 - `-v, --verbose` - Increase verbosity (can repeat: `-v`, `-vv`, `-vvv`, `-vvvv`, `-vvvvv`)
   - `-v` - Add speaker VAD events
   - `-vv` - Add turn predictions
   - `-vvv` - Add segment annotations
   - `-vvvv` - Add metrics
   - `-vvvvv` - Add STT events
 - `-L, --legacy` - Show only legacy transcript messages
-- `--results` - Include word-level results in segments
+- `-D, --default-device` - Use default audio device (skip selection)
+- `-w, --results` - Include word-level results in segments
 
 ### Audio
 
-- `--sample-rate` - Sample rate in Hz (default: 16000)
-- `--chunk-size` - Chunk size in bytes (default: 320)
+- `-R, --sample-rate` - Sample rate in Hz (default: 16000)
+- `-C, --chunk-size` - Chunk size in bytes (default: 320)
 - `-M, --mute` - Mute audio playback for file input
-- `-D, --default-device` - Use default audio device (skip selection)
 
 ### Voice Agent Config
 
-- `-l, --language` - Language code (default: en)
-- `-d, --max-delay` - Max transcription delay in seconds (default: 0.7)
-- `-t, --end-of-utterance-silence-trigger` - Silence duration for turn end (default: 0.5)
-- `-m, --end-of-utterance-mode` - Turn detection mode: `FIXED`, `ADAPTIVE`, `SMART_TURN`, or `EXTERNAL`
-- `-e, --emit-sentences` - Emit sentence-level segments
-- `--forced-eou` - Enable forced end of utterance
+**Configuration Priority:**
+
+1. Use `--preset` to start with a preset configuration (recommended)
+2. Use `-c/--config` to provide a complete JSON configuration
+3. Use individual parameters (`-l`, `-d`, `-t`, `-m`) to override preset settings or create custom config
+
+**Preset Options:**
+
+- `-P, --preset` - Use preset configuration: `scribe`, `low_latency`, `conversation_adaptive`, `conversation_smart_turn`, or `captions`
+- `--list-presets` - List available presets and exit
+- `-W, --show` - Display the final configuration as JSON and exit (after applying preset/config and overrides)
+
+**Configuration Options:**
+
+- `-c, --config` - JSON config string or file path (complete configuration)
+- `-l, --language` - Language code (overrides preset if used together)
+- `-d, --max-delay` - Max transcription delay in seconds (overrides preset if used together)
+- `-t, --end-of-utterance-silence-trigger` - Silence duration for turn end in seconds (overrides preset if used together)
+- `-m, --end-of-utterance-mode` - Turn detection mode: `FIXED`, `ADAPTIVE`, `SMART_TURN`, or `EXTERNAL` (overrides preset if used together)
+
+**Note:** When using `-c/--config`, you cannot use `-l`, `-d`, `-t`, `-m`, `-f`, `-I`, `-x`, or `-s` as the config JSON should contain all settings.
 
 ### Speaker Management
 
@@ -72,62 +112,142 @@ Press `CTRL+C` to stop.
 
 ## Examples
 
+**List presets:**
+
+```bash
+python cli.py --list-presets
+```
+
+**Show config (from preset):**
+
+```bash
+python cli.py -P scribe -W
+```
+
+**Show config (with overrides):**
+
+```bash
+python cli.py -P scribe -l fr -d 1.0 -W
+```
+
+**Use preset:**
+
+```bash
+python cli.py -k YOUR_KEY -P scribe -p
+```
+
+**Use preset with overrides:**
+
+```bash
+python cli.py -k YOUR_KEY -P scribe -l fr -d 1.0 -p
+```
+
 **Basic microphone:**
+
 ```bash
 python cli.py -k YOUR_KEY -p
 ```
 
+Output saved to `./output/YYYYMMDD_HHMMSS/log.jsonl`
+
+**Record microphone audio:**
+
+```bash
+python cli.py -k YOUR_KEY -r -p
+```
+
+Recording saved to `./output/YYYYMMDD_HHMMSS/recording.wav`
+
+**Custom output directory:**
+
+```bash
+python cli.py -k YOUR_KEY -o ./my_sessions -p
+```
+
+Output saved to `./my_sessions/YYYYMMDD_HHMMSS/log.jsonl`
+
+**EXTERNAL mode with manual turn control:**
+
+```bash
+python cli.py -k YOUR_KEY -m EXTERNAL -p
+```
+
+Press 't' or 'T' to manually signal end of turn.
+
+**Save audio slices (SMART_TURN mode):**
+
+```bash
+python cli.py -k YOUR_KEY -P conversation_smart_turn -S -p
+```
+
+Audio slices (~8 seconds) saved to `./output/YYYYMMDD_HHMMSS/slice_*.wav` with matching `.json` metadata files on each SPEAKER_ENDED event.
+
 **Audio file:**
+
 ```bash
 python cli.py -k YOUR_KEY -i audio.wav -p
 ```
 
 **Audio file (muted):**
-```bash
-python cli.py -k YOUR_KEY -i audio.wav -Mp
-```
 
-**Save output:**
 ```bash
-python cli.py -k YOUR_KEY -o output.jsonl -p
+python cli.py -k YOUR_KEY -i audio.wav -Mp
 ```
 
 **Verbose logging:**
+
 ```bash
 python cli.py -k YOUR_KEY -vv -p
 ```
 
+Shows additional events (speaker VAD, turn predictions, etc.)
+
 **Focus on speakers:**
+
 ```bash
 python cli.py -k YOUR_KEY -f S1 S2 -p
 ```
 
 **Enrol speakers:**
+
 ```bash
 python cli.py -k YOUR_KEY -Ep
 ```
+
 Press `CTRL+C` when done to see speaker identifiers.
 
 **Use known speakers:**
+
 ```bash
 python cli.py -k YOUR_KEY -s speakers.json -p
 ```
 
 Example `speakers.json`:
+
 ```json
 [
-  {"label": "Alice", "speaker_identifiers": ["XX...XX"]},
-  {"label": "Bob", "speaker_identifiers": ["YY...YY"]}
+  { "label": "Alice", "speaker_identifiers": ["XX...XX"] },
+  { "label": "Bob", "speaker_identifiers": ["YY...YY"] }
 ]
 ```
 
 **Custom config:**
+
 ```bash
 python cli.py -k YOUR_KEY -c config.json -p
 ```
 
 ## Notes
 
+- Output directory (`-o`) defaults to `./output`
+- Each session creates a timestamped subdirectory (YYYYMMDD_HHMMSS format)
+- Session directory contains:
+  - `log.jsonl` - All events with timestamps
+  - `recording.wav` - Microphone recording (if `-r` is used)
+  - `slice_*.wav` and `slice_*.json` - Audio slices (if `--save-slices` is used in SMART_TURN mode)
+- Session subdirectories prevent accidental data loss from multiple runs
+- Audio slices are ~8 seconds and saved on each SPEAKER_ENDED event
+- JSON metadata includes event details, speaker ID, timing, and slice duration
 - Speaker identifiers are encrypted and unique to your API key
 - Allow speakers to say at least 20 words before enrolling
 - Avoid labels `S1`, `S2` (reserved by engine)