Skip to content

Commit b3e83d5

Browse files
authored
Updated Voice SDK CLI example for using presets (#58)
* updated CLI for using presets * updated CLI to save recordings and slices for smart turn. * new 'external' preset * updated CLI readme * README update * Default to EXTERNAL. * adjust for smart turn events + updated defaults
1 parent 2527e0d commit b3e83d5

File tree

9 files changed

+615
-133
lines changed

9 files changed

+615
-133
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,4 +169,5 @@ tmp/
169169
.claude
170170

171171
# Examples
172+
output/
172173
**/output.wav

examples/voice/cli/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
output/

examples/voice/cli/README.md

Lines changed: 141 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,25 @@ Real-time transcription tool using the Speechmatics Voice SDK. Supports micropho
55
## Quick Start
66

77
**Microphone:**
8+
89
```bash
9-
python cli.py -p -k YOUR_API_KEY
10+
# Quick example
11+
python cli.py -k YOUR_API_KEY -p
12+
13+
# Example that saves the output in verbose mode using a preset
14+
python cli.py -k YOUR_API_KEY -vvvvvpDSr -P conversation_smart_turn
1015
```
1116

17+
Output saved to `./output/YYYYMMDD_HHMMSS/log.jsonl`
18+
1219
**Audio file:**
20+
1321
```bash
14-
python cli.py -p -k YOUR_API_KEY -i audio.wav
22+
python cli.py -k YOUR_API_KEY -i audio.wav -p
1523
```
1624

25+
Output saved to `./output/YYYYMMDD_HHMMSS/log.jsonl`
26+
1727
Press `CTRL+C` to stop.
1828

1929
## Requirements
@@ -23,41 +33,71 @@ Press `CTRL+C` to stop.
2333

2434
## Options
2535

36+
### Quick Reference
37+
38+
Common short codes:
39+
40+
- `-k` API key | `-i` input file | `-o` output dir | `-p` pretty print | `-v` verbose
41+
- `-r` record | `-S` save slices | `-P` preset | `-W` show config
42+
- `-l` language | `-m` mode | `-d` max delay | `-t` silence trigger
43+
- `-f` focus speakers | `-s` known speakers | `-E` enrol
44+
2645
### Core
2746

2847
- `-k, --api-key` - API key (defaults to `SPEECHMATICS_API_KEY` env var)
2948
- `-u, --url` - Server URL (defaults to `SPEECHMATICS_RT_URL` env var)
3049
- `-i, --input-file` - Audio file path (WAV, mono 16-bit). Uses microphone if not specified
31-
- `-c, --config` - JSON config string or file path (overrides other Voice Agent options)
3250

3351
### Output
3452

53+
- `-o, --output-dir` - Base output directory (default: ./output)
54+
- Creates a session subdirectory with timestamp (YYYYMMDD_HHMMSS)
55+
- Inside session directory:
56+
- `log.jsonl` - All events with timestamps
57+
- `recording.wav` - Microphone recording (if `-r` is used)
58+
- `slice_*.wav` and `slice_*.json` - Audio slices (if `-S` is used)
59+
- `-r, --record` - Record microphone audio to recording.wav (microphone input only)
60+
- `-S, --save-slices` - Save audio slices on SPEAKER_ENDED events (SMART_TURN mode only)
3561
- `-p, --pretty` - Formatted console output with colors
36-
- `-o, --output-file` - Save output to JSONL file
3762
- `-v, --verbose` - Increase verbosity (can repeat: `-v`, `-vv`, `-vvv`, `-vvvv`, `-vvvvv`)
3863
- `-v` - Add speaker VAD events
3964
- `-vv` - Add turn predictions
4065
- `-vvv` - Add segment annotations
4166
- `-vvvv` - Add metrics
4267
- `-vvvvv` - Add STT events
4368
- `-L, --legacy` - Show only legacy transcript messages
44-
- `--results` - Include word-level results in segments
69+
- `-D, --default-device` - Use default audio device (skip selection)
70+
- `-w, --results` - Include word-level results in segments
4571

4672
### Audio
4773

48-
- `--sample-rate` - Sample rate in Hz (default: 16000)
49-
- `--chunk-size` - Chunk size in bytes (default: 320)
74+
- `-R, --sample-rate` - Sample rate in Hz (default: 16000)
75+
- `-C, --chunk-size` - Chunk size in bytes (default: 320)
5076
- `-M, --mute` - Mute audio playback for file input
51-
- `-D, --default-device` - Use default audio device (skip selection)
5277

5378
### Voice Agent Config
5479

55-
- `-l, --language` - Language code (default: en)
56-
- `-d, --max-delay` - Max transcription delay in seconds (default: 0.7)
57-
- `-t, --end-of-utterance-silence-trigger` - Silence duration for turn end (default: 0.5)
58-
- `-m, --end-of-utterance-mode` - Turn detection mode: `FIXED`, `ADAPTIVE`, `SMART_TURN`, or `EXTERNAL`
59-
- `-e, --emit-sentences` - Emit sentence-level segments
60-
- `--forced-eou` - Enable forced end of utterance
80+
**Configuration Priority:**
81+
82+
1. Use `--preset` to start with a preset configuration (recommended)
83+
2. Use `-c/--config` to provide a complete JSON configuration
84+
3. Use individual parameters (`-l`, `-d`, `-t`, `-m`) to override preset settings or create custom config
85+
86+
**Preset Options:**
87+
88+
- `-P, --preset` - Use preset configuration: `scribe`, `low_latency`, `conversation_adaptive`, `conversation_smart_turn`, or `captions`
89+
- `--list-presets` - List available presets and exit
90+
- `-W, --show` - Display the final configuration as JSON and exit (after applying preset/config and overrides)
91+
92+
**Configuration Options:**
93+
94+
- `-c, --config` - JSON config string or file path (complete configuration)
95+
- `-l, --language` - Language code (overrides preset if used together)
96+
- `-d, --max-delay` - Max transcription delay in seconds (overrides preset if used together)
97+
- `-t, --end-of-utterance-silence-trigger` - Silence duration for turn end in seconds (overrides preset if used together)
98+
- `-m, --end-of-utterance-mode` - Turn detection mode: `FIXED`, `ADAPTIVE`, `SMART_TURN`, or `EXTERNAL` (overrides preset if used together)
99+
100+
**Note:** When using `-c/--config`, you cannot use `-l`, `-d`, `-t`, `-m`, `-f`, `-I`, `-x`, or `-s` as the config JSON should contain all settings.
61101

62102
### Speaker Management
63103

@@ -72,62 +112,142 @@ Press `CTRL+C` to stop.
72112

73113
## Examples
74114

115+
**List presets:**
116+
117+
```bash
118+
python cli.py --list-presets
119+
```
120+
121+
**Show config (from preset):**
122+
123+
```bash
124+
python cli.py -P scribe -W
125+
```
126+
127+
**Show config (with overrides):**
128+
129+
```bash
130+
python cli.py -P scribe -l fr -d 1.0 -W
131+
```
132+
133+
**Use preset:**
134+
135+
```bash
136+
python cli.py -k YOUR_KEY -P scribe -p
137+
```
138+
139+
**Use preset with overrides:**
140+
141+
```bash
142+
python cli.py -k YOUR_KEY -P scribe -l fr -d 1.0 -p
143+
```
144+
75145
**Basic microphone:**
146+
76147
```bash
77148
python cli.py -k YOUR_KEY -p
78149
```
79150

151+
Output saved to `./output/YYYYMMDD_HHMMSS/log.jsonl`
152+
153+
**Record microphone audio:**
154+
155+
```bash
156+
python cli.py -k YOUR_KEY -r -p
157+
```
158+
159+
Recording saved to `./output/YYYYMMDD_HHMMSS/recording.wav`
160+
161+
**Custom output directory:**
162+
163+
```bash
164+
python cli.py -k YOUR_KEY -o ./my_sessions -p
165+
```
166+
167+
Output saved to `./my_sessions/YYYYMMDD_HHMMSS/log.jsonl`
168+
169+
**EXTERNAL mode with manual turn control:**
170+
171+
```bash
172+
python cli.py -k YOUR_KEY -m EXTERNAL -p
173+
```
174+
175+
Press 't' or 'T' to manually signal end of turn.
176+
177+
**Save audio slices (SMART_TURN mode):**
178+
179+
```bash
180+
python cli.py -k YOUR_KEY -P conversation_smart_turn -S -p
181+
```
182+
183+
Audio slices (~8 seconds) saved to `./output/YYYYMMDD_HHMMSS/slice_*.wav` with matching `.json` metadata files on each SPEAKER_ENDED event.
184+
80185
**Audio file:**
186+
81187
```bash
82188
python cli.py -k YOUR_KEY -i audio.wav -p
83189
```
84190

85191
**Audio file (muted):**
86-
```bash
87-
python cli.py -k YOUR_KEY -i audio.wav -Mp
88-
```
89192

90-
**Save output:**
91193
```bash
92-
python cli.py -k YOUR_KEY -o output.jsonl -p
194+
python cli.py -k YOUR_KEY -i audio.wav -Mp
93195
```
94196

95197
**Verbose logging:**
198+
96199
```bash
97200
python cli.py -k YOUR_KEY -vv -p
98201
```
99202

203+
Shows additional events (speaker VAD, turn predictions, etc.)
204+
100205
**Focus on speakers:**
206+
101207
```bash
102208
python cli.py -k YOUR_KEY -f S1 S2 -p
103209
```
104210

105211
**Enrol speakers:**
212+
106213
```bash
107214
python cli.py -k YOUR_KEY -Ep
108215
```
216+
109217
Press `CTRL+C` when done to see speaker identifiers.
110218

111219
**Use known speakers:**
220+
112221
```bash
113222
python cli.py -k YOUR_KEY -s speakers.json -p
114223
```
115224

116225
Example `speakers.json`:
226+
117227
```json
118228
[
119-
{"label": "Alice", "speaker_identifiers": ["XX...XX"]},
120-
{"label": "Bob", "speaker_identifiers": ["YY...YY"]}
229+
{ "label": "Alice", "speaker_identifiers": ["XX...XX"] },
230+
{ "label": "Bob", "speaker_identifiers": ["YY...YY"] }
121231
]
122232
```
123233

124234
**Custom config:**
235+
125236
```bash
126237
python cli.py -k YOUR_KEY -c config.json -p
127238
```
128239

129240
## Notes
130241

242+
- Output directory (`-o`) defaults to `./output`
243+
- Each session creates a timestamped subdirectory (YYYYMMDD_HHMMSS format)
244+
- Session directory contains:
245+
- `log.jsonl` - All events with timestamps
246+
- `recording.wav` - Microphone recording (if `-r` is used)
247+
- `slice_*.wav` and `slice_*.json` - Audio slices (if `--save-slices` is used in SMART_TURN mode)
248+
- Session subdirectories prevent accidental data loss from multiple runs
249+
- Audio slices are ~8 seconds and saved on each SPEAKER_ENDED event
250+
- JSON metadata includes event details, speaker ID, timing, and slice duration
131251
- Speaker identifiers are encrypted and unique to your API key
132252
- Allow speakers to say at least 20 words before enrolling
133253
- Avoid labels `S1`, `S2` (reserved by engine)

0 commit comments

Comments
 (0)