@@ -5,15 +5,25 @@ Real-time transcription tool using the Speechmatics Voice SDK. Supports micropho
55## Quick Start
66
77** Microphone:**
8+
89``` bash
9- python cli.py -p -k YOUR_API_KEY
10+ # Quick example
11+ python cli.py -k YOUR_API_KEY -p
12+
13+ # Example that saves the output in verbose mode using a preset
14+ python cli.py -k YOUR_API_KEY -vvvvvpDSr -P conversation_smart_turn
1015```
1116
17+ Output saved to ` ./output/YYYYMMDD_HHMMSS/log.jsonl `
18+
1219** Audio file:**
20+
1321``` bash
14- python cli.py -p - k YOUR_API_KEY -i audio.wav
22+ python cli.py -k YOUR_API_KEY -i audio.wav -p
1523```
1624
25+ Output saved to ` ./output/YYYYMMDD_HHMMSS/log.jsonl `
26+
1727Press ` CTRL+C ` to stop.
1828
1929## Requirements
@@ -23,41 +33,71 @@ Press `CTRL+C` to stop.
2333
2434## Options
2535
36+ ### Quick Reference
37+
38+ Common short codes:
39+
40+ - ` -k ` API key | ` -i ` input file | ` -o ` output dir | ` -p ` pretty print | ` -v ` verbose
41+ - ` -r ` record | ` -S ` save slices | ` -P ` preset | ` -W ` show config
42+ - ` -l ` language | ` -m ` mode | ` -d ` max delay | ` -t ` silence trigger
43+ - ` -f ` focus speakers | ` -s ` known speakers | ` -E ` enrol
44+
2645### Core
2746
2847- ` -k, --api-key ` - API key (defaults to ` SPEECHMATICS_API_KEY ` env var)
2948- ` -u, --url ` - Server URL (defaults to ` SPEECHMATICS_RT_URL ` env var)
3049- ` -i, --input-file ` - Audio file path (WAV, mono 16-bit). Uses microphone if not specified
31- - ` -c, --config ` - JSON config string or file path (overrides other Voice Agent options)
3250
3351### Output
3452
53+ - ` -o, --output-dir ` - Base output directory (default: ./output)
54+ - Creates a session subdirectory with timestamp (YYYYMMDD_HHMMSS)
55+ - Inside session directory:
56+ - ` log.jsonl ` - All events with timestamps
57+ - ` recording.wav ` - Microphone recording (if ` -r ` is used)
58+ - ` slice_*.wav ` and ` slice_*.json ` - Audio slices (if ` -S ` is used)
59+ - ` -r, --record ` - Record microphone audio to recording.wav (microphone input only)
60+ - ` -S, --save-slices ` - Save audio slices on SPEAKER_ENDED events (SMART_TURN mode only)
3561- ` -p, --pretty ` - Formatted console output with colors
36- - ` -o, --output-file ` - Save output to JSONL file
3762- ` -v, --verbose ` - Increase verbosity (can repeat: ` -v ` , ` -vv ` , ` -vvv ` , ` -vvvv ` , ` -vvvvv ` )
3863 - ` -v ` - Add speaker VAD events
3964 - ` -vv ` - Add turn predictions
4065 - ` -vvv ` - Add segment annotations
4166 - ` -vvvv ` - Add metrics
4267 - ` -vvvvv ` - Add STT events
4368- ` -L, --legacy ` - Show only legacy transcript messages
44- - ` --results ` - Include word-level results in segments
69+ - ` -D, --default-device ` - Use default audio device (skip selection)
70+ - ` -w, --results ` - Include word-level results in segments
4571
4672### Audio
4773
48- - ` --sample-rate ` - Sample rate in Hz (default: 16000)
49- - ` --chunk-size ` - Chunk size in bytes (default: 320)
74+ - ` -R, - -sample-rate ` - Sample rate in Hz (default: 16000)
75+ - ` -C, - -chunk-size ` - Chunk size in bytes (default: 320)
5076- ` -M, --mute ` - Mute audio playback for file input
51- - ` -D, --default-device ` - Use default audio device (skip selection)
5277
5378### Voice Agent Config
5479
55- - ` -l, --language ` - Language code (default: en)
56- - ` -d, --max-delay ` - Max transcription delay in seconds (default: 0.7)
57- - ` -t, --end-of-utterance-silence-trigger ` - Silence duration for turn end (default: 0.5)
58- - ` -m, --end-of-utterance-mode ` - Turn detection mode: ` FIXED ` , ` ADAPTIVE ` , ` SMART_TURN ` , or ` EXTERNAL `
59- - ` -e, --emit-sentences ` - Emit sentence-level segments
60- - ` --forced-eou ` - Enable forced end of utterance
80+ ** Configuration Priority:**
81+
82+ 1 . Use ` --preset ` to start with a preset configuration (recommended)
83+ 2 . Use ` -c/--config ` to provide a complete JSON configuration
84+ 3 . Use individual parameters (` -l ` , ` -d ` , ` -t ` , ` -m ` ) to override preset settings or create custom config
85+
86+ ** Preset Options:**
87+
88+ - ` -P, --preset ` - Use preset configuration: ` scribe ` , ` low_latency ` , ` conversation_adaptive ` , ` conversation_smart_turn ` , or ` captions `
89+ - ` --list-presets ` - List available presets and exit
90+ - ` -W, --show ` - Display the final configuration as JSON and exit (after applying preset/config and overrides)
91+
92+ ** Configuration Options:**
93+
94+ - ` -c, --config ` - JSON config string or file path (complete configuration)
95+ - ` -l, --language ` - Language code (overrides preset if used together)
96+ - ` -d, --max-delay ` - Max transcription delay in seconds (overrides preset if used together)
97+ - ` -t, --end-of-utterance-silence-trigger ` - Silence duration for turn end in seconds (overrides preset if used together)
98+ - ` -m, --end-of-utterance-mode ` - Turn detection mode: ` FIXED ` , ` ADAPTIVE ` , ` SMART_TURN ` , or ` EXTERNAL ` (overrides preset if used together)
99+
100+ ** Note:** When using ` -c/--config ` , you cannot use ` -l ` , ` -d ` , ` -t ` , ` -m ` , ` -f ` , ` -I ` , ` -x ` , or ` -s ` as the config JSON should contain all settings.
61101
62102### Speaker Management
63103
@@ -72,62 +112,142 @@ Press `CTRL+C` to stop.
72112
73113## Examples
74114
115+ ** List presets:**
116+
117+ ``` bash
118+ python cli.py --list-presets
119+ ```
120+
121+ ** Show config (from preset):**
122+
123+ ``` bash
124+ python cli.py -P scribe -W
125+ ```
126+
127+ ** Show config (with overrides):**
128+
129+ ``` bash
130+ python cli.py -P scribe -l fr -d 1.0 -W
131+ ```
132+
133+ ** Use preset:**
134+
135+ ``` bash
136+ python cli.py -k YOUR_KEY -P scribe -p
137+ ```
138+
139+ ** Use preset with overrides:**
140+
141+ ``` bash
142+ python cli.py -k YOUR_KEY -P scribe -l fr -d 1.0 -p
143+ ```
144+
75145** Basic microphone:**
146+
76147``` bash
77148python cli.py -k YOUR_KEY -p
78149```
79150
151+ Output saved to ` ./output/YYYYMMDD_HHMMSS/log.jsonl `
152+
153+ ** Record microphone audio:**
154+
155+ ``` bash
156+ python cli.py -k YOUR_KEY -r -p
157+ ```
158+
159+ Recording saved to ` ./output/YYYYMMDD_HHMMSS/recording.wav `
160+
161+ ** Custom output directory:**
162+
163+ ``` bash
164+ python cli.py -k YOUR_KEY -o ./my_sessions -p
165+ ```
166+
167+ Output saved to ` ./my_sessions/YYYYMMDD_HHMMSS/log.jsonl `
168+
169+ ** EXTERNAL mode with manual turn control:**
170+
171+ ``` bash
172+ python cli.py -k YOUR_KEY -m EXTERNAL -p
173+ ```
174+
175+ Press 't' or 'T' to manually signal end of turn.
176+
177+ ** Save audio slices (SMART_TURN mode):**
178+
179+ ``` bash
180+ python cli.py -k YOUR_KEY -P conversation_smart_turn -S -p
181+ ```
182+
183+ Audio slices (~ 8 seconds) saved to ` ./output/YYYYMMDD_HHMMSS/slice_*.wav ` with matching ` .json ` metadata files on each SPEAKER_ENDED event.
184+
80185** Audio file:**
186+
81187``` bash
82188python cli.py -k YOUR_KEY -i audio.wav -p
83189```
84190
85191** Audio file (muted):**
86- ``` bash
87- python cli.py -k YOUR_KEY -i audio.wav -Mp
88- ```
89192
90- ** Save output:**
91193``` bash
92- python cli.py -k YOUR_KEY -o output.jsonl -p
194+ python cli.py -k YOUR_KEY -i audio.wav -Mp
93195```
94196
95197** Verbose logging:**
198+
96199``` bash
97200python cli.py -k YOUR_KEY -vv -p
98201```
99202
203+ Shows additional events (speaker VAD, turn predictions, etc.)
204+
100205** Focus on speakers:**
206+
101207``` bash
102208python cli.py -k YOUR_KEY -f S1 S2 -p
103209```
104210
105211** Enrol speakers:**
212+
106213``` bash
107214python cli.py -k YOUR_KEY -Ep
108215```
216+
109217Press ` CTRL+C ` when done to see speaker identifiers.
110218
111219** Use known speakers:**
220+
112221``` bash
113222python cli.py -k YOUR_KEY -s speakers.json -p
114223```
115224
116225Example ` speakers.json ` :
226+
117227``` json
118228[
119- {"label" : " Alice" , "speaker_identifiers" : [" XX...XX" ]},
120- {"label" : " Bob" , "speaker_identifiers" : [" YY...YY" ]}
229+ { "label" : " Alice" , "speaker_identifiers" : [" XX...XX" ] },
230+ { "label" : " Bob" , "speaker_identifiers" : [" YY...YY" ] }
121231]
122232```
123233
124234** Custom config:**
235+
125236``` bash
126237python cli.py -k YOUR_KEY -c config.json -p
127238```
128239
129240## Notes
130241
242+ - Output directory (` -o ` ) defaults to ` ./output `
243+ - Each session creates a timestamped subdirectory (YYYYMMDD_HHMMSS format)
244+ - Session directory contains:
245+ - ` log.jsonl ` - All events with timestamps
246+ - ` recording.wav ` - Microphone recording (if ` -r ` is used)
247+ - ` slice_*.wav ` and ` slice_*.json ` - Audio slices (if ` --save-slices ` is used in SMART_TURN mode)
248+ - Session subdirectories prevent accidental data loss from multiple runs
249+ - Audio slices are ~ 8 seconds and saved on each SPEAKER_ENDED event
250+ - JSON metadata includes event details, speaker ID, timing, and slice duration
131251- Speaker identifiers are encrypted and unique to your API key
132252- Allow speakers to say at least 20 words before enrolling
133253- Avoid labels ` S1 ` , ` S2 ` (reserved by engine)
0 commit comments