Follow these steps if you'd like to test the Deepgram Speech-to-Text (STT) Amazon SageMaker
- Ensure you've deployed an Amazon SageMaker endpoint
- Install the
uvpackage manager - Install dependencies:
uv pip install -r requirements.txt
The stt_microphone_stress.py script streams live microphone audio to Deepgram on SageMaker for real-time transcription. It supports multiple simultaneous connections for load testing.
Basic usage (single connection):
uv run python-stt/stt_microphone_stress.py your-endpoint-nameWith specific AWS region:
uv run python-stt/stt_microphone_stress.py your-endpoint-name --region us-west-2Multiple simultaneous connections (load testing):
uv run python-stt/stt_microphone_stress.py your-endpoint-name --connections 5With speaker diarization:
uv run python-stt/stt_microphone_stress.py your-endpoint-name --diarize trueWith different model and language:
The default Speech-to-Text (STT) transcription model is nova-3.
uv run python-stt/stt_microphone_stress.py your-endpoint-name --model nova-2 --language esWith keywords boosting:
Keywords are only conmpatible with nova-2. For nova-3 use keyterms instead.
uv run python-stt/stt_microphone_stress.py your-endpoint-name --keywords "Deepgram:5,SageMaker:10,transcription:3"Run for a fixed duration (useful for automated tests):
uv run python-stt/stt_microphone_stress.py your-endpoint-name --duration 30Timed load test with multiple connections:
uv run python-stt/stt_microphone_stress.py your-endpoint-name --connections 5 --duration 120Full example with all options:
uv run python-stt/stt_microphone_stress.py your-endpoint-name \
--connections 3 \
--model nova-2 \
--language en \
--diarize true \
--punctuate true \
--keywords "hello:5,world:10" \
--duration 60 \
--region us-east-1 \
--log-level DEBUGAvailable options:
--connections N- Number of simultaneous streaming connections (default: 1)--model MODEL- Deepgram model to use (default: nova-3)--language LANG- Language code (default: en)--diarize true|false- Enable speaker diarization (default: false)--punctuate true|false- Enable punctuation (default: true)--keywords KEYWORDS- Comma-delimited keywords with intensity (format: "word:intensity,word:intensity")--duration SECONDS- Stop automatically after this many seconds (default: run until Ctrl+C)--region REGION- AWS region (default: us-east-1)--log-level LEVEL- Logging level: DEBUG, INFO, WARNING, ERROR, CRITICAL (default: INFO)
The stt_wav_stress.py script supports two sub-commands: stream and batch.
Requirements: The WAV file must be 16-bit PCM (linear16). To convert any audio file, use
ffmpeg:ffmpeg -i input.mp3 -ar 16000 -ac 1 -sample_fmt s16 output.wav
Streams the WAV file to Deepgram on SageMaker in real-time, paced to match the file's actual sample rate. Behaves like a live microphone source, enabling repeatable and automated load testing without requiring a physical microphone.
Basic usage (single connection, play file once):
uv run python-stt/stt_wav_stress.py stream your-endpoint-name --file audio.wavWith a specific AWS region:
uv run python-stt/stt_wav_stress.py stream your-endpoint-name --file audio.wav --region us-west-2Multiple simultaneous connections (load testing):
uv run python-stt/stt_wav_stress.py stream your-endpoint-name --file audio.wav --connections 5Loop the file continuously until Ctrl+C:
uv run python-stt/stt_wav_stress.py stream your-endpoint-name --file audio.wav --loopLoop for a fixed duration (useful for automated tests):
uv run python-stt/stt_wav_stress.py stream your-endpoint-name --file audio.wav --loop --duration 120Timed load test with multiple connections:
uv run python-stt/stt_wav_stress.py stream your-endpoint-name --file audio.wav \
--connections 10 --loop --duration 300With speaker diarization:
uv run python-stt/stt_wav_stress.py stream your-endpoint-name --file audio.wav --diarize trueWith a different model and language:
uv run python-stt/stt_wav_stress.py stream your-endpoint-name --file audio.wav \
--model nova-2 --language esWith keywords boosting (nova-2 only) or keyterms (nova-3):
Keywords are only compatible with nova-2. For nova-3 use --keyterms instead.
# nova-2 keywords
uv run python-stt/stt_wav_stress.py stream your-endpoint-name --file audio.wav \
--keywords "Deepgram:5,SageMaker:10"
# nova-3 keyterms
uv run python-stt/stt_wav_stress.py stream your-endpoint-name --file audio.wav \
--keyterms "Deepgram,SageMaker"With PII redaction:
uv run python-stt/stt_wav_stress.py stream your-endpoint-name --file audio.wav \
--redact "pii,ssn,email_address"Full example with all stream options:
uv run python-stt/stt_wav_stress.py stream your-endpoint-name --file audio.wav \
--connections 3 \
--model nova-3 \
--language en \
--diarize true \
--punctuate true \
--keyterms "Deepgram,SageMaker" \
--redact "pii,ssn" \
--interim-results true \
--loop \
--duration 60 \
--region us-east-2 \
--log-level DEBUGAvailable stream options:
| Option | Description | Default |
|---|---|---|
--file WAV_FILE |
Path to a 16-bit PCM WAV file (required) | — |
--connections N |
Number of simultaneous streaming connections | 1 |
--model MODEL |
Deepgram model | nova-3 |
--language LANG |
Language code | en |
--diarize true|false |
Enable speaker diarization | false |
--punctuate true|false |
Enable punctuation | true |
--keywords WORD:N,... |
Keyword boosting with intensity, e.g. hello:5,world:10 (nova-2 only) |
— |
--keyterms TERM,... |
Comma-separated keyterms to boost recognition (nova-3) | — |
--redact ENTITY,... |
Comma-separated entity types to redact, e.g. pii,ssn,email_address |
— |
--interim-results true|false |
Emit interim (partial) transcripts | true |
--loop |
Loop the WAV file until --duration is reached or Ctrl+C |
off |
--duration SECONDS |
Stop after this many seconds | play file once |
--region REGION |
AWS region | us-east-2 |
--log-level LEVEL |
DEBUG, INFO, WARNING, ERROR, CRITICAL |
INFO |
Posts the entire WAV file in a single HTTP request using the SageMaker InvokeEndpoint API. Supports configurable parallelism via --concurrency for throughput and latency stress testing. Each concurrent request runs on its own Python thread with its own boto3 client. After all requests complete, a summary table shows min/avg/p95/max latency, throughput, and success/failure counts.
Note: SageMaker
InvokeEndpointhas a 6 MB request body limit. For larger files, usestreammode or split the file:ffmpeg -i input.wav -f segment -segment_time 60 segment_%03d.wav
Basic usage (single request):
uv run python-stt/stt_wav_stress.py batch your-endpoint-name --file audio.wavSend 10 concurrent requests (load testing):
uv run python-stt/stt_wav_stress.py batch your-endpoint-name --file audio.wav --concurrency 10Send 100 total requests, 10 at a time:
uv run python-stt/stt_wav_stress.py batch your-endpoint-name --file audio.wav \
--concurrency 10 --requests 100With a different model and language:
uv run python-stt/stt_wav_stress.py batch your-endpoint-name --file audio.wav \
--model nova-2 --language esWith keyterms (nova-3):
uv run python-stt/stt_wav_stress.py batch your-endpoint-name --file audio.wav \
--keyterms "Deepgram,SageMaker"With speaker diarization and PII redaction:
uv run python-stt/stt_wav_stress.py batch your-endpoint-name --file audio.wav \
--diarize true --redact "pii,ssn,email_address"Full example with all batch options:
uv run python-stt/stt_wav_stress.py batch your-endpoint-name --file audio.wav \
--concurrency 5 \
--requests 50 \
--model nova-3 \
--language en \
--diarize true \
--punctuate true \
--keyterms "Deepgram,SageMaker" \
--redact "pii,ssn" \
--region us-east-2 \
--log-level DEBUGAvailable batch options:
| Option | Description | Default |
|---|---|---|
--file WAV_FILE |
Path to a 16-bit PCM WAV file, max 6 MB (required) | — |
--concurrency N |
Number of requests to run in parallel | 1 |
--requests N |
Total number of requests to send | same as --concurrency |
--model MODEL |
Deepgram model | nova-3 |
--language LANG |
Language code | en |
--diarize true|false |
Enable speaker diarization | false |
--punctuate true|false |
Enable punctuation | true |
--keyterms TERM,... |
Comma-separated keyterms to boost recognition (nova-3) | — |
--redact ENTITY,... |
Comma-separated entity types to redact, e.g. pii,ssn,email_address |
— |
--region REGION |
AWS region | us-east-2 |
--log-level LEVEL |
DEBUG, INFO, WARNING, ERROR, CRITICAL |
INFO |
TBD
TBD
To run transcription load test:
- Set your AWS credentials (eg.
AWS_SHARED_CREDENTIALS_FILEandAWS_PROFILEvariables) - Ensure Node.js is installed
- Set the AWS region, SageMaker Endpoint name, input audio file name, and query string parameters, in
stt.file.ts - Run
npx tsx stress-stt.ts