-
Notifications
You must be signed in to change notification settings - Fork 349
Open
Description
Hello Community,
When using the CoquiEngine in RealtimeTTS for realtime text-to-speech, the synthesized audio is choppy. The sound intermittently stops between words or segments, resulting in a disruptive playback experience.
Steps to Reproduce:
- Use a realtime TTS script (see below) that feeds text either in small chunks or via a generator.
- Run the script on a system with GPU support (e.g., RTX 4060 with CUDA enabled).
- Observe that during playback, the audio “comes and goes” with noticeable gaps between words or sentences.
Expected Behavior:
The synthesized speech should be continuous and smooth, without abrupt pauses or intermittent glitches.
Actual Behavior:
The playback is discontinuous—audio frequently stops and then resumes, causing a choppy, glitchy experience
This is my code i am using:
import os
import time
import torch
from RealtimeTTS import TextToAudioStream, CoquiEngine
def realtime_text_generator():
texts = [
"Hello, this is real-time TTS speaking. ",
"Every sentence is synthesized as soon as it is ready. ",
"The voice is generated using a local, neural cloned model. "
]
for text in texts:
yield text
time.sleep(0.1) # simulate continuous input with a short delay
if __name__ == "__main__":
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
# Optionally, specify custom model parameters via environment variables.
custom_model_path = os.getenv("CUSTOM_COQUI_MODEL_PATH", None)
custom_model_name = os.getenv("CUSTOM_COQUI_MODEL_NAME", None)
if custom_model_path:
print(f"Using custom model from: {custom_model_path}")
engine = CoquiEngine(
local_models_path=custom_model_path,
specific_model=custom_model_name,
full_sentences=True
)
else:
print("Using default model settings.")
engine = CoquiEngine()
stream = TextToAudioStream(engine)
print("Starting realtime TTS streaming...")
stream.feed(realtime_text_generator()).play(log_synthesized_text=True)
while stream.is_playing():
time.sleep(0.05)
print("Playback finished.")
engine.shutdown()
Please someone help me to solve this issue.
Metadata
Metadata
Assignees
Labels
No labels