Skip to content

Sound Glitches During Realtime Synthesis – Audio Stops and Starts Abruptly #272

@Panther465

Description

@Panther465

Hello Community,

When using the CoquiEngine in RealtimeTTS for realtime text-to-speech, the synthesized audio is choppy. The sound intermittently stops between words or segments, resulting in a disruptive playback experience.

Steps to Reproduce:

  1. Use a realtime TTS script (see below) that feeds text either in small chunks or via a generator.
  2. Run the script on a system with GPU support (e.g., RTX 4060 with CUDA enabled).
  3. Observe that during playback, the audio “comes and goes” with noticeable gaps between words or sentences.

Expected Behavior:
The synthesized speech should be continuous and smooth, without abrupt pauses or intermittent glitches.

Actual Behavior:
The playback is discontinuous—audio frequently stops and then resumes, causing a choppy, glitchy experience

This is my code i am using:


import os
import time
import torch
from RealtimeTTS import TextToAudioStream, CoquiEngine

def realtime_text_generator():
    texts = [
        "Hello, this is real-time TTS speaking. ",
        "Every sentence is synthesized as soon as it is ready. ",
        "The voice is generated using a local, neural cloned model. "
    ]
    for text in texts:
        yield text
        time.sleep(0.1)  # simulate continuous input with a short delay

if __name__ == "__main__":
    device = "cuda" if torch.cuda.is_available() else "cpu"
    print(f"Using device: {device}")

    # Optionally, specify custom model parameters via environment variables.
    custom_model_path = os.getenv("CUSTOM_COQUI_MODEL_PATH", None)
    custom_model_name = os.getenv("CUSTOM_COQUI_MODEL_NAME", None)

    if custom_model_path:
        print(f"Using custom model from: {custom_model_path}")
        engine = CoquiEngine(
            local_models_path=custom_model_path,
            specific_model=custom_model_name,
            full_sentences=True
        )
    else:
        print("Using default model settings.")
        engine = CoquiEngine()

    stream = TextToAudioStream(engine)
    print("Starting realtime TTS streaming...")
    stream.feed(realtime_text_generator()).play(log_synthesized_text=True)

    while stream.is_playing():
        time.sleep(0.05)

    print("Playback finished.")
    engine.shutdown()

Please someone help me to solve this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions