How to increase model loading speed for real-time workflow? #929

TrollMasterr · 2025-03-28T14:05:25Z

TrollMasterr
Mar 28, 2025

Hi,
To increase loading speed, what should I do to keep model and as much as possible of parameters preloaded? Better edit the socket_server.py or somehow the infer_cli.py or other? Or maybe there is some settings for that?

I want to automatically generate lines coming from a real-time workflow.
The inference is like 3x realtime for me so it's good, but the initialization and loading before every individual inference takes way too much time.

I tried those:
The infer_gradio.py has nice latency and seems to keep the model loaded, but it won't work well, because I need it automated.

The infer_cli.py with predefined ref_text and ref_audio in .wav format is still quite slow I assume because of the delay associated with checking for things like "Download Vocos from huggingface charactr/vocos-mel-24khz" , loading model for each individual infer_cli.py call, for some reason even "Converting audio..." despite it being exactly what it outputs to temp.

the socket_server.py seems to keep model preloaded, but it seems to spend too much time chunking the audio and maybe even the playback of individual parts causes more delays? Maybe if it could just output it in 1 or 2 parts instead of so many chunks. How would i edit the chunking logic to make it take reasonable time.

The api.py i can't get to work so I didn't try it's latency.

Thanks for any help.

kiuyg22 · 2025-08-05T02:40:23Z

kiuyg22
Aug 5, 2025

Greetings,
I was just thinking the same thing, wondering if you guys come up with any approach. Below is my failed attempt:

I tried the Gradio API Recorder on mrfakename's hugging face space demo.

with test.py (window 10, gradio version: 5.39.0, gradio_client version: 1.11.0):

from gradio_client import Client, handle_file
client = Client("mrfakename/E2-F5-TTS")
results = client.predict(
  ref_audio=handle_file('https://mrfakename-e2-f5-tts.hf.space/gradio_api/file=/tmp/gradio/46f7334e189927d3cba8104b07ceba8047cd7a962e773548e0614630d5eabdc1/japanese_ref.wav'),
  ref_text="え、うそでしょ?",
  gen_text="Upload a reference voice, give reference and generation text, and hear it in the same voice!",
  remove_silence=True,
  api_name="/predict"
)
print(results)

cmd C:\Users\username>C:\Users\username\Desktop\test.py return:

Loaded as API: https://mrfakename-e2-f5-tts.hf.space ✔
C:\Users\username\AppData\Local\Temp\gradio\fbce0d9ee92aa36953ec53291b80034ec2ff91f12e15185b1270e8390f209aea\tmpqlnhmzxo.wav

Which working as intended, but when it come to local host, I ran into an error:

Loaded as API: http://127.0.0.1:7860/ ✔
Traceback (most recent call last):
  File "C:\Users\username\Desktop\test1.py", line 5, in <module>
    client.predict(
  File "C:\Users\username\AppData\Local\Programs\Python\Python312\Lib\site-packages\gradio_client\client.py", line 497, in predict
    ).result()
      ^^^^^^^^
  File "C:\Users\username\AppData\Local\Programs\Python\Python312\Lib\site-packages\gradio_client\client.py", line 1602, in result
    return super().result(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\username\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\username\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\_base.py", line 401, in __get_result
    raise self._exception
  File "C:\Users\username\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\username\AppData\Local\Programs\Python\Python312\Lib\site-packages\gradio_client\client.py", line 1206, in _inner
    predictions = _predict(*data)
                  ^^^^^^^^^^^^^^^
  File "C:\Users\username\AppData\Local\Programs\Python\Python312\Lib\site-packages\gradio_client\client.py", line 1320, in _predict
    raise AppError(
gradio_client.exceptions.AppError: The upstream Gradio app has raised an exception but has not enabled verbose error reporting. To enable, set show_error=True in launch().

Im not sure whether it's this gradio issue (gradio-app/gradio#10379) I ran into. I didn't manage to work it out. Hoping I can get some help here!

P.S. Just a thought, Im not familiar with gradio and im not sure it is the local audio file handling issue I met, but im wondering if following approaches might make this api method work?

Manually modify infer_gradio.py to load reference audio by default like model and vocab file.
Modify infer_gradio.py to add an reference audio loading option by providing local file path like model and vocab file in costom mode (instead of drag and drop / file select).

Thank to this amazing project and you lovely ppl!

1 reply

kiuyg22 Aug 5, 2025

Well... I think i figure out where in infer_gradio.py I can input the reference audio. By manually assign the audio file file path (string) to variable "ref_audio_orig" before the original code "if not ref_audio_orig" like this:

    ref_audio_orig = "C:\\Users\\username\\F5-TTS\\japanese_ref.wav"

    if not ref_audio_orig:
        gr.Warning("Please provide reference audio.")
        return gr.update(), gr.update(), ref_text

And also assign a empty .wav header to fill the variable "ref_audio_input" so that it won't return error (Since I don't know much about gradio, I prefer not to delete original code.), before the original code "ref_audio_input =", like this:

    with open("temp_audio.wav", "wb") as f:
        f.write(b"RIFF\x00\x00\x00\x00WAVEfmt \x10\x00\x00\x00\x01\x00\x01\x00D\xac\x00\x00\x80\xbb\x00\x00\x02\x00\x10\x00data\x00\x00\x00\x00")

    ref_audio_input = gr.Audio(value="temp_audio.wav",label="Reference Audio", type="filepath")
    with gr.Row():             ^^^^^^^^^^^^^^^^^^^^^^(add)

And by Gradio API Recorder I manage to make it run and return the info I need, test1.py:

from gradio_client import Client, handle_file
client = Client("http://127.0.0.1:7860/")
result = client.predict(
  ref_text_input="え、うそでしょ?. ",
  gen_text_input="おはようございます。え、うそでしょ?",
  remove_silence=True,
  randomize_seed=True,
  seed_input=1722318242,
  cross_fade_duration_slider=0.15,
  nfe_slider=32,
  speed_slider=0.5,
  api_name="/basic_tts"
)
print(result)

return:

Loaded as API: http://127.0.0.1:7860/ ✔
('C:\\Users\\username\\AppData\\Local\\Temp\\gradio\\c4bb41ad42262880cf8b8d7324ee59dd3e514c48f95f9a41d0a6ab465c1f59bf\\audio.wav', 'C:\\Users\\username\\AppData\\Local\\Temp\\gradio\\e9f2d9c76a226b1cda70e7fba0796e82660c423ac82c046bf079156d9c603567\\tmpe186gjrb.png', 'え、うそでしょ?. ', 1694798811)

Sum: Turn out that infer_gradio.py with Gradio API Recorder is a doable method, just need to avoid the audio file handling issue in gradio. And since gradio looks easy to learn, maybe modifying the gradio web ui to assign the variable "ref_audio_orig" worth a try.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to increase model loading speed for real-time workflow? #929

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to increase model loading speed for real-time workflow? #929

Uh oh!

TrollMasterr Mar 28, 2025

Replies: 1 comment · 1 reply

Uh oh!

kiuyg22 Aug 5, 2025

Uh oh!

kiuyg22 Aug 5, 2025

TrollMasterr
Mar 28, 2025

Replies: 1 comment 1 reply

kiuyg22
Aug 5, 2025