How to increase model loading speed for real-time workflow? #929
Replies: 1 comment 1 reply
-
|
Greetings, I tried the Gradio API Recorder on mrfakename's hugging face space demo. with test.py (window 10, gradio version: 5.39.0, gradio_client version: 1.11.0): cmd C:\Users\username>C:\Users\username\Desktop\test.py return: Which working as intended, but when it come to local host, I ran into an error: Im not sure whether it's this gradio issue (gradio-app/gradio#10379) I ran into. I didn't manage to work it out. Hoping I can get some help here! P.S. Just a thought, Im not familiar with gradio and im not sure it is the local audio file handling issue I met, but im wondering if following approaches might make this api method work?
Thank to this amazing project and you lovely ppl! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
To increase loading speed, what should I do to keep model and as much as possible of parameters preloaded? Better edit the socket_server.py or somehow the infer_cli.py or other? Or maybe there is some settings for that?
I want to automatically generate lines coming from a real-time workflow.
The inference is like 3x realtime for me so it's good, but the initialization and loading before every individual inference takes way too much time.
I tried those:
The infer_gradio.py has nice latency and seems to keep the model loaded, but it won't work well, because I need it automated.
The infer_cli.py with predefined ref_text and ref_audio in .wav format is still quite slow I assume because of the delay associated with checking for things like "Download Vocos from huggingface charactr/vocos-mel-24khz" , loading model for each individual infer_cli.py call, for some reason even "Converting audio..." despite it being exactly what it outputs to temp.
the socket_server.py seems to keep model preloaded, but it seems to spend too much time chunking the audio and maybe even the playback of individual parts causes more delays? Maybe if it could just output it in 1 or 2 parts instead of so many chunks. How would i edit the chunking logic to make it take reasonable time.
The api.py i can't get to work so I didn't try it's latency.
Thanks for any help.
Beta Was this translation helpful? Give feedback.
All reactions