Bug
When inference is run with infer_speedup=1 (i.e. the full 100-step DDPM path, not the dpm-solver speedup path), the conversion crashes inside the denoising loop:
ERROR: WaveNet.forward() got an unexpected keyword argument 'cond'
The shipped config's default infer_speedup: 10 works fine (10 steps via dpm-solver). infer_speedup: 2 (50 steps via dpm-solver) also works. Only the no-speedup full-step path fails.
Repro
cd ~/HQ-SVC
# Use any source + reference singing wav.
# In configs/hq_svc_infer.yaml, set:
# infer_speedup: 1
# Then launch gradio_app.py and run any conversion. The diffusion
# loop starts ("sample time step: 0/100") then crashes immediately.
Tested on macOS 15 / Apple Silicon (MPS), PyTorch 2.5, with both infer_method: 'dpm-solver' and infer_method: 'ddim' set in the config — same error in both cases. The dpm-solver/ddim branches in utils/models/diffusion.py are gated by if method is not None and infer_speedup > 1:, so when infer_speedup == 1, control drops into the trailing else: block (around line ~377):
else:
if use_tqdm:
for i in tqdm(reversed(range(0, t)), desc='sample time step', total=t):
x = self.p_sample(x, torch.full((b,), i, device=device, dtype=torch.long), cond)
else:
for i in reversed(range(0, t)):
x = self.p_sample(x, torch.full((b,), i, device=device, dtype=torch.long), cond)
The crash is downstream of self.p_sample(..., cond) — cond is passed positionally here but appears to end up as a keyword somewhere inside, where WaveNet.forward doesn't accept it.
Why this matters
Some inference scenarios benefit from more diffusion steps than the dpm-solver 10/50-step regime, especially when:
- Conditioning audio is noisier than the LibriTTS-style training distribution
- The user wants to A/B test paper-quality numbers (the paper benchmarks at 100 base steps)
- Quality > speed (offline / batch use cases)
Right now the 100-step path is unreachable.
Suggested fix
Likely a one-liner in the no-speedup else: block: change self.p_sample(x, t, cond) to match whatever p_sample's implementation expects (e.g. cond=cond if the kwarg flows through, or remove the cond plumbing if p_sample doesn't need it). Happy to send a PR once I understand the intended call site.
Environment
- macOS 15 / Apple Silicon (MPS)
- PyTorch 2.5.1, transformers 4.46
- HQ-SVC commit
853a188 (current main)
- weights:
shawnpi/HQ-SVC (HuggingFace)
Bug
When inference is run with
infer_speedup=1(i.e. the full 100-step DDPM path, not the dpm-solver speedup path), the conversion crashes inside the denoising loop:The shipped config's default
infer_speedup: 10works fine (10 steps via dpm-solver).infer_speedup: 2(50 steps via dpm-solver) also works. Only the no-speedup full-step path fails.Repro
Tested on macOS 15 / Apple Silicon (MPS), PyTorch 2.5, with both
infer_method: 'dpm-solver'andinfer_method: 'ddim'set in the config — same error in both cases. The dpm-solver/ddim branches inutils/models/diffusion.pyare gated byif method is not None and infer_speedup > 1:, so wheninfer_speedup == 1, control drops into the trailingelse:block (around line ~377):The crash is downstream of
self.p_sample(..., cond)—condis passed positionally here but appears to end up as a keyword somewhere inside, whereWaveNet.forwarddoesn't accept it.Why this matters
Some inference scenarios benefit from more diffusion steps than the dpm-solver 10/50-step regime, especially when:
Right now the 100-step path is unreachable.
Suggested fix
Likely a one-liner in the no-speedup
else:block: changeself.p_sample(x, t, cond)to match whateverp_sample's implementation expects (e.g.cond=condif the kwarg flows through, or remove the cond plumbing ifp_sampledoesn't need it). Happy to send a PR once I understand the intended call site.Environment
853a188(currentmain)shawnpi/HQ-SVC(HuggingFace)