Skip to content

Commit 77be5d2

Browse files
committed
add feature for using other vocoder
1 parent bbbe3f0 commit 77be5d2

File tree

2 files changed

+35
-3
lines changed

2 files changed

+35
-3
lines changed

README.md

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Please visit [demo page](https://haoheliu.github.io/demopage-voicefixer/) to vie
1616

1717
## Usage
1818

19-
- Basic example:
19+
### Basic example:
2020

2121
```python
2222
# Will automatically download model parameters.
@@ -56,6 +56,35 @@ wave = vocoder.forward(mel=mel_spec) # This forward function is used in the foll
5656
vocoder.oracle(fpath="", # input wav file path
5757
out_path="") # output wav file path
5858
```
59+
60+
### Others
61+
62+
- How to use your own vocoder, like pre-trained HiFi-Gan?
63+
64+
First you need to write a following helper function with your model. Similar to the helper function in this repo: https://github.com/haoheliu/voicefixer/blob/main/voicefixer/vocoder/base.py#L35
65+
66+
```shell script
67+
def convert_mel_to_wav(mel):
68+
"""
69+
:param non normalized mel spectrogram: [batchsize, 1, t-steps, n_mel]
70+
:return: [batchsize, 1, samples]
71+
"""
72+
return wav
73+
```
74+
75+
Then pass this function to *voicefixer.restore*, for example:
76+
```
77+
voicefixer.restore(input="", # input wav file path
78+
output="", # output wav file path
79+
cuda=False, # whether to use gpu acceleration
80+
mode = 0,
81+
your_vocoder_func = convert_mel_to_wav)
82+
```
83+
84+
Note:
85+
- For compatibility, your vocoder should working on 44.1kHz wave with mel frequency bins 128.
86+
- The input mel spectrogram to the helper function should not be normalized by the width of each mel filter.
87+
5988
## Materials
6089
- Voicefixer training: https://github.com/haoheliu/voicefixer_main.git
6190
- Demo page: https://haoheliu.github.io/demopage-voicefixer/

voicefixer/base.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ def remove_higher_frequency(self, wav, ratio=0.95):
8080
stft = spec * cos + 1j * spec * sin
8181
return librosa.istft(stft)
8282

83-
def restore(self, input, output, cuda=False, mode=0):
83+
def restore(self, input, output, cuda=False, mode=0, your_vocoder_func=None):
8484
if(cuda and torch.cuda.is_available()):
8585
self._model = self._model.cuda()
8686
# metrics = {}
@@ -106,7 +106,10 @@ def restore(self, input, output, cuda=False, mode=0):
106106
denoised_mel = from_log(out_model['mel'])
107107
# if(meta["unify_energy"]):
108108
# denoised_mel, mel_noisy = self.amp_to_original_f(mel_sp_est=denoised_mel,mel_sp_target=mel_noisy)
109-
out = self._model.vocoder(denoised_mel)
109+
if(your_vocoder_func is None):
110+
out = self._model.vocoder(denoised_mel)
111+
else:
112+
out = your_vocoder_func(denoised_mel)
110113
# unify energy
111114
if(torch.max(torch.abs(out)) > 1.0):
112115
out = out / torch.max(torch.abs(out))

0 commit comments

Comments
 (0)