Train Frame Clip (#70)

PetrochukM · r9y9 · commit 2b557d4cbace · 2018-05-19T23:52:42.000+09:00
* Lengths 

Input lengths is the length of the audio clip; therefore, it does not make sense to clip the spectrogram a length like 10000. We need to scale the length down by ``audio.get_hop_size()``.

* Fix Division

* Comments

* Update train.py
diff --git a/train.py b/train.py
@@ -492,7 +492,10 @@ def eval_model(global_step, writer, device, model, y, c, g, input_lengths, eval_
     y_target = y[idx].view(-1).data.cpu().numpy()[:length]
 
     if c is not None:
-        c = c[idx, :, :length].unsqueeze(0)
+        if hparams.upsample_conditional_features:
+            c = c[idx, :, :length // audio.get_hop_size()].unsqueeze(0)
+        else:
+            c = c[idx, :, :length].unsqueeze(0)
         assert c.dim() == 3
         print("Shape of local conditioning features: {}".format(c.size()))
     if g is not None: