Thank you for sharing the code and paper, it has been very helpful. I think I may have found a subtle issue with the padding scheme and would appreciate another opinion.
Conceptually, we'd like every sequence input before the first to be zero. But I noticed that the implementation pads every Conv1d input with zeros, not just the first one. In my opinion, this is incorrect behavior for each layer beyond the first.
Here is a diagram of the issue.

The triangles represent padded inputs. The bottom row (sequence input) is padded with 0, which is correct. However, the first layer's outputs are also padded with 0 (red triangles) before feeding to the next layer. I think we should instead pad with a constant vector, the result of convolving an all-zero receptive field. (Resulting in conv1's bias term.)
Similarly, the next layer up should be padded with a constant vector, whose value is the result of convolving a receptive field with a constant value (the padding of the previous layer).
Impact: A network with receptive field $r$ will produce incorrect results prior to the $r$-th input. "Incorrect" in this case means at least inconsistent with its behavior in the steady state, far from the beginning of the input. This might be especially important with long receptive fields, where sequences are similar in length to the receptive field, because a substantial portion of the training examples will be using these wrong padding values.
Here's a simple test case that demonstrates that prepending a sequence of zeros to the input changes the output.
def test_tcn():
torch.manual_seed(42)
def init_weights(m):
if isinstance(m, nn.Conv1d):
if hasattr(m, 'weight_g'):
# weight_norm was applied to this layer
torch.nn.init.uniform_(m.weight_g)
torch.nn.init.uniform_(m.weight_v)
# XXX: not sure if this is correct way to initialize
else:
torch.nn.init.uniform_(m.weight)
torch.nn.init.uniform_(m.bias)
with torch.no_grad():
net = tcn.TemporalConvNet(num_inputs=1, num_channels=[2, 1], kernel_size=2, dropout=0)
net.apply(init_weights)
print("Receptive field", net.receptive_field_size)
for i in range(8):
print(f"Padding with {i} zeros:",
net(torch.Tensor([[ [0] * i + [1] ]])))
print("Zero input response:", net(torch.Tensor([[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]])))
Receptive field 7
Padding with 0 zeros: tensor([[[2.1018]]])
Padding with 1 zeros: tensor([[[1.3458, 2.2364]]])
Padding with 2 zeros: tensor([[[1.3458, 1.4805, 2.4149]]])
Padding with 3 zeros: tensor([[[1.3458, 1.4805, 1.6590, 2.4309]]])
Padding with 4 zeros: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 2.4466]]])
Padding with 5 zeros: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 1.6907, 2.4550]]])
Padding with 6 zeros: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 1.6907, 1.6991, 2.4550]]])
Padding with 7 zeros: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 1.6907, 1.6991, 1.6991, 2.4550]]])
Zero input response: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 1.6907, 1.6991, 1.6991, 1.6991,
1.6991, 1.6991, 1.6991, 1.6991]]])
Clearly this TCN implementation is still able to achieve great results, so I am not yet sure of the practical impact. I'll experiment with changing it for my application.
Thank you for sharing the code and paper, it has been very helpful. I think I may have found a subtle issue with the padding scheme and would appreciate another opinion.
Conceptually, we'd like every sequence input before the first to be zero. But I noticed that the implementation pads every Conv1d input with zeros, not just the first one. In my opinion, this is incorrect behavior for each layer beyond the first.
Here is a diagram of the issue.
The triangles represent padded inputs. The bottom row (sequence input) is padded with 0, which is correct. However, the first layer's outputs are also padded with 0 (red triangles) before feeding to the next layer. I think we should instead pad with a constant vector, the result of convolving an all-zero receptive field. (Resulting in conv1's bias term.)
Similarly, the next layer up should be padded with a constant vector, whose value is the result of convolving a receptive field with a constant value (the padding of the previous layer).
Impact: A network with receptive field$r$ will produce incorrect results prior to the $r$ -th input. "Incorrect" in this case means at least inconsistent with its behavior in the steady state, far from the beginning of the input. This might be especially important with long receptive fields, where sequences are similar in length to the receptive field, because a substantial portion of the training examples will be using these wrong padding values.
Here's a simple test case that demonstrates that prepending a sequence of zeros to the input changes the output.
Clearly this TCN implementation is still able to achieve great results, so I am not yet sure of the practical impact. I'll experiment with changing it for my application.