Zero padding - possibly incorrect behavior?

Thank you for sharing the code and paper, it has been very helpful. I think I may have found a subtle issue with the padding scheme and would appreciate another opinion.

Conceptually, we'd like every sequence input before the first to be zero. But I noticed that the implementation pads _every_ Conv1d input with zeros, not just the first one. In my opinion, this is incorrect behavior for each layer beyond the first.

Here is a diagram of the issue.

<img width="660" alt="Screen Shot 2022-12-31 at 19 14 15" src="https://user-images.githubusercontent.com/933687/210158072-dc81870d-5e07-42e5-9b2b-3738562613bb.png">

The triangles represent padded inputs. The bottom row (sequence input) is padded with 0, which is correct. However, the first layer's outputs are also padded with 0 (red triangles) before feeding to the next layer. I think we should instead pad with a constant vector, the result of convolving an all-zero receptive field. (Resulting in conv1's bias term.)

Similarly, the next layer up should be padded with a constant vector, whose value is the result of convolving a receptive field with a constant value (the padding of the previous layer).

**Impact:** A network with receptive field $r$ will produce incorrect results prior to the $r$-th input. "Incorrect" in this case means at least inconsistent with its behavior in the steady state, far from the beginning of the input. This might be especially important with long receptive fields, where sequences are similar in length to the receptive field, because a substantial portion of the training examples will be using these wrong padding values.

Here's a simple test case that demonstrates that prepending a sequence of zeros to the input changes the output.

```python
def test_tcn():
    torch.manual_seed(42)
    def init_weights(m):
        if isinstance(m, nn.Conv1d):
            if hasattr(m, 'weight_g'):
                # weight_norm was applied to this layer
                torch.nn.init.uniform_(m.weight_g)
                torch.nn.init.uniform_(m.weight_v)
                # XXX: not sure if this is correct way to initialize
            else:
                torch.nn.init.uniform_(m.weight)
            torch.nn.init.uniform_(m.bias)

    with torch.no_grad():
        net = tcn.TemporalConvNet(num_inputs=1, num_channels=[2, 1], kernel_size=2, dropout=0)
        net.apply(init_weights)
        print("Receptive field", net.receptive_field_size)

        for i in range(8):
            print(f"Padding with {i} zeros:",
                  net(torch.Tensor([[ [0] * i + [1] ]])))

        print("Zero input response:", net(torch.Tensor([[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]])))
```
```
Receptive field 7
Padding with 0 zeros: tensor([[[2.1018]]])
Padding with 1 zeros: tensor([[[1.3458, 2.2364]]])
Padding with 2 zeros: tensor([[[1.3458, 1.4805, 2.4149]]])
Padding with 3 zeros: tensor([[[1.3458, 1.4805, 1.6590, 2.4309]]])
Padding with 4 zeros: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 2.4466]]])
Padding with 5 zeros: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 1.6907, 2.4550]]])
Padding with 6 zeros: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 1.6907, 1.6991, 2.4550]]])
Padding with 7 zeros: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 1.6907, 1.6991, 1.6991, 2.4550]]])

Zero input response: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 1.6907, 1.6991, 1.6991, 1.6991,
          1.6991, 1.6991, 1.6991, 1.6991]]])
```

Clearly this TCN implementation is still able to achieve great results, so I am not yet sure of the practical impact. I'll experiment with changing it for my application.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero padding - possibly incorrect behavior? #76

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Zero padding - possibly incorrect behavior? #76

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions