Type instabilities lead to insane number of CPU allocations on grouped convolutions

```
Julia 1.9.1
  [052768ef] CUDA v4.4.0
  [872c559c] NNlib v0.9.1
```

Here is the MWE
```
using CUDA
using NNlib

function mwe()
    channels = 256
    x = rand(Float32,1024, channels, 64)
    w = rand(Float32,2, 1, channels)
    @info "NNlib.conv"
    NNlib.conv(x, w, groups=channels);
    @time NNlib.conv(x, w, groups=channels);
    @info "NNlib.depthwiseconv"
    NNlib.depthwiseconv(x,w);
    @time NNlib.depthwiseconv(x,w);
    @info "Done"
end
```
Result of above is run twice

```
julia> DepthwiseMWE.mwe()
[ Info: NNlib.conv
  0.031946 seconds (12.84 k allocations: 82.142 MiB, 10.82% gc time)
[ Info: NNlib.depthwiseconv
  0.032803 seconds (70 allocations: 79.931 MiB, 19.57% gc time)
[ Info: Done

julia> DepthwiseMWE.mwe()
[ Info: NNlib.conv
  0.031491 seconds (12.84 k allocations: 82.142 MiB, 30.70% gc time)
[ Info: NNlib.depthwiseconv
  0.029980 seconds (69 allocations: 79.931 MiB, 18.81% gc time)
[ Info: Done

```
Expected result ~70 CPU allocations, not 128400 CPU allocations, in a deeper network it puts considerable pressure on GC and kills performance.

I tried `depthwiseconv` in my code but it has another problem that it's not GPU friendly.

So it's either making depathwiseconv GPU friendly or fixing insane allocations of `conv`.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Type instabilities lead to insane number of CPU allocations on grouped convolutions #520

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Type instabilities lead to insane number of CPU allocations on grouped convolutions #520

Description

Activity

mashu commented on Jul 5, 2023

ToucheSir commented on Jul 5, 2023

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions