- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 127
Open
Labels
Description
Julia 1.9.1
[052768ef] CUDA v4.4.0
[872c559c] NNlib v0.9.1
Here is the MWE
using CUDA
using NNlib
function mwe()
channels = 256
x = rand(Float32,1024, channels, 64)
w = rand(Float32,2, 1, channels)
@info "NNlib.conv"
NNlib.conv(x, w, groups=channels);
@time NNlib.conv(x, w, groups=channels);
@info "NNlib.depthwiseconv"
NNlib.depthwiseconv(x,w);
@time NNlib.depthwiseconv(x,w);
@info "Done"
end
Result of above is run twice
julia> DepthwiseMWE.mwe()
[ Info: NNlib.conv
0.031946 seconds (12.84 k allocations: 82.142 MiB, 10.82% gc time)
[ Info: NNlib.depthwiseconv
0.032803 seconds (70 allocations: 79.931 MiB, 19.57% gc time)
[ Info: Done
julia> DepthwiseMWE.mwe()
[ Info: NNlib.conv
0.031491 seconds (12.84 k allocations: 82.142 MiB, 30.70% gc time)
[ Info: NNlib.depthwiseconv
0.029980 seconds (69 allocations: 79.931 MiB, 18.81% gc time)
[ Info: Done
Expected result ~70 CPU allocations, not 128400 CPU allocations, in a deeper network it puts considerable pressure on GC and kills performance.
I tried depthwiseconv
in my code but it has another problem that it's not GPU friendly.
So it's either making depathwiseconv GPU friendly or fixing insane allocations of conv
.
Metadata
Metadata
Assignees
Labels
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
mashu commentedon Jul 5, 2023
Closing no issue when on GPU
[-]Depthwise convolutions lead to insane number of CPU allocations or GPU version broken[/-][+]Type instabilities lead to insane number of CPU allocations on grouped convolutions[/+]ToucheSir commentedon Jul 5, 2023
Since the MWE has a lot of useful information, I'm taking the liberty of reopening this with a different focus.