Question about ‘kaiming_uniform_’ usage

First of all, thank you for the efficient implementation of KAN. It has been very helpful.
While reading through the code, I noticed that **_kaiming_uniform__** is called as follows:
`torch.nn.init.kaiming_uniform_(self.base_weight, a=math.sqrt(5) * self.scale_base)`
From my understanding of PyTorch's implementation, the parameter **_a_** in **_kaiming_uniform__** represents the negative slope of a LeakyReLU activation, and is used internally to compute the gain.This means that changing **_a_** affects the computed gain and thus the variance of the initialization, rather than acting as a direct scaling factor.As a result, increasing **_scale_base_** (through `a=math.sqrt(5) * scale_base`) would actually reduce the gain and therefore reduce the initialization magnitude, which seems slightly different from what one might expect if **_scale_base_** is intended as an amplitude scaling parameter.
I just want to make sure I’m interpreting the intended behavior correctly.Is the current design intentional for some stability reason, or would explicit post-initialization scaling be preferable?
Thanks again for the great work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about ‘kaiming_uniform_’ usage #72

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about ‘kaiming_uniform_’ usage #72

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions