Skip to content

Convert activation functions to numpower #381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 115 commits into
base: 3.0
Choose a base branch
from

Conversation

SkibidiProduction
Copy link

@SkibidiProduction SkibidiProduction commented Jul 14, 2025

Activation implementations

  • Swapped out custom Tensor code for NumPower APIs across all functions: ReLU, LeakyReLU, ELU, GELU, HardSigmoid, SiLU, Tanh, Sigmoid, Softmax, Softplus, Softsign, ThresholdedReLU, etc.

  • Updated derivative methods to use numpower’s derivative helpers.

Tests

  • Refactored unit tests to assert against numpower outputs.

  • Adjusted tolerances and assertions to match numpower’s numeric behavior.

Documentation

  • Added/updated images under docs/images/activation-functions/ to illustrate each activation curve and its derivative using the new implementations.

  • Cleaned up corresponding markdown to reference the updated diagrams.

Code cleanup

  • Aligned naming conventions and method signatures with numpower’s API.

  • Minor style fixes (whitespace, imports, visibility).

Copy link
Member

@andrewdalpino andrewdalpino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work @apphp and @SkibidiProduction ... I think this is exactly what we need for the first round of integration with NumPower. I had a few questions and comments that may change the outcome of the PR so I'm just going to leave it at that for now until we get that sorted.

Overall, fantastic usage of unit tests and good code quality. I love to see it.

Andrew

$$

## Parameters
| # | Name | Default | Type | Description |
|---|---|---|---|---|
| 1 | alpha | 1.0 | float | The value at which leakage will begin to saturate. Ex. alpha = 1.0 means that the output will never be less than -1.0 when inactivated. |

## Size and Performance
ELU is a simple function and is well-suited for deployment on resource-constrained devices or when working with large neural networks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you come up with these size and performance details? I'm noticing that some differ from my understanding. For example, it is not necessarily true when taken in the context of all activation functions that ELU is a simple function or well-suited for resource constrained devices.

Perhaps it would actually be more confusing to offer this somewhat subjective explanation. In addition, in practice, activation functions have very little impact on the total runtime of the network - so taking the effort here to detail out their performance is somewhat distracting.

How do you feel about dropping this "size and performance" section all together, not being opinionated about individual activation functions, and instead letting the user discover the nuances of each activation function for themselves? However, if there is something truly outstanding about a particular activation functions performance characteristics, then let's make sure to include that in the description of the class. For example, ReLU is outstanding because it is the simplest activation function in the group. Maybe there's another activation function that has an associated kernel that is particularly optimized, etc.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we can remove these section at all, it was it was too subjective.
So, remove them at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes remove the section, but if there is something unique about a particular functions performance characteristics, we can put that info in the description. What do you think?

*/
public function activate(NDArray $input) : NDArray
{
// Calculate |x|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel that these comments provide enough value to justify their existence. I can understand what is going on clearly given your great usage of variables and naming.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be removed

$$

## Parameters
| # | Name | Default | Type | Description |
|---|---|---|---|---|
| 1 | alpha | 1.0 | float | The value at which leakage will begin to saturate. Ex. alpha = 1.0 means that the output will never be less than -1.0 when inactivated. |

## Size and Performance
ELU is a simple function and is well-suited for deployment on resource-constrained devices or when working with large neural networks.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we can remove these section at all, it was it was too subjective.
So, remove them at all?

*/
public function activate(NDArray $input) : NDArray
{
// Calculate |x|
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be removed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants