Skip to content

fix: regression in non-fast scalar indexing support #760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

avik-pal
Copy link

@avik-pal avik-pal commented Aug 13, 2025

fixes #759

  • Jacobian support for GPU Arrays has been restored
  • ForwardDiff.gradient now supports GPU Arrays

cc @ChrisRackauckas @devmotion

Copy link

codecov bot commented Aug 13, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.07%. Comparing base (fbf48ae) to head (11540a0).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #760      +/-   ##
==========================================
+ Coverage   89.58%   90.07%   +0.49%     
==========================================
  Files          11       12       +1     
  Lines        1008     1038      +30     
==========================================
+ Hits          903      935      +32     
+ Misses        105      103       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

idxs = collect(
Iterators.drop(ForwardDiff.structural_eachindex(result), offset)
)[1:chunksize]
result[idxs] .= partial_fn.(Ref(dual), 1:chunksize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not have an inference issue due to losing static information about size? I would think this needs to be ntuple unless it can prove things about size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would still be type-stable, it would just have dynamism in the function that would slow it down a bit during the broadcast.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the chunksize is already an Int, so I don't think we will have any benefit of using an ntuple

ChrisRackauckas added a commit that referenced this pull request Aug 14, 2025
Noted in #759 (comment), GPU is completely untested in ForwardDiff.jl, so this sets up the buildkite pipeline. I setup the backend and all, and just took a few tests from #760 to seed it. The point of this isn't really to be a comprehensive set of GPU tests but rather to update this repo to have the standard tools the other repos have so GPU doesn't regress again/more.
@avik-pal avik-pal force-pushed the ap/gpu_arrays branch 3 times, most recently from 19e8423 to da2efb7 Compare August 16, 2025 00:34
@@ -26,6 +28,7 @@ CommonSubexpressions = "0.3"
DiffResults = "1.1"
DiffRules = "1.4"
DiffTests = "0.1"
GPUArraysCore = "0.1, 0.2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@avik-pal one final question: why is it necessary/useful to support both GPUArraysCore 0.1 and 0.2? Only GPUArraysCore 0.2 is tested:

Suggested change
GPUArraysCore = "0.1, 0.2"
GPUArraysCore = "0.2"

@KristofferC
Copy link
Collaborator

KristofferC commented Aug 19, 2025

In #472, the seed! (etc) functions were written in a generic (non-typespecific) way that should have supported GPU arrays. This PR adds further specializations to seed! for some specific types by using extensions. But why does the previous approach not work anymore? That one seemed better since it supports non-fast scalar indexing for arrays that are not GPU arrays.

Has it been properly explored if the existing functions can be written in an alternative way that would support both fast and non-fast scalar arrays with the same generic code (which would avoid any new extensions)?

Edit: I had missed that #739 reverted some of #472.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scalar indexing error when computing Jacobian with a CUDA.CuArray
4 participants