Skip to content

Refactor to make it backend-agnostic, improved GPU execution#1

Open
urlicht wants to merge 40 commits into
flavell-lab:masterfrom
urlicht:master
Open

Refactor to make it backend-agnostic, improved GPU execution#1
urlicht wants to merge 40 commits into
flavell-lab:masterfrom
urlicht:master

Conversation

@urlicht

@urlicht urlicht commented Mar 13, 2026

Copy link
Copy Markdown
Member

Version up to v1.0.0. Make it backend-agnostic core (supporting both CPU and CUDA), improved GPU execution, added benchmarking tools, added CI.

  • 11d205c, e451d21: Refactored registration logic to be backend-agnostic and introduced CPU/CUDA extensions with optional CUDA dependency.
  • 29102e6: Removed unnecessary GPU host copies and added true in-place resampling for better performance.
  • 34af435, 3726e9f: Fixed CuArray peak-search behavior and reduced allocations via scratch-buffer optimization.
  • 095101f, 86b0cdd: Added core unit tests plus synthetic noisy-stack registration tests.
  • a4e94dc, c68b125, 0c2162f, 9445820, 6252095, f0c9fa8, e0e794e: Added benchmark suite and commit-compare tooling; improved CUDA benchmark reliability and reporting.
  • a5ac633, 7f74312, 816fbc5: Added CI workflow, adjusted CI matrix, and surfaced CI badge in README.
  • dc0c075 (plus 399f945): Bumped version to 1.0.0 and updated package author metadata.
  • performance improvements including using plan/in-place fft. testing with compare_reg_stack_translate_commits.jl(256x256x64) improvement 13.9% reduction in time from 6252095 to 497f512

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant