Precompute inverse square root of impratio #883

adenzler-nvidia · 2025-12-05T11:59:40Z

I don't need this change in 100%. I just came across this, tried it and I can see a slight edge in benchmarks/kernel timings. Nothing compared to the bigger things we still have to optimize though.

That being said, apart from the interface change I think this has only benefits, as it removes runtime square root instructions in some kernel that run very often (linesearch) and sqrt throughput is generally lower than multiplication. My intuition and a very quick experiment tell me that a non-default impratio that doesn't have a trivial solution to sqrt makes these gains more visible.

Numbers for humanoid on an RTX Pro 6000 Blackwell.

impratio == 1:

main:

Summary for 8192 parallel rollouts

Total JIT time: 0.31 s
Total simulation time: 2.29 s
Total steps per second: 3,580,815
Total realtime factor: 17,904.08 x
Total time per step: 279.27 ns
Total converged worlds: 8192 / 8192

this pr:

Summary for 8192 parallel rollouts

Total JIT time: 0.31 s
Total simulation time: 2.28 s
Total steps per second: 3,591,947
Total realtime factor: 17,959.73 x
Total time per step: 278.40 ns
Total converged worlds: 8192 / 8192

Impratio == 7:

main:

Summary for 8192 parallel rollouts

Total JIT time: 0.32 s
Total simulation time: 2.62 s
Total steps per second: 3,128,554
Total realtime factor: 15,642.77 x
Total time per step: 319.64 ns
Total converged worlds: 8192 / 8192

this pr:

Summary for 8192 parallel rollouts

Total JIT time: 0.30 s
Total simulation time: 2.60 s
Total steps per second: 3,145,472
Total realtime factor: 15,727.36 x
Total time per step: 317.92 ns
Total converged worlds: 8192 / 8192

Signed-off-by: Alain Denzler <[email protected]>

mujoco_warp/_src/io.py

Signed-off-by: Alain Denzler <[email protected]>

adenzler-nvidia added 2 commits December 5, 2025 12:24

invsqrt of impratio

e51dd17

Signed-off-by: Alain Denzler <[email protected]>

fix impratio indexing

18ec0b2

Signed-off-by: Alain Denzler <[email protected]>

thowell reviewed Dec 5, 2025

View reviewed changes

mujoco_warp/_src/io.py Outdated Show resolved Hide resolved

adenzler-nvidia added 4 commits December 8, 2025 11:29

Merge branch 'main' into dev/adenzler/impratio-invsqrt

39e9c1c

safe div in io.py

6960d96

Signed-off-by: Alain Denzler <[email protected]>

fix kernel analyzer errors

36daf96

Signed-off-by: Alain Denzler <[email protected]>

more fixes

b578d5f

Signed-off-by: Alain Denzler <[email protected]>

thowell approved these changes Dec 8, 2025

View reviewed changes

adenzler-nvidia merged commit 9cfbf7f into google-deepmind:main Dec 8, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Precompute inverse square root of impratio #883

Precompute inverse square root of impratio #883

Uh oh!

adenzler-nvidia commented Dec 5, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Precompute inverse square root of impratio #883

Precompute inverse square root of impratio #883

Uh oh!

Conversation

adenzler-nvidia commented Dec 5, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants