Skip to content

Conversation

@adenzler-nvidia
Copy link
Collaborator

I don't need this change in 100%. I just came across this, tried it and I can see a slight edge in benchmarks/kernel timings. Nothing compared to the bigger things we still have to optimize though.

That being said, apart from the interface change I think this has only benefits, as it removes runtime square root instructions in some kernel that run very often (linesearch) and sqrt throughput is generally lower than multiplication. My intuition and a very quick experiment tell me that a non-default impratio that doesn't have a trivial solution to sqrt makes these gains more visible.

Numbers for humanoid on an RTX Pro 6000 Blackwell.

impratio == 1:

main:

Summary for 8192 parallel rollouts

Total JIT time: 0.31 s
Total simulation time: 2.29 s
Total steps per second: 3,580,815
Total realtime factor: 17,904.08 x
Total time per step: 279.27 ns
Total converged worlds: 8192 / 8192

this pr:

Summary for 8192 parallel rollouts

Total JIT time: 0.31 s
Total simulation time: 2.28 s
Total steps per second: 3,591,947
Total realtime factor: 17,959.73 x
Total time per step: 278.40 ns
Total converged worlds: 8192 / 8192

Impratio == 7:

main:

Summary for 8192 parallel rollouts

Total JIT time: 0.32 s
Total simulation time: 2.62 s
Total steps per second: 3,128,554
Total realtime factor: 15,642.77 x
Total time per step: 319.64 ns
Total converged worlds: 8192 / 8192

this pr:

Summary for 8192 parallel rollouts

Total JIT time: 0.30 s
Total simulation time: 2.60 s
Total steps per second: 3,145,472
Total realtime factor: 15,727.36 x
Total time per step: 317.92 ns
Total converged worlds: 8192 / 8192

Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
@adenzler-nvidia adenzler-nvidia merged commit 9cfbf7f into google-deepmind:main Dec 8, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants