Add bit transpose operations by robert3005 · Pull Request #6928 · vortex-data/vortex

robert3005 · 2026-03-13T00:22:27Z

Logic to convert bit buffers into transpose layout. This is useful where
intermediary arrays are in transpose layout

In particular this is necessary to handle DeltaArray validity correctly

On Zen 5 the bmi instructions are 20-30% faster, VBMI are ~10-20x faster
On Zen3 bmi instructions are still around 20-30% faster
On M4 the neon version is ~60-100% faster while untranspose stays the same

Full benchmark results for posterity
on Zen 5 machine (m8azn)

Compiling fastlanes v0.5.0 (/home/ubuntu/fastlanes)
    Finished `bench` profile [optimized] target(s) in 0.56s
     Running benches/bit_transpose.rs (target/release/deps/bit_transpose-dd4b19170a0386ff)
Timer precision: 10 ns
bit_transpose                      fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ transpose_scalar                59.8 ns       │ 1.059 µs      │ 69.8 ns       │ 75.7 ns       │ 100     │ 100
├─ transpose_scalar_throughput     8.349 µs      │ 15.75 µs      │ 8.369 µs      │ 8.443 µs      │ 100     │ 100
├─ untranspose_scalar              39.8 ns       │ 40.73 ns      │ 40.11 ns      │ 39.99 ns      │ 100     │ 3200
├─ untranspose_scalar_throughput   8.889 µs      │ 14.26 µs      │ 8.909 µs      │ 8.985 µs      │ 100     │ 100
╰─ x86                                           │               │               │               │         │
   ├─ transpose_bmi2               29.64 ns      │ 30.26 ns      │ 29.95 ns      │ 29.94 ns      │ 100     │ 6400
   ├─ transpose_bmi2_throughput    6.309 µs      │ 10.92 µs      │ 6.339 µs      │ 6.386 µs      │ 100     │ 100
   ├─ transpose_vbmi               3.608 ns      │ 3.667 ns      │ 3.628 ns      │ 3.631 ns      │ 100     │ 51200
   ├─ transpose_vbmi_throughput    504.8 ns      │ 524.8 ns      │ 509.8 ns      │ 508 ns        │ 100     │ 200
   ├─ untranspose_bmi2             27.61 ns      │ 70.42 ns      │ 27.76 ns      │ 28.24 ns      │ 100     │ 6400
   ├─ untranspose_bmi2_throughput  6.199 µs      │ 10.57 µs      │ 6.219 µs      │ 6.262 µs      │ 100     │ 100
   ├─ untranspose_vbmi             3.589 ns      │ 3.628 ns      │ 3.608 ns      │ 3.604 ns      │ 100     │ 51200
   ╰─ untranspose_vbmi_throughput  489.8 ns      │ 504.8 ns      │ 494.8 ns      │ 496.7 ns      │ 100     │ 200

on a Zen 3 machine (c6a)

Compiling fastlanes v0.5.0 (/home/ubuntu/fastlanes)
    Finished `bench` profile [optimized] target(s) in 1.02s
     Running benches/bit_transpose.rs (target/release/deps/bit_transpose-dd4b19170a0386ff)
Timer precision: 20 ns
bit_transpose                      fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ transpose_scalar                70.65 ns      │ 962.9 ns      │ 70.97 ns      │ 81.99 ns      │ 100     │ 3200
├─ transpose_scalar_throughput     15.68 µs      │ 37.39 µs      │ 15.72 µs      │ 16.04 µs      │ 100     │ 100
├─ untranspose_scalar              72.84 ns      │ 242.2 ns      │ 75.04 ns      │ 77.92 ns      │ 100     │ 3200
├─ untranspose_scalar_throughput   16.13 µs      │ 32.4 µs       │ 16.26 µs      │ 16.69 µs      │ 100     │ 100
╰─ x86                                           │               │               │               │         │
   ├─ transpose_bmi2               58.75 ns      │ 229.4 ns      │ 59.09 ns      │ 61.18 ns      │ 100     │ 3200
   ├─ transpose_bmi2_throughput    12.33 µs      │ 27.84 µs      │ 12.41 µs      │ 12.68 µs      │ 100     │ 100
   ├─ transpose_vbmi               warning: No benchmark function registered for 'transpose_vbmi'

   ├─ transpose_vbmi_throughput    warning: No benchmark function registered for 'transpose_vbmi_throughput'

   ├─ untranspose_bmi2             57.22 ns      │ 452.2 ns      │ 58.78 ns      │ 63.58 ns      │ 100     │ 3200
   ├─ untranspose_bmi2_throughput  13.23 µs      │ 25.47 µs      │ 13.29 µs      │ 13.61 µs      │ 100     │ 100
   ├─ untranspose_vbmi             warning: No benchmark function registered for 'untranspose_vbmi'

   ╰─ untranspose_vbmi_throughput  warning: No benchmark function registered for 'untranspose_vbmi_throughput'

on a M4 Max

Finished [`bench` profile [optimized]](https://doc.rust-lang.org/cargo/reference/profiles.html#default-profiles) target(s) in 0.02s
     Running benches/bit_transpose.rs (target/release/deps/bit_transpose-9780c0103c6d2dc3)
Timer precision: 41 ns
bit_transpose                      fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ transpose_scalar                24.48 ns      │ 25.78 ns      │ 24.97 ns      │ 25.03 ns      │ 100     │ 25600
├─ transpose_scalar_throughput     5.249 µs      │ 9.166 µs      │ 5.332 µs      │ 5.545 µs      │ 100     │ 100
├─ untranspose_scalar              25.79 ns      │ 37.5 ns       │ 26.44 ns      │ 28.07 ns      │ 100     │ 12800
├─ untranspose_scalar_throughput   5.457 µs      │ 7.999 µs      │ 5.583 µs      │ 5.821 µs      │ 100     │ 100
╰─ aarch64                                       │               │               │               │         │
   ├─ transpose_neon               14.23 ns      │ 20.57 ns      │ 14.55 ns      │ 15.22 ns      │ 100     │ 25600
   ├─ transpose_neon_throughput    3.187 µs      │ 4.52 µs       │ 3.228 µs      │ 3.262 µs      │ 100     │ 200
   ├─ untranspose_neon             23.67 ns      │ 35.06 ns      │ 24.32 ns      │ 25.19 ns      │ 100     │ 25600
   ╰─ untranspose_neon_throughput  4.874 µs      │ 7.082 µs      │ 4.958 µs      │ 5.174 µs      │ 100     │ 100Finished

Signed-off-by: Robert Kruszewski github@robertk.io

Logic to convert bit buffers into transpose layout. This is useful where intermediary arrays are in transpose layout Signed-off-by: Robert Kruszewski <github@robertk.io>

encodings/fastlanes/src/bit_transpose/mod.rs

Signed-off-by: Robert Kruszewski <github@robertk.io>

encodings/fastlanes/benches/bit_transpose.rs

joseph-isaacs

LG otherwise

Signed-off-by: Robert Kruszewski <github@robertk.io>

Add bit transpose operations

6c86c9e

Logic to convert bit buffers into transpose layout. This is useful where intermediary arrays are in transpose layout Signed-off-by: Robert Kruszewski <github@robertk.io>

robert3005 commented Mar 13, 2026

View reviewed changes

encodings/fastlanes/src/bit_transpose/mod.rs Show resolved Hide resolved

robert3005 added 3 commits March 13, 2026 00:51

imports

0de79c5

Signed-off-by: Robert Kruszewski <github@robertk.io>

benchfixes

fddfdc6

Signed-off-by: Robert Kruszewski <github@robertk.io>

less

a96a103

Signed-off-by: Robert Kruszewski <github@robertk.io>

robert3005 added the changelog/feature A new feature label Mar 13, 2026

robert3005 added 2 commits March 13, 2026 01:45

comments

31068af

Signed-off-by: Robert Kruszewski <github@robertk.io>

imports

73824fc

Signed-off-by: Robert Kruszewski <github@robertk.io>

joseph-isaacs reviewed Mar 13, 2026

View reviewed changes

encodings/fastlanes/benches/bit_transpose.rs Outdated Show resolved Hide resolved

joseph-isaacs approved these changes Mar 13, 2026

View reviewed changes

robert3005 added 3 commits March 13, 2026 10:57

bench

d6119c9

Signed-off-by: Robert Kruszewski <github@robertk.io>

fill

02fdca8

Signed-off-by: Robert Kruszewski <github@robertk.io>

typo

3b380c2

Signed-off-by: Robert Kruszewski <github@robertk.io>

robert3005 merged commit 0b981a8 into develop Mar 13, 2026
54 checks passed

robert3005 deleted the rk/bittranspose branch March 13, 2026 12:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bit transpose operations#6928

Add bit transpose operations#6928
robert3005 merged 9 commits intodevelopfrom
rk/bittranspose

robert3005 commented Mar 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

joseph-isaacs left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

robert3005 commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joseph-isaacs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

robert3005 commented Mar 13, 2026 •

edited

Loading