Skip to content

perf: optimize decode paths for Nat/Int, primitive vecs, and strings#721

Merged
lwshang merged 9 commits intomasterfrom
sat-perf-1-decode
Mar 18, 2026
Merged

perf: optimize decode paths for Nat/Int, primitive vecs, and strings#721
lwshang merged 9 commits intomasterfrom
sat-perf-1-decode

Conversation

@sasa-tomic
Copy link
Copy Markdown
Member

@sasa-tomic sasa-tomic commented Mar 18, 2026

Overview

Four decode-side optimizations, all behavior-preserving:

  1. Nat/Int deserialization bypass: for values fitting u64/i64, read LEB128 directly and call visitor.visit_u64/i64, avoiding the BigUint/BigInt → bytes → BigUint round-trip (saves 3 allocations per value).

  2. BigNum vector fast path: batch cost tracking and skip per-element type cloning/checking for Vec, Vec, and Vec with Nat wire type, mirroring the existing primitive vec fast path.

  3. PrimitiveVecAccess with IntoDeserializer: on LE platforms, decode primitive vectors via a lightweight SeqAccess that reads directly from the input byte slice using serde's IntoDeserializer, bypassing the full Deserializer and Cursor overhead.

  4. Borrowed string deserialization: use visit_borrowed_str instead of copying bytes, enabling zero-copy for &str targets.

Benchmark improvements (decode, vs previous optimized baseline):
vec_nat: 910M → 300M (-67%)
vec_nat32: 406M → 247M (-39%)
vec_nat64: 411M → 255M (-38%)
vec_int16: 411M → 251M (-39%)
btreemap: 13.3B → 11.2B (-16%)
option_list: 23M → 18M (-20%)
variant_list: 21M → 17M (-21%)

Four decode-side optimizations, all behavior-preserving:

1. Nat/Int deserialization bypass: for values fitting u64/i64, read
   LEB128 directly and call visitor.visit_u64/i64, avoiding the
   BigUint/BigInt → bytes → BigUint round-trip (saves 3 allocations
   per value).

2. BigNum vector fast path: batch cost tracking and skip per-element
   type cloning/checking for Vec<Nat>, Vec<Int>, and Vec<Int> with
   Nat wire type, mirroring the existing primitive vec fast path.

3. PrimitiveVecAccess with IntoDeserializer: on LE platforms, decode
   primitive vectors via a lightweight SeqAccess that reads directly
   from the input byte slice using serde's IntoDeserializer, bypassing
   the full Deserializer and Cursor overhead.

4. Borrowed string deserialization: use visit_borrowed_str instead of
   copying bytes, enabling zero-copy for &str targets.

Benchmark improvements (decode, vs previous optimized baseline):
  vec_nat:      910M → 300M  (-67%)
  vec_nat32:    406M → 247M  (-39%)
  vec_nat64:    411M → 255M  (-38%)
  vec_int16:    411M → 251M  (-39%)
  btreemap:    13.3B → 11.2B (-16%)
  option_list:   23M →  18M  (-20%)
  variant_list:  21M →  17M  (-21%)

Made-with: Cursor
@sasa-tomic sasa-tomic requested a review from a team as a code owner March 18, 2026 09:41
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 18, 2026

Name Max Mem (Kb) Encode Decode
blob 4_224 4_207_487 2_122_433 ($\textcolor{red}{0.00\%}$)
btreemap 75_456 ($\textcolor{red}{2.17\%}$) 531_975_781 ($\textcolor{green}{-0.00\%}$) 11_105_147_772 ($\textcolor{green}{-14.72\%}$)
nns 192 2_021_253 5_669_058 ($\textcolor{red}{0.04\%}$)
nns_list_proposal 1_216 7_013_836 ($\textcolor{red}{0.11\%}$) 65_295_437 ($\textcolor{red}{1.59\%}$)
option_list 128 ($\textcolor{red}{100.00\%}$) 716_415 ($\textcolor{red}{0.05\%}$) 17_851_091 ($\textcolor{green}{-18.62\%}$)
text 6_336 4_204_384 7_877_830 ($\textcolor{red}{0.00\%}$)
variant_list 128 ($\textcolor{red}{100.00\%}$) 711_213 16_594_674 ($\textcolor{green}{-20.01\%}$)
vec_int16 12_480 8_404_689 249_586_549 ($\textcolor{green}{-54.92\%}$)
vec_nat 11_008 ($\textcolor{red}{13.91\%}$) 67_095_666 304_518_781 ($\textcolor{green}{-63.93\%}$)
vec_nat32 24_768 16_793_297 243_295_382 ($\textcolor{green}{-55.72\%}$)
vec_nat64 49_344 33_570_495 251_684_254 ($\textcolor{green}{-54.88\%}$)
  • Parser cost: 16_174_059 ($\textcolor{green}{-0.00\%}$)
  • Extra args: 2_854_026 ($\textcolor{red}{0.55\%}$)
Click to see raw report
---------------------------------------------------

Benchmark: blob
  total:
    instructions: 6.33 M (0.00%) (change within noise threshold)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 4.21 M (no change)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 2.12 M (0.00%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap
  total:
    instructions: 11.64 B (improved by 14.15%)
    heap_increase: 1179 pages (regressed by 2.17%)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 531.98 M (-0.00%) (change within noise threshold)
    heap_increase: 159 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 11.11 B (improved by 14.72%)
    heap_increase: 1020 pages (regressed by 2.51%)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: extra_args
  total:
    instructions: 2.85 M (0.55%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: nns
  total:
    instructions: 24.70 M (0.01%) (change within noise threshold)
    heap_increase: 3 pages (no change)
    stable_memory_increase: 0 pages (no change)

  0. Parsing (scope):
    calls: 1 (no change)
    instructions: 16.17 M (-0.00%) (change within noise threshold)
    heap_increase: 3 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 2.02 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 5.67 M (0.04%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: nns_list_proposal
  total:
    instructions: 72.31 M (1.44%) (change within noise threshold)
    heap_increase: 19 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 7.01 M (0.11%) (change within noise threshold)
    heap_increase: 5 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 65.30 M (1.59%) (change within noise threshold)
    heap_increase: 14 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: option_list
  total:
    instructions: 18.57 M (improved by 18.03%)
    heap_increase: 2 pages (regressed by 100.00%)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 716.41 K (0.05%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 17.85 M (improved by 18.62%)
    heap_increase: 2 pages (regressed by 100.00%)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: text
  total:
    instructions: 12.08 M (0.00%) (change within noise threshold)
    heap_increase: 99 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 4.20 M (no change)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 7.88 M (0.00%) (change within noise threshold)
    heap_increase: 33 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: variant_list
  total:
    instructions: 17.31 M (improved by 19.35%)
    heap_increase: 2 pages (regressed by 100.00%)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 711.21 K (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 16.59 M (improved by 20.01%)
    heap_increase: 2 pages (regressed by 100.00%)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_int16
  total:
    instructions: 257.99 M (improved by 54.10%)
    heap_increase: 195 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 8.40 M (no change)
    heap_increase: 130 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 249.59 M (improved by 54.92%)
    heap_increase: 65 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_nat
  total:
    instructions: 371.62 M (improved by 59.23%)
    heap_increase: 172 pages (regressed by 13.91%)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 67.10 M (no change)
    heap_increase: 33 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 304.52 M (improved by 63.93%)
    heap_increase: 139 pages (regressed by 17.80%)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_nat32
  total:
    instructions: 260.09 M (improved by 54.07%)
    heap_increase: 387 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 16.79 M (no change)
    heap_increase: 258 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 243.30 M (improved by 55.72%)
    heap_increase: 129 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_nat64
  total:
    instructions: 285.26 M (improved by 51.77%)
    heap_increase: 771 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 33.57 M (no change)
    heap_increase: 514 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 251.68 M (improved by 54.88%)
    heap_increase: 257 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Summary:
  instructions:
    status:   Improvements detected 🟢
    counts:   [total 12 | regressed 0 | improved 7 | new 0 | unchanged 5]
    change:   [max +1.03M | p75 +415 | median -4.12M | p25 -306.18M | min -1.92B]
    change %: [max +1.44% | p75 0.00% | median -16.09% | p25 -52.34% | min -59.23%]

  heap_increase:
    status:   Regressions detected 🔴
    counts:   [total 12 | regressed 4 | improved 0 | new 0 | unchanged 8]
    change:   [max +25 | p75 +1 | median 0 | p25 0 | min 0]
    change %: [max +100.00% | p75 +5.10% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 12 | regressed 0 | improved 0 | new 0 | unchanged 12]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------

Only significant changes:
| status | name                      | calls |     ins |  ins Δ% |    HI |    HI Δ% | SMI |  SMI Δ% |
|--------|---------------------------|-------|---------|---------|-------|----------|-----|---------|
|  +/-   | btreemap                  |       |  11.64B | -14.15% | 1.18K |   +2.17% |   0 |   0.00% |
|  +/-   | btreemap::2. Decoding     |     1 |  11.11B | -14.72% | 1.02K |   +2.51% |   0 |   0.00% |
|  +/-   | option_list               |       |  18.57M | -18.03% |     2 | +100.00% |   0 |   0.00% |
|  +/-   | option_list::2. Decoding  |     1 |  17.85M | -18.62% |     2 | +100.00% |   0 |   0.00% |
|  +/-   | variant_list              |       |  17.31M | -19.35% |     2 | +100.00% |   0 |   0.00% |
|  +/-   | variant_list::2. Decoding |     1 |  16.59M | -20.01% |     2 | +100.00% |   0 |   0.00% |
|   -    | vec_nat64                 |       | 285.26M | -51.77% |   771 |    0.00% |   0 |   0.00% |
|   -    | vec_nat32                 |       | 260.09M | -54.07% |   387 |    0.00% |   0 |   0.00% |
|   -    | vec_int16                 |       | 257.99M | -54.10% |   195 |    0.00% |   0 |   0.00% |
|   -    | vec_nat64::2. Decoding    |     1 | 251.68M | -54.88% |   257 |    0.00% |   0 |   0.00% |
|   -    | vec_int16::2. Decoding    |     1 | 249.59M | -54.92% |    65 |    0.00% |   0 |   0.00% |
|   -    | vec_nat32::2. Decoding    |     1 | 243.30M | -55.72% |   129 |    0.00% |   0 |   0.00% |
|  +/-   | vec_nat                   |       | 371.62M | -59.23% |   172 |  +13.91% |   0 |   0.00% |
|  +/-   | vec_nat::2. Decoding      |     1 | 304.52M | -63.93% |   139 |  +17.80% |   0 |   0.00% |

ins = instructions, HI = heap_increase, SMI = stable_memory_increase, Δ% = percent change

---------------------------------------------------
Successfully persisted results to canbench_results.yml

lwshang and others added 6 commits March 18, 2026 10:53
Remove redundant explicit cleanup blocks after visit_seq — Compound::drop
already resets both primitive_vec_fast_path and bignum_vec_fast_path on all
paths (success and error). Restore the explanatory comment on the Drop impl.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously set_position advanced by total_bytes unconditionally.
Use access.offset (bytes actually consumed) so the cursor is correct
if the visitor short-circuits before consuming all elements.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bignum_vec_fast_path is only ever set from deserialize_seq when type
information is available, so is_untyped must be false whenever the fast
path is active. Add debug_assert to make this invariant explicit in both
deserialize_int and deserialize_nat.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_str

Both methods were identical after the visit_borrowed_str change. Delegate
deserialize_string to deserialize_str to avoid future drift.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Conflict in Compound::next_element_seed: master refactored to always set
expect_type/wire_type upfront and simplified the cost condition. Resolved
by keeping master's unconditional type assignment while extending the
is_fast check to include bignum_vec_fast_path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
is_untyped can be true with bignum_vec_fast_path active when deserializing
IDLValue (get_value_with_type sets is_untyped=true). The LEB128 fast path
is already correctly guarded by !is_untyped; the bignum fallback path works
regardless because wire_type is pre-set by the vec fast path setup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lwshang lwshang merged commit 99ef1fa into master Mar 18, 2026
11 checks passed
@lwshang lwshang deleted the sat-perf-1-decode branch March 18, 2026 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants