Skip to content

perf(de): inline LEB128 and add map fast paths#722

Merged
lwshang merged 6 commits intomasterfrom
perf/inlined-deserializer
Mar 18, 2026
Merged

perf(de): inline LEB128 and add map fast paths#722
lwshang merged 6 commits intomasterfrom
perf/inlined-deserializer

Conversation

@sasa-tomic
Copy link
Copy Markdown
Member

Motivation

  • Remove external binary_parser dependency for hot deserialization paths.

Solution

  • Inline LEB128 u64/i64, length, and bool reading directly in Deserializer.
  • Add text_fast_path flag to skip type checks when key types are known during map deserialization.
  • Apply fast path to map keys/values and bignum handling.

Meta

  • Removed unused BoolValue import.

Four decode-side optimizations, all behavior-preserving:

1. Nat/Int deserialization bypass: for values fitting u64/i64, read
   LEB128 directly and call visitor.visit_u64/i64, avoiding the
   BigUint/BigInt → bytes → BigUint round-trip (saves 3 allocations
   per value).

2. BigNum vector fast path: batch cost tracking and skip per-element
   type cloning/checking for Vec<Nat>, Vec<Int>, and Vec<Int> with
   Nat wire type, mirroring the existing primitive vec fast path.

3. PrimitiveVecAccess with IntoDeserializer: on LE platforms, decode
   primitive vectors via a lightweight SeqAccess that reads directly
   from the input byte slice using serde's IntoDeserializer, bypassing
   the full Deserializer and Cursor overhead.

4. Borrowed string deserialization: use visit_borrowed_str instead of
   copying bytes, enabling zero-copy for &str targets.

Benchmark improvements (decode, vs previous optimized baseline):
  vec_nat:      910M → 300M  (-67%)
  vec_nat32:    406M → 247M  (-39%)
  vec_nat64:    411M → 255M  (-38%)
  vec_int16:    411M → 251M  (-39%)
  btreemap:    13.3B → 11.2B (-16%)
  option_list:   23M →  18M  (-20%)
  variant_list:  21M →  17M  (-21%)

Made-with: Cursor
### Motivation
- Remove external `binary_parser` dependency for hot deserialization paths.

### Solution
- Inline LEB128 u64/i64, length, and bool reading directly in `Deserializer`.
- Add `text_fast_path` flag to skip type checks when key types are known during map deserialization.
- Apply fast path to map keys/values and bignum handling.

### Meta
- Removed unused `BoolValue` import.
@sasa-tomic sasa-tomic self-assigned this Mar 18, 2026
@sasa-tomic sasa-tomic changed the base branch from master to sat-perf-1-decode March 18, 2026 11:05
@sasa-tomic sasa-tomic marked this pull request as ready for review March 18, 2026 11:05
@sasa-tomic sasa-tomic requested a review from a team as a code owner March 18, 2026 11:05
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 18, 2026

Name Max Mem (Kb) Encode Decode
blob 4_224 4_207_487 2_122_059 ($\textcolor{green}{-0.02\%}$)
btreemap 75_456 531_975_925 ($\textcolor{red}{0.00\%}$) 10_185_892_820 ($\textcolor{green}{-8.28\%}$)
nns 192 2_021_253 5_627_936 ($\textcolor{green}{-0.69\%}$)
nns_list_proposal 1_216 7_016_211 ($\textcolor{red}{0.17\%}$) 61_850_053 ($\textcolor{green}{-5.25\%}$)
option_list 128 716_809 ($\textcolor{red}{0.06\%}$) 16_965_339 ($\textcolor{green}{-4.96\%}$)
text 6_336 4_204_384 7_877_508 ($\textcolor{green}{-0.00\%}$)
variant_list 128 711_213 16_149_392 ($\textcolor{green}{-2.68\%}$)
vec_int16 12_480 8_404_689 249_586_272 ($\textcolor{green}{-0.00\%}$)
vec_nat 11_008 67_095_666 277_601_791 ($\textcolor{green}{-8.84\%}$)
vec_nat32 24_768 16_793_297 243_295_105 ($\textcolor{green}{-0.00\%}$)
vec_nat64 49_344 33_570_495 251_683_977 ($\textcolor{green}{-0.00\%}$)
  • Parser cost: 16_174_059 ($\textcolor{green}{-0.00\%}$)
  • Extra args: 2_859_770 ($\textcolor{red}{0.20\%}$)
Click to see raw report
---------------------------------------------------

Benchmark: blob
  total:
    instructions: 6.33 M (-0.01%) (change within noise threshold)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 4.21 M (no change)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 2.12 M (-0.02%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap
  total:
    instructions: 10.72 B (improved by 7.90%)
    heap_increase: 1179 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 531.98 M (0.00%) (change within noise threshold)
    heap_increase: 159 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 10.19 B (improved by 8.28%)
    heap_increase: 1020 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: extra_args
  total:
    instructions: 2.86 M (0.20%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: nns
  total:
    instructions: 24.66 M (-0.16%) (change within noise threshold)
    heap_increase: 3 pages (no change)
    stable_memory_increase: 0 pages (no change)

  0. Parsing (scope):
    calls: 1 (no change)
    instructions: 16.17 M (-0.00%) (change within noise threshold)
    heap_increase: 3 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 2.02 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 5.63 M (-0.69%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: nns_list_proposal
  total:
    instructions: 68.87 M (improved by 4.73%)
    heap_increase: 19 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 7.02 M (0.17%) (change within noise threshold)
    heap_increase: 5 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 61.85 M (improved by 5.25%)
    heap_increase: 14 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: option_list
  total:
    instructions: 17.68 M (improved by 4.77%)
    heap_increase: 2 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 716.81 K (0.06%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 16.97 M (improved by 4.96%)
    heap_increase: 2 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: text
  total:
    instructions: 12.08 M (-0.00%) (change within noise threshold)
    heap_increase: 99 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 4.20 M (no change)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 7.88 M (-0.00%) (change within noise threshold)
    heap_increase: 33 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: variant_list
  total:
    instructions: 16.86 M (improved by 2.57%)
    heap_increase: 2 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 711.21 K (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 16.15 M (improved by 2.68%)
    heap_increase: 2 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_int16
  total:
    instructions: 257.99 M (-0.00%) (change within noise threshold)
    heap_increase: 195 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 8.40 M (no change)
    heap_increase: 130 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 249.59 M (-0.00%) (change within noise threshold)
    heap_increase: 65 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_nat
  total:
    instructions: 344.70 M (improved by 7.24%)
    heap_increase: 172 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 67.10 M (no change)
    heap_increase: 33 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 277.60 M (improved by 8.84%)
    heap_increase: 139 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_nat32
  total:
    instructions: 260.09 M (-0.00%) (change within noise threshold)
    heap_increase: 387 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 16.79 M (no change)
    heap_increase: 258 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 243.30 M (-0.00%) (change within noise threshold)
    heap_increase: 129 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_nat64
  total:
    instructions: 285.26 M (-0.00%) (change within noise threshold)
    heap_increase: 771 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 33.57 M (no change)
    heap_increase: 514 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 251.68 M (-0.00%) (change within noise threshold)
    heap_increase: 257 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Summary:
  instructions:
    status:   Improvements detected 🟢
    counts:   [total 12 | regressed 0 | improved 5 | new 0 | unchanged 7]
    change:   [max +5.74K | p75 -277 | median -20.06K | p25 -1.52M | min -919.25M]
    change %: [max +0.20% | p75 -0.00% | median -0.08% | p25 -4.74% | min -7.90%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 12 | regressed 0 | improved 0 | new 0 | unchanged 12]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 12 | regressed 0 | improved 0 | new 0 | unchanged 12]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------

Only significant changes:
| status | name                           | calls |     ins |  ins Δ% |    HI |  HI Δ% | SMI |  SMI Δ% |
|--------|--------------------------------|-------|---------|---------|-------|--------|-----|---------|
|   -    | variant_list                   |       |  16.86M |  -2.57% |     2 |  0.00% |   0 |   0.00% |
|   -    | variant_list::2. Decoding      |     1 |  16.15M |  -2.68% |     2 |  0.00% |   0 |   0.00% |
|   -    | nns_list_proposal              |       |  68.87M |  -4.73% |    19 |  0.00% |   0 |   0.00% |
|   -    | option_list                    |       |  17.68M |  -4.77% |     2 |  0.00% |   0 |   0.00% |
|   -    | option_list::2. Decoding       |     1 |  16.97M |  -4.96% |     2 |  0.00% |   0 |   0.00% |
|   -    | nns_list_proposal::2. Decoding |     1 |  61.85M |  -5.25% |    14 |  0.00% |   0 |   0.00% |
|   -    | vec_nat                        |       | 344.70M |  -7.24% |   172 |  0.00% |   0 |   0.00% |
|   -    | btreemap                       |       |  10.72B |  -7.90% | 1.18K |  0.00% |   0 |   0.00% |
|   -    | btreemap::2. Decoding          |     1 |  10.19B |  -8.28% | 1.02K |  0.00% |   0 |   0.00% |
|   -    | vec_nat::2. Decoding           |     1 | 277.60M |  -8.84% |   139 |  0.00% |   0 |   0.00% |

ins = instructions, HI = heap_increase, SMI = stable_memory_increase, Δ% = percent change

---------------------------------------------------
Successfully persisted results to canbench_results.yml

Base automatically changed from sat-perf-1-decode to master March 18, 2026 15:13
lwshang and others added 2 commits March 18, 2026 11:21
…st paths

Introduce `try_read_leb_u64` and `try_read_leb_i64` returning `Ok(None)`
on overflow and `Err` on genuine I/O errors (e.g. unexpected EOF).
The bignum fallback callers in `deserialize_int` and `deserialize_nat`
now use these, so a truncated input correctly surfaces an error rather
than silently falling through to a secondary bignum decode failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Keep inlined LEB128/bool/len readers and map fast paths from PR
- Keep try_read_leb_* error-propagation fix
- Take master's deserialize_string -> deserialize_str delegation
- Take master's PrimitiveVecAccess cursor fix (advance by access.offset)
- Take master's SeqAccess expect/wire_type assignment and if !is_fast restructure

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lwshang lwshang merged commit 917abf8 into master Mar 18, 2026
11 checks passed
@lwshang lwshang deleted the perf/inlined-deserializer branch March 18, 2026 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants