Skip to content

Commit 7d9c1d0

Browse files
pref: benchmark results
Signed-off-by: Henry Gressmann <[email protected]>
1 parent 893396a commit 7d9c1d0

File tree

4 files changed

+69
-756
lines changed

4 files changed

+69
-756
lines changed

BENCHMARKS.md

Lines changed: 41 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,8 @@
11
# Benchmark results
22

3-
All benchmarks are run on a Ryzen 7 5800X, with 32GB of RAM, running Linux 6.6 with `intel_pstate=passive split_lock_detect=off mitigations=off`.
4-
5-
## Results
6-
7-
Coming soon.
3+
All benchmarks are run on a Ryzen 7 5800X, with 32GB of RAM, running Linux 6.6.
4+
WebAssembly files are optimized using [wasm-opt](https://github.com/WebAssembly/binaryen)
5+
and the benchmark code is available in the `benches` folder.
86

97
## WebAssembly Settings
108

@@ -20,6 +18,43 @@ All runtimes are compiled with the following settings:
2018
- `unsafe` features are enabled
2119
- `opt-level` is set to 3, `lto` is set to `thin`, `codegen-units` is set to 1.
2220

21+
## Results
22+
23+
| Benchmark | Native | TinyWasm | Wasmi | Wasmer (Single Pass) |
24+
| ------------ | ------ | -------- | -------- | -------------------- |
25+
| `argon2id` | 0.52ms | 110.08ms | 44.408ms | 4.76ms |
26+
| `fib` | 6ns | 44.76µs | 48.96µs | 52µs |
27+
| `fib-rec` | 284ns | 25.565ms | 5.11ms | 0.50ms |
28+
| `selfhosted` | 45µs | 2.18ms | 4.25ms | 258.87ms |
29+
30+
### Argon2id
31+
32+
This benchmark runs the Argon2id hashing algorithm, with 2 iterations, 1KB of memory, and 1 parallel lane.
33+
I had to decrease the memory usage from the default to 1KB, because especially the interpreters were struggling to finish in a reasonable amount of time.
34+
This is something where `simd` instructions would be really useful, and it also highlights some of the issues with the current implementation of TinyWasm's Value Stack and Memory Instances.
35+
36+
### Fib
37+
38+
The first benchmark is a simple optimized Fibonacci function, which is a good way to show the overhead of calling functions and parsing the bytecode.
39+
TinyWasm is slightly faster then Wasmi here, but that's probably because of the overhead of parsing the bytecode as TinyWasm uses a custom bytecode to pre-process the WebAssembly bytecode.
40+
41+
### Fib-Rec
42+
43+
This benchmark is a recursive Fibonacci function, which highlights some of the issues with the current implementation of TinyWasm's Call Stack.
44+
TinyWasm is a lot slower here, but that's because there's currently no way to reuse the same Call Frame for recursive calls, so a new Call Frame is allocated for every call. This is not a problem for most programs, and the upcoming `tail-call` proposal will make this a lot easier to implement.
45+
46+
### Selfhosted
47+
48+
This benchmark runs TinyWasm itself in the VM, and parses and executes the `print.wasm` example from the `examples` folder.
49+
This is a godd way to show some of TinyWasm's strengths - the code is pretty large at 702KB and Wasmer struggles massively with it, even with the Single Pass compiler. I think it's a decent real-world performance benchmark, but definitely favors TinyWasm a bit.
50+
51+
Wasmer also offers a pre-parsed module format, so keep in mind that this number could be a bit lower if that was used (but probably still on the same order of magnitude). This number seems so high that I'm not sure if I'm doing something wrong, so I will be looking into this in the future.
52+
53+
### Conclusion
54+
55+
After profiling and fixing some low hanging fruits, I found the biggest bottleneck to be Vector operations, especially for the Value Stack, and having shared access to Memory Instances using RefCell. These are the two areas I will be focusing on improving in the future, trying out to use
56+
Arena Allocation and other data structures to improve performance. Still, I'm quite happy with the results, especially considering the use of standard Rust data structures. Additionally, typed FuncHandles have a significant overhead over the untyped ones, so I will be looking into improving that as well.
57+
2358
# Running benchmarks
2459

2560
Benchmarks are run using [Criterion.rs](https://github.com/bheisler/criterion.rs). To run a benchmark, use the following command:
@@ -28,7 +63,7 @@ Benchmarks are run using [Criterion.rs](https://github.com/bheisler/criterion.rs
2863
$ cargo bench --bench <name>
2964
```
3065

31-
## Profiling
66+
# Profiling
3267

3368
To profile a benchmark, use the following command:
3469

0 commit comments

Comments
 (0)