-
Notifications
You must be signed in to change notification settings - Fork 6
Description
I am failing to understand how the spec interpreter emits binary offsets. Specifically, the spec interpreter emits binaries that it can parse itself, but binaryen cannot. Interestingly the spec can parse binaryen's binaries (!) so something odd is going on. Here is an example:
;; w.wat
(module
(type $0 (func (param i32)))
(func $branch-hints-br_if (type $0) (param $x i32)
(block $out
(@metadata.code.branch_hint "\00")
(br_if $out
(i32.const 0)
)
)
)
)
$ wasm -ca w.wat -o c.wasm
$ hd c.wasm
00000000 00 61 73 6d 01 00 00 00 01 85 80 80 80 00 01 60 |.asm...........`|
00000010 01 7f 00 03 82 80 80 80 00 01 00 00 a0 80 80 80 |................|
00000020 00 19 6d 65 74 61 64 61 74 61 2e 63 6f 64 65 2e |..metadata.code.|
00000030 62 72 61 6e 63 68 5f 68 69 6e 74 01 00 01 0a 01 |branch_hint.....|
00000040 00 0a 8f 80 80 80 00 01 89 80 80 80 00 00 02 40 |...............@|
00000050 41 00 0d 00 0b 0b |A.....|
00000056
The single branch hint appears around 0x3c
: it is in function 0, there is a single hint, the offset is 0x0a
= 10, the hint size is 1, and the hint value is 0.
An initial mystery: when I hack the binary to change the hint offset from 0x0a
, it still works with 0x09
, producing the same correct wat output. Other values lead to expected errors. Does the spec allow some amount of "slop" in the offsets?
My larger confusion: The overview says the offset works like this:
the |U32| byte offset of the hinted instruction, relative to the beginning of the function locals declaration
The function locals declaration is the zero right before the code. The code begins with a block at 0x4e
(2 bytes), then an i32.const
(2 bytes). So going back from the br_if
instruction, which is the only 0x0d
, appearing at offset 0x52
, we have a 5 byte span from the locals to the br_if
. Should the offset not be 5, then, and not 10..?
Going back 10 from the br_if
, we reach 0x48
. Here is that line:
00000040 00 0a 8f 80 80 80 00 01 89 80 80 80 00 00 02 40 |...............@|
The 0x0a
is the start of the code section. After that, 5 bytes for the size of the code section (btw, a one-byte LEB could work, and is what binaryen emits - this is the source of the differences between the two binaries). Then 0x01
for "one function". Then 5 bytes for the size of the function, starting at 0x48
, so an offset of 10 points there.
For comparison, here is binaryen's binary:
00000000 00 61 73 6d 01 00 00 00 01 05 01 60 01 7f 00 03 |.asm.......`....|
00000010 02 01 00 00 20 19 6d 65 74 61 64 61 74 61 2e 63 |.... .metadata.c|
00000020 6f 64 65 2e 62 72 61 6e 63 68 5f 68 69 6e 74 01 |ode.branch_hint.|
00000030 00 01 05 01 00 0a 0b 01 09 00 02 40 41 00 0d 00 |...........@A...|
00000040 0b 0b |..|
Now the branch hint offset is 5, which makes sense to me (it should be the same as the first binary, as it has the same 4 bytes to skip over, the block
and i32.const
).
In summary:
- I am probably confused, but the spec seems to be using an offset from the function size LEB (10 in this case), not the local declarations (5, I believe)?
- The spec also works with an offset of 9 in a hacked-up binary, which goes to the second byte of the function size LEB.
- The spec interpreter accepts binaryen's binary too, where (IIANM) the offset is the correct one, 5. Somehow, the earlier non-zero LEBs cause a difference in the spec interpreter.