Skip to content

Meaning of offsets #32

@kripken

Description

@kripken

I am failing to understand how the spec interpreter emits binary offsets. Specifically, the spec interpreter emits binaries that it can parse itself, but binaryen cannot. Interestingly the spec can parse binaryen's binaries (!) so something odd is going on. Here is an example:

;; w.wat
(module
 (type $0 (func (param i32)))
 (func $branch-hints-br_if (type $0) (param $x i32)
  (block $out
   (@metadata.code.branch_hint "\00")
   (br_if $out
    (i32.const 0)
   )
  )
 )
)
$ wasm -ca w.wat -o c.wasm
$ hd c.wasm
00000000  00 61 73 6d 01 00 00 00  01 85 80 80 80 00 01 60  |.asm...........`|
00000010  01 7f 00 03 82 80 80 80  00 01 00 00 a0 80 80 80  |................|
00000020  00 19 6d 65 74 61 64 61  74 61 2e 63 6f 64 65 2e  |..metadata.code.|
00000030  62 72 61 6e 63 68 5f 68  69 6e 74 01 00 01 0a 01  |branch_hint.....|
00000040  00 0a 8f 80 80 80 00 01  89 80 80 80 00 00 02 40  |...............@|
00000050  41 00 0d 00 0b 0b                                 |A.....|
00000056

The single branch hint appears around 0x3c: it is in function 0, there is a single hint, the offset is 0x0a = 10, the hint size is 1, and the hint value is 0.

An initial mystery: when I hack the binary to change the hint offset from 0x0a, it still works with 0x09, producing the same correct wat output. Other values lead to expected errors. Does the spec allow some amount of "slop" in the offsets?

My larger confusion: The overview says the offset works like this:

the |U32| byte offset of the hinted instruction, relative to the beginning of the function locals declaration

The function locals declaration is the zero right before the code. The code begins with a block at 0x4e (2 bytes), then an i32.const (2 bytes). So going back from the br_if instruction, which is the only 0x0d, appearing at offset 0x52, we have a 5 byte span from the locals to the br_if. Should the offset not be 5, then, and not 10..?

Going back 10 from the br_if, we reach 0x48. Here is that line:

00000040  00 0a 8f 80 80 80 00 01  89 80 80 80 00 00 02 40  |...............@|

The 0x0a is the start of the code section. After that, 5 bytes for the size of the code section (btw, a one-byte LEB could work, and is what binaryen emits - this is the source of the differences between the two binaries). Then 0x01 for "one function". Then 5 bytes for the size of the function, starting at 0x48, so an offset of 10 points there.

For comparison, here is binaryen's binary:

00000000  00 61 73 6d 01 00 00 00  01 05 01 60 01 7f 00 03  |.asm.......`....|
00000010  02 01 00 00 20 19 6d 65  74 61 64 61 74 61 2e 63  |.... .metadata.c|
00000020  6f 64 65 2e 62 72 61 6e  63 68 5f 68 69 6e 74 01  |ode.branch_hint.|
00000030  00 01 05 01 00 0a 0b 01  09 00 02 40 41 00 0d 00  |...........@A...|
00000040  0b 0b                                             |..|

Now the branch hint offset is 5, which makes sense to me (it should be the same as the first binary, as it has the same 4 bytes to skip over, the block and i32.const).

In summary:

  • I am probably confused, but the spec seems to be using an offset from the function size LEB (10 in this case), not the local declarations (5, I believe)?
  • The spec also works with an offset of 9 in a hacked-up binary, which goes to the second byte of the function size LEB.
  • The spec interpreter accepts binaryen's binary too, where (IIANM) the offset is the correct one, 5. Somehow, the earlier non-zero LEBs cause a difference in the spec interpreter.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions