Meaning of offsets

I am failing to understand how the spec interpreter emits binary offsets. Specifically, the spec interpreter emits binaries that it can parse itself, but binaryen cannot. Interestingly the spec can parse binaryen's binaries (!) so something odd is going on. Here is an example:
```wat
;; w.wat
(module
 (type $0 (func (param i32)))
 (func $branch-hints-br_if (type $0) (param $x i32)
  (block $out
   (@metadata.code.branch_hint "\00")
   (br_if $out
    (i32.const 0)
   )
  )
 )
)
```
```
$ wasm -ca w.wat -o c.wasm
$ hd c.wasm
00000000  00 61 73 6d 01 00 00 00  01 85 80 80 80 00 01 60  |.asm...........`|
00000010  01 7f 00 03 82 80 80 80  00 01 00 00 a0 80 80 80  |................|
00000020  00 19 6d 65 74 61 64 61  74 61 2e 63 6f 64 65 2e  |..metadata.code.|
00000030  62 72 61 6e 63 68 5f 68  69 6e 74 01 00 01 0a 01  |branch_hint.....|
00000040  00 0a 8f 80 80 80 00 01  89 80 80 80 00 00 02 40  |...............@|
00000050  41 00 0d 00 0b 0b                                 |A.....|
00000056
```
The single branch hint appears around `0x3c`: it is in function 0, there is a single hint, the offset is `0x0a` = 10, the hint size is 1, and the hint value is 0.

An initial mystery: when I hack the binary to change the hint offset from `0x0a`, it still works with `0x09`, producing the same correct wat output. Other values lead to expected errors. Does the spec allow some amount of "slop" in the offsets?

My larger confusion: The overview says the offset works like this:

> the |U32| byte offset of the hinted instruction, relative to the beginning of the function locals declaration

The function locals declaration is the zero right before the code. The code begins with a block at `0x4e` (2 bytes), then an `i32.const` (2 bytes). So going back from the `br_if` instruction, which is the only `0x0d`, appearing at offset `0x52`, we have a 5 byte span from the locals to the `br_if`. Should the offset not be 5, then, and not 10..?

Going back 10 from the `br_if`, we reach `0x48`. Here is that line:
```
00000040  00 0a 8f 80 80 80 00 01  89 80 80 80 00 00 02 40  |...............@|
```
The `0x0a` is the start of the code section. After that, 5 bytes for the size of the code section (btw, a one-byte LEB could work, and is what binaryen emits - this is the source of the differences between the two binaries). Then `0x01` for "one function". Then 5 bytes for the size of the function, starting at `0x48`, so an offset of 10 points there.

For comparison, here is binaryen's binary:
```
00000000  00 61 73 6d 01 00 00 00  01 05 01 60 01 7f 00 03  |.asm.......`....|
00000010  02 01 00 00 20 19 6d 65  74 61 64 61 74 61 2e 63  |.... .metadata.c|
00000020  6f 64 65 2e 62 72 61 6e  63 68 5f 68 69 6e 74 01  |ode.branch_hint.|
00000030  00 01 05 01 00 0a 0b 01  09 00 02 40 41 00 0d 00  |...........@A...|
00000040  0b 0b                                             |..|
```
Now the branch hint offset is 5, which makes sense to me (it should be the same as the first binary, as it has the same 4 bytes to skip over, the `block` and `i32.const`).

**In summary:**

* I am probably confused, but the spec seems to be using an offset from the function size LEB (10 in this case), not the local declarations (5, I believe)?
* The spec also works with an offset of 9 in a hacked-up binary, which goes to the second byte of the function size LEB.
* The spec interpreter accepts binaryen's binary too, where (IIANM) the offset is the correct one, 5. Somehow, the earlier non-zero LEBs cause a difference in the spec interpreter.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Meaning of offsets #32

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Meaning of offsets #32

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions