|
| 1 | +--- |
| 2 | +title: Whisper Memcpy |
| 3 | +linkTitle: Whisper Memcpy |
| 4 | +weight: 50 |
| 5 | +--- |
| 6 | + |
| 7 | +Is it easy to recognize vector expansions of libc functions like `memcpy`? |
| 8 | + |
| 9 | +Let's locate some explicit invocations of `memcpy` within Whisper and |
| 10 | +see what the Advisor has to say. |
| 11 | + |
| 12 | +```c++ |
| 13 | +struct whisper_context * whisper_init_from_buffer_with_params_no_state(void * buffer, size_t buffer_size, struct whisper_context_params params) { |
| 14 | + struct buf_context { |
| 15 | + uint8_t* buffer; |
| 16 | + size_t size; |
| 17 | + size_t current_offset; |
| 18 | + }; |
| 19 | + loader.read = [](void * ctx, void * output, size_t read_size) { |
| 20 | + buf_context * buf = reinterpret_cast<buf_context *>(ctx); |
| 21 | + |
| 22 | + size_t size_to_copy = buf->current_offset + read_size < buf->size ? read_size : buf->size - buf->current_offset; |
| 23 | + |
| 24 | + memcpy(output, buf->buffer + buf->current_offset, size_to_copy); |
| 25 | + buf->current_offset += size_to_copy; |
| 26 | + |
| 27 | + return size_to_copy; |
| 28 | + }; |
| 29 | +}; |
| 30 | +``` |
| 31 | +
|
| 32 | +This source example shows a few traits: |
| 33 | +
|
| 34 | +* the number of bytes to copy is not in general known at compile time |
| 35 | +* the buffer type is `uint8_t*` |
| 36 | +* there are no alignment guarantees |
| 37 | +
|
| 38 | +GCC 15 compiles the lambda stored in loader.read as |
| 39 | +`whisper_init_from_buffer_with_params_no_state::{lambda(void*,void*,unsigned_long)#1}::_FUN`. |
| 40 | +The relevant instruction sequence (trimmed of address and whitespace) is: |
| 41 | +
|
| 42 | +```as |
| 43 | +LAB_000b0be2 |
| 44 | + vsetvli a3,param_3,e8,m8,ta,ma |
| 45 | + vle8.v v8,(a4) |
| 46 | + c.sub param_3,a3 |
| 47 | + c.add a4,a3 |
| 48 | + vse8.v v8,(param_2) |
| 49 | + c.add param_2,a3 |
| 50 | + c.bnez param_3,LAB_000b0be2 |
| 51 | +``` |
| 52 | + |
| 53 | +Copying the Ghidra listing to the clipboard and running the Advisor gives us: |
| 54 | + |
| 55 | +```text |
| 56 | +Clipboard Contents to Analyze |
| 57 | +
|
| 58 | +LAB_000b0be2 XREF[1]: 000b0bf4(j) |
| 59 | +000b0be2 d7 76 36 0c vsetvli a3,param_3,e8,m8,ta,ma |
| 60 | +000b0be6 07 04 07 02 vle8.v v8,(a4) |
| 61 | +000b0bea 15 8e c.sub param_3,a3 |
| 62 | +000b0bec 36 97 c.add a4,a3 |
| 63 | +000b0bee 27 84 05 02 vse8.v v8,(param_2) |
| 64 | +000b0bf2 b6 95 c.add param_2,a3 |
| 65 | +000b0bf4 7d f6 c.bnez param_3,LAB_000b0be2 |
| 66 | +
|
| 67 | +Signatures: |
| 68 | +
|
| 69 | + Element width is = 8 bits |
| 70 | + Vector registers are grouped with MUL = 8 |
| 71 | + Vector load: vle8.v |
| 72 | + Vector load is to multiple registers |
| 73 | + Vector store: vse8.v |
| 74 | + Vector store is from multiple registers |
| 75 | + At least one loop exists |
| 76 | + Significant operations, in the order they appear: |
| 77 | + vsetvli,vle8.v,vse8.v,_loop |
| 78 | + Significant operations, in alphanumeric order: |
| 79 | + _loop,vle8.v,vse8.v,vsetvli |
| 80 | +
|
| 81 | +Similarity Analysis |
| 82 | +
|
| 83 | +Compare the clipped example to the database of vectorized examples. |
| 84 | +
|
| 85 | +The best match is id=1873 [1.000]= _loop,vle8.v,vse8.v,vsetvli |
| 86 | +
|
| 87 | +The clip is similar to the reference example data/custom_testsuite/builtins/memcpy_rv64gcv:memcpy_255 |
| 88 | +Reference C Source |
| 89 | +
|
| 90 | +void memcpy_255() |
| 91 | +{ |
| 92 | + __builtin_memcpy (to, from, 255); |
| 93 | +}; |
| 94 | +
|
| 95 | +Reference Compiled Assembly Code |
| 96 | +
|
| 97 | +65e auipc a3,0x2 |
| 98 | +662 ld a3,-1678(a3) |
| 99 | +666 auipc a2,0x0 |
| 100 | +66a addi a2,a2,82 |
| 101 | +66e li a4,255 |
| 102 | +672 vsetvli a5,a4,e8,m8,ta,ma |
| 103 | +676 vle8.v v8,(a2) |
| 104 | +67a sub a4,a4,a5 |
| 105 | +67c add a2,a2,a5 |
| 106 | +67e vse8.v v8,(a3) |
| 107 | +682 add a3,a3,a5 |
| 108 | +``` |
| 109 | + |
| 110 | +The Advisor has matched the vector instruction loop to the GCC `__builtin_memcpy` test case where the |
| 111 | +number of bytes to transfer is large (255). The individual scalar instructions are not the same. |
| 112 | + |
| 113 | +This example shows something important that we probably want to add to the Advisor's report: |
| 114 | + |
| 115 | +The `vsetvli` instruction includes the `m8` multiplier option, which means vector operations cover groups of 8 |
| 116 | +registers. The `vle8.v` only references vector register `v8`, but the loads and stores affect the 8 |
| 117 | +registers `v8` through `v15`. If the `__builtin_memcpy` appeared in an inline code fragment, where |
| 118 | +there may be more pressure on vector register availability, we might have seen very similar code |
| 119 | +with multipliers of `m4`, `m2`, or `m1`. |
| 120 | + |
| 121 | +What does the Ghidra decompiler show for this instruction sequence? |
| 122 | + |
| 123 | +```c |
| 124 | + do { |
| 125 | + lVar3 = vsetvli_e8m8tama(uVar1); |
| 126 | + auVar4 = vle8_v(lVar2); |
| 127 | + uVar1 = uVar1 - lVar3; |
| 128 | + lVar2 = lVar2 + lVar3; |
| 129 | + vse8_v(auVar4,param_2); |
| 130 | + param_2 = (void *)((long)param_2 + lVar3); |
| 131 | + } while (uVar1 != 0); |
| 132 | +``` |
| 133 | +
|
| 134 | +What would we like Ghidra's decompiler to show instead? Something like: |
| 135 | +
|
| 136 | +```c |
| 137 | +__builtin_memcpy(param_2, lvar2, uVar1); |
| 138 | +``` |
| 139 | + |
| 140 | +That's not quite correct, as `__builtin_memcpy` doesn't mutate the values `param_2` or `lvar2`. |
0 commit comments