[qnn][bug] FP16 matmul 分配到 qnn_npu 上运行时推理崩溃

### Name and Version

测试设备：骁龙8gen3
测试模型：qwen2.5-1.5b-instruct-fp16.gguf


### Operating systems

Android

### GGML backends

QNN

### Hardware

骁龙8gen3

### Models

_No response_

### Problem description & steps to reproduce

同标题（分配到qnn_gpu上时可正常推理）

### First Bad Commit

_No response_

### Relevant log output

```shell
[qnn-npu][MUL_MATf32_1536x8960f32_1536x42f32#MUL(SILU,MUL_MAT)#MUL_MAT(NONE,MUL)#ADD(MUL_MAT,ADD)f32_1536x42f32][execute]error: QNN_GRAPH_ERROR_INVALID_HANDLE
graph_compute: ggml_backend_sched_graph_compute_async failed with error -1
process_ubatch: failed to compute graph, compute status: -1
decode: removing KV cache entries for seq_id = 0, pos = [0, +inf)
llama_decode: failed to decode, ret = -3
main : failed to eval
idx 1, name:qnn-gpu
idx 2, name:qnn-npu
FORTIFY: pthread_mutex_destroy called on a destroyed mutex (0xb4000070ae23a450)
Aborted (core dumped)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[qnn][bug] FP16 matmul 分配到 qnn_npu 上运行时推理崩溃 #46

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[qnn][bug] FP16 matmul 分配到 qnn_npu 上运行时推理崩溃 #46

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions