Skip to content

Conversation

@wine99
Copy link
Collaborator

@wine99 wine99 commented Nov 18, 2025

CPU and GPU

  • llama-server -np > 1 support
  • llama-perplexity support
  • llama-bench support (-fa 1 is needed to use flash attention)
  • GPU accuracy issue on quantized models (WA: OV_GPU_DISABLE_HORIZONTAL_FC_FUSION)
  • Need to test perf: small regression on CPU, large regression on GPU

All apps broken on NPU, WIP

@github-actions github-actions bot added the ggml label Nov 18, 2025
@wine99 wine99 force-pushed the multi_seq branch 2 times, most recently from df6dac2 to d2524d0 Compare November 20, 2025 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants