Skip to content

Conversation

@kkHuang-amd
Copy link
Collaborator

@kkHuang-amd kkHuang-amd commented Nov 12, 2025

Motivation

Support fp8 kv cache in aiter-backend of AMD.

Aiter backend only support mla decode fp8 computation.

Other attention function still do bf16 computation

Modifications

Aiter backend and model runner.

Next actions

  1. Support other attention function also do fp8 computation
  2. Accuracy issue in MTP run without SGLANG_AITER_MLA_PERSIST

Accuracy Tests

Benchmarking and Profiling

Checklist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants