Skip to content

Conversation

@jadewang21
Copy link

Description

This PR fixes a bug in the FlashinferAttentionWrapper class where the append_paged_kv_cache function was called with incorrect parameters, resulting in a TypeError: append_paged_kv_cache() missing 1 required positional argument: 'kv_last_page_len'.

Issue

  • In the forward method of FlashinferAttentionWrapper, the append_paged_kv_cache function was called with incorrect parameter order and missing required arguments (batch_indices and positions).
  • The correct signature of append_paged_kv_cache requires the following parameters: append_key, append_value, batch_indices, positions, paged_kv_cache, kv_indices, kv_indptr, kv_last_page_len, and kv_layout.

Fix

  • Added computation of batch_indices and positions in the begin_forward method:
    • batch_indices is computed as the sequence ID for each token, repeated for the length of the prompt or decode token.
    • positions is computed as the absolute position of each token in its sequence.
  • Stored these values in new member variables self.append_batch_indices_tensor and self.append_positions_tensor.
  • Updated the forward method to call append_paged_kv_cache with the correct parameter order, including the newly computed batch_indices and positions.

Testing

  • I tested the updated FlashinferAttentionWrapper with a small batch of sequences, including both prefill and decode phases.
  • The TypeError no longer occurs, and the attention computation completes successfully.
  • The behavior of the attention wrapper remains consistent with the expected functionality.

Please review and provide feedback. Thank you!

…g batch_indices and positions

Signed-off-by: jadewang21 <[email protected]>
@jadewang21
Copy link
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant