Fix append_paged_kv_cache call bug #56
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes a bug in the
FlashinferAttentionWrapperclass where theappend_paged_kv_cachefunction was called with incorrect parameters, resulting in aTypeError: append_paged_kv_cache() missing 1 required positional argument: 'kv_last_page_len'.Issue
forwardmethod ofFlashinferAttentionWrapper, theappend_paged_kv_cachefunction was called with incorrect parameter order and missing required arguments (batch_indicesandpositions).append_paged_kv_cacherequires the following parameters:append_key,append_value,batch_indices,positions,paged_kv_cache,kv_indices,kv_indptr,kv_last_page_len, andkv_layout.Fix
batch_indicesandpositionsin thebegin_forwardmethod:batch_indicesis computed as the sequence ID for each token, repeated for the length of the prompt or decode token.positionsis computed as the absolute position of each token in its sequence.self.append_batch_indices_tensorandself.append_positions_tensor.forwardmethod to callappend_paged_kv_cachewith the correct parameter order, including the newly computedbatch_indicesandpositions.Testing
FlashinferAttentionWrapperwith a small batch of sequences, including both prefill and decode phases.TypeErrorno longer occurs, and the attention computation completes successfully.Please review and provide feedback. Thank you!