-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Solve underallocation in VSWA+/VGQA #4667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
netanel-haber
merged 47 commits into
NVIDIA:main
from
netanel-haber:user/nhaber/fix-variable-window-size-underallocation
Jun 12, 2025
Merged
Changes from 45 commits
Commits
Show all changes
47 commits
Select commit
Hold shift + click to select a range
012f8ad
allocate blocks per window size correctly
netanel-haber d5da328
Merge branch 'main' into user/nhaber/fix-variable-window-size-underal…
netanel-haber f5265e6
simpler code path for common homogeneous models
netanel-haber f94e3c5
shorten: (b|B)locksPerWindowSize -> blocksPerWindow
netanel-haber f3c3c63
fix trivial test compile errors
netanel-haber 5e9c4de
fix non-trivial compile errors
netanel-haber 3f12bd5
fix resource manager
netanel-haber aad71d8
fix extracostmemory
netanel-haber 0caef2d
minimize diff
netanel-haber 2e086a0
fix
netanel-haber 9d867f8
fix
netanel-haber 52053f1
small fix
netanel-haber 4cf087c
dynamic batch tuning
netanel-haber f0c2427
fix tests
netanel-haber 7d40928
fix blocks_per_window
netanel-haber 7119d3a
use windowSizeToLayers for improved clarity instead of managedLayers …
netanel-haber 08545e8
docs and naming
netanel-haber ac5e649
provide free memory to calculateMaxNumBlocks as an argument, so cross…
netanel-haber e9ca7b1
remove unused imports
netanel-haber 0bccc1a
hopefully implement mpi sync
netanel-haber 6ee3096
only warn when VSWA + config.maxTokens is set
netanel-haber 93d9816
clamp maxAttentionWindowVec
netanel-haber efd35ee
fix test
netanel-haber bb6ea84
better logs
netanel-haber 2cc5566
fix windowSizeToBlocks indexing
netanel-haber 1945864
Merge branch 'main' into user/nhaber/fix-variable-window-size-underal…
netanel-haber ff7388a
fix KVCacheManagerLeafBlockWithDependentTest
netanel-haber 44e6d51
fix WindowSizeMetadata fields ordering
netanel-haber c885ace
fix KVCacheManagerVariableWindowAttentionWithReuseTest
netanel-haber 002ba6a
Changed type for maxTokens to uint64_t to avoid overflow
netanel-haber 12486a5
*multiply* by crossKvCacheFraction, not divide
netanel-haber 31e4048
fix cross manager window size
netanel-haber 717d9c0
fix test_KvCache_events_binding kvcachemanager init
netanel-haber a258b7e
fix calculate_max_num_blocks binding
netanel-haber d33883b
assert freeMemory smaller than totalMemory
netanel-haber 92ed07b
assert freeMemory smaller than totalMemory - after printing them
netanel-haber 755ebf5
metadata.allottedPrimaryBlocks / blockRequirementsPerSequence instead…
netanel-haber 34a7033
fix minor bug
netanel-haber 31f501d
actually use reduced value [blocksWorld] and assign it to blocksPrima…
netanel-haber 8c67618
[Infra] - Update JNLP container config (#5008)
chzblych f6f030b
Merge branch 'main' into user/nhaber/fix-variable-window-size-underal…
netanel-haber 11402b3
Merge branch 'main' into user/nhaber/fix-variable-window-size-underal…
netanel-haber 892ad3d
pr comments
netanel-haber 13edfc1
make logging quieter
netanel-haber 029d7e6
add ceremony
netanel-haber ae25a10
Merge branch 'main' into user/nhaber/fix-variable-window-size-underal…
netanel-haber 37c8e56
Merge branch 'main' into user/nhaber/fix-variable-window-size-underal…
netanel-haber File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.