You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
q has a few fixed length buckets (Different lengths for different batch). Within one batch, sequence_length is the same and we can share one same block_mask or not.
Observations:
Different lengths across steps
_compile=False: memory stays lower/stable, fits on my GPU.
_compile=True: memory jumps a lot and I OOM.
Same length, but different block masks across batches
_compile=False still uses less memory than _compile=True.
Same length and same block mask for all steps
_compile=True gives the smallest memory footprint here.
Currently I must use _compile=False to survive the first two cases. Any tips to reduce memory further, or to make _compile=True workable without the growth?