You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[CLI] Report KV cache memory usage in mlc_llm compile (#3221)
This PR prints out the memory usage of KV cache: MB for one token's KV cache, and the total MB for model weights + intermediate buffers + a 4K-long KV cache.
If somehow the required fields are not present in `config` and `metadata` (e.g. for an old model), we do nothing.
Sample output in CLI:
```
[2025-05-03 21:44:24] INFO model_metadata.py:94: Total memory usage without KV cache: 2254.16 MB (Parameters: 923.16 MB. Temporary buffer: 1331.00 MB)
[2025-05-03 21:44:24] INFO model_metadata.py:128: KV cache size: 0.11 MB per token in the context window
[2025-05-03 21:44:24] INFO model_metadata.py:133: Total memory usage with a 4K KV cache: 2702.16 MB
```
0 commit comments