From e52d7436aad21f05ee27eb83a5940f543344ef4e Mon Sep 17 00:00:00 2001 From: "Chang Liu (Enterprise Products)" <9713593+chang-l@users.noreply.github.com> Date: Tue, 16 Sep 2025 10:23:43 -0700 Subject: [PATCH] Add doc for kvcache salting Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> --- docs/source/features/kvcache.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/source/features/kvcache.md b/docs/source/features/kvcache.md index 3f6b394d0e3..a6b040a8aa1 100644 --- a/docs/source/features/kvcache.md +++ b/docs/source/features/kvcache.md @@ -58,6 +58,12 @@ Property ```free_gpu_memory_fraction``` is a ratio > 0 and < 1 that specifies ho Block reuse across requests is enabled by default, but can be disabled by setting ```enable_block_reuse``` to False. +### KV Cache Salting for Secure Reuse + +KV cache salting provides a security mechanism to control which requests can reuse cached KV states. When a `cache_salt` parameter is provided with a request, the KV cache system will only allow reuse of cached blocks given the same cache salt value. This prevents potential security issues such as prompt theft attacks, where malicious users might try to infer information from cached states of other users' requests. + +To use cache salting, specify the `cache_salt` parameter as a string when creating requests. Only requests with matching cache salt values can share cached KV blocks. The salt value can be any non-empty string, such as a user ID, tenant ID, or hash string. + ### Enable Offloading to Host Memory Before a block is evicted from GPU memory, it can optionally be offloaded to host (CPU) memory. The block remains reusable until it is evicted from host memory. When an offloaded block is reused, it is first copied back into GPU memory. Offloading is controlled with property ```host_cache_size``` which specifies how much host memory (in bytes) should be allocated for offloading. The default is 0.