diff --git a/docs/source/features/kvcache.md b/docs/source/features/kvcache.md index 3f6b394d0e3..a6b040a8aa1 100644 --- a/docs/source/features/kvcache.md +++ b/docs/source/features/kvcache.md @@ -58,6 +58,12 @@ Property ```free_gpu_memory_fraction``` is a ratio > 0 and < 1 that specifies ho Block reuse across requests is enabled by default, but can be disabled by setting ```enable_block_reuse``` to False. +### KV Cache Salting for Secure Reuse + +KV cache salting provides a security mechanism to control which requests can reuse cached KV states. When a `cache_salt` parameter is provided with a request, the KV cache system will only allow reuse of cached blocks given the same cache salt value. This prevents potential security issues such as prompt theft attacks, where malicious users might try to infer information from cached states of other users' requests. + +To use cache salting, specify the `cache_salt` parameter as a string when creating requests. Only requests with matching cache salt values can share cached KV blocks. The salt value can be any non-empty string, such as a user ID, tenant ID, or hash string. + ### Enable Offloading to Host Memory Before a block is evicted from GPU memory, it can optionally be offloaded to host (CPU) memory. The block remains reusable until it is evicted from host memory. When an offloaded block is reused, it is first copied back into GPU memory. Offloading is controlled with property ```host_cache_size``` which specifies how much host memory (in bytes) should be allocated for offloading. The default is 0.