[AutoDeploy] Tune KVCacheConfig of drafter for memory optimization

### Proposal to improve performance

Currently, in DraftTarget speculative decoding, we pass along the KVCacheConfig that the target model is configured with to a separate draft model KV cache. This could lead to excessive memory being reserved for draft model KV cache, when the KV cache for the draft model can be made much smaller than for the target model (based on a ratio of number of attention layers between the two models, and number of draft tokens generated).

### Report of performance regression

_No response_

### Misc discussion on performance

_No response_

### Your current environment (if you think it is necessary)

**System Information:**
- OS:
- Python version:
- CUDA version:
- GPU model(s):
- Driver version:
- TensorRT version:
- PyTorch version:
- TensorRT-LLM version:

**Detailed output:**
```text
Paste the output of the above commands here
```


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoDeploy] Tune KVCacheConfig of drafter for memory optimization #9279

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[AutoDeploy] Tune KVCacheConfig of drafter for memory optimization #9279

Description

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions