Speculative Decoding

## How would you like to use ModelOpt

I looked at the [documentation](https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/speculative_decoding) for speculative decoding, and there are a few things that I couldn't find in the documentation. 

1. The [support matrix](https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/speculative_decoding#support-matrix) does not list any model from the Kimi family, yet I see a [drafter for K2](https://huggingface.co/nvidia/Kimi-K2-Thinking-Eagle3) trained by Nvidia. Are Kimi models like K2.5 supported or not? If they are supported, can we please update the documentation and usage accordingly?

2. The [advanced usage](https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/speculative_decoding#advanced-usage) section showcases that we can use a model served using vllm server for generating the dataset. But for hidden state extraction, only TRT-LLM is supported, right?

3. The training section for the draft model focuses on HuggingFace, and I don't think that works for Kimi. In most case we would either generate the dataset or extract the hidden state offline for it and then train unless we have more resources. Can you please clarify the usage with an example?

### Who can help?



- ?

## System information



- Container used (if applicable): nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc8
- OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): ? Ubuntu 24.04
- CPU architecture (x86_64, aarch64): x86_64
- GPU name (e.g. H100, A100, L40S): H200
- GPU memory size: 140G
- Number of GPUs: 8
- Library versions (if applicable):
  - Python: 3.12
  - ModelOpt version or commit hash: 0.37
  - CUDA: 13.1
  - PyTorch: 2.9.1
  - Transformers: 4.57.3
  - TensorRT-LLM: 1.3.0rc8
  - TensorRT: 10.14.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speculative Decoding #1066

How would you like to use ModelOpt

Who can help?

System information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Speculative Decoding #1066

Description

How would you like to use ModelOpt

Who can help?

System information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions