System Info
when I reading the soure code , I noticed that the class member -- multimodalInput, multimodalEmbedding of request (at cpp/include/tensorrt_llm/executor/executor.h) seem unused in tensorrt-llm. Could you clarify where these two parameters come into play.
How would you like to use TensorRT-LLM
I want to run inference of a [specific model](put Hugging Face link here). I don't know how to integrate it with TensorRT-LLM or optimize it for my use case.
Specific questions:
- Model:
- Use case (e.g., chatbot, batch inference, real-time serving):
- Expected throughput/latency requirements:
- Multi-GPU setup needed:
Before submitting a new issue...