-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
LLM API<NV>High-level LLM Python API & tools (e.g., trtllm-llmapi-launch) for TRTLLM inference/workflows.<NV>High-level LLM Python API & tools (e.g., trtllm-llmapi-launch) for TRTLLM inference/workflows.questionFurther information is requestedFurther information is requested
Description
System Info
hi,
i get this error with gemma3 4 it:
E1118 19:56:24.568955 1 pb_stub.cc:737] "Failed to process the request(s) for model 'gemma_chat_0_0', message: ValueError: The sum of prompt length (9351.0), query length (0) should not exceed max_num_tokens (8192)\n\nAt:\n /usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm.py(583): _check_arguments\n /usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm.py(427): generate_async\n /usr/lib/python3.12/contextlib.py(81): inner\n /usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm.py(292): generate\n /models/gemma_chat/1/model.py(60): execute\n"
i already set max_tokens to 200 and pass it as follows to tensorrtlm:
sampling_params = SamplingParams(max_tokens=max_tokens)
outs = self.llm.generate([prompt], sampling_params)
generated_text = outs[0].outputs[0].text
How would you like to use TensorRT-LLM
I want to run inference of a [specific model](put Hugging Face link here). I don't know how to integrate it with TensorRT-LLM or optimize it for my use case.
Specific questions:
- Model:
- Use case (e.g., chatbot, batch inference, real-time serving):
- Expected throughput/latency requirements:
- Multi-GPU setup needed:
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
cc @QiJune
Metadata
Metadata
Assignees
Labels
LLM API<NV>High-level LLM Python API & tools (e.g., trtllm-llmapi-launch) for TRTLLM inference/workflows.<NV>High-level LLM Python API & tools (e.g., trtllm-llmapi-launch) for TRTLLM inference/workflows.questionFurther information is requestedFurther information is requested