-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
InvestigatingTriton backend<NV>Related to NVIDIA Triton Inference Server backend<NV>Related to NVIDIA Triton Inference Server backendbugSomething isn't workingSomething isn't workingtriagedIssue has been triaged by maintainersIssue has been triaged by maintainers
Description
System Info
Nvidia rtx 3090 ti
nvcr.io/nvidia/tritonserver:25.05-trtllm-python-py3
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Steps to reproduce the behaviour:
- take any tensorrt_llm compiled plan
- delete from config.pbtxt parameters
tokenizer_dir,xgrammar_tokenizer_info_pathorguided_decoding_backend
Expected behavior
tensorrtllm_backend should start normally
actual behavior
tensorrtllm_backend crashes with a message that a parameter is missing, although it's not used.
additional notes
TensorRT-LLM/triton_backend/inflight_batcher_llm/src/model_instance_state.cc
Lines 410 to 453 in 7b210ae
| std::optional<executor::GuidedDecodingConfig> ModelInstanceState::getGuidedDecodingConfigFromParams() | |
| { | |
| std::optional<executor::GuidedDecodingConfig> guidedDecodingConfig = std::nullopt; | |
| std::string tokenizerDir = model_state_->GetParameter<std::string>("tokenizer_dir"); | |
| std::string tokenizerInfoPath = model_state_->GetParameter<std::string>("xgrammar_tokenizer_info_path"); | |
| std::string guidedDecodingBackendStr = model_state_->GetParameter<std::string>("guided_decoding_backend"); | |
| if (!tokenizerDir.empty() && tokenizerDir != "${tokenizer_dir}") | |
| { | |
| TLLM_LOG_INFO( | |
| "Guided decoding C++ workflow does not use tokenizer_dir, this parameter will " | |
| "be ignored."); | |
| } | |
| if (guidedDecodingBackendStr.empty() || guidedDecodingBackendStr == "${guided_decoding_backend}" | |
| || tokenizerInfoPath.empty() || tokenizerInfoPath == "${xgrammar_tokenizer_info_path}") | |
| { | |
| return guidedDecodingConfig; | |
| } | |
| TLLM_CHECK_WITH_INFO(std::filesystem::exists(tokenizerInfoPath), | |
| "Xgrammar's tokenizer info path at %s does not exist.", tokenizerInfoPath.c_str()); | |
| auto const tokenizerInfo = nlohmann::json::parse(std::ifstream{std::filesystem::path(tokenizerInfoPath)}); | |
| auto const encodedVocab = tokenizerInfo["encoded_vocab"].template get<std::vector<std::string>>(); | |
| auto const tokenizerStr = tokenizerInfo["tokenizer_str"].template get<std::string>(); | |
| auto const stopTokenIds | |
| = tokenizerInfo["stop_token_ids"].template get<std::vector<tensorrt_llm::runtime::TokenIdType>>(); | |
| executor::GuidedDecodingConfig::GuidedDecodingBackend guidedDecodingBackend; | |
| if (guidedDecodingBackendStr == "xgrammar") | |
| { | |
| guidedDecodingBackend = executor::GuidedDecodingConfig::GuidedDecodingBackend::kXGRAMMAR; | |
| } | |
| else | |
| { | |
| TLLM_THROW( | |
| "Guided decoding is currently supported with 'xgrammar' backend. Invalid guided_decoding_backend parameter " | |
| "provided."); | |
| } | |
| guidedDecodingConfig | |
| = executor::GuidedDecodingConfig(guidedDecodingBackend, encodedVocab, tokenizerStr, stopTokenIds); | |
| return guidedDecodingConfig; | |
| } |
RuABraun, yaroslavklymchuk, maximka608 and equivalence1RuABraun
Metadata
Metadata
Assignees
Labels
InvestigatingTriton backend<NV>Related to NVIDIA Triton Inference Server backend<NV>Related to NVIDIA Triton Inference Server backendbugSomething isn't workingSomething isn't workingtriagedIssue has been triaged by maintainersIssue has been triaged by maintainers