v0.19.0 #4192
kaiyux
announced in
Announcements
v0.19.0
#4192
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
TensorRT-LLM Release 0.19.0
Key Features and Enhancements
examples/deepseek_v3/README.md, also to the blogdocs/source/blogs/Best_perf_practice_on_DeepSeek-R1_in_TensorRT-LLM.md.PyExecutor.PeftCacheManagersupport.AutoTunerto both Fused MoE and NVFP4 Linear operators.UserBuffersallocator.examples/deepseek_v3/README.md.tensorrt_llm._torch.auto_deploy.examples/auto_deploy/README.mdfor more details.get_statssupport.examples/llm-api/llm_mgmn_*.sh.examples/multimodal/README.md.examples/mixtral/README.md.examples/qwen2audio/README.md.examples/language_adapter/README.md.examples/stdit/README.md.examples/vit/README.md.examples/exaone/README.md.examples/gemma/README.md.examples/mmlu_llmapi.py.--quantize_lm_headoptionexamples/quantization/quantize.pyto supportlm_headquantization./metricsendpoint fortrtllm-serveto log iteration statistics.trtllm-serve.disaggServerBenchmark.trtllm-bench.fp8_blockscale_gemmis now open-sourced.ENABLE_MULTI_DEVICEandENABLE_UCXas CMake options.PyExecutorinference flow to estimatemax_num_tokensforkv_cache_manager.TLLM_OVERRIDE_LAYER_NUMandTLLM_TRACE_MODEL_FORWARDenvironment variables for debugging.init.py.API Changes
kc_cache_retention_configfrom C++executorAPI to the LLM API.BuildConfigarguments toLlmArgs.DecoderStatevia bindings and integrated it in decoder.LlmArgswithPydanticand migrated remaining pybinding configurations to Python.numNodestoParallelConfig.Fixed Issues
addCumLogProbskernel. Thanks to the contribution from @aotman in Fix Incorrect Batch Slot Usage in addCumLogProbs Kernel #2787.--extra-index-url https://pypi.nvidia.comwhen runningpip install tensorrt-llm.Infrastructure Changes
Known Issues
This discussion was created from the release v0.19.0.
Beta Was this translation helpful? Give feedback.
All reactions