Commit 453df8f
committed
rebase
Signed-off-by: junq <[email protected]>File tree
3,580 files changed
+246392
-38491
lines changed- .devcontainer
- .github
- workflows
- 3rdparty
- cpp
- include/tensorrt_llm
- batch_manager
- executor
- layers
- runtime
- kernels
- fmha_v2
- xqa
- test
- micro_benchmarks
- tensorrt_llm
- batch_manager
- common
- executor
- cache_transmission
- agent_utils
- nixl_utils
- flash_mla
- kernels
- communicationKernels
- contextFusedMultiHeadAttention/cubin
- cutlass_kernels
- fp8_blockscale_gemm
- fp8_rowwise_gemm
- include
- moe_gemm
- internal_cutlass_kernels/include
- moeLoadBalance
- trtllmGenKernels
- batchedGemm
- trtllmGen_bmm_export
- cubins
- blockScaleMoe
- fmha/cubin
- weightOnlyBatchedGemv
- layers
- nanobind
- batch_manager
- executor
- thop
- plugins/mixtureOfExperts
- pybind
- batch_manager
- executor
- thop
- runtime
- moeLoadBalancer
- thop
- tests
- resources/data
- unit_tests
- batch_manager
- executor
- kernels
- routing
- sampling
- layers
- multi_gpu
- runtime
- docker
- common
- docs/source
- blogs/tech_blog
- commands/trtllm-serve
- deployment-guide
- developer-guide
- features
- auto_deploy
- advanced
- installation
- media
- models
- torch
- auto_deploy/advanced
- features
- examples
- auto_deploy
- .vscode
- disaggregated
- slurm
- benchmark
- service_discovery_example
- simple_example
- llm-api
- longbench
- medusa
- models
- contrib
- bloom
- cogvlm
- dbrx
- falcon
- gptj
- gptneox
- mmdit
- mpt
- opt
- core
- enc_dec
- internlm2
- llama
- mamba
- mixtral
- multimodal
- qwen2audio
- qwenvl
- qwen
- recurrentgemma
- whisper
- opentelemetry
- ray_orchestrator
- redrafter
- scaffolding/contrib/DeepConf
- serve
- wide_ep/slurm_scripts
- jenkins
- scripts
- scripts
- security_scanning/cpp/kernels/fmha_v2
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
3,580 files changed
+246392
-38491
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
39 | | - | |
| 39 | + | |
40 | 40 | | |
41 | 41 | | |
| 42 | + | |
42 | 43 | | |
43 | 44 | | |
44 | 45 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
159 | 159 | | |
160 | 160 | | |
161 | 161 | | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
162 | 192 | | |
163 | 193 | | |
164 | 194 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
43 | | - | |
| 43 | + | |
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
50 | 53 | | |
51 | 54 | | |
52 | 55 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
69 | | - | |
| 69 | + | |
70 | 70 | | |
71 | 71 | | |
72 | 72 | | |
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
87 | 92 | | |
88 | 93 | | |
89 | 94 | | |
| |||
- .github/scripts/build.sh+31
- .github/scripts/check_for_ngc_images.sh+65
- .github/scripts/test.sh+6
- .github/workflows/_build.yml+217
- .github/workflows/_build_in_container.yml+139
- .github/workflows/build.yml+53
- .github/workflows/build_in_container.yml+34
- .github/workflows/publish.yml+129
- README.md+42-15
- csrc/apis/attention.hpp+240
- csrc/apis/einsum.hpp+115
- csrc/apis/gemm.hpp+114-2
- csrc/apis/layout.hpp+2-2
- csrc/indexing/main.cu+15-1
- csrc/jit/device_runtime.hpp+29-3
- csrc/jit_kernels/heuristics/common.hpp+14-12
- csrc/jit_kernels/heuristics/sm100.hpp+20-2
- csrc/jit_kernels/heuristics/sm90.hpp+39-13
- csrc/jit_kernels/impls/epilogue.hpp+12
- csrc/jit_kernels/impls/runtime_utils.hpp+67-17
- csrc/jit_kernels/impls/sm100_bf16_gemm.hpp+4-5
- csrc/jit_kernels/impls/sm100_bmk_bnk_mn.hpp+137
- csrc/jit_kernels/impls/sm100_fp8_gemm_1d1d.hpp+13-5
- csrc/jit_kernels/impls/sm100_fp8_gemm_1d2d.hpp+11-4
- csrc/jit_kernels/impls/sm90_bf16_gemm.hpp+6-5
- csrc/jit_kernels/impls/sm90_bmk_bnk_mn.hpp+131
- csrc/jit_kernels/impls/sm90_fp8_gemm_1d1d.hpp+214
- csrc/jit_kernels/impls/sm90_fp8_gemm_1d2d.hpp+12-4
- csrc/jit_kernels/impls/smxx_clean_logits.hpp+79
- csrc/jit_kernels/impls/smxx_cublaslt.hpp+151
- csrc/jit_kernels/impls/smxx_fp8_mqa_logits.hpp+152
- csrc/jit_kernels/impls/smxx_fp8_paged_mqa_logits.hpp+236
- csrc/python_api.cpp+4
- csrc/utils/exception.hpp+13
- csrc/utils/system.hpp+8-1
- deep_gemm/__init__.py+13
- deep_gemm/include/deep_gemm/common/epilogue_utils.cuh+27
- deep_gemm/include/deep_gemm/common/reduction.cuh+44
- deep_gemm/include/deep_gemm/common/scheduler.cuh+28-18
- deep_gemm/include/deep_gemm/common/sm100_utils.cuh+97-6
- deep_gemm/include/deep_gemm/common/sm90_utils.cuh+53-7
- deep_gemm/include/deep_gemm/common/utils.cuh+14
- deep_gemm/include/deep_gemm/impls/sm100_bf16_gemm.cuh+237-286
- deep_gemm/include/deep_gemm/impls/sm100_bmk_bnk_mn.cuh+265
- deep_gemm/include/deep_gemm/impls/sm100_fp8_gemm_1d1d.cuh+332-406
- deep_gemm/include/deep_gemm/impls/sm100_fp8_gemm_1d2d.cuh+11-15
- deep_gemm/include/deep_gemm/impls/sm100_fp8_mqa_logits.cuh+385
- deep_gemm/include/deep_gemm/impls/sm100_fp8_paged_mqa_logits.cuh+404
- deep_gemm/include/deep_gemm/impls/sm90_bf16_gemm.cuh+76-62
- deep_gemm/include/deep_gemm/impls/sm90_bmk_bnk_mn.cuh+173
- deep_gemm/include/deep_gemm/impls/sm90_fp8_gemm_1d1d.cuh+346-1
- deep_gemm/include/deep_gemm/impls/sm90_fp8_gemm_1d2d.cuh+156-189
- deep_gemm/include/deep_gemm/impls/sm90_fp8_mqa_logits.cuh+317
- deep_gemm/include/deep_gemm/impls/sm90_fp8_paged_mqa_logits.cuh+403
- deep_gemm/include/deep_gemm/impls/smxx_clean_logits.cuh+67
- deep_gemm/testing/bench.py+3-4
- deep_gemm/utils/math.py+7-4
- setup.py+118-21
- tests/generators.py+85-33
- tests/test_attention.py+247
- tests/test_bf16.py+41-17
- tests/test_einsum.py+85
- tests/test_fp8.py+72-27
- third-party/cutlass+1-1
- CHANGELOG.md+14
- README.md+5-2
- examples/77_blackwell_fmha/CMakeLists.txt+5-3
- examples/77_blackwell_fmha/collective/fmha_fusion.hpp+2-2
- examples/python/CuTeDSL/ampere/all_reduce.py-314
- examples/python/CuTeDSL/ampere/distributed_vector_add.py-189
- include/cutlass/epilogue/collective/builders/sm100_builder.inl+8-4
- include/cutlass/epilogue/dispatch_policy.hpp+4
- include/cutlass/version.h+1-1
- media/docs/cpp/cute/02_layout_algebra.md+2-2
- media/docs/cpp/cute/03_tensor.md+1-1
- media/docs/cpp/pipeline.md+1-1
- pyproject.toml+1-1
- python/CuTeDSL/base_dsl/ast_helpers.py+35
- python/CuTeDSL/base_dsl/ast_preprocessor.py+56-49
- python/CuTeDSL/base_dsl/runtime/cuda.py+7-1
- python/CuTeDSL/base_dsl/utils/logger.py+2-1
- python/CuTeDSL/cutlass/__init__.py+2
- python/CuTeDSL/cutlass_dsl/__init__.py+2
- python/CuTeDSL/requirements.txt+1-1
- python/cutlass_cppgen/__init__.py+1-1
- python/cutlass_cppgen/backend/__init__.py
- python/cutlass_cppgen/backend/arguments.py
- python/cutlass_cppgen/backend/c_types.py
- python/cutlass_cppgen/backend/compiler.py
- python/cutlass_cppgen/backend/conv2d_operation.py
- python/cutlass_cppgen/backend/epilogue.py
- python/cutlass_cppgen/backend/evt/__init__.py
- python/cutlass_cppgen/backend/evt/backend/__init__.py
- python/cutlass_cppgen/backend/evt/backend/emitter_base.py
- python/cutlass_cppgen/backend/evt/backend/sm100_emitter.py
- python/cutlass_cppgen/backend/evt/backend/sm100_nodes.py
- python/cutlass_cppgen/backend/evt/backend/sm80_emitter.py
- python/cutlass_cppgen/backend/evt/backend/sm80_nodes.py
- python/cutlass_cppgen/backend/evt/backend/sm90_emitter.py
- python/cutlass_cppgen/backend/evt/backend/sm90_nodes.py
- python/cutlass_cppgen/backend/evt/epilogue.py
- python/cutlass_cppgen/backend/evt/frontend/__init__.py
- python/cutlass_cppgen/backend/evt/frontend/frontend_base.py
- python/cutlass_cppgen/backend/evt/frontend/python_ast.py
- python/cutlass_cppgen/backend/evt/ir/__init__.py
- python/cutlass_cppgen/backend/evt/ir/compute_nodes.py
- python/cutlass_cppgen/backend/evt/ir/dag_ir.py
- python/cutlass_cppgen/backend/evt/ir/layout_algorithm.py
- python/cutlass_cppgen/backend/evt/ir/layout_nodes.py
- python/cutlass_cppgen/backend/evt/ir/load_nodes.py
- python/cutlass_cppgen/backend/evt/ir/node.py
- python/cutlass_cppgen/backend/evt/ir/store_nodes.py
- python/cutlass_cppgen/backend/evt/ir/tensor.py
- python/cutlass_cppgen/backend/evt/passes/__init__.py
- python/cutlass_cppgen/backend/evt/passes/graph_drawer.py
- python/cutlass_cppgen/backend/evt/passes/pass_argument_type.py
- python/cutlass_cppgen/backend/evt/passes/pass_dag_2_tree.py
- python/cutlass_cppgen/backend/evt/passes/pass_fix_element_d.py
- python/cutlass_cppgen/backend/evt/passes/pass_get_impl.py
- python/cutlass_cppgen/backend/evt/passes/pass_layout_elimination.py
- python/cutlass_cppgen/backend/evt/passes/pass_manager.py
- python/cutlass_cppgen/backend/evt/passes/pass_no_op_elimination.py
- python/cutlass_cppgen/backend/evt/passes/pass_preprocess_red.py
- python/cutlass_cppgen/backend/evt/passes/pass_shape_type_propagation.py
- python/cutlass_cppgen/backend/evt/passes/smem_size_calculator.py
- python/cutlass_cppgen/backend/evt/passes/util.py
- python/cutlass_cppgen/backend/frontend.py
- python/cutlass_cppgen/backend/gemm_operation.py
- python/cutlass_cppgen/backend/library.py
- python/cutlass_cppgen/backend/memory_manager.py
- python/cutlass_cppgen/backend/operation.py
- python/cutlass_cppgen/backend/reduction_operation.py
- python/cutlass_cppgen/backend/type_hint.py
- python/cutlass_cppgen/backend/utils/__init__.py
- python/cutlass_cppgen/backend/utils/device.py
- python/cutlass_cppgen/emit/__init__.py
- python/cutlass_cppgen/emit/common.py
- python/cutlass_cppgen/emit/pytorch.py
- python/cutlass_cppgen/epilogue/__init__.py
- python/cutlass_cppgen/epilogue/epilogue.py
- python/cutlass_cppgen/epilogue/evt_ops.py
- python/cutlass_cppgen/library_defaults.py
- python/cutlass_cppgen/op/__init__.py
- python/cutlass_cppgen/op/conv.py
- python/cutlass_cppgen/op/gemm.py
- python/cutlass_cppgen/op/gemm_grouped.py
- python/cutlass_cppgen/op/op.py
- python/cutlass_cppgen/shape.py
- python/cutlass_cppgen/swizzle.py
- python/cutlass_cppgen/utils/__init__.py
- python/cutlass_cppgen/utils/check.py
- python/cutlass_cppgen/utils/datatypes.py
- python/cutlass_cppgen/utils/lazy_import.py
- python/cutlass_cppgen/utils/profiler.py
- python/cutlass_library/generator.py+2-1
- python/cutlass_library/library.py+14
- python/setup_library.py+1-1
- python/setup_pycute.py+1-1
- setup.cfg+1-1
- test/unit/gemm/device/sm100_gemm_f8_f8_f8_tensor_op_f32_blockwise.cu+30-8
0 commit comments