Integrate internal changes #664

shelkesagar29 · 2025-06-30T20:20:33Z

Integrate internal changes

[executor] Add complex type support to ScalarValue

Previously, ScalarValue which represents scalar runtime value did not
support complex type. This MR adds support for complex type by making
storage union of real and complex data instaed of just real.

MLIR tests are added via constant subgraph execution.

Author: Christopher Bate [email protected]

[compiler] Enable more stablehlo.dot_general to TensorRT using
tensorrt.einsum

Previously, we relied on canonicalization of stablehlo.dot_general
to put all such contraction operations into a form that could be
converted to tensorrt.matrix_multiply. Based on recent experiments,
this can actually produce very inefficient TensorRT programs due to
the number of reshapes and transpositions that must be inserted to
coerce general stablehlo.dot_general into batched matrix
multiplications. This change enables conversion of
stablehlo.dot_general to tensorrt.einsum, and the pass and
patterns now contain configurable parameters to control whether
tensorrt.einsum is used as the primary method or only for fallback
when conversion to tensorrt.matrix_multiply is not possible.

A follow on change will revamp the Stablehlo preprocessing that we
perform on 'stablehlo.dot_general' to avoid creating inefficient
patterns and enable wider use of this pattern.

Author: Christopher Bate [email protected]

[compiler] Fix stablehlo-to-scf scalarization heuristics

Fixes an issue where float tensors in the 'before' region of converted
while loops where scalarized. The transform should only scalarize
operands which are likely to be for-style induction variables.

Author: Christopher Bate [email protected]

[compiler] NFC: Drop dead code from StablehloToExecutableTask

Author: Chris Bate [email protected]

[compiler] Add plan-promote-host-tensors-to-host-pinned pass

Adds a simple pass to promote "host" tensors to "host-pinned" tensors
in common cases where we know a tensor will be transferred between
host and device spaces. This pass runs after
plan-optimize-memory-spaces since the former is sensitive to
mismatching host spaces for patterns related to moving tranfers out of
loops.

Author: Sagar Shelke [email protected]

[executor] Handle elided dense resource elements attr during
translation

Translation to executable (which is flatbuffer) uses MLIR attr
serialization to serialize ElementsAttr. However, this doesn't work
when attr is elided dense resource and results in segfault. This MR
handles this situation by replacing elided resource with
DenseElementsAttr of all ones (true in case of boolean).

IR with elided resource is usally seen only during testing of passes
and not useful for e2e functional execution. Testing of
ExecuteConstantFoldableSubgraphs pass is such case. Thus, MLIR test
cases for this pass are added.

Author: Chris Bate [email protected]

[tensorrt] Fix TRT layer name generation function

The TRT layer naming had some faulty logic that could cause the layer
name to grow very large in the process to create a unique name. Fix
the issue and use a static counter to reduce time spent in the loop.

Author: Christopher Bate [email protected]

Further fixes to LIT configs

Previously, we were setting lit_config.parallelism_group instead of
config.parallelism_group. Apparently, the previous method does
nothing, only config.parallelism_group has any effect.

Author: Chris Bate [email protected]

Update LIT test parallelism configs

In more recent versions of TensorRT (10.11+ at least), the builder is
taking a much larger amount of host memory. This can cause OOM when
running the LIT test suites under their existing configurations.

This change updates all LIT configs:

Make sure to use %pick-one-gpu in the LIT command line to ensure
we stall if there are not enough GPU or host resources available.
Add a hard limit that there must be at least 5GB of host memory
available.
Update configurations to reduce the amount of estimated parallelism
by increasing host memory requirements and reducing the amount of
host memory to 50% for the purposes of the parallelism
calculation.
Force all tests to use a common parallelism group unless otherwise
specified in the test config.

Author: Christopher Bate [email protected]

[compiler] Fix failure case in stablehlo-to-scf

Fixes a failure case due to one of the recently introduced rewrites in
stablehlo-to-scf.

Author: Christopher Bate [email protected]

[compiler] Further improvements to plan bufferization pipeline

Split plan-assign-memory-spaces into three passes:
- plan-assign-memory-spaces
- plan-optimize-memory-spaces
- plan-materialize-explicit-transfers
The last one is the only new code:
plan-materialize-explicit-transfers converts tensor.cast ops
that change the memory space encoding into explicit
bufferization.alloc_tensor +
bufferization.materialize_in_destination operations.
Improve handling of bufferization.alloc_tensor and optimization of
scf.for iteration args in plan-assign-memory-spaces.
Improve handling of tensor.reshape in plan-assign-memory-spaces.
Fix handling of tensor.reshape when rewriting functions to be in
DPS style in plan-alloc-tensors.

This change also updates the LLVM dependencies in order to cherry-pick
fix to the tensor.reshape bufferization interface that I merged
upstream (llvm/llvm-project#128590).

In addition, fix APInt assertions in
plan-execute-constant-foldable-subgraphs.

Author: Chris Bate [email protected]

[compiler] Enable While-to-For conversion in Stablehlo-to-Scf pass

This change adds some patterns to the Stablehlo-to-Scf pass to enable
While-to-For conversion after the Stablehlo-to-Scf conversion. This
transformation is combined with the Stablehlo-to-Scf conversion
because the While-to-For patterns require first scalarizing block
arguments of the While operation. The heuristics for which block
arguments should be scalarized are implemented as control callbacks
for the scalarization patterns. These callbacks need
Stablehlo-specific logic, so it makes sense to test the combined
conversion as a single pass. From the pass users' perspective, it
gives the appearence of going directly from stablehlo.while to
scf.for.

The test cases are updated to cover the new patterns.

Author: Chris Bate [email protected]

[compiler] Fix assign-memory-spaces pass to respect function-level
constraints

Fixes an issue where the plan.memory_space attribute on a function
was not being respected when converting function signatures.

MR: initialdl/mlir-tensorrt!2146

Author: Chris Bate [email protected]

[compiler] Update scf.while detensorization to increase flexibility

In order to incorporate the upstream "uplift scf.while to scf.for"
transformation as part of the stablehlo-to-scf conversion, we need
to detensorize the operands of scf.while that are likely to
correspond to the loop induction variable. This change refactors our
existing 'scf.while' detensorization transformation to give more
flexibility and control. The TensorKindAnalysis is no longer required
in order to use the pattern(s). Detensorization of after and
before arguments of scf.while are now controlled separately.

Author: Chris Bate [email protected]

[compiler] Improve handling of memory space constraints in the Plan
dialect

This Plan dialect. Constraints are now specified using a common
attribute 'plan.memory_space' that can be applied to functions or
individual arguments/results. In addition, patterns in
plan-alloc-tensors and plan-assign-memory-spaces are updated to
avoid introducing unnecessary transfers between memory spaces.

Author: Chris Bate [email protected]

[compiler] Add plan-buffer-results-to-out-params pass

This change adds a new Plan dialect pass
plan-buffer-results-to-out-params. This pass is based on the
upstream Bufferization pass buffer-results-to-out-params, but it can
handle a wider number of cases (such as promoting dynamic allocations)
and uses alias analysis utilities to guard against failure cases that
the upstream pass currently cannot handle. These improvements should
eventually be upstreamed back to the Bufferization dialect.

Author: Chris Bate [email protected]

[compiler] Update func conversion in host-to-emitc

In the EmitC conversion/translation process, you can use func.func
or emitc.func to define functions. Previously, we converted all
func.func to emitc.func. However, emitc.func does not have a
path for supporting multiple return values. Therefore, prefer use of
type conversions on func.func instead of converting the entire op to
emitc.func. Add tests to verify that we can support multiple return
values.

Author: Chris Bate [email protected]

[compiler] Fix two host-to-emitc bugs

This change fixes two bugs exposed by new 'host-to-emitc' conversion
testing:

The !emitc.size_t type does not have DataLayout information
specified upstream. Therefore, to ensure that the type can be
queried using DataLayout, we add a DataLayoutTypeInterface
external model to the type. All queries are simply mapped to
queries to the index type.
The upstream func.call conversion has a bug where it does not
correctly convert the result types of the call operation, which
can lead to a type mismatch for any type that does not have an
identity conversion.

Additional tests are added to host-to-emitc. Eventually the fixes
for both these issues should be moved upstream.

Author: Chris Bate [email protected]

[common] Add Linalg-to-loops (on tensors) implementation and
conversion pass

Adds a ToLoopsOpInterface implementation and for Linalg operations. In
addition, a conversion pass is added that converts ToLoopOpInterface
operations to loops.

Author: Chris Bate [email protected]

NFC: Move ToLoopsOpInterface to 'mlir-tensorrt-common'

Moves the ToLoopsOpInterface to the 'mlir-tensorrt-common' project.
This is in preperation for enabling the ToLoopsOpInterface on LinalgOp
(lowering while still using Tensor types) to replace the
convert-stablehlo-arith-to-scalar pipeline.

MR: initialdl/mlir-tensorrt!2137

Author: Christopher Bate [email protected]

NFC: Fix formatting across several files

Author: Chris Bate [email protected]

[executor] Introduce RuntimeSession "features" to control loading of
runtime modules

Previously, the RuntimeSession would always load all available runtime
modules. This causes some inefficiences. For example, in certain
integration tests for the Executor runtime, we don't use CUDA at all.
However, because CUDA is still initialized by default, we would still
require a GPU to be present just to run the integration test.
Furthermore, some experimental modules (e.g. Lua cublas module) are
not ready for "production" use and are only really invoked inside
special integration tests.

This change inroduces a notion of "features" to the RuntimeSession and
RuntimeSessionOptions. A feature is just a string that identifies a
particular runtime component. The particular semantic of a "feature"
depends on the the actual runtime implementation. For example, for the
LuaRuntimeSession, the feature names correspond to the available Lua
"modules" (a module is just a group of C++ Lua extension functions),
e.g. "core", "cuda", "tensorrt", etc.

The RuntimeSessionOptions gains methods for enabling/disabling
features. Certain features cause others to be added to the set
automatically, e.g. "tensorrt" and "nccl" both require "cuda" to be
added.

The API is piped through all the way to the Python bindings to allow
control of loaded modules at all levels. To preserve existing
behavior, RuntimeSessions created from Python will load all available
modules by default, but the executor-runner|mlir-tensorrt-runner
tools now require features to be explicitly specified.

Author: Christopher Bate [email protected]

NFC: Fix include guard for 'mlir-executor/Support/Status.h'

Author: Sagar Shelke [email protected]

[compiler/lib] Add stablehlo composite to call pass to pre-processing
pipeline

This MR adds StablehloLegalizeCompositeToCallPass to the
pre-processing pipeline.

MLIR test is added.

Author: Chris Bate [email protected]

[compiler] Add "default memory space" to ClusterKindAttrInterface

Adds a new method to the ClusterKindAttrInterface so that backends can
control the default tensor encoding (#plan.memory_space<..>) assigned
by the plan.assign-memory-spaces pass at a function-scope level. In
addition, we also allow an attribute to override the default space at
function argument/results. This override mechnanism was previously
lacking and will help resolve a long-standing issue where users cannot
control the memory space of arguments/results reliably.

Author: Christopher Bate [email protected]

[compiler] Fix some issues related to pipeline extension mechanism

The StablehloToExecutableTensorRTExtension had both 'disable' and an
inherited 'disabled' member variable. Delete the inherited one such it
should not have been introduced and was not bound to any option.
Further, remove unused 'extensions' vector from
CompilationTaskOptionsBase.

Author: Christopher Bate [email protected]

[executor] Fix ptrtoint and inttoptr op translation to Lua

Previously, we could generate conflicting function types (due to
pointer address space) when converting executor.ptrtoint and
executor.inttoptr ops to opaque calls. Instead, defer the conversion
to function call until the actual Lua translation point. At that point
we can generate a function name without having to consider the pointer
address space.

Author: Chris Bate [email protected]

Introduce 'MLIRTensorRTCommmon' sub-project

Certain targets need to be used across multiple sub-projects. For
example, the 'TensorRTDynamicLoader' target is used in all
sub-projects. In addition, the sub-projects need to be independently
buildable. This change introduces another sub-project under the
'common' directory where shared code can be placed. This allows us to
use find_package to declare the dependency, and downstream consumers
to meet the requirement using any number of techniques to fullfill the
'find_package' call.

Author: Chris Bate [email protected]

[compiler] Harden stablehlo.constant to arith.constant conversion

There is a utility pass that runs in the stablehlo-to-executable
pipeline that converts stablehlo.constant to arith.constant. This
pass can temporarily create invalid IR due to arith.constant not
supporting signful integer types. If the "verify-each" option is off,
then the issue will not be caught since it happens to be
self-correcting. However, the issue can still cause verification
failures while debugging. This change fixes the issue by adding a
builtin.unrealized_conversion_cast operation to bridge the type
change between signless-and-signfull integer types.

Author: Chris Bate [email protected]

Integrate LLVM at f137c3d592e96330e450a8fd63ef7e8877fc1908

Author: Christopher Bate [email protected]

Fix build with BUILD_SHARED_LIBS=ON

The new InferTensorValueRangeInterface was used without correctly
specifying the library dependency the PlanIR and StablehloExtIR
libraries.

Author: Sagar Shelke [email protected]

[compiler] Maintain output order in TensorRT engine.

For TensorRT engine conversion, first step in lowering a cluster
containing TensorRT ops is created inline group op. Operands to the
yield op (i.e. terminator) of inline group op are values from the
cluster that are used outside the cluster. These values are collected
by getting uses of each op (with op->getUses()) and checking if they
are outside the cluster. However, this use order is not deterministic
and sometimes it is desired to get yield results in a certian order.

This MR makes the following changes,

Add a function callback option named ReorderRegionOpYieldValues
to mlir::createRegionOpFromCluster method. This callback function
has signature std::function<void(SetVector<Value> &yieldValues, SmallVectorImpl<Type> &yieldTypes)> which takes cluster values used
outside the cluster (in SetVector) and their types. By default this is
set to nullptr.
TensorRTToExecutable task is used in cases where a single
func.func represents a single TensorRT engine. In this case,
ReorderRegionOpYieldValues callback is implemented to make sure
inline group op yield value order is same as func.func return values
order.

Valid MLIR test is added.

GitOrigin-RevId: 630a69d8e14506db43cfefe4be2c790f9352da4f

mlir-tensorrt/DependencyProvider.cmake

Author: Sagar Shelke <[email protected]> [executor] Add complex type support to `ScalarValue` Previously, ScalarValue which represents scalar runtime value did not support complex type. This MR adds support for complex type by making storage union of real and complex data instaed of just real. MLIR tests are added via constant subgraph execution. Author: Christopher Bate <[email protected]> [compiler] Enable more `stablehlo.dot_general` to TensorRT using `tensorrt.einsum` Previously, we relied on canonicalization of `stablehlo.dot_general` to put all such contraction operations into a form that could be converted to `tensorrt.matrix_multiply`. Based on recent experiments, this can actually produce very inefficient TensorRT programs due to the number of reshapes and transpositions that must be inserted to coerce general `stablehlo.dot_general` into batched matrix multiplications. This change enables conversion of `stablehlo.dot_general` to `tensorrt.einsum`, and the pass and patterns now contain configurable parameters to control whether `tensorrt.einsum` is used as the primary method or only for fallback when conversion to `tensorrt.matrix_multiply` is not possible. A follow on change will revamp the Stablehlo preprocessing that we perform on 'stablehlo.dot_general' to avoid creating inefficient patterns and enable wider use of this pattern. Author: Christopher Bate <[email protected]> [compiler] Fix stablehlo-to-scf scalarization heuristics Fixes an issue where float tensors in the 'before' region of converted while loops where scalarized. The transform should only scalarize operands which are likely to be for-style induction variables. Author: Christopher Bate <[email protected]> [compiler] NFC: Drop dead code from StablehloToExecutableTask Author: Chris Bate <[email protected]> [compiler] Add `plan-promote-host-tensors-to-host-pinned` pass Adds a simple pass to promote "host" tensors to "host-pinned" tensors in common cases where we know a tensor will be transferred between host and device spaces. This pass runs after `plan-optimize-memory-spaces` since the former is sensitive to mismatching host spaces for patterns related to moving tranfers out of loops. Author: Sagar Shelke <[email protected]> [executor] Handle elided dense resource elements attr during translation Translation to executable (which is flatbuffer) uses MLIR attr serialization to serialize `ElementsAttr`. However, this doesn't work when attr is elided dense resource and results in segfault. This MR handles this situation by replacing elided resource with `DenseElementsAttr` of all `one`s (`true` in case of boolean). IR with elided resource is usally seen only during testing of passes and not useful for e2e functional execution. Testing of `ExecuteConstantFoldableSubgraphs` pass is such case. Thus, MLIR test cases for this pass are added. Author: Chris Bate <[email protected]> [tensorrt] Fix TRT layer name generation function The TRT layer naming had some faulty logic that could cause the layer name to grow very large in the process to create a unique name. Fix the issue and use a static counter to reduce time spent in the loop. Author: Christopher Bate <[email protected]> Further fixes to LIT configs Previously, we were setting `lit_config.parallelism_group` instead of `config.parallelism_group`. Apparently, the previous method does nothing, only `config.parallelism_group` has any effect. Author: Chris Bate <[email protected]> Update LIT test parallelism configs In more recent versions of TensorRT (10.11+ at least), the builder is taking a much larger amount of host memory. This can cause OOM when running the LIT test suites under their existing configurations. This change updates all LIT configs: - Make sure to use `%pick-one-gpu` in the LIT command line to ensure we stall if there are not enough GPU or host resources available. Add a hard limit that there must be at least 5GB of host memory available. - Update configurations to reduce the amount of estimated parallelism by increasing host memory requirements and reducing the amount of host memory to 50% for the purposes of the parallelism calculation. - Force all tests to use a common parallelism group unless otherwise specified in the test config. Author: Christopher Bate <[email protected]> [compiler] Fix failure case in stablehlo-to-scf Fixes a failure case due to one of the recently introduced rewrites in `stablehlo-to-scf`. Author: Christopher Bate <[email protected]> [compiler] Further improvements to plan bufferization pipeline - Split `plan-assign-memory-spaces` into three passes: - `plan-assign-memory-spaces` - `plan-optimize-memory-spaces` - `plan-materialize-explicit-transfers` - The last one is the only new code: `plan-materialize-explicit-transfers` converts `tensor.cast` ops that change the memory space encoding into explicit `bufferization.alloc_tensor` + `bufferization.materialize_in_destination` operations. - Improve handling of `bufferization.alloc_tensor` and optimization of `scf.for` iteration args in `plan-assign-memory-spaces`. - Improve handling of `tensor.reshape` in `plan-assign-memory-spaces`. - Fix handling of `tensor.reshape` when rewriting functions to be in DPS style in `plan-alloc-tensors`. This change also updates the LLVM dependencies in order to cherry-pick fix to the `tensor.reshape` bufferization interface that I merged upstream (llvm/llvm-project#128590). In addition, fix APInt assertions in `plan-execute-constant-foldable-subgraphs`. Author: Chris Bate <[email protected]> [compiler] Enable While-to-For conversion in Stablehlo-to-Scf pass This change adds some patterns to the Stablehlo-to-Scf pass to enable While-to-For conversion after the Stablehlo-to-Scf conversion. This transformation is combined with the Stablehlo-to-Scf conversion because the While-to-For patterns require first scalarizing block arguments of the While operation. The heuristics for which block arguments should be scalarized are implemented as control callbacks for the scalarization patterns. These callbacks need Stablehlo-specific logic, so it makes sense to test the combined conversion as a single pass. From the pass users' perspective, it gives the appearence of going directly from `stablehlo.while` to `scf.for`. The test cases are updated to cover the new patterns. Author: Chris Bate <[email protected]> [compiler] Fix assign-memory-spaces pass to respect function-level constraints Fixes an issue where the `plan.memory_space` attribute on a function was not being respected when converting function signatures. MR: initialdl/mlir-tensorrt!2146 Author: Chris Bate <[email protected]> [compiler] Update scf.while detensorization to increase flexibility In order to incorporate the upstream "uplift scf.while to scf.for" transformation as part of the `stablehlo-to-scf` conversion, we need to detensorize the operands of `scf.while` that are likely to correspond to the loop induction variable. This change refactors our existing 'scf.while' detensorization transformation to give more flexibility and control. The TensorKindAnalysis is no longer required in order to use the pattern(s). Detensorization of `after` and `before` arguments of `scf.while` are now controlled separately. Author: Chris Bate <[email protected]> [compiler] Improve handling of memory space constraints in the Plan dialect This Plan dialect. Constraints are now specified using a common attribute 'plan.memory_space' that can be applied to functions or individual arguments/results. In addition, patterns in `plan-alloc-tensors` and `plan-assign-memory-spaces` are updated to avoid introducing unnecessary transfers between memory spaces. Author: Chris Bate <[email protected]> [compiler] Add plan-buffer-results-to-out-params pass This change adds a new Plan dialect pass `plan-buffer-results-to-out-params`. This pass is based on the upstream Bufferization pass `buffer-results-to-out-params`, but it can handle a wider number of cases (such as promoting dynamic allocations) and uses alias analysis utilities to guard against failure cases that the upstream pass currently cannot handle. These improvements should eventually be upstreamed back to the Bufferization dialect. Author: Chris Bate <[email protected]> [compiler] Update func conversion in host-to-emitc In the EmitC conversion/translation process, you can use `func.func` or `emitc.func` to define functions. Previously, we converted all `func.func` to `emitc.func`. However, `emitc.func` does not have a path for supporting multiple return values. Therefore, prefer use of type conversions on `func.func` instead of converting the entire op to `emitc.func`. Add tests to verify that we can support multiple return values. Author: Chris Bate <[email protected]> [compiler] Fix two host-to-emitc bugs This change fixes two bugs exposed by new 'host-to-emitc' conversion testing: - The `!emitc.size_t` type does not have DataLayout information specified upstream. Therefore, to ensure that the type can be queried using DataLayout, we add a DataLayoutTypeInterface external model to the type. All queries are simply mapped to queries to the `index` type. - The upstream `func.call` conversion has a bug where it does not correctly convert the result types of the call operation, which can lead to a type mismatch for any type that does not have an identity conversion. Additional tests are added to `host-to-emitc`. Eventually the fixes for both these issues should be moved upstream. Author: Chris Bate <[email protected]> [common] Add Linalg-to-loops (on tensors) implementation and conversion pass Adds a ToLoopsOpInterface implementation and for Linalg operations. In addition, a conversion pass is added that converts ToLoopOpInterface operations to loops. Author: Chris Bate <[email protected]> NFC: Move ToLoopsOpInterface to 'mlir-tensorrt-common' Moves the ToLoopsOpInterface to the 'mlir-tensorrt-common' project. This is in preperation for enabling the ToLoopsOpInterface on LinalgOp (lowering while still using Tensor types) to replace the `convert-stablehlo-arith-to-scalar` pipeline. MR: initialdl/mlir-tensorrt!2137 Author: Christopher Bate <[email protected]> NFC: Fix formatting across several files Author: Chris Bate <[email protected]> [executor] Introduce RuntimeSession "features" to control loading of runtime modules Previously, the RuntimeSession would always load all available runtime modules. This causes some inefficiences. For example, in certain integration tests for the Executor runtime, we don't use CUDA at all. However, because CUDA is still initialized by default, we would still require a GPU to be present just to run the integration test. Furthermore, some experimental modules (e.g. Lua cublas module) are not ready for "production" use and are only really invoked inside special integration tests. This change inroduces a notion of "features" to the RuntimeSession and RuntimeSessionOptions. A feature is just a string that identifies a particular runtime component. The particular semantic of a "feature" depends on the the actual runtime implementation. For example, for the LuaRuntimeSession, the feature names correspond to the available Lua "modules" (a module is just a group of C++ Lua extension functions), e.g. "core", "cuda", "tensorrt", etc. The RuntimeSessionOptions gains methods for enabling/disabling features. Certain features cause others to be added to the set automatically, e.g. "tensorrt" and "nccl" both require "cuda" to be added. The API is piped through all the way to the Python bindings to allow control of loaded modules at all levels. To preserve existing behavior, RuntimeSessions created from Python will load all available modules by default, but the `executor-runner|mlir-tensorrt-runner` tools now require features to be explicitly specified. Author: Christopher Bate <[email protected]> NFC: Fix include guard for 'mlir-executor/Support/Status.h' Author: Sagar Shelke <[email protected]> [compiler/lib] Add stablehlo composite to call pass to pre-processing pipeline This MR adds `StablehloLegalizeCompositeToCallPass` to the pre-processing pipeline. MLIR test is added. Author: Chris Bate <[email protected]> [compiler] Add "default memory space" to ClusterKindAttrInterface Adds a new method to the ClusterKindAttrInterface so that backends can control the default tensor encoding (#plan.memory_space<..>) assigned by the `plan.assign-memory-spaces` pass at a function-scope level. In addition, we also allow an attribute to override the default space at function argument/results. This override mechnanism was previously lacking and will help resolve a long-standing issue where users cannot control the memory space of arguments/results reliably. Author: Christopher Bate <[email protected]> [compiler] Fix some issues related to pipeline extension mechanism The StablehloToExecutableTensorRTExtension had both 'disable' and an inherited 'disabled' member variable. Delete the inherited one such it should not have been introduced and was not bound to any option. Further, remove unused 'extensions' vector from CompilationTaskOptionsBase. Author: Christopher Bate <[email protected]> [executor] Fix ptrtoint and inttoptr op translation to Lua Previously, we could generate conflicting function types (due to pointer address space) when converting `executor.ptrtoint` and `executor.inttoptr` ops to opaque calls. Instead, defer the conversion to function call until the actual Lua translation point. At that point we can generate a function name without having to consider the pointer address space. Author: Chris Bate <[email protected]> Introduce 'MLIRTensorRTCommmon' sub-project Certain targets need to be used across multiple sub-projects. For example, the 'TensorRTDynamicLoader' target is used in all sub-projects. In addition, the sub-projects need to be independently buildable. This change introduces another sub-project under the 'common' directory where shared code can be placed. This allows us to use `find_package` to declare the dependency, and downstream consumers to meet the requirement using any number of techniques to fullfill the 'find_package' call. Author: Chris Bate <[email protected]> [compiler] Harden `stablehlo.constant` to `arith.constant` conversion There is a utility pass that runs in the stablehlo-to-executable pipeline that converts `stablehlo.constant` to `arith.constant`. This pass can temporarily create invalid IR due to `arith.constant` not supporting signful integer types. If the "verify-each" option is off, then the issue will not be caught since it happens to be self-correcting. However, the issue can still cause verification failures while debugging. This change fixes the issue by adding a `builtin.unrealized_conversion_cast` operation to bridge the type change between signless-and-signfull integer types. Author: Chris Bate <[email protected]> Integrate LLVM at f137c3d592e96330e450a8fd63ef7e8877fc1908 Author: Christopher Bate <[email protected]> Fix build with BUILD_SHARED_LIBS=ON The new InferTensorValueRangeInterface was used without correctly specifying the library dependency the PlanIR and StablehloExtIR libraries. Author: Sagar Shelke <[email protected]> [compiler] Maintain output order in TensorRT engine. For TensorRT engine conversion, first step in lowering a cluster containing TensorRT ops is created inline group op. Operands to the yield op (i.e. terminator) of inline group op are values from the cluster that are used outside the cluster. These values are collected by getting uses of each op (with `op->getUses()`) and checking if they are outside the cluster. However, this use order is not deterministic and sometimes it is desired to get yield results in a certian order. This MR makes the following changes, 1. Add a function callback option named `ReorderRegionOpYieldValues` to `mlir::createRegionOpFromCluster` method. This callback function has signature `std::function<void(SetVector<Value> &yieldValues, SmallVectorImpl<Type> &yieldTypes)>` which takes cluster values used outside the cluster (in SetVector) and their types. By default this is set to nullptr. 2. TensorRTToExecutable task is used in cases where a single `func.func` represents a single TensorRT engine. In this case, `ReorderRegionOpYieldValues` callback is implemented to make sure inline group op yield value order is same as func.func return values order. Valid MLIR test is added. GitOrigin-RevId: 630a69d8e14506db43cfefe4be2c790f9352da4f DependencyProvider.cmake # modified: build_tools/cmake/Dependencies.cmake # modified: build_tools/patches/mlir/0005-mlir-memref-Fix-memref.global-overly-constrained-ver.patch build_tools/patches/mlir/0006-mlir-emitc-Fix-two-EmitC-bugs.patch # deleted: build_tools/patches/mlir/0008-MLIR-Remove-unnecessary-include-from-MathToEmitC.h-t.patch build_tools/patches/mlir/0009-mlir-Support-FileLineColRange-in-LLVM-debug-translat.patch build_tools/patches/mlir/0010-MLIR-Fix-LLVMIRTransforms-build-failure-125485.patch build_tools/patches/mlir/0011-MLIR-Fix-bufferization-interface-for-tensor-reshape.patch build_tools/patches/stablehlo/0001-Fix-a-couple-missing-checks-for-static-shapes-in-sta.patch build_tools/patches/stablehlo/0002-cmake-Update-usage-of-HandleLLVMOptions-and-LLVM_DEF.patch build_tools/patches/stablehlo/0003-Don-t-insert-unnecessary-arith.index_cast-ops.patch build_tools/patches/stablehlo/0004-Fix-ZeroExtent-condition-in-simplification-pattern.patch build_tools/patches/stablehlo/0005-Fix-crash-on-ComplexType-in-PointwiseToLinalgMapConv.patch build_tools/patches/stablehlo/0006-Remove-explicit-use-of-LLVMSupport.patch build_tools/patches/stablehlo/0007-Fix-circular-dependence-between-StablehloPasses-and-.patch build_tools/patches/torch_mlir/0001-cmake-Allow-finding-Stablehlo-via-find_package.patch build_tools/patches/torch_mlir/0002-Make-compatible-with-more-recent-Stablehlo-version.patch build_tools/patches/torch_mlir/0003-Fix-some-configuration-paths-in-LIT-cfg.patch common/include/mlir-tensorrt-common/CMakeLists.txt # renamed: executor/include/mlir-executor/Runtime/Backend/Lua/LuaRegistration.h -> common/include/mlir-tensorrt-common/Conversion/Passes.h # new file: common/include/mlir-tensorrt-common/Conversion/Passes.td # new file: common/include/mlir-tensorrt-common/Dialect/EmitCExt/IR/DataLayoutImpl.h common/include/mlir-tensorrt-common/Dialect/LinalgExt/Transforms/ToLoopsOpInterfaceImpl.h common/include/mlir-tensorrt-common/Interfaces/ToLoopsOpInterface.h # new file: common/include/mlir-tensorrt-common/Interfaces/ToLoopsOpInterface.td # new file: common/lib/CMakeLists.txt # new file: common/lib/Conversion/CMakeLists.txt # new file: common/lib/Conversion/ToLoops/CMakeLists.txt # new file: common/lib/Conversion/ToLoops/ConvertToLoops.cpp # new file: common/lib/Dialect/CMakeLists.txt # new file: common/lib/Dialect/EmitCExt/CMakeLists.txt # new file: common/lib/Dialect/EmitCExt/DataLayoutImpl.cpp # new file: common/lib/Dialect/LinalgExt/CMakeLists.txt # new file: common/lib/Dialect/LinalgExt/Transforms/CMakeLists.txt # new file: common/lib/Dialect/LinalgExt/Transforms/ToLoopsOpInterfaceImpl.cpp # new file: common/lib/Interfaces/CMakeLists.txt # new file: common/lib/Interfaces/ToLoopsOpInterface.cpp # new file: common/lib/Utils/CMakeLists.txt # renamed: executor/lib/Utils/TensorRTDynamicLoader/CMakeLists.txt -> common/lib/Utils/TensorRTDynamicLoader/CMakeLists.txt # renamed: executor/lib/Utils/TensorRTDynamicLoader/TensorRTDynamicLoader.cpp -> common/lib/Utils/TensorRTDynamicLoader/TensorRTDynamicLoader.cpp # modified: compiler/CMakeLists.txt # modified: compiler/include/mlir-tensorrt/Backends/Host/HostBackend.td # modified: compiler/include/mlir-tensorrt/Compiler/Extension.h # modified: compiler/include/mlir-tensorrt/Compiler/OptionsProviders.h compiler/include/mlir-tensorrt/Compiler/StablehloToExecutable/StablehloToExecutable.h compiler/include/mlir-tensorrt/Compiler/StablehloToExecutable/TensorRTExtension.h modified: compiler/include/mlir-tensorrt/Conversion/StablehloToTensorRT/StablehloToTensorRT.h compiler/include/mlir-tensorrt/Conversion/TensorRTCommon/ConvertToTensorRTCommon.h compiler/include/mlir-tensorrt/Dialect/Plan/IR/PlanDialect.td # new file: compiler/include/mlir-tensorrt/Dialect/Plan/IR/PlanEnums.h # modified: compiler/include/mlir-tensorrt/Dialect/Plan/IR/PlanInterfaces.h # modified: compiler/include/mlir-tensorrt/Dialect/Plan/IR/PlanInterfaces.td # modified: compiler/include/mlir-tensorrt/Dialect/Plan/Transforms/Passes.td # modified: compiler/include/mlir-tensorrt/InitAllDialects.h # modified: compiler/include/mlir-tensorrt/InitAllPasses.h # modified: compiler/include/mlir-tensorrt/Transforms/Transforms.h # modified: compiler/lib/Backends/Host/HostBackend.cpp # modified: compiler/lib/CAPI/Compiler/Registration/RegisterAllDialects.cpp # modified: compiler/lib/Compiler/OptionsProviders.cpp # modified: compiler/lib/Compiler/StablehloToExecutable/Passes.cpp # modified: compiler/lib/Compiler/StablehloToExecutable/StableHloInputPipelines.cpp compiler/lib/Compiler/StablehloToExecutable/StablehloToExecutable.cpp modified: compiler/lib/Conversion/HostToEmitC/HostToEmitC.cpp # modified: compiler/lib/Conversion/StablehloToScf/CMakeLists.txt # modified: compiler/lib/Conversion/StablehloToScf/StablehloToScf.cpp compiler/lib/Conversion/StablehloToTensorRT/CMakeLists.txt # modified: compiler/lib/Conversion/StablehloToTensorRT/Matchers.h # new file: compiler/lib/Conversion/StablehloToTensorRT/ReductionConversions.cpp # modified: compiler/lib/Conversion/StablehloToTensorRT/StablehloToTensorRT.cpp # modified: compiler/lib/Dialect/Plan/Transforms/AllocTensors.cpp # modified: compiler/lib/Dialect/Plan/Transforms/AssignMemorySpaces.cpp # modified: compiler/lib/Dialect/Plan/Transforms/CMakeLists.txt # modified: compiler/lib/Dialect/Plan/Transforms/CreateShapeFuncs.cpp compiler/lib/Dialect/Plan/Transforms/MaterializeExplicitTransfers.cpp compiler/lib/Dialect/Plan/Transforms/ModuleBufferization/BufferResultsToOutParams.cpp compiler/lib/Dialect/Plan/Transforms/ModuleBufferization/ModuleBufferizationAnalysis.cpp compiler/lib/Dialect/Plan/Transforms/OptimizeMemorySpaces.cpp # modified: compiler/lib/Dialect/Plan/Transforms/Passes.cpp # new file: compiler/lib/Dialect/Plan/Transforms/PromoteHostTensorsToHostPinned.cpp compiler/lib/Transforms/SCFDetensorizeLoops/SCFDetensorizeLoops.cpp # new file: compiler/test/Conversion/HostToEmitC/func-to-emitc.mlir # modified: compiler/test/Conversion/HostToEmitC/memref-to-emitc.mlir compiler/test/Conversion/StablehloToArith/stablehlo-constant-to-arith.mlir compiler/test/Conversion/StablehloToScf/stablehlo-to-scf.mlir # new file: compiler/test/Conversion/StablehloToTensorRT/dot-to-einsum.mlir # modified: compiler/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt-invalid.mlir compiler/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt-trt10.mlir compiler/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt.mlir file: compiler/test/Dialect/Plan/assign-and-optimize-memory-spaces.mlir # deleted: compiler/test/Dialect/Plan/assign-memory-spaces.mlir # new file: compiler/test/Dialect/Plan/buffer-results-to-out-params.mlir # new file: compiler/test/Dialect/Plan/materialize-explicit-transfers.mlir compiler/test/Dialect/Plan/materialize-shape-calculations-composite.mlir compiler/test/Dialect/Plan/materialize-shape-calculations.mlir # modified: compiler/test/Dialect/Plan/plan-bufferize-pipeline.mlir # new file: compiler/test/Dialect/Plan/promote-host-tensors-to-host-pinned.mlir # new file: compiler/test/Pipelines/StableHloInputPipeline/preprocessing-pipeline.mlir compiler/test/Target/Lua/IntegrationTests/ClusteringDynamicShape/end-to-end-binary.mlir compiler/test/Target/Lua/IntegrationTests/ClusteringDynamicShape/end-to-end-unary.mlir compiler/test/Target/Lua/IntegrationTests/buffer-ops-bf16.mlir # modified: compiler/test/Target/Lua/IntegrationTests/buffer-ops-dynamic.mlir # modified: compiler/test/Target/Lua/IntegrationTests/buffer-ops-f16.mlir # modified: compiler/test/Target/Lua/IntegrationTests/buffer-ops-f32.mlir # modified: compiler/test/Target/Lua/IntegrationTests/buffer-ops-f8E4M3FN.mlir # modified: compiler/test/Target/Lua/IntegrationTests/buffer-ops-i1.mlir # modified: compiler/test/Target/Lua/IntegrationTests/buffer-ops-i4.mlir # new file: compiler/test/Target/Lua/IntegrationTests/lit.local.cfg # modified: compiler/test/Target/Lua/IntegrationTests/memcpy-strided.mlir # modified: compiler/test/Target/Lua/IntegrationTests/memcpy.mlir # modified: compiler/test/Transforms/SCFDetensorizeLoops/scf-detensorize-loops.mlir compiler/test/python/IntegrationTests/Torch/test_torch_add.py # modified: compiler/test/python/IntegrationTests/lit.local.cfg # modified: compiler/test/python/IntegrationTests/test_call_validation.py # modified: compiler/test/python/IntegrationTests/test_non_dps_cconv.py # modified: compiler/test/python/IntegrationTests/test_return_allocation_loop.py # modified: compiler/test/python/IntegrationTests/test_stablehlo_add.py # modified: compiler/test/python/IntegrationTests/test_stablehlo_dynamic.py # modified: compiler/test/python/IntegrationTests/test_stablehlo_dynamic_iota.py # modified: compiler/test/python/IntegrationTests/test_tensorrt10_data_type_support.py compiler/test/python/IntegrationTests/test_tensorrt_add.py # modified: compiler/test/python/mlir_tensorrt_compiler/compiler_api/test_compiler_api.py compiler/test/python/mlir_tensorrt_compiler/compiler_api/test_compiler_debug_dump.py compiler/test/python/mlir_tensorrt_compiler/compiler_api/test_plugin_schema_api.py compiler/test/python/mlir_tensorrt_runtime/test_runtime_api.py # modified: compiler/test/python/mlir_tensorrt_runtime/test_runtime_debug_dump.py executor/cmake/ExecutorDependencies.cmake # modified: executor/include/mlir-executor-c/Runtime/Runtime.h # modified: executor/include/mlir-executor/Conversion/ConvertToExecutorCommon.h # modified: executor/include/mlir-executor/Executor/IR/ExecutorOps.td modified: executor/include/mlir-executor/Runtime/API/API.h # modified: executor/include/mlir-executor/Runtime/Backend/Lua/LuaExtensionRegistry.h executor/include/mlir-executor/Runtime/Backend/Lua/LuaRuntime.h # modified: executor/include/mlir-executor/Runtime/Backend/Utils/NvtxUtils.h # modified: executor/include/mlir-executor/Support/Status.h # modified: executor/lib/CAPI/Runtime/Runtime.cpp # modified: executor/lib/Executor/IR/Executor.cpp # modified: executor/lib/Executor/Transforms/Passes.cpp # modified: executor/lib/Runtime/API/API.cpp # modified: executor/lib/Runtime/Backend/Lua/LuaExtensionRegistry.cpp # modified: executor/lib/Runtime/Backend/Lua/LuaRuntime.cpp # modified: executor/lib/Target/Lua/TranslateToLua.cpp # modified: executor/lib/Target/Lua/TranslateToRuntimeExecutable.cpp # modified: executor/lib/Tools/ExecutorRunnerMain.cpp # modified: executor/lib/Utils/CMakeLists.txt # modified: executor/test/Executor/lower-builtins.mlir # modified: executor/test/IntegrationTests/arithmetic.mlir # modified: executor/test/IntegrationTests/assertion.mlir # modified: executor/test/IntegrationTests/complex.mlir # modified: executor/test/IntegrationTests/control-flow-nested.mlir # modified: executor/test/IntegrationTests/control-flow.mlir # modified: executor/test/IntegrationTests/coroutine.mlir # modified: executor/test/IntegrationTests/fill-device-f32.mlir # modified: executor/test/IntegrationTests/fill-f32.mlir # modified: executor/test/IntegrationTests/fill-i1.mlir # modified: executor/test/IntegrationTests/host-buffer-c32.mlir # modified: executor/test/IntegrationTests/host-buffer-i4.mlir # modified: executor/test/IntegrationTests/load-globals.mlir # modified: executor/test/IntegrationTests/pointer-cast-ops.mlir # new file: executor/test/IntegrationTests/ptr-to-int.mlir # modified: executor/test/IntegrationTests/stream.mlir # modified: executor/test/Unit/Runtime/LuaRuntime/ExecuteFunctionWithLuaBackendTests.cpp modified: integrations/python/bindings/Runtime/RuntimePyBind.cpp # modified: integrations/python/mlir_tensorrt_runtime/mlir_tensorrt/runtime/_mlir_libs/_api.pyi integrations/python/mlir_tensorrt_tools/mlir_tensorrt/tools/gpu_tools.py tensorrt/include/mlir-tensorrt-dialect/Target/TensorRTEncodingOpInterface/NetworkEncoder.h tensorrt/include/mlir-tensorrt-dialect/TensorRT/IR/TensorRTOps.td # modified: tensorrt/lib/Target/TensorRTEncodingOpInterface/NetworkEncoder.cpp # modified: tensorrt/test/lit.cfg.py # new file: third_party/torch-mlir-cmake/CMakeLists.txt # new file: third_party/torch-mlir-cmake/TorchMLIRModule.cpp #

shelkesagar29 requested a review from christopherbate as a code owner June 30, 2025 20:20

shelkesagar29 force-pushed the integrate_internal_changes branch 9 times, most recently from 915e6fd to 709391e Compare July 1, 2025 00:33

christopherbate reviewed Jul 1, 2025

View reviewed changes

mlir-tensorrt/DependencyProvider.cmake Outdated Show resolved Hide resolved

shelkesagar29 force-pushed the integrate_internal_changes branch 6 times, most recently from 836e79c to c41bdcf Compare July 2, 2025 19:16

christopherbate force-pushed the integrate_internal_changes branch 2 times, most recently from 0d56717 to 7891ce7 Compare July 10, 2025 03:08

christopherbate force-pushed the integrate_internal_changes branch from 7891ce7 to 470e744 Compare July 10, 2025 03:27

christopherbate merged commit 7d4fa5c into main Jul 10, 2025
1 check passed

christopherbate deleted the integrate_internal_changes branch July 10, 2025 03:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integrate internal changes #664

Integrate internal changes #664

Uh oh!

shelkesagar29 commented Jun 30, 2025 •

edited by christopherbate

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Integrate internal changes #664

Integrate internal changes #664

Uh oh!

Conversation

shelkesagar29 commented Jun 30, 2025 • edited by christopherbate Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shelkesagar29 commented Jun 30, 2025 •

edited by christopherbate

Loading