Migrate internal changes #687

christopherbate · 2025-08-02T21:10:48Z

No description provided.

shelkesagar29 · 2025-08-04T00:59:46Z

What are these Migrate Internal changes to OSS commits?

christopherbate · 2025-08-04T22:16:04Z

What are these Migrate Internal changes to OSS commits?

Redacted messages.

Previously, the Lua translation didn't correctly handle the case where an operation's results were emitted as both local and global (or outer-scope) variables. This change fixes the bug by declaring the locals separately before the assignment. GitOrigin-RevId: f49e2e6a136f2e65bae9ed8668ea29a7b0b1c5a2

…ule is destroyed. Adds a utility class that will release the blobs backing the builtin dialect's top-level resource attributes when a module is destroyed. This is then used in a downstream compiler (where we know this should be safe since we hold a mutex during the compilation function and we know there is no sharing of builtin ResourceElementsAttrs keys between compilation of different modules). GitOrigin-RevId: 7f308c7afdb139e69f1aba994888a04061b98b9d

Fixes end-to-end handling of `stablehlo.reverse`. The `stablehlo-legalize-to-linalg` pass performs reversal at the indexing map level, generating indexing maps like `(d0, d1) -> (d0, -d1 + [size-1])`. This is poorly supported in the Linalg dialect, so we rewrite it to use explicit `linalg.index/tensor.extract` in the body of the `linalg.generic` op. GitOrigin-RevId: 2a5404936c2980ce75b3ad9dff560090212f2a73

Adds missing support for executor.uitofp. GitOrigin-RevId: 4734d0bdf65575c9fb6c70f5a1d33e6df7ce6716

Nightly testing of debug mode revealed a couple issues: - The Lua parser will parse i64 min value integer literal as a float due to conservative verflow checking (or a bug)? As a workaround, print the very large values as a hex instead in the mlir-to-lua translation. Lua will always parse these as integers. - Fix the uitofp function for i4, which was not being properly handled. GitOrigin-RevId: 2f8658a4d3789bcedfbdbf52932782ed4163e833

…ing cast - Introduce `plan.transfer` op for tensor memory space encoding cast since using `plan.cast` for that purpose is problematic (upstream can have opinionated canonicalizers with regards to cast and we don't want memory space casts to change shape information). - Fix two bugs related to memory space changes and update tests. - Unify more options related to bufferization pipeline and expose them to the top-level pipelines. GitOrigin-RevId: 1c0d02d5a9aa94e6d57a6a556b7cabcc3ceede07

Adds umin/umax support (e.g. arith.umin/arith.umax). As well as additional integration tests. GitOrigin-RevId: 6285a0b78c160c4c72e9d77401dd9c4b10a0e265

…10.12 Fixes most deprecation warnings when building with TensorRT 10.12, which were related to use of `ILayer::setOutputType` in `NvInferAdaptor.h`. We can update this piece of code to use `ICastLayer` instead. `NvInferAdaptor.h` is not used in any load-bearing feature. GitOrigin-RevId: 082abe0e2ba61c0cd6ee91230a164a21ecf885f4

…ply conversion Fixes an issue where the `stablehlo.dot_general` -> `tensorrt.matrix_multiply` converter was not checking that the batching dimensions were contiguous sequence of leading dimensions. Adds additional tests to the to verify conversion works correctly under both `prefer-einsum=true` and `prefer-einsum=false` modes. GitOrigin-RevId: 9b3e2c4762a49893286134b1aa6ae2d8405287c1

Fixes mostly warnings about unused variables when compield with NDEBUG. GitOrigin-RevId: f83ce7aefd1ac6e29b9de3036348c3be90e0aaa0

…ehlo pipeline Adds a pass to unroll for loops with static trip count. Loops are fully unrolled if the "cost" is below the threshold specified by a pass option. For now the cost is simply given by the number of operations in the loop body multiplied by the trip count. GitOrigin-RevId: c125adfe10db13b8f41d912ae791ad8fa1028126

Dropping `convert-stablehlo-scalar-to-arith` pass because it is no longer used. We transitioned to using `stablehlo-legalize-to-linalg` pass several weeks ago. GitOrigin-RevId: 7a4c2435a96d2bf3cfeca5d8aedd46729f84a4cf

Previously, we relied on custom CMake for generating Python wheels. We assemble the python packages under `<build>/python_packages`. Afer the build runs, all Python files and required binary objects are present under that directory. Running a `ninja -C build mlir-tensorrt-compiler-wheel` command then just invokes `pip wheel <build>/python_packages/mlir_tensorrt_compiler` to generate the wheel file. While this process worked, it has one major deficiency which is that typically binaries which are meant to be distributed are generated by invoking a cmake "install" command (`cmake --install`). The CMake install copies binaries to their final installed location and may perform other post-processing steps such as stripping or changing various ELF metadata. In addition (and this is subject to opinion), Python users often expect that a package can be wholly built from source just be running the `pip install .` or `pip wheel .` command in the approriate directory. This is a "Python first" approach to building release packages, and actually it does somewhat simplify the process (from CI's point of view) in terms of building the package under a large number of different Python versions. To address these issues, I have upgraded the `mlir-tensorrt-compiler` package's `setup.py` to be able to just run the `pip wheel` command (or `uv build`), and the Python build script will take care of invoking CMake. It runs the CMake config, build, and install steps, then produces the wheel file from the install tree (as opposed to the build tree). Details are in the updated documentation. GitOrigin-RevId: eb3cfd1979d520899d4381c0138584dacfb11050

Migrate changes related to constant folding. GitOrigin-RevId: e67f84c4018b3f8e9c1efcfecd94169f655337bf

…tering - Fix bitwidth calculation to take into account complex, vector, and index types. - You can't add a `std::function` as a `PassOption` to a class. PassOption is only for simple POD types. Expose the filtering function just as a class member which can be set in the constructor. GitOrigin-RevId: e1da375bfaa719e47fd69e9b36e2aba69413343c

Adds an option to prefer einsum over 'tensorrt.matrix_multiply' for TensorRT conversion. Sets this default to true. GitOrigin-RevId: 03d2adb26d3f1d000465020c832968626c46e646

GitOrigin-RevId: c1b778bc2ccefc10d2d560b06565634129ed0d52

This commit addresses some issues related to the CUDA dialect and how it treated streams and devices: - The CUDA runtime has implicit state. Active device is tied to the active CUDA context. Context is changed by invoking `cudaSetDevice` and checked by invoking `cudaGetDevice`. CUDA streams are associated with a particular device -- the device that was active when the stream was created. Previously we encoded assumptions into the CUDA dialect about requiring a specific device, but there was additional poorly-thought-out aspects such as having a redundant `device` operand to the `cuda.alloc` operation (instead of just a `stream` parameter). Additionally, we specified that the operation which retrieves the current device was side-effect-free. This didn't matter too much because we didn't provide a `cuda.set_device` operation, but it was still a bad idea. - This change better refines the semantics of the CUDA dialect to reflect the CUDA runtime semantics. We now have a `cuda.get_active_device` and CUDA `cuda.set_active_device` operations. The `cuda.get_global_stream` also takes a device operand. In the lowering of CUDA to Executor or LLVM, we now explicitly check that the device is never changed in the program code if "global streams" are used. - In order to fully support multiple devices per execution context (non SPMD mode), then we need additional resource/stream assignment scheduling transformation passes. GitOrigin-RevId: 90ed6a4fbabbbd4274d6ea04a0a09432f676de1d

- Update Stablehlo dependency to include fixes for linalg conversions - Add support for `chlo.erf`, `chlo.erfc`, and `stablehlo.tan` to linalg pointwise conversion GitOrigin-RevId: 3272728d97166dfc1e0cdb265a155f10c5fbcbc4

- Our tests use CuPy as an external framework to verify the DLPack interoperability with MLIR TensorRT runtime. CuPy doesn't always support the latest CUDA releases, so separate the CuPy tests and make them optional.

…ce selection During test execution, we associate the test execution with a specific GPU device using CUDA_VISIBLE_DEVICES. We use NVML to query the available devices and choose the device with the lowest level of used memory. However, some systems (like Jetson) do not fully support NVML, so add a backup path that assumes 8GB of memory per device. GitOrigin-RevId: b965a4e1068305fa9b48e917ae7b4fdc4e00d652

Activation layer adaptor used in `emitc` pass didn't have default value for `alpha` and `beta`. However, tensorrt-to-emitc pass does conversion considering default values are present. This MR gives default value of `0.0f` to both `alpha` and `beta`. GitOrigin-RevId: 90350d5e7c88606b6bc75af936aa9bb7bb275495

…linkage. MLIR’s `declare_mlir_python_extension` provides `EMBED_CAPI_LINK_LIBS` to ensure symbols from CAPI libraries are exported from Python extension modules. Previously, some dialect and pass libraries were incorrectly linked via `PRIVATE_LINK_LIBS`, which does not export symbols. This caused multiple copies of the same dialect to be registered, resulting in errors like: ```bash LLVM ERROR: Trying to register different dialects for the same namespace ``` The fix is to move those dependencies to `EMBED_CAPI_LINK_LIBS`, ensuring consistent `TypeID` and preventing duplicate registrations across translation units. GitOrigin-RevId: b145a7b9b198f562e7d0873a3ea873624d78bced

This change reorders the `executor-allocs-to-globals` pass to run before `convert-memref-to-cuda` in the `stablehlo-to-executable` pipeline. This allows some memory optimizations to be applied that were previously not taking effect. Additionally, a flag is added to the pipeline to control whether this transformation is applied. GitOrigin-RevId: b1b7048a8fa15debb128a249202e3f4420932760

…sorRT 10.12+ This change makes the 'strongly-typed' translation mode the default for TensorRT 10.12 and above. "Weakly-typed" mode is deprecated in starting in TensorRT 10.12. GitOrigin-RevId: e01c49f7f62d781f44cafd8666f32a3c4d211140

We have encountered performance issues with offloading slices of block arguments to TensorRT. This change disables offloading for these cases. In addition, this change makes it more likely to encounter a warning related to input alignment requirements in the runtime. Drop this warning since it incorrect and no longer required by TensorRT. GitOrigin-RevId: a03cc8404f373fb9816a5fc223fc61904ce69907

If an elementwise operation or slice-update loperation is yielded from a loop and one of the arguments is a block argument, then it's very likely that the optimal solution is to bufferize that operation in-place while re-using one of the input buffers for the output buffer. If such an operation is offloaded to TensorRT, then such a bufferization would be impossible due to I/O aliasing constraints. Therefore, we should detect this situation and not offload such operations to TensorRT. GitOrigin-RevId: e6169d5ffa91576f578d40834b2154dd138e88c7

Previously, we were matching specific values which depended on UB in Executor tests. On different platforms (e.g. Jetson Thor), these test results can differ, so adjust the tests accordingly. GitOrigin-RevId: 3bb00e2d85f3d03890e3d290bc11e8f620dd48fe

…rithmetic tests GitOrigin-RevId: 8af0305d43b343b6c54480910e340f42eb0f33c6

…oolean shape tensor inputs TensorRT does not allow boolean shape tensors as inputs. A long-standing issues is that we currently don't have a good way of dealing with this constraint in our clustering algorithm. The crux of the issue is that we clustering for different backends sequentially instead of at the same time/jointly. This means that it's difficult to reason about the effect of simply perturbing the boundary of a cluster or inserting cast operations to change the boundary type -- we need a way to ensure that the inserted casts are offloaded to some backend and we want to avoid having to create such extra ops in the first place. This commit fixes the problem temporoarily by disallowing operations which likely have boolean shape tensor operands from being clustered to the TensorRT backend at all. Generally this is OK since such operations can almost always be offloaded to the host. GitOrigin-RevId: ecb492dd0674d9f73dac7c2999bc0a72019c4857

Incorporate a workaround for a bug in certain TensorRT versions on particular platforms. GitOrigin-RevId: 8f4ea5b89cbe38987d720a67d690299fe155428e

This adds support for the `cf.switch` operation in the MLIR-to-Lua translation, which is translated to a Lua if-elseif chain. GitOrigin-RevId: 1225871521490e79e196079545b0a9dfc51cdd12

GitOrigin-RevId: cce60f66cb978ba012c074aca0d652710262457a

christopherbate requested review from jhalakpatel, parthchadha, pranavm-nvidia, shelkesagar29 and yizhuoz004 as code owners August 2, 2025 21:10

christopherbate force-pushed the migrate-internal-changes-2 branch from b0893c0 to b4de263 Compare August 7, 2025 03:16

yaoyuannnn and others added 22 commits August 8, 2025 18:26

[executor] Add support for executor.uitofp

f500491

Adds missing support for executor.uitofp. GitOrigin-RevId: 4734d0bdf65575c9fb6c70f5a1d33e6df7ce6716

[executor] Add support for executor.umin/executor.umax

582a3bf

Adds umin/umax support (e.g. arith.umin/arith.umax). As well as additional integration tests. GitOrigin-RevId: 6285a0b78c160c4c72e9d77401dd9c4b10a0e265

NFC: fix various warnings under release build

46d7cbe

Fixes mostly warnings about unused variables when compield with NDEBUG. GitOrigin-RevId: f83ce7aefd1ac6e29b9de3036348c3be90e0aaa0

[compiler] Drop dead code for StablehloScalarToArith

7746265

Dropping `convert-stablehlo-scalar-to-arith` pass because it is no longer used. We transitioned to using `stablehlo-legalize-to-linalg` pass several weeks ago. GitOrigin-RevId: 7a4c2435a96d2bf3cfeca5d8aedd46729f84a4cf

[mlir-tensorrt] Migrate internal changes to OSS

4642398

Migrate changes related to constant folding. GitOrigin-RevId: e67f84c4018b3f8e9c1efcfecd94169f655337bf

[compiler] Expose 'tensorrt-prefer-einsum' option of TensorRT extension

c038465

Adds an option to prefer einsum over 'tensorrt.matrix_multiply' for TensorRT conversion. Sets this default to true. GitOrigin-RevId: 03d2adb26d3f1d000465020c832968626c46e646

NFC: Move 'Status' library to the mlir-tensorrt-common project

c514704

GitOrigin-RevId: c1b778bc2ccefc10d2d560b06565634129ed0d52

[mlir-tensorrt] Migrate internal changes to OSS

1370b31

- Our tests use CuPy as an external framework to verify the DLPack interoperability with MLIR TensorRT runtime. CuPy doesn't always support the latest CUDA releases, so separate the CuPy tests and make them optional.

yaoyuannnn and others added 12 commits August 8, 2025 18:26

[executor] NFC: account for platform-specific variation in executor a…

718f64a

…rithmetic tests GitOrigin-RevId: 8af0305d43b343b6c54480910e340f42eb0f33c6

[mlir-tensorrt] Migrate an internal change to OSS

b15595c

Incorporate a workaround for a bug in certain TensorRT versions on particular platforms. GitOrigin-RevId: 8f4ea5b89cbe38987d720a67d690299fe155428e

[executor][TranslateToLua]: Support cf.switch operation.

733dba1

This adds support for the `cf.switch` operation in the MLIR-to-Lua translation, which is translated to a Lua if-elseif chain. GitOrigin-RevId: 1225871521490e79e196079545b0a9dfc51cdd12

[tensorrt] NFC: Fix broken translation test

f0dcc48

GitOrigin-RevId: cce60f66cb978ba012c074aca0d652710262457a

[mlir-tensorrt] NFC: remove commit message checker in GitHub CI pipeline

ddaa342

christopherbate force-pushed the migrate-internal-changes-2 branch from 9060baf to ddaa342 Compare August 8, 2025 18:26

christopherbate merged commit fede517 into main Aug 8, 2025
1 check passed

christopherbate deleted the migrate-internal-changes-2 branch August 8, 2025 18:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate internal changes #687

Migrate internal changes #687

Uh oh!

christopherbate commented Aug 2, 2025

Uh oh!

shelkesagar29 commented Aug 4, 2025

Uh oh!

christopherbate commented Aug 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Migrate internal changes #687

Migrate internal changes #687

Uh oh!

Conversation

christopherbate commented Aug 2, 2025

Uh oh!

shelkesagar29 commented Aug 4, 2025

Uh oh!

christopherbate commented Aug 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants