Skip to content

Conversation

@christopherbate
Copy link
Collaborator

No description provided.

@shelkesagar29
Copy link
Collaborator

What are these Migrate Internal changes to OSS commits?

@christopherbate
Copy link
Collaborator Author

What are these Migrate Internal changes to OSS commits?

Redacted messages.

@christopherbate christopherbate force-pushed the migrate-internal-changes-2 branch from b0893c0 to b4de263 Compare August 7, 2025 03:16
yaoyuannnn and others added 22 commits August 8, 2025 18:26
Previously, the Lua translation didn't correctly handle the case where
an operation's results were emitted as both local and global (or
outer-scope) variables.  This change fixes the bug by declaring the
locals separately before the assignment.

GitOrigin-RevId: f49e2e6a136f2e65bae9ed8668ea29a7b0b1c5a2
…ule is destroyed.

Adds a utility class that will release the blobs backing the builtin
dialect's top-level resource attributes when a module is destroyed. This
is then used in a downstream compiler (where we know this should be safe
since we hold a mutex during the compilation function and we know there
is no sharing of builtin ResourceElementsAttrs keys between compilation
of different modules).

GitOrigin-RevId: 7f308c7afdb139e69f1aba994888a04061b98b9d
Fixes end-to-end handling of `stablehlo.reverse`. The `stablehlo-legalize-to-linalg`
pass performs reversal at the indexing map level, generating indexing
maps like `(d0, d1) -> (d0, -d1 + [size-1])`. This is poorly supported
in the Linalg dialect, so we rewrite it to use explicit
`linalg.index/tensor.extract` in the body of the `linalg.generic` op.

GitOrigin-RevId: 2a5404936c2980ce75b3ad9dff560090212f2a73
Adds missing support for executor.uitofp.

GitOrigin-RevId: 4734d0bdf65575c9fb6c70f5a1d33e6df7ce6716
Nightly testing of debug mode revealed a couple issues:

- The Lua parser will parse i64 min value integer literal as a float
  due to conservative verflow checking (or a bug)? As a workaround,
  print the very large values as a hex instead in the
  mlir-to-lua translation. Lua will always parse these as integers.

- Fix the uitofp function for i4, which was not being properly handled.

GitOrigin-RevId: 2f8658a4d3789bcedfbdbf52932782ed4163e833
…ing cast

- Introduce `plan.transfer` op for tensor memory space encoding cast
  since using `plan.cast` for that purpose is problematic (upstream
  can have opinionated canonicalizers with regards to cast and we don't
  want memory space casts to change shape information).
- Fix two bugs related to memory space changes and update tests.
- Unify more options related to bufferization pipeline and
  expose them to the top-level pipelines.

GitOrigin-RevId: 1c0d02d5a9aa94e6d57a6a556b7cabcc3ceede07
Adds umin/umax support (e.g. arith.umin/arith.umax). As well as
additional integration tests.

GitOrigin-RevId: 6285a0b78c160c4c72e9d77401dd9c4b10a0e265
…10.12

Fixes most deprecation warnings when building with TensorRT 10.12,
which were related to use of `ILayer::setOutputType` in `NvInferAdaptor.h`.
We can update this piece of code to use `ICastLayer` instead. `NvInferAdaptor.h`
is not used in any load-bearing feature.

GitOrigin-RevId: 082abe0e2ba61c0cd6ee91230a164a21ecf885f4
…ply conversion

Fixes an issue where the `stablehlo.dot_general` -> `tensorrt.matrix_multiply`
converter was not checking that the batching dimensions were contiguous
sequence of leading dimensions.

Adds additional tests to the to verify conversion works correctly under
both `prefer-einsum=true` and `prefer-einsum=false` modes.

GitOrigin-RevId: 9b3e2c4762a49893286134b1aa6ae2d8405287c1
Fixes mostly warnings about unused variables when compield with NDEBUG.

GitOrigin-RevId: f83ce7aefd1ac6e29b9de3036348c3be90e0aaa0
…ehlo pipeline

Adds a pass to unroll for loops with static trip count. Loops are fully
unrolled if the "cost" is below the threshold specified by a pass option.
For now the cost is simply given by the number of operations in the loop
body multiplied by the trip count.

GitOrigin-RevId: c125adfe10db13b8f41d912ae791ad8fa1028126
Dropping `convert-stablehlo-scalar-to-arith` pass because it is no longer
used. We transitioned to using `stablehlo-legalize-to-linalg` pass
several weeks ago.

GitOrigin-RevId: 7a4c2435a96d2bf3cfeca5d8aedd46729f84a4cf
Previously, we relied on custom CMake for generating Python wheels. We
assemble the python packages under `<build>/python_packages`. Afer the
build runs, all Python files and required binary objects are present
under that directory. Running a `ninja -C build
mlir-tensorrt-compiler-wheel` command then just invokes `pip wheel
<build>/python_packages/mlir_tensorrt_compiler` to generate the wheel
file.

While this process worked, it has one major deficiency which is that
typically binaries which are meant to be distributed are generated
by invoking a cmake "install" command (`cmake --install`). The CMake
install copies binaries to their final installed location and may perform
other post-processing steps such as stripping or changing various
ELF metadata.

In addition (and this is subject to opinion), Python users often expect
that a package can be wholly built from source just be running the
`pip install .` or `pip wheel .` command in the approriate directory.
This is a "Python first" approach to building release packages, and
actually it does somewhat simplify the process (from CI's point of view)
in terms of building the package under a large number of different
Python versions.

To address these issues, I have upgraded the `mlir-tensorrt-compiler`
package's `setup.py` to be able to just run the `pip wheel` command
(or `uv build`), and the Python build script will take care of invoking
CMake. It runs the CMake config, build, and install steps, then produces
the wheel file from the install tree (as opposed to the build tree).

Details are in the updated documentation.

GitOrigin-RevId: eb3cfd1979d520899d4381c0138584dacfb11050
Migrate changes related to constant folding.

GitOrigin-RevId: e67f84c4018b3f8e9c1efcfecd94169f655337bf
…tering

- Fix bitwidth calculation to take into account complex, vector, and index types.
- You can't add a `std::function` as a `PassOption` to a class. PassOption
  is only for simple POD types. Expose the filtering function just as a
  class member which can be set in the constructor.

GitOrigin-RevId: e1da375bfaa719e47fd69e9b36e2aba69413343c
Adds an option to prefer einsum over 'tensorrt.matrix_multiply'
for TensorRT conversion. Sets this default to true.

GitOrigin-RevId: 03d2adb26d3f1d000465020c832968626c46e646
GitOrigin-RevId: c1b778bc2ccefc10d2d560b06565634129ed0d52
This commit addresses some issues related to the CUDA dialect and how
it treated streams and devices:

- The CUDA runtime has implicit state. Active device is tied to the
  active CUDA context. Context is changed by invoking `cudaSetDevice`
  and checked by invoking `cudaGetDevice`. CUDA streams are associated
  with a particular device -- the device that was active when
  the stream was created.  Previously we encoded assumptions into the
  CUDA dialect about requiring a specific device, but there was
  additional poorly-thought-out aspects such as having a redundant
  `device` operand to the `cuda.alloc` operation (instead of just
  a `stream` parameter). Additionally, we specified that the operation
  which retrieves the current device was side-effect-free. This didn't
  matter too much because we didn't provide a `cuda.set_device`
  operation, but it was still a bad idea.

- This change better refines the semantics of the CUDA dialect to
  reflect the CUDA runtime semantics. We now have a `cuda.get_active_device`
  and CUDA `cuda.set_active_device` operations. The `cuda.get_global_stream`
  also takes a device operand. In the lowering of CUDA to Executor or
  LLVM, we now explicitly check that the device is never changed in the
  program code if "global streams" are used.

- In order to fully support multiple devices per execution context
  (non SPMD mode), then we need additional resource/stream assignment
  scheduling transformation passes.

GitOrigin-RevId: 90ed6a4fbabbbd4274d6ea04a0a09432f676de1d
- Update Stablehlo dependency to include fixes for linalg conversions
- Add support for `chlo.erf`, `chlo.erfc`, and `stablehlo.tan`
  to linalg pointwise conversion

GitOrigin-RevId: 3272728d97166dfc1e0cdb265a155f10c5fbcbc4
- Our tests use CuPy as an external framework to verify the DLPack
  interoperability with MLIR TensorRT runtime. CuPy doesn't always
  support the latest CUDA releases, so separate the CuPy tests and make
  them optional.
…ce selection

During test execution, we associate the test execution with a specific GPU device
using CUDA_VISIBLE_DEVICES. We use NVML to query the available devices
and choose the device with the lowest level of used memory. However,
some systems (like Jetson) do not fully support NVML, so add a backup
path that assumes 8GB of memory per device.

GitOrigin-RevId: b965a4e1068305fa9b48e917ae7b4fdc4e00d652
Activation layer adaptor used in `emitc` pass didn't have default value for
`alpha` and `beta`. However, tensorrt-to-emitc pass does conversion considering
default values are present. This MR gives default value of `0.0f` to both
`alpha` and `beta`.

GitOrigin-RevId: 90350d5e7c88606b6bc75af936aa9bb7bb275495
yaoyuannnn and others added 12 commits August 8, 2025 18:26
…linkage.

MLIR’s `declare_mlir_python_extension` provides `EMBED_CAPI_LINK_LIBS`
to ensure symbols from CAPI libraries are exported from Python extension
modules. Previously, some dialect and pass libraries were incorrectly
linked via `PRIVATE_LINK_LIBS`, which does not export symbols. This
caused multiple copies of the same dialect to be registered, resulting
in errors like:

```bash
LLVM ERROR: Trying to register different dialects for the same namespace
```

The fix is to move those dependencies to `EMBED_CAPI_LINK_LIBS`,
ensuring consistent `TypeID` and preventing duplicate registrations
across translation units.

GitOrigin-RevId: b145a7b9b198f562e7d0873a3ea873624d78bced
This change reorders the `executor-allocs-to-globals` pass to run before
`convert-memref-to-cuda` in the `stablehlo-to-executable` pipeline.

This allows some memory optimizations to be applied that were previously
not taking effect.

Additionally, a flag is added to the pipeline to control whether this
transformation is applied.

GitOrigin-RevId: b1b7048a8fa15debb128a249202e3f4420932760
…sorRT 10.12+

This change makes the 'strongly-typed' translation mode the default for
TensorRT 10.12 and above. "Weakly-typed" mode is deprecated in starting
in TensorRT 10.12.

GitOrigin-RevId: e01c49f7f62d781f44cafd8666f32a3c4d211140
We have encountered performance issues with offloading slices of block
arguments to TensorRT. This change disables offloading for these cases.

In addition, this change makes it more likely to encounter a warning
related to input alignment requirements in the runtime. Drop this
warning since it incorrect and no longer required by TensorRT.

GitOrigin-RevId: a03cc8404f373fb9816a5fc223fc61904ce69907
If an elementwise operation or slice-update loperation is yielded from
a loop and one of the arguments is a block argument, then it's very
likely that the optimal solution is to bufferize that operation in-place
while re-using one of the input buffers for the output buffer. If such
an operation is offloaded to TensorRT, then such a bufferization would
be impossible due to I/O aliasing constraints. Therefore, we should
detect this situation and not offload such operations to TensorRT.

GitOrigin-RevId: e6169d5ffa91576f578d40834b2154dd138e88c7
Previously, we were matching specific values which depended on UB
in Executor tests. On different platforms (e.g. Jetson Thor), these
test results can differ, so adjust the tests accordingly.

GitOrigin-RevId: 3bb00e2d85f3d03890e3d290bc11e8f620dd48fe
…rithmetic tests

GitOrigin-RevId: 8af0305d43b343b6c54480910e340f42eb0f33c6
…oolean shape tensor inputs

TensorRT does not allow boolean shape tensors as inputs. A long-standing
issues is that we currently don't have a good way of dealing with this
constraint in our clustering algorithm.

The crux of the issue is that we clustering for different backends sequentially
instead of at the same time/jointly. This means that it's difficult to reason
about the effect of simply perturbing the boundary of a cluster or inserting
cast operations to change the boundary type -- we need a way to ensure that the
inserted casts are offloaded to some backend and we want to avoid having to
create such extra ops in the first place.

This commit fixes the problem temporoarily by disallowing operations which
likely have boolean shape tensor operands from being clustered to the
TensorRT backend at all. Generally this is OK since such operations
can almost always be offloaded to the host.

GitOrigin-RevId: ecb492dd0674d9f73dac7c2999bc0a72019c4857
Incorporate a workaround for a bug in certain TensorRT versions on
particular platforms.

GitOrigin-RevId: 8f4ea5b89cbe38987d720a67d690299fe155428e
This adds support for the `cf.switch` operation in the MLIR-to-Lua
translation, which is translated to a Lua if-elseif chain.

GitOrigin-RevId: 1225871521490e79e196079545b0a9dfc51cdd12
GitOrigin-RevId: cce60f66cb978ba012c074aca0d652710262457a
@christopherbate christopherbate force-pushed the migrate-internal-changes-2 branch from 9060baf to ddaa342 Compare August 8, 2025 18:26
@christopherbate christopherbate merged commit fede517 into main Aug 8, 2025
1 check passed
@christopherbate christopherbate deleted the migrate-internal-changes-2 branch August 8, 2025 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants