-
Notifications
You must be signed in to change notification settings - Fork 19
Migrate internal changes #687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Collaborator
|
What are these |
Collaborator
Author
Redacted messages. |
b0893c0 to
b4de263
Compare
Previously, the Lua translation didn't correctly handle the case where an operation's results were emitted as both local and global (or outer-scope) variables. This change fixes the bug by declaring the locals separately before the assignment. GitOrigin-RevId: f49e2e6a136f2e65bae9ed8668ea29a7b0b1c5a2
…ule is destroyed. Adds a utility class that will release the blobs backing the builtin dialect's top-level resource attributes when a module is destroyed. This is then used in a downstream compiler (where we know this should be safe since we hold a mutex during the compilation function and we know there is no sharing of builtin ResourceElementsAttrs keys between compilation of different modules). GitOrigin-RevId: 7f308c7afdb139e69f1aba994888a04061b98b9d
Fixes end-to-end handling of `stablehlo.reverse`. The `stablehlo-legalize-to-linalg` pass performs reversal at the indexing map level, generating indexing maps like `(d0, d1) -> (d0, -d1 + [size-1])`. This is poorly supported in the Linalg dialect, so we rewrite it to use explicit `linalg.index/tensor.extract` in the body of the `linalg.generic` op. GitOrigin-RevId: 2a5404936c2980ce75b3ad9dff560090212f2a73
Adds missing support for executor.uitofp. GitOrigin-RevId: 4734d0bdf65575c9fb6c70f5a1d33e6df7ce6716
Nightly testing of debug mode revealed a couple issues: - The Lua parser will parse i64 min value integer literal as a float due to conservative verflow checking (or a bug)? As a workaround, print the very large values as a hex instead in the mlir-to-lua translation. Lua will always parse these as integers. - Fix the uitofp function for i4, which was not being properly handled. GitOrigin-RevId: 2f8658a4d3789bcedfbdbf52932782ed4163e833
…ing cast - Introduce `plan.transfer` op for tensor memory space encoding cast since using `plan.cast` for that purpose is problematic (upstream can have opinionated canonicalizers with regards to cast and we don't want memory space casts to change shape information). - Fix two bugs related to memory space changes and update tests. - Unify more options related to bufferization pipeline and expose them to the top-level pipelines. GitOrigin-RevId: 1c0d02d5a9aa94e6d57a6a556b7cabcc3ceede07
Adds umin/umax support (e.g. arith.umin/arith.umax). As well as additional integration tests. GitOrigin-RevId: 6285a0b78c160c4c72e9d77401dd9c4b10a0e265
…10.12 Fixes most deprecation warnings when building with TensorRT 10.12, which were related to use of `ILayer::setOutputType` in `NvInferAdaptor.h`. We can update this piece of code to use `ICastLayer` instead. `NvInferAdaptor.h` is not used in any load-bearing feature. GitOrigin-RevId: 082abe0e2ba61c0cd6ee91230a164a21ecf885f4
…ply conversion Fixes an issue where the `stablehlo.dot_general` -> `tensorrt.matrix_multiply` converter was not checking that the batching dimensions were contiguous sequence of leading dimensions. Adds additional tests to the to verify conversion works correctly under both `prefer-einsum=true` and `prefer-einsum=false` modes. GitOrigin-RevId: 9b3e2c4762a49893286134b1aa6ae2d8405287c1
Fixes mostly warnings about unused variables when compield with NDEBUG. GitOrigin-RevId: f83ce7aefd1ac6e29b9de3036348c3be90e0aaa0
…ehlo pipeline Adds a pass to unroll for loops with static trip count. Loops are fully unrolled if the "cost" is below the threshold specified by a pass option. For now the cost is simply given by the number of operations in the loop body multiplied by the trip count. GitOrigin-RevId: c125adfe10db13b8f41d912ae791ad8fa1028126
Dropping `convert-stablehlo-scalar-to-arith` pass because it is no longer used. We transitioned to using `stablehlo-legalize-to-linalg` pass several weeks ago. GitOrigin-RevId: 7a4c2435a96d2bf3cfeca5d8aedd46729f84a4cf
Previously, we relied on custom CMake for generating Python wheels. We assemble the python packages under `<build>/python_packages`. Afer the build runs, all Python files and required binary objects are present under that directory. Running a `ninja -C build mlir-tensorrt-compiler-wheel` command then just invokes `pip wheel <build>/python_packages/mlir_tensorrt_compiler` to generate the wheel file. While this process worked, it has one major deficiency which is that typically binaries which are meant to be distributed are generated by invoking a cmake "install" command (`cmake --install`). The CMake install copies binaries to their final installed location and may perform other post-processing steps such as stripping or changing various ELF metadata. In addition (and this is subject to opinion), Python users often expect that a package can be wholly built from source just be running the `pip install .` or `pip wheel .` command in the approriate directory. This is a "Python first" approach to building release packages, and actually it does somewhat simplify the process (from CI's point of view) in terms of building the package under a large number of different Python versions. To address these issues, I have upgraded the `mlir-tensorrt-compiler` package's `setup.py` to be able to just run the `pip wheel` command (or `uv build`), and the Python build script will take care of invoking CMake. It runs the CMake config, build, and install steps, then produces the wheel file from the install tree (as opposed to the build tree). Details are in the updated documentation. GitOrigin-RevId: eb3cfd1979d520899d4381c0138584dacfb11050
Migrate changes related to constant folding. GitOrigin-RevId: e67f84c4018b3f8e9c1efcfecd94169f655337bf
…tering - Fix bitwidth calculation to take into account complex, vector, and index types. - You can't add a `std::function` as a `PassOption` to a class. PassOption is only for simple POD types. Expose the filtering function just as a class member which can be set in the constructor. GitOrigin-RevId: e1da375bfaa719e47fd69e9b36e2aba69413343c
Adds an option to prefer einsum over 'tensorrt.matrix_multiply' for TensorRT conversion. Sets this default to true. GitOrigin-RevId: 03d2adb26d3f1d000465020c832968626c46e646
GitOrigin-RevId: c1b778bc2ccefc10d2d560b06565634129ed0d52
This commit addresses some issues related to the CUDA dialect and how it treated streams and devices: - The CUDA runtime has implicit state. Active device is tied to the active CUDA context. Context is changed by invoking `cudaSetDevice` and checked by invoking `cudaGetDevice`. CUDA streams are associated with a particular device -- the device that was active when the stream was created. Previously we encoded assumptions into the CUDA dialect about requiring a specific device, but there was additional poorly-thought-out aspects such as having a redundant `device` operand to the `cuda.alloc` operation (instead of just a `stream` parameter). Additionally, we specified that the operation which retrieves the current device was side-effect-free. This didn't matter too much because we didn't provide a `cuda.set_device` operation, but it was still a bad idea. - This change better refines the semantics of the CUDA dialect to reflect the CUDA runtime semantics. We now have a `cuda.get_active_device` and CUDA `cuda.set_active_device` operations. The `cuda.get_global_stream` also takes a device operand. In the lowering of CUDA to Executor or LLVM, we now explicitly check that the device is never changed in the program code if "global streams" are used. - In order to fully support multiple devices per execution context (non SPMD mode), then we need additional resource/stream assignment scheduling transformation passes. GitOrigin-RevId: 90ed6a4fbabbbd4274d6ea04a0a09432f676de1d
- Update Stablehlo dependency to include fixes for linalg conversions - Add support for `chlo.erf`, `chlo.erfc`, and `stablehlo.tan` to linalg pointwise conversion GitOrigin-RevId: 3272728d97166dfc1e0cdb265a155f10c5fbcbc4
- Our tests use CuPy as an external framework to verify the DLPack interoperability with MLIR TensorRT runtime. CuPy doesn't always support the latest CUDA releases, so separate the CuPy tests and make them optional.
…ce selection During test execution, we associate the test execution with a specific GPU device using CUDA_VISIBLE_DEVICES. We use NVML to query the available devices and choose the device with the lowest level of used memory. However, some systems (like Jetson) do not fully support NVML, so add a backup path that assumes 8GB of memory per device. GitOrigin-RevId: b965a4e1068305fa9b48e917ae7b4fdc4e00d652
Activation layer adaptor used in `emitc` pass didn't have default value for `alpha` and `beta`. However, tensorrt-to-emitc pass does conversion considering default values are present. This MR gives default value of `0.0f` to both `alpha` and `beta`. GitOrigin-RevId: 90350d5e7c88606b6bc75af936aa9bb7bb275495
…linkage. MLIR’s `declare_mlir_python_extension` provides `EMBED_CAPI_LINK_LIBS` to ensure symbols from CAPI libraries are exported from Python extension modules. Previously, some dialect and pass libraries were incorrectly linked via `PRIVATE_LINK_LIBS`, which does not export symbols. This caused multiple copies of the same dialect to be registered, resulting in errors like: ```bash LLVM ERROR: Trying to register different dialects for the same namespace ``` The fix is to move those dependencies to `EMBED_CAPI_LINK_LIBS`, ensuring consistent `TypeID` and preventing duplicate registrations across translation units. GitOrigin-RevId: b145a7b9b198f562e7d0873a3ea873624d78bced
This change reorders the `executor-allocs-to-globals` pass to run before `convert-memref-to-cuda` in the `stablehlo-to-executable` pipeline. This allows some memory optimizations to be applied that were previously not taking effect. Additionally, a flag is added to the pipeline to control whether this transformation is applied. GitOrigin-RevId: b1b7048a8fa15debb128a249202e3f4420932760
…sorRT 10.12+ This change makes the 'strongly-typed' translation mode the default for TensorRT 10.12 and above. "Weakly-typed" mode is deprecated in starting in TensorRT 10.12. GitOrigin-RevId: e01c49f7f62d781f44cafd8666f32a3c4d211140
We have encountered performance issues with offloading slices of block arguments to TensorRT. This change disables offloading for these cases. In addition, this change makes it more likely to encounter a warning related to input alignment requirements in the runtime. Drop this warning since it incorrect and no longer required by TensorRT. GitOrigin-RevId: a03cc8404f373fb9816a5fc223fc61904ce69907
If an elementwise operation or slice-update loperation is yielded from a loop and one of the arguments is a block argument, then it's very likely that the optimal solution is to bufferize that operation in-place while re-using one of the input buffers for the output buffer. If such an operation is offloaded to TensorRT, then such a bufferization would be impossible due to I/O aliasing constraints. Therefore, we should detect this situation and not offload such operations to TensorRT. GitOrigin-RevId: e6169d5ffa91576f578d40834b2154dd138e88c7
Previously, we were matching specific values which depended on UB in Executor tests. On different platforms (e.g. Jetson Thor), these test results can differ, so adjust the tests accordingly. GitOrigin-RevId: 3bb00e2d85f3d03890e3d290bc11e8f620dd48fe
…rithmetic tests GitOrigin-RevId: 8af0305d43b343b6c54480910e340f42eb0f33c6
…oolean shape tensor inputs TensorRT does not allow boolean shape tensors as inputs. A long-standing issues is that we currently don't have a good way of dealing with this constraint in our clustering algorithm. The crux of the issue is that we clustering for different backends sequentially instead of at the same time/jointly. This means that it's difficult to reason about the effect of simply perturbing the boundary of a cluster or inserting cast operations to change the boundary type -- we need a way to ensure that the inserted casts are offloaded to some backend and we want to avoid having to create such extra ops in the first place. This commit fixes the problem temporoarily by disallowing operations which likely have boolean shape tensor operands from being clustered to the TensorRT backend at all. Generally this is OK since such operations can almost always be offloaded to the host. GitOrigin-RevId: ecb492dd0674d9f73dac7c2999bc0a72019c4857
Incorporate a workaround for a bug in certain TensorRT versions on particular platforms. GitOrigin-RevId: 8f4ea5b89cbe38987d720a67d690299fe155428e
This adds support for the `cf.switch` operation in the MLIR-to-Lua translation, which is translated to a Lua if-elseif chain. GitOrigin-RevId: 1225871521490e79e196079545b0a9dfc51cdd12
GitOrigin-RevId: cce60f66cb978ba012c074aca0d652710262457a
9060baf to
ddaa342
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.