Remove dps #1

jhalakpatel · 2024-08-30T01:19:33Z

No description provided.

This function greatly simplifies Tripy's Array implementation. We want to be able to handle memref creation from all types that implement the `__dlpack__()` interface rather than the limited set we currently support. This function should allow us to achieve this. The corresponding Tripy changes are NVIDIA#72.

Signed-off-by: Parth Chadha <[email protected]> Co-authored-by: Jhalak Patel <[email protected]>

Signed-off-by: yizhuoz004 <[email protected]>

Proper fix for mounting tripy directory in the contianer. --------- Signed-off-by: yizhuoz004 <[email protected]>

This PR formats `test_create_memref.py` using black formatter.

… possible (NVIDIA#92) For some cases, it is useful to know the length of a `tp.Shape` without executing the model. This PR adds a method `infer_len` that allows operators to specify how to statically infer the length of `Shape` outputs when possible (it is always optional). Test cases are added.

- Adds `Shape.__ne__()` method so that shapes can be conveniently checked for inequality. - Fixes logic error where `Shape.__eq__()` would ignore checking rank. This would lead to an error when comparing shape's with different ranks. - Adds shape inequality test cases for shapes with different and same ranks.

Signed-off-by: yizhuoz004 <[email protected]>

NVIDIA#131) This PR adds a GitHub workflow to create new docker image if changes are made in `build_tools/docker/Dockerfile`, `python/requirements-dev.txt` or `python/requirements.txt`.

Throw an exception when tensor input's dtype is `int64` and operation has undefined behavior for i64 tensors.

- Adds test cases for multiple negative reduce dims, e.g. ```python a = tp.ones((5,5,5)) tp.sum(a, dim=[-2,-1] ``` - Fixes the `_reduce_impl` to ensure negatives are sorted in decreasing order when performing `unsqueeze`

This change is a compilation of the following commits: -- [executor] Enable wider support for loads/stores of aggregates types Previously, we only supported load/store of aggregate types in a very limited scope. This change expands that support so that we support load/store of arbitrary aggregates. We introduce a pass `executor-decompose-aggregate-loads-and-stores` that specifically decomposes these loads/stores into load/stores of the individual elements. Since this decomposition results in the creation of `executor.offset` operations, we need to sequence the new pass correctly with the `executor-expand-ops` pass (which lowers `executor.alloca` and `executor.offset`) as well as with the lowering of executor operations into opaque `executor.call` ops. To make all this work correctly, this change also factors out the latter transformation into a dedicated pass `executor-lower-to-runtime-builtins`. -- NFC: fix some Python typing annotations -- [executor] add Executor and runtime support for `complex<f32>` and `complex<f64>` types This changes: - adds support to the Executor dialect for `complex<f32>` and `complex<f64>` types - adds support to the runtime API interface for the corresponding `c32` and `c64` types -- [executor] Properly serialize absent function signatures When a `FunctionMetadataAttr` attribute is not provided on a `func.func` during translation/serialization to the Executable format, we should pass a 0-offset to the signature field when creating a `rt::impl::Function` flatbuffer table. This was caught when working on complex32/complex64 support. An additional validation check is added immediately after finalizing the executable buffer in the `mlir::translateToRuntimeExecutable` function. -- [executor] Fix i4 multiplication runtime error and i4 tests Fixes an issue where Lua user type metatable information wasn't correctly set for the `nv_int4` type. The 'arithmetic.mlir' i4 tests also were not effectively testing the runtime i4 functions because the compiler was constant-folding most of the operations being tested. To fix the constant folding issue, we just need to pass some arguments to the test functions instead of inlining them into the test function bodies. -- Add bufferization integration test pipeline This change adds a simple test bufferization pipeline for the Executor project and uses that pipeline to construct new integration tests. The test verifies host i4 operations and makes a small bug fix. -- NFC: move some unit tests from top-level 'test' under the 'executor' sub-project Signed-off-by: Christopher Bate <[email protected]>

This MR adds CI workflow for MLIR-TensorRT project which runs PR created against main branch (except draft PR). This workflow performs format checking and runs LIT tests.

- Make the `normalized_shape` argument 1:1 with Torch. Previously, our API only supported taking a single integer for the normalization dimension. Now, a list of integers can be provided. - Add integration tests for `tp.LayerNorm` and `tp.GroupNorm`

…VIDIA#142)

…VIDIA#138)

Not successful figuring out how to commit to `gh-pages` branch, the contents are very different. Disabling the doc deployment job for now. Signed-off-by: yizhuoz004 <[email protected]>

NVIDIA#146) …actual computation in benchmarks

Since we are treating `tp.Shape` as a collection, it makes sense to iterate over them. This PR adds a very simple iterator implementation to `tp.Shape`.

…e in the contributing doc (NVIDIA#151) `docker run` by default will only pull an image if it is entirely missing, so the command given in `CONTRIBUTING.md` will not update the image if it had been pulled before. This change adds the `--pull always` setting in the example command so that it would check for an update before running.

- Revert "Add support for stream in Tripy and make execution async by default (NVIDIA#138)" This reverts commit f9fd477. - Removes an incorrect test skip. When an MLIR-TRT link was seen in a markdown file, we were skipping the entire link-checking test instead of just that one link. - Reworks doc styling, removes redundant doc testing - Updates documentation to use a new Sphinx theme which is more compact and stylistically consistent with other popular Python documentation. - Adds a new `manual` test cadence which will prevent tests from being run in automation. - Applies `manual` test cadence to some documentation testing, which is not required since we build documentation in L0. - Reenables multi-threading for documentation generation. - Miscellaneous changes in some guides. --------- Signed-off-by: pranavm-nvidia <[email protected]>

1) Create a standard for doc-strings dtypes 2) Automatically verify doc-strings' dtype - negative test any dtypes that are not supported 3) Integrate verification into test pipeline (L1 for now) 4) Add readme file to explain how to use verifier/decorator Side task: Add support for several dtypes within cast. --------- Signed-off-by: Mgluhovskoi <[email protected]> Co-authored-by: pranavm-nvidia <[email protected]> Co-authored-by: Parth Chadha <[email protected]>

- Updates container to include tooling to enable profiling our test suite. - Updates README with instructions on how to use profiling tooling.

- Updates `get_stack_info()` to no longer use `inspect` APIs, which are extremely slow, but instead work with the frames directly. - Updates `StackInfo` with a `fetch_source_code()` method which allows us to defer the fetching of source code (extremely slow due to file I/O) until the point where we actually require it, which is typically when we throw an exception. This greatly speeds up Tripy execution in general, including our tests: Before: ``` =================== 1691 passed, 54 skipped, 2549 deselected in 311.22s (0:05:11) =================== ``` After: ``` ===================== 1691 passed, 54 skipped, 2549 deselected in 64.83s (0:01:04) ===================== ```

google-cla · 2024-08-30T01:19:42Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

![Screenshot 2024-08-30 at 1 43 26 PM](https://github.com/user-attachments/assets/a71e4097-3f6c-449a-96c1-923ed0173aa6)

…VIDIA#165) This PR corrects a small bug with the `__eq__` implementation for `tp.Shape`: The comparison was checking the `len` of the `shape` field of the shape, but it should actually be checking the length of the `tp.Shape` itself. Note: The test case that was included in the unit tests worked "by accident" because the shape in it was length 1, which is broadcasted up to other shapes' lengths. Without this change, the test would fail if comparing two shapes of different lengths where neither is length 1.

yizhuoz004 and others added 30 commits August 16, 2024 11:00

Add tripy L0, L1 CI yml (NVIDIA#110)

93cc518

Add user guide on how to use the compiler (NVIDIA#114)

9e3b302

Reduce time to test introduction doc in L0 (NVIDIA#122)

7a489be

Fix broken dynamic iota implementation, add unit tests (NVIDIA#121)

2729a36

Signed-off-by: Parth Chadha <[email protected]> Co-authored-by: Jhalak Patel <[email protected]>

Fix L1 yml path issue (NVIDIA#123)

d6d6c26

Signed-off-by: yizhuoz004 <[email protected]>

Update L1 yml: Proper fix for mounting tripy directory (NVIDIA#124)

c0b6b49

Proper fix for mounting tripy directory in the contianer. --------- Signed-off-by: yizhuoz004 <[email protected]>

Add tensor.tolist() method (NVIDIA#125)

7c5bcbf

[NFC] Fix test_create_memref.py formatting. (NVIDIA#130)

b4491f7

This PR formats `test_create_memref.py` using black formatter.

Bump version to 0.1.32 (NVIDIA#137)

2c88af8

Signed-off-by: yizhuoz004 <[email protected]>

Add GitHub workflow to create docker image for mlir-tensorrt project (

2d983df

NVIDIA#131) This PR adds a GitHub workflow to create new docker image if changes are made in `build_tools/docker/Dockerfile`, `python/requirements-dev.txt` or `python/requirements.txt`.

Add int64 exceptions for operations (NVIDIA#128)

bc19b9b

Throw an exception when tensor input's dtype is `int64` and operation has undefined behavior for i64 tensors.

Fix reduce dims to work with multiple negative dims (NVIDIA#136)

977f653

- Adds test cases for multiple negative reduce dims, e.g. ```python a = tp.ones((5,5,5)) tp.sum(a, dim=[-2,-1] ``` - Fixes the `_reduce_impl` to ensure negatives are sorted in decreasing order when performing `unsqueeze`

Add MLIR-TensorRT CI workflow (NVIDIA#135)

f15fb83

This MR adds CI workflow for MLIR-TensorRT project which runs PR created against main branch (except draft PR). This workflow performs format checking and runs LIT tests.

Replace single assert with 3 asserts with appropriate error message (N…

6730b82

…VIDIA#142)

Update MLIR-TRT 0.1.32 (NVIDIA#143)

6e2abc4

Add support for stream in Tripy and make execution async by default (N…

f9fd477

…VIDIA#138)

Disable doc deployment in L1 (NVIDIA#145)

58a95f1

Not successful figuring out how to commit to `gh-pages` branch, the contents are very different. Disabling the doc deployment job for now. Signed-off-by: yizhuoz004 <[email protected]>

Skip fetching stack info for outputs; this was taking more time than … (

7f3d04b

NVIDIA#146) …actual computation in benchmarks

Fix docs building (NVIDIA#148)

39d15b4

Add tp.flatten (NVIDIA#150)

9cb5381

[Tripy] Add iterator implementation for tp.Shape (NVIDIA#149)

7c2e70d

Since we are treating `tp.Shape` as a collection, it makes sense to iterate over them. This PR adds a very simple iterator implementation to `tp.Shape`.

Replace Array with runtime.MemRefValue (NVIDIA#95)

b12ee49

Add packages.html, enable doc publishing in L1

755633e

Mgluhovskoi and others added 3 commits August 27, 2024 20:37

Enables pytest profiling

720a5f6

- Updates container to include tooling to enable profiling our test suite. - Updates README with instructions on how to use profiling tooling.

github-actions bot added tripy mlir-tensorrt labels Aug 30, 2024

jhalakpatel force-pushed the remove-dps branch from a868afd to 969510c Compare August 30, 2024 07:10

Mgluhovskoi and others added 2 commits August 30, 2024 13:48

Add verification to flatten op (NVIDIA#163)

7ab1144

![Screenshot 2024-08-30 at 1 43 26 PM](https://github.com/user-attachments/assets/a71e4097-3f6c-449a-96c1-923ed0173aa6)

Add constant deduplication pass in flat_ir (NVIDIA#157)

ed92919

jhalakpatel force-pushed the remove-dps branch 2 times, most recently from f62962c to 44df0ce Compare August 30, 2024 20:14

slyubomirsky and others added 2 commits August 30, 2024 16:49

Change to remove DPS style calling convention in plan dialect

a9843e6

jhalakpatel force-pushed the remove-dps branch from 44df0ce to a9843e6 Compare September 1, 2024 01:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove dps #1

Remove dps #1

Uh oh!

jhalakpatel commented Aug 30, 2024

Uh oh!

google-cla bot commented Aug 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Remove dps #1

Are you sure you want to change the base?

Remove dps #1

Uh oh!

Conversation

jhalakpatel commented Aug 30, 2024

Uh oh!

google-cla bot commented Aug 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants