forked from NVIDIA/TensorRT-Incubator
-
Notifications
You must be signed in to change notification settings - Fork 0
Remove dps #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jhalakpatel
wants to merge
37
commits into
main
Choose a base branch
from
remove-dps
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Remove dps #1
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This function greatly simplifies Tripy's Array implementation. We want to be able to handle memref creation from all types that implement the `__dlpack__()` interface rather than the limited set we currently support. This function should allow us to achieve this. The corresponding Tripy changes are NVIDIA#72.
Signed-off-by: Parth Chadha <[email protected]> Co-authored-by: Jhalak Patel <[email protected]>
Signed-off-by: yizhuoz004 <[email protected]>
Proper fix for mounting tripy directory in the contianer. --------- Signed-off-by: yizhuoz004 <[email protected]>
This PR formats `test_create_memref.py` using black formatter.
… possible (NVIDIA#92) For some cases, it is useful to know the length of a `tp.Shape` without executing the model. This PR adds a method `infer_len` that allows operators to specify how to statically infer the length of `Shape` outputs when possible (it is always optional). Test cases are added.
- Adds `Shape.__ne__()` method so that shapes can be conveniently checked for inequality. - Fixes logic error where `Shape.__eq__()` would ignore checking rank. This would lead to an error when comparing shape's with different ranks. - Adds shape inequality test cases for shapes with different and same ranks.
Signed-off-by: yizhuoz004 <[email protected]>
NVIDIA#131) This PR adds a GitHub workflow to create new docker image if changes are made in `build_tools/docker/Dockerfile`, `python/requirements-dev.txt` or `python/requirements.txt`.
Throw an exception when tensor input's dtype is `int64` and operation has undefined behavior for i64 tensors.
- Adds test cases for multiple negative reduce dims, e.g. ```python a = tp.ones((5,5,5)) tp.sum(a, dim=[-2,-1] ``` - Fixes the `_reduce_impl` to ensure negatives are sorted in decreasing order when performing `unsqueeze`
This change is a compilation of the following commits: -- [executor] Enable wider support for loads/stores of aggregates types Previously, we only supported load/store of aggregate types in a very limited scope. This change expands that support so that we support load/store of arbitrary aggregates. We introduce a pass `executor-decompose-aggregate-loads-and-stores` that specifically decomposes these loads/stores into load/stores of the individual elements. Since this decomposition results in the creation of `executor.offset` operations, we need to sequence the new pass correctly with the `executor-expand-ops` pass (which lowers `executor.alloca` and `executor.offset`) as well as with the lowering of executor operations into opaque `executor.call` ops. To make all this work correctly, this change also factors out the latter transformation into a dedicated pass `executor-lower-to-runtime-builtins`. -- NFC: fix some Python typing annotations -- [executor] add Executor and runtime support for `complex<f32>` and `complex<f64>` types This changes: - adds support to the Executor dialect for `complex<f32>` and `complex<f64>` types - adds support to the runtime API interface for the corresponding `c32` and `c64` types -- [executor] Properly serialize absent function signatures When a `FunctionMetadataAttr` attribute is not provided on a `func.func` during translation/serialization to the Executable format, we should pass a 0-offset to the signature field when creating a `rt::impl::Function` flatbuffer table. This was caught when working on complex32/complex64 support. An additional validation check is added immediately after finalizing the executable buffer in the `mlir::translateToRuntimeExecutable` function. -- [executor] Fix i4 multiplication runtime error and i4 tests Fixes an issue where Lua user type metatable information wasn't correctly set for the `nv_int4` type. The 'arithmetic.mlir' i4 tests also were not effectively testing the runtime i4 functions because the compiler was constant-folding most of the operations being tested. To fix the constant folding issue, we just need to pass some arguments to the test functions instead of inlining them into the test function bodies. -- Add bufferization integration test pipeline This change adds a simple test bufferization pipeline for the Executor project and uses that pipeline to construct new integration tests. The test verifies host i4 operations and makes a small bug fix. -- NFC: move some unit tests from top-level 'test' under the 'executor' sub-project Signed-off-by: Christopher Bate <[email protected]>
This MR adds CI workflow for MLIR-TensorRT project which runs PR created against main branch (except draft PR). This workflow performs format checking and runs LIT tests.
- Make the `normalized_shape` argument 1:1 with Torch. Previously, our API only supported taking a single integer for the normalization dimension. Now, a list of integers can be provided. - Add integration tests for `tp.LayerNorm` and `tp.GroupNorm`
Not successful figuring out how to commit to `gh-pages` branch, the contents are very different. Disabling the doc deployment job for now. Signed-off-by: yizhuoz004 <[email protected]>
NVIDIA#146) …actual computation in benchmarks
Since we are treating `tp.Shape` as a collection, it makes sense to iterate over them. This PR adds a very simple iterator implementation to `tp.Shape`.
…e in the contributing doc (NVIDIA#151) `docker run` by default will only pull an image if it is entirely missing, so the command given in `CONTRIBUTING.md` will not update the image if it had been pulled before. This change adds the `--pull always` setting in the example command so that it would check for an update before running.
- Revert "Add support for stream in Tripy and make execution async by default (NVIDIA#138)" This reverts commit f9fd477. - Removes an incorrect test skip. When an MLIR-TRT link was seen in a markdown file, we were skipping the entire link-checking test instead of just that one link. - Reworks doc styling, removes redundant doc testing - Updates documentation to use a new Sphinx theme which is more compact and stylistically consistent with other popular Python documentation. - Adds a new `manual` test cadence which will prevent tests from being run in automation. - Applies `manual` test cadence to some documentation testing, which is not required since we build documentation in L0. - Reenables multi-threading for documentation generation. - Miscellaneous changes in some guides. --------- Signed-off-by: pranavm-nvidia <[email protected]>
1) Create a standard for doc-strings dtypes
2) Automatically verify doc-strings' dtype
- negative test any dtypes that are not supported
3) Integrate verification into test pipeline (L1 for now)
4) Add readme file to explain how to use verifier/decorator
Side task:
Add support for several dtypes within cast.
---------
Signed-off-by: Mgluhovskoi <[email protected]>
Co-authored-by: pranavm-nvidia <[email protected]>
Co-authored-by: Parth Chadha <[email protected]>
- Updates container to include tooling to enable profiling our test suite. - Updates README with instructions on how to use profiling tooling.
- Updates `get_stack_info()` to no longer use `inspect` APIs, which are extremely
slow, but instead work with the frames directly.
- Updates `StackInfo` with a `fetch_source_code()` method which allows us to defer
the fetching of source code (extremely slow due to file I/O) until the point
where we actually require it, which is typically when we throw an exception.
This greatly speeds up Tripy execution in general, including our tests:
Before:
```
=================== 1691 passed, 54 skipped, 2549 deselected in 311.22s (0:05:11) ===================
```
After:
```
===================== 1691 passed, 54 skipped, 2549 deselected in 64.83s (0:01:04) =====================
```
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
a868afd to
969510c
Compare
f62962c to
44df0ce
Compare
…VIDIA#165) This PR corrects a small bug with the `__eq__` implementation for `tp.Shape`: The comparison was checking the `len` of the `shape` field of the shape, but it should actually be checking the length of the `tp.Shape` itself. Note: The test case that was included in the unit tests worked "by accident" because the shape in it was length 1, which is broadcasted up to other shapes' lengths. Without this change, the test would fail if comparing two shapes of different lengths where neither is length 1.
44df0ce to
a9843e6
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.