Skip to content

Conversation

@jhalakpatel
Copy link
Owner

No description provided.

yizhuoz004 and others added 30 commits August 16, 2024 11:00
This function greatly simplifies Tripy's Array implementation. We want
to be able to handle memref creation from all types that implement the
`__dlpack__()` interface rather than the limited set we currently
support. This function should allow us to achieve this. The
corresponding Tripy changes are NVIDIA#72.
Proper fix for mounting tripy directory in the contianer.

---------

Signed-off-by: yizhuoz004 <[email protected]>
This PR formats `test_create_memref.py` using black formatter.
… possible (NVIDIA#92)

For some cases, it is useful to know the length of a `tp.Shape` without
executing the model. This PR adds a method `infer_len` that allows
operators to specify how to statically infer the length of `Shape`
outputs when possible (it is always optional). Test cases are added.
- Adds `Shape.__ne__()` method so that shapes can be conveniently
checked for inequality.
- Fixes logic error where `Shape.__eq__()` would ignore checking rank.
This would lead to an error when comparing shape's with different ranks.
- Adds shape inequality test cases for shapes with different and same
ranks.
NVIDIA#131)

This PR adds a GitHub workflow to create new docker image if changes are
made in `build_tools/docker/Dockerfile`, `python/requirements-dev.txt`
or `python/requirements.txt`.
Throw an exception when tensor input's dtype is `int64` and operation
has undefined behavior for i64 tensors.
- Adds test cases for multiple negative reduce dims, e.g.
```python
a = tp.ones((5,5,5))
tp.sum(a, dim=[-2,-1]
```
- Fixes the `_reduce_impl` to ensure negatives are sorted in decreasing
order when performing `unsqueeze`
This change is a compilation of the following commits:

-- [executor] Enable wider support for loads/stores of aggregates types

  Previously, we only supported load/store of aggregate  types in a very
  limited scope. This change expands that support so that we support load/store
  of arbitrary aggregates. We introduce a pass `executor-decompose-aggregate-loads-and-stores`
  that specifically decomposes these loads/stores into load/stores of the
  individual elements. Since this decomposition results in the creation of
  `executor.offset` operations, we need to sequence the new pass correctly with
  the `executor-expand-ops` pass (which lowers `executor.alloca` and `executor.offset`)
  as well as with the lowering of executor operations into opaque
  `executor.call` ops. To make all this work correctly, this change also
  factors out the latter transformation into a dedicated pass
  `executor-lower-to-runtime-builtins`.

-- NFC: fix some Python typing annotations

-- [executor] add Executor and runtime support for `complex<f32>` and `complex<f64>` types

  This changes:

  - adds support to the Executor dialect for `complex<f32>` and `complex<f64>` types
  - adds support to the runtime API interface for the corresponding `c32` and `c64` types

-- [executor] Properly serialize absent function signatures

  When a `FunctionMetadataAttr` attribute is not provided on a `func.func`
  during translation/serialization to the Executable format, we should pass
  a 0-offset to the signature field when creating a `rt::impl::Function`
  flatbuffer table.

  This was caught when working on complex32/complex64 support. An additional
  validation check is added immediately after finalizing the executable
  buffer in the `mlir::translateToRuntimeExecutable` function.

-- [executor] Fix i4 multiplication runtime error and i4 tests

  Fixes an issue where Lua user type metatable information wasn't correctly
  set for the `nv_int4` type. The 'arithmetic.mlir' i4 tests also were not
  effectively testing the runtime i4 functions because the compiler was
  constant-folding most of the operations being tested. To fix the constant
  folding issue, we just need to pass some arguments to the test functions
  instead of inlining them into the test function bodies.

-- Add bufferization integration test pipeline

  This change adds a simple test bufferization pipeline for the Executor project
  and uses that pipeline to construct new integration tests. The test verifies
  host i4 operations and makes a small bug fix.

-- NFC: move some unit tests from top-level 'test' under the 'executor' sub-project

Signed-off-by: Christopher Bate <[email protected]>
This MR adds CI workflow for MLIR-TensorRT project which runs PR created
against main branch (except draft PR). This workflow performs format
checking and runs LIT tests.
- Make the `normalized_shape` argument 1:1 with Torch. Previously, our
API only supported taking a single integer for the normalization
dimension. Now, a list of integers can be provided.
- Add integration tests for `tp.LayerNorm` and `tp.GroupNorm`
Not successful figuring out how to commit to `gh-pages` branch, the
contents are very different. Disabling the doc deployment job for now.

Signed-off-by: yizhuoz004 <[email protected]>
Since we are treating `tp.Shape` as a collection, it makes sense
to iterate over them. This PR adds a very simple iterator implementation
to `tp.Shape`.
…e in the contributing doc (NVIDIA#151)

`docker run` by default will only pull an image if it is entirely
missing, so the command given in `CONTRIBUTING.md` will not update the
image if it had been pulled before. This change adds the `--pull always`
setting in the example command so that it would check for an update
before running.
- Revert "Add support for stream in Tripy and make execution async by
default (NVIDIA#138)"
    This reverts commit f9fd477.

- Removes an incorrect test skip.
When an MLIR-TRT link was seen in a markdown file, we were skipping the
entire
    link-checking test instead of just that one link.

- Reworks doc styling, removes redundant doc testing
    
- Updates documentation to use a new Sphinx theme which is more compact
and stylistically consistent with other popular Python documentation.
    
- Adds a new `manual` test cadence which will prevent tests from being
run in
         automation.
    
- Applies `manual` test cadence to some documentation testing, which is
not
         required since we build documentation in L0.
    
    - Reenables multi-threading for documentation generation.
    
    - Miscellaneous changes in some guides.

---------

Signed-off-by: pranavm-nvidia <[email protected]>
Mgluhovskoi and others added 3 commits August 27, 2024 20:37
1) Create a standard for doc-strings dtypes
2) Automatically verify doc-strings' dtype
    - negative test any dtypes that are not supported
3) Integrate verification into test pipeline (L1 for now)
4) Add readme file to explain how to use verifier/decorator

Side task:

Add support for several dtypes within cast.

---------

Signed-off-by: Mgluhovskoi <[email protected]>
Co-authored-by: pranavm-nvidia <[email protected]>
Co-authored-by: Parth Chadha <[email protected]>
- Updates container to include tooling to enable profiling our test suite.

- Updates README with instructions on how to use profiling tooling.
- Updates `get_stack_info()` to no longer use `inspect` APIs, which are extremely
     slow, but instead work with the frames directly.

- Updates `StackInfo` with a `fetch_source_code()` method which allows us to defer
     the fetching of source code (extremely slow due to file I/O) until the point
     where we actually require it, which is typically when we throw an exception.

This greatly speeds up Tripy execution in general, including our tests:
Before:
```
=================== 1691 passed, 54 skipped, 2549 deselected in 311.22s (0:05:11) ===================
```

After:
```
===================== 1691 passed, 54 skipped, 2549 deselected in 64.83s (0:01:04) =====================
```
@google-cla
Copy link

google-cla bot commented Aug 30, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@jhalakpatel jhalakpatel force-pushed the remove-dps branch 2 times, most recently from f62962c to 44df0ce Compare August 30, 2024 20:14
slyubomirsky and others added 2 commits August 30, 2024 16:49
…VIDIA#165)

This PR corrects a small bug with the `__eq__` implementation for
`tp.Shape`: The comparison was checking the `len` of the `shape` field
of the shape, but it should actually be checking the length of the
`tp.Shape` itself.

Note: The test case that was included in the unit tests worked "by
accident" because the shape in it was length 1, which is broadcasted up
to other shapes' lengths. Without this change, the test would fail if
comparing two shapes of different lengths where neither is length 1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants