[mlir-tensorrt] Integrate internal changes #651
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
--
b7c0ac3c95aac78b563cfcf1926fcfb0de20ef67 by Chris Bate [email protected]:
[compiler] Fix bufferization for
trtrt.enqueueThis fixes a bug where
trtrt.enqueuecould incorrectly allow the same buffer to be used as both an input and an output. This is technically possible since for many TensorRT programs, the inputs will be read before any outputs are written. However, in general we cannot actually make this assumption, which is codified in the bufferization infrastructure via thebufferizesToElementwiseAccessmethod of the BufferizableOpInterface. Previously, we were returning true for this method fortrtrt.enqueuedespite not analyzing the corresponding function.Since the ability to re-use an input pointer avoids an allocation, it can be critical to good performance in certain cases, especially in loops. Therefore, in the future, we need to add back the ability to specify
truewhen some basic analysis reduces the risk of that being incorrect.--
6fab2aa80a7776fe9afd15a59fcce916b61570cd by Chris Bate [email protected]:
Refactor registration of dialects and passes
Features.hheader containing some convenience macros that look a lot nicer than#ifdef..#endifblocks.--
96f45d6c6f3f2f1354984944f194fdc8a857ff34 by Sagar Shelke [email protected]:
[tensorrt] Fix a bug in
trtSetWeightsfunction in adaptor.This MR fixes a bug in
trtSetWeightsmethod in NvInferAdaptor. This method stores weights in weight map in order to keep them alive until engine is built. Previously, a new weight in map was created but original weight was never copied. This created issues like getting zeros in the output to all the way getting NaNs. Change copies original weights into map.--
08fc170f769d9cedce2541da14b101e0947175e2 by Chris Bate [email protected]:
[compiler] Add explicit memory space assignment
Previously, the 'plan-module-bufferize' pass used the encodings of tensor types to infer the memory space of memref types. However, not every tensor had an encoding. There was a notion of a default memory space (
#plan.memory_space<device>) which was used to deduce the memory space when no tensor encoding was present. However, this could result in situations where the program could not be bufferized correctly, for example if we hadtensor.castoperations that added or removed the encoding. In such cases, we are relying on brittle logic of the bufferization infrastructure to somehow deduce the correct memory space, and this may not always work.Therefore, this change adds a new pass
plan-assign-memory-spaceswhich explicitly assigns a#plan.memory_spaceencoding to all tensor types. In proceeds in two steps. The first step simply assigns the 'device' space to all tensors. Then the second step performs some minor optimizations to avoid unnecessary host-device transfers after bufferization. An end-to-end test case for the bufferization pipeline is added which was previously failing. Interestingly, the addition of the new pass results in better bufferization for cases where we can't detensorize while loops.--
1be85552b63d7a06d9e27b7d9bd57e2e62589356 by Christopher Bate [email protected]:
[compiler] Add additional IR printing flags to the compiler API
Exposes
-mlir-elide-elementsattrs-if-largerand-mlir-elide-resource-strings-if-largerto the compiler API via DebugOptions provider.--
cfc7a93555f2778711610468b2f56e9f2fb28148 by Chris Bate [email protected]:
[tensorrt] Make the TensorRT builder logger global
This change makes the TensorRT builder logger a global singleton. This change has been made due to an issue with the TRT API -- the logger does not actually get associated with the TRT object created via
createInferBuilderorcreateInferRuntime-- instead the TRT API maintains a global logger that is populated with push/op mechanism, except the stack size can only be 1. So if the Builder object lifetime overlaps the Runtime object lifetime, then the Runtime will ignore the logger passed tocreateInferRuntimeand instead use the global logger populated bycreateInferBuilder. This is clearly problematic if the API user (e.g. MLIR-TRT compiler) thinks they can destroy the logger used to create the Builder object after the Builder object is destroyed. In such a case, the Runtime will still try to use the builder's logger, which results in use-after-free.To reduce the likelihood of this ocurring, we make the builder logger a global singleton. Note, however, that we can't guarantee that the logger is not destroyed when the compiler library is closed via
dlclose, and we can't make any guaruntees about global shutdown order.--
4f215aa9513d7751572cd83d8575f251c5a61edb by Christopher Bate [email protected]:
NFC: Fix two compiler warnings
--
6f4cc4ad0eb6a4c554e5e36fc23a4b3b56ba607e by Christopher Bate [email protected]:
NFC: update test runner configs to take into account system memory
This change updates LIT parallelism setting logic to parallelism based on system memory.
GitOrigin-RevId: 6f4cc4ad0eb6a4c554e5e36fc23a4b3b56ba607e