Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR overhauls the support for building accelerators from conifer models. There are two main aspects:
XRT drivers
Since its
v3.1release,pynqis deprecating support for Alveo platforms. I explored XRT's provided Python bindings viapyxrt, and made some working implementations. However, it doesn't expose XRT's API for reading and writing registers. This functionality is quite useful for the quality-of-life features of the drivers, in that the IPs produced withconifercontain some high level information about the IP which needs to be retrieved via registers. For that reason this PR provides a native XRT C++ driver implementation, with Python bindings viapybind11. Happily, this also turns out to be marginally faster thanpyxrt.These drivers are split into a standalone package within the repo named
conifer-xrt. This enables installing the package viapip, without introducing a dependency on XRT installation for all ofconifer. XRT needs to be manually installed by users via AMD/Xilinx, and is only relevant for these accelerators designs.I also add the nlohmann JSON library as a submodule, since the
conifer-xrtpackage needs to build against it at installation time. In a follow up PR I plan to make use of that for the C++ backend, relating to #87.This PR solves #99 (in spirit)
load/compute/store
This part relates to the Xilinx HLS backend top-level wrapper, which has been refactored to properly partition the load, compute, and store tasks along with AMD/Xilinx guidelines. I've written up the implementation and observations, but the main plot is here:

Some attempts at this approach for the FPU backend only slowed things down, so I plan to come back to that.