Skip to content

Commit 53769d6

Browse files
authored
top/raster: add hardware-accelerated sprite/font/line drawing (merge #133)
There is a lot going on in this PR which I will summarize below, but to start out, a quick demo is now included in the `selftest` bitstream, video: https://www.youtube.com/watch?v=HYBjDtPL2xw ## Hardware-accelerated SoC drawing Before this PR, only the scope traces from DSP RTL were hardware-accelerated. Any fonts, lines or other things drawn by the CPU `VexiiRiscv` were directly plotted to the framebuffer memory, without any acceleration. For most simple menus this is fine, however for some bitstreams (like `sid`) this was causing some lag, and I have some ideas for bitstreams that do a lot more CPU drawing. This PR can increase the speed of such drawing ops from the CPU by over 10x. The net effect is that we can reduce the size of the CPU for the same drawing performance (turn off some features, reduce cache sizes), which gives us more space for DSP operations in bitstreams. ## Approach - 3 new cores for accelerating draws are added: `plot.Peripheral`, `line.Peripheral` and `blit.Peripheral`, for drawing points, lines and spritesheets respectively. The SoC can directly (and asynchronously) perform these operations by writing to CSRs. - A [fork of `embedded-graphics`](embedded-graphics/embedded-graphics@master...vk2seb:embedded-graphics:seb/fast-draw) (probably the most popular Rust embedded graphics lib) is slightly modified such that font ops and line draws call out to new (mandatory) HAL functions, which we implement in this PR in the display HAL `dma_framebuffer.rs`. - This means the user API surface of draws (usage of `embedded-graphics`) remains untouched and portable, we only modify the lower layers of `embedded-graphics` to support hardware acceleration. - The entire `video`/`raster` pipeline has been reworked and re-documented, making it more obvious which components accomplish which functions. An excerpt from `plot.py`'s docstring: ``` Utilities for plotting pixels to a framebuffer. Generally, all plotting operations to the framebuffer should go through the ``FramebufferPlotter`` core in this file. This can be via direct pixel writes from the SoC (using e.g. ``Peripheral`` here), through hardware-accelerated SoC interfaces like those in ``line.py`` or ``blit.py``, or indirectly through gateware-driven plotting requests like ``stroke.py``. In general, a ``raster`` system may look like this: .. code-block:: text SoC requests ────► [plot.Peripheral()] ─╮ ╰──────► [line.Peripheral()] ─┼───► FramebufferPlotter ─► PSRAM ╰───────► [blit.Peripheral()] ─┤ │ RTL requests ────► [stroke.Stroke()] ─╯ All pixel ``PlotRequest``s are streams, arbitered in a round-robin fashion by the ``FramebufferPlotter`` into a single (internal) ``PlotRequest``, which is fed into the ``_FramebufferBackend`` to perform blending (and handle screen rotation). The resulting memory accesses go through a cache ``raster.cache.Cache``, before eventually issuing requests on the PSRAM bus. If higher pixel throughput is needed, one can instantiate multiple hardware accelerators and ``FramebufferPlotters`` on the same (shared) PSRAM bus, at least as many as you want until you run out of memory bandwidth or FPGA resources :). For the best performance, it makes sense to share ``FramebufferPlotter``s between components that want to draw to the same part of the screen, to avoid cache thrashing. ``` - More such docstrings are also added in `line.py` and `blit.py`, as well as anywhere else that major changes were made. - The cache flushing mechanism has also been reworked to use less area. In general we don't have a concept of cache coherency in the main framebuffer, as the phosphor simulation combined with cache flushing is visually enough to hide most artifacts that would be hidden by cache coherency management. ## Arbitrary rotation A nice artifact of having a unified plotting backend is that we have a single place where screen rotation transformations take place. So, in `xbeam` I added an extra menu option to demo this, which you can use to rotate the entire screen in steps of 90 degrees. This even resizes correctly across different aspect ratios (e.g. portrait/landscape). ## Testing Some interesting unit tests which simulate CSR writes and draws to some fake 'ASCII art' are added in `tests/test_raster.py`. These are automatically run in CI but can also be run with printouts locally from the `gateware` directory by running: `pdm run python3 -m pytest tests/test_raster.py -srv` Additionally, ALL bitstreams are updated to now include all these acceleration cores, so all of them are a bit faster. The `selftest` bitstream specifically adds a new feature on the last menu page, which lets you measure the framerate of the graphics hardware while saturating it with lines or text (this is the demo video I posted in the first sentence above^). Overall, in `selftest`, I saw an FPS increase from ~10FPS to ~55FPS when not changing the CPU size. However, in this PR I also halved the RISCV cache size and disabled some features to save area, so it's currently about 35FPS in the selftest bitstream. Note that the framerate is not so comparable with that of a PC framerate, as the phosphor simulation and audio traces are always updated at 60FPS, it's only the CPU drawing ops that have their independent framerate that I am referring to here, as the two drawing (symphonies?) run asynchronously.
2 parents d9b1b10 + 5dd259d commit 53769d6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+3188
-13905
lines changed

gateware/src/rs/bootinfo_gen/Cargo.lock

Lines changed: 21 additions & 7 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

gateware/src/rs/hal/Cargo.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,14 @@ embedded-hal-nb = "=1.0.0"
2222
log = { version = "0.4.*", optional = true }
2323
nb = "=1.1.0"
2424
riscv = { version = "=0.11.1", features = ["critical-section-single-hart"] }
25-
embedded-graphics = "0.8.1"
25+
embedded-graphics = { git = "https://github.com/vk2seb/embedded-graphics.git", branch = "seb/fast-draw" }
2626
bitflags = "2.6.0"
2727
micromath = "2.1.0"
2828
embedded-storage = "0.3.1"
2929
serde = { version="1.0.219", default-features=false }
3030
serde_derive = "1.0.219"
31+
strum_macros = "0.26.4"
32+
strum = {version = "0.25.0", features = ["derive"], default-features=false}
3133

3234
[dev-dependencies]
3335
critical-section = { version = "1.1.2", features = ["std"] }

0 commit comments

Comments
 (0)