Skip to content

Conversation

XanClic
Copy link

@XanClic XanClic commented May 30, 2025

Summary of the PR

This MR adds support for an IOMMU, and thus I/O virtual memory handling.

New Memory Trait: IoMemory

Handling I/O virtual memory requires a new interface to access guest memory:
GuestMemory does not allow specifying the required access permissions, which
is necessary when working with MMU-guarded memory.

We could add memory access methods with such a permissions parameter to
GuestMemory, but I prefer to provide a completely new trait instead. This
ensures that users will only use the interface that actually works when working
with (potentially) I/O virtual memory, i.e.:

  • They must always specify the required permissions,
  • They cannot (easily) directly access the memory regions, because doing so
    generally assumes that regions are long, continuous, and any address in a
    given range will be in the same memory region. This is absolutely no longer
    the case with virtual memory, which is heavily fragmented into pages.

That is, adding a new trait (IoMemory) allows to catch a lot of potential
mistakes at compile time, which I feel is much better than finding out at
runtime that some place forgot to specify the access permissions.

Unfortunately, this is an incompatible change because we need to decide on a
single guest memory trait that we expect users to primarily use: We can only
have one blanket implementation of e.g. Bytes, and this MR changes that
blanket implementation to be on IoMemory instead of GuestMemory because we
want to prefer IoMemory with its permissions-including interface.

While this MR does provide a blanket implementation of IoMemory for all
GuestMemory, Rust isn’t fully transitive here, so just because we have a
blanket impl IoMemory for GuestMemory and a blanket impl Bytes for IoMemory
doesn’t really implicitly give us an impl Bytes for GuestMemory.

What this means can be seen in virtio-queue (in vm-virtio): It uses trait bounds
like M: GuestMemory only, but then expects to be able to use the Bytes
trait. This is no longer possible, the trait bound must be extended to
M: GuestMemory + Bytes or replaced by M: IoMemory (the latter is what we
want).

Guest Address Type

Another consideration is that I originally planned to introduce new address
types. GuestAddress currently generally refers to a guest physical address
(GPA); but we now also need to deal with I/O virtual addresses (IOVAs), and an
IOMMU generally doesn’t translate those into GPAs, but VMM user space addresses
(VUAs) instead, so now there’s three kinds of addresses. Ideally, all of those
should get their own type; but I felt like:

  • This would require too many changes from our users, and
  • You don’t even know whether the address you use on an IoMemory object is an
    IOVA or a GPA. It depends on whether the IOMMU is enabled or not, which is
    generally a runtime question.

Therefore, I kept GuestAddress as the only type, and it may refer to any of
the three kinds of addresses (GPAs, IOVAs, VUAs).

Async Accesses

I was considering whether to also make memory accesses optionally async. The
vhost-user IOMMU implementation basically needs two vhost-user socket roundtrips
per IOTLB miss, which can make guest memory accesses quite slow. An async
implementation could allow mitigating that.

However, I decided against it (for now), because this would also require
extensive changes in all of our consuming crates to really be useful: Anything
that does a guest memory access should then be async.

I think if we want to add this functionality later, it should be possible in a
compatible manner.

Changes Necessary in Other Crates

vm-virtio

Implementation: https://gitlab.com/hreitz/vm-virtio/-/commits/iommu

As stated above, places that bind M: GuestMemory but expect the Bytes trait
to also be implemented need to be changed to M: GuestMemory + Bytes or
M: IoMemory. I opted for the latter approach, and basically replaced all
GuestMemory instances by IoMemory.

(That is what we want because dropping GuestMemory in favor of IoMemory
ensures that all vm-virtio crates can work with virtual memory.)

vhost

Implementation: https://gitlab.com/hreitz/vhost/-/commits/iommu

Here, the changes that updating vm-memory necessitates are quite marginal, and
have a similar cause: But instead of requiring the Bytes trait, it’s the
GuestAddressSpace trait. The resolution is the same: Switch from requiring
GuestMemory to IoMemory.

The rest of the commits concerns itself with implementing VhostUserIommu and
allowing users to choose to use IommuMemory<GuestMemoryMmap, VhostUserIommu>
instead of only GuestMemoryMmap.

virtiofsd (as one user)

Implementation: https://gitlab.com/hreitz/virtiofsd-rs/-/commits/iommu

This is an example of an actual user. Updating all crates to IOMMU-supporting
versions actually does not require any changes to the code, but enabling the
'iommu' feature does: This feature makes the vhost-user-backend crate require
the VhostUserBackend::Memory associated type (because associated type defaults
are not stable yet), so this single line of code must be added (which sets the
type to GuestMemoryMmap<BitmapMmapRegion>).

Actually enabling IOMMU support is then a bit more involved, as it requires
switching away from GuestMemoryMmap to IommuMemory again.

However, to me, this shows that end users working with concrete types do not
seem to be affected by the incompatible IoMemory change until they want to opt
in to it. That’s because GuestMemoryMmap implements both GuestMemory and
IoMemory (thanks to the blanket impl), so can transparently be used wherever
the updated crates expect to see an IoMemory type.

Requirements

Before submitting your PR, please make sure you addressed the following
requirements:

  • All commits in this PR have Signed-Off-By trailers (with
    git commit -s), and the commit message has max 60 characters for the
    summary and max 75 characters for each description line.
  • All added/changed functionality has a corresponding unit/integration
    test.
  • All added/changed public-facing functionality has entries in the "Upcoming
    Release" section of CHANGELOG.md (if no such section exists, please create one).
  • Any newly added unsafe code is properly documented.

@XanClic
Copy link
Author

XanClic commented May 30, 2025

I know why the code coverage CI check fails, because I (purposefully) don’t have unit tests for the new code.

Why the other tests failed, I don’t know; but I suspect it’s because I force-pushed an update (fixing the CHANGELOG.md link) while the tests where running, maybe SIGTERM-ing them. Looking at the timelines, they all failed (finished) between 16:27:02.985 and 16:27:02.995. (Except the coverage one, which is an actual failure.)

@XanClic
Copy link
Author

XanClic commented May 30, 2025

Pushed an update without actual changes (just re-committing the top commit) to trigger a CI re-run. This time, only the coverage check failed (as expected).

XanClic added 6 commits July 10, 2025 13:27
With virtual memory, seemingly consecutive I/O virtual memory regions
may actually be fragmented across multiple pages in our userspace
mapping.  Existing `descriptor_utils::Reader::new()` (and `Writer`)
implementations (e.g. in virtiofsd or vm-virtio/virtio-queue) use
`GuestMemory::get_slice()` to turn guest memory address ranges into
valid slices in our address space; but with this fragmentation, it is
easily possible that a range no longer corresponds to a single slice.

To fix this, we can instead use `try_access()` to collect all slices,
but to do so, its region argument needs to have the correct lifetime so
we can collect the slices into a `Vec<_>` outside of the closure.

Signed-off-by: Hanna Czenczek <[email protected]>
read() and write() must not ignore the `count` parameter: The mappings
passed into the `try_access()` closure are only valid for up to `count`
bytes, not more.

Signed-off-by: Hanna Czenczek <[email protected]>
When we switch to a (potentially) virtual memory model, we want to
compact the interface, especially removing references to memory regions
because virtual memory is not just split into regions, but pages first.

The one memory-region-referencing part we are going to keep is
`try_access()` because that method is nicely structured around the
fragmentation we will have to accept when it comes to paged memory.

`to_region_addr()` in contrast does not even take a length argument, so
for virtual memory, using the returned region and address is unsafe if
doing so crosses page boundaries.

Therefore, switch `Bytes::load()` and `store()` from using
`to_region_addr()` to `try_access()`.

Signed-off-by: Hanna Czenczek <[email protected]>
The existing `GuestMemory` trait is insufficient for representing
virtual memory, as it does not allow specifying the required access
permissions.

Its focus on all guest memory implementations consisting of a relatively
small number of regions is also unsuited for paged virtual memory with a
potentially very lage set of non-continuous mappings.

The new `IoMemory` trait in contrast provides only a small number of
methods that keep the implementing type’s internal structure more
opaque, and every access needs to be accompanied by the required
permissions.

Signed-off-by: Hanna Czenczek <[email protected]>
Rust only allows us to give one trait the blanket implementations for
`Bytes` and `GuestAddressSpace`.

We want `IoMemory` to be our primary external interface becaue it has
users specify the access permissions they need, and because we can (and
do) provide a blanket `IoMemory` implementation for all `GuestMemory`
types.

Therefore, replace requirements of `GuestMemory` by `IoMemory` instead.

Signed-off-by: Hanna Czenczek <[email protected]>
The Iommu trait defines an interface for translating virtual addresses
into addresses in an underlying address space.

It is supposed to do so by internally keeping an instance of the Iotlb
type, updating it with mappings whenever necessary (e.g. when
actively invalidated or when there’s an access failure) from some
internal data source (e.g. for a vhost-user IOMMU, the data comes from
the vhost-user front-end by requesting an update).

In a later commit, we are going to provide an implementation of
`IoMemory` that can use an `Iommu` to provide an I/O virtual address
space.

Note that while I/O virtual memory in practice will be organized in
pages, the vhost-user specification makes no mention of a specific page
size or how to obtain it.  Therefore, we cannot really assume any page
size and have to use plain ranges with byte granularity as mappings
instead.

Signed-off-by: Hanna Czenczek <[email protected]>
@XanClic XanClic force-pushed the iommu branch 2 times, most recently from 5ee020b to 4104d5f Compare July 10, 2025 12:23
@XanClic XanClic changed the title [DRAFT] I/O virtual memory (IOMMU) support I/O virtual memory (IOMMU) support Jul 10, 2025
@XanClic XanClic marked this pull request as ready for review July 10, 2025 12:39
@XanClic
Copy link
Author

XanClic commented Jul 10, 2025

Added tests for the new functionality, and rebased on the main branch.

@XanClic
Copy link
Author

XanClic commented Jul 11, 2025

cc @germag

XanClic added 3 commits July 30, 2025 10:46
This `IoMemory` type provides an I/O virtual address space by adding an
IOMMU translation layer to an underlying `GuestMemory` object.

Signed-off-by: Hanna Czenczek <[email protected]>
The vhost-user-backend crate will need to be able to modify all existing
memory regions to use the VMM user address instead of the guest physical
address once the IOMMU feature is switched on, and vice versa.  To do
so, it needs to be able to modify regions’ base address.

Because `GuestMemoryMmap` stores regions wrapped in an `Arc<_>`, we
cannot mutate them after they have been put into the `GuestMemoryMmap`
object; and `MmapRegion` itself is by its nature not clonable.  So to
modify the regions’ base addresses, we need some way to create a new
`GuestRegionMmap` referencing the same `MmapRegion` as another one, but
with a different base address.

We can do that by having `GuestRegionMmap` wrap its `MmapRegion` in an
`Arc`, and adding a method to return a reference to that `Arc`, and a
method to construct a `GuestRegionMmap` object from such a cloned `Arc.`

Signed-off-by: Hanna Czenczek <[email protected]>
Without an IOMMU, we have direct access to guest physical addresses
(GPAs).  In order to track our writes to guest memory (during
migration), we log them into dirty bitmaps, and a page's bit index is
its GPA divided by the page size.

With an IOMMU, however, we no longer know the GPA, instead we operate on
I/O virtual addresses (IOVAs) and VMM user-space addresses (VUAs).
Here, the dirty bitmap bit index is the IOVA divided by the page size.

`IoMemory` types contain an internal "physical" memory type that
operates on these VUAs (`IoMemory::PhysicalMemory).  Any bitmap
functionality that this internal type may already have (e.g.
`GuestMemoryMmap` does) cannot be used for dirty bitmap tracking with an
IOMMU because they would use the VUA, but we need to use the IOVA, and
this information is not available on that lower layer.

Therefore, `IoMemory` itself needs to support bitmaps separately from
its inner `PhysicalMemory`, which will be used when the IOMMU is in use.
Add an associated `IoMemory::Bitmap` type and add a bitmap object to
`IommuMemory`.  Ensure that writes to memory dirty that bitmap
appropriately:
- In `try_access()`, if write access was requested, dirty the handled
  region of the bitmap after the access is done.
- In `get_slice()`, replace the `VolatileSlice`'s bitmap (which comes
  from the inner `PhysicalMemory`) by the correct slice of our IOVA
  bitmap before returning it.

Signed-off-by: Hanna Czenczek <[email protected]>
@XanClic
Copy link
Author

XanClic commented Jul 30, 2025

Added support for bitmaps in I/O virtual address space (required for migration).

XanClic added 3 commits July 30, 2025 18:58
This commit also adds the iommu feature to the coverage_config feature
list.  (I left the aarch64 coverage value unchanged; I cannot find out
how to get the current value on my system, and it isn’t include in CI.)

Signed-off-by: Hanna Czenczek <[email protected]>
Document in DESIGN.md how I/O virtual memory is handled.

Signed-off-by: Hanna Czenczek <[email protected]>
/// the I/O virtual address space that `IommuMemory` provides into that underlying physical address
/// space.
#[derive(Debug, Default)]
pub struct IommuMemory<M: GuestMemory, I: Iommu> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand methods such as inner_replaced. Maybe what you want is a GuestAddressSpace here, and physical_memory() returns a GuestAddressSpace::T?

Maybe with that change you don't need the Arc<I> anymore, but you can just use &I?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, yep, I’ll try making it a GuestAddressSpace instead to have the user handle the physical memory object management.

(And I’ll see whether I can make it I instead of Arc<I>.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, something like IommuMemory<M: GuestAddressSpace, I: Deref<Target = Iommu>> would do. Then the Clone implementation can be conditional on having Clone on both M and I.

The main disadvantage would be an extra clone() in try_access(). But it could be optimized with a method like

    fn snapshot(&self) -> IommuMemory<&M::M, &I> {
         IommuMemory {
              inner: &*self.inner.memory(),
              iommu: &*self.iommu,
              enabled_iommu: self.enabled_iommu
        }
    }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After trying to implement it, I’m not sure how to make it work with the slice lifetimes. Both try_access() and get_slice() return[1] objects with the same lifetime as self, which only works if self owns the memory. With GuestAddressSpace, the lifetime would be limited to the memory() snapshot we can take.

The simplest way to do this would be for the caller to take that snapshot and pass it into get_slice()/try_access(). At that point, I suppose we could completely remove any reference to the GuestMemory from IommuMemory altogether. I wouldn’t like that very much, though.

Besides my personal preferences, I also don’t know how to make it work at least with users like descriptor_utils::Reader (and Writer): That new() function collects slices in memory[2], and returns those in an object whose lifetime is bound to that of its memory parameter. If we used GuestAddressSpace for the physical memory, someone in descriptor_utils would need to create a snapshot, store it somewhere, and the slices’ lifetimes would be bound to that snapshot’s lifetime. I don’t know how to express that, other than to have the Reader::new() caller in turn create that snapshot and pass it into that function. That may indeed be doable, but it doesn’t really feel worth it to me…

So at this point, I’d prefer for IommuMemory to own the inner physical memory, and consider IoMemory a snapshot like GuestMemory is a snapshot.


[1] In case of try_access(), “return” means passing it to the closure.

[2] Note that currently it uses GuestMemory::find_region() + GuestMemoryRegion::get_slice() – that will need to be changed into an IoMemory::try_access(|_, len, region_addr, region| region.get_slice(region_addr, len)), basically.

@@ -87,6 +93,8 @@ impl std::ops::BitAnd for Permissions {
pub trait IoMemory {
/// Underlying `GuestMemory` type.
type PhysicalMemory: GuestMemory;
/// Dirty bitmap type for tracking writes to the IOVA address space.
type Bitmap: Bitmap;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I like the design here. The IOVA address space, being virtual, can have aliases. Could you still store the bitmap in the low-level memory region, but with accesses going through IOMMU translation?

What does vhost-user require? Are dirty bitmap accesses done in IOVA or GPA space?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are dirty bitmap accesses done in IOVA or GPA space?

They’re done in IOVA space, which is why I think we need the bitmap on this level.

(If we continued to store it in e.g. GuestRegionMmap/MmapRegion, it would need a reverse translation, and then keep a mapping of address ranges to bitmap slices instead of just a single bitmap slice linearly covering the entire region.)

Copy link
Member

@bonzini bonzini Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They’re done in IOVA space, which is why I think we need the bitmap on this level.

I see. Maybe you could add a bitmap::BS<'a, Self::Bitmap> as another argument that try_access() passes back to the callback? And then IommuMemory::get_slice() can pass that argument to replace_bitmap().

This way, the Iommu controls whether dirty tracking is done in IOVA or GPA space.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean putting the responsibility on the try_access() callback to dirty a bitmap slice given to it if it has written anything?

I’m not sure, that does sound more than reasonable, but would require more changes in the callers than just to add the Permissions flag. 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean putting the responsibility on the try_access() callback to dirty a bitmap slice given to it if it has written anything?

No, I confused try_access and get_slice sorry. For IoMemory, dirtying is already done it in try_access itself; for Iommu the slice could be returned by translate, that is it would be part of the Iotlb entry?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m sorry, I don’t quite follow; currently, dirtying is done only by IommuMemory (in its IoMemory implementation), neither IoMemory nor Iommu.

<IommuMemory as IoMemory>::try_access() dirties the bitmap itself, right.

<IommuMemory as IoMemory>::get_slice() has the VolatileSlice do the work. For this, it replaces the VolatileSlice’s internal bitmap slice (which is initially set by the GuestMemoryRegion) by the right slice in the IOVA space.

I don’t think the IOMMU object should do any of this, and I don’t think it should return slices (if I understand correctly). I think it should only do the translation and not care about the actual memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

::get_slice() has the VolatileSlice do the work. For this, it replaces the VolatileSlice’s internal bitmap slice (which is initially set by the GuestMemoryRegion) by the right slice in the IOVA space.

To clarify, I was saying it should not be the IommuMemory that decides the address space used by the bitmap - whether IOVA space as vhost wants or GPA space as everyone else (?) wants. I thought Iommu would return the slice in the Iotlb object, but I agree that probably Iommu isn't the right place either.

That said I don't care much, because as long as IommuMemory is hidden behind a feature, it's just a convenience or a sample implementation (and QEMU probably will not use IommuMemory, only IoMemory). Maybe just add a comment that says "NOTE: IommuMemory can only be used if the dirty bitmap tracks accesses in IOVA space".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just add a comment that says "NOTE: IommuMemory can only be used if the dirty bitmap tracks accesses in IOVA space".

To be precise, it depends on the IOMMU enabled setting: With it enabled, it’s IOVA space, otherwise it’s whatever the address space of the underlying GuestMemory is (for vhost, that would be VUA).

I could speculate on how it works for non-vhost (specifically in-hypervisor) cases; maybe the underlying memory will use GPA in that case, and therefore we can (and have to) track dirty accesses in GPA then. To make that work, we would only need a separate flag to control whether the IOVA dirty bitmap is supposed to be used or not.

But I think the best would be for me not to speculate and just leave this as it is – if such a separate flag would be enough, we can just add it later. For the time being, as you suggest, a comment that specifies the dirty bitmap behavior should be enough.

Copy link
Member

@bonzini bonzini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is good stuff. It also has a lot of overlap with the experimental changes that were necessary in order to use vm-memory in QEMU, which is a very good thing.

I have two main questions:

  • do we really need an IoMemory or can we change GuestMemory?
  • if we need an IoMemory, I think the PhysicalMemory: GuestMemory type instead be an AS: GuestAddressSpace? (and likewise for physical_memory()?

I'm marking this as "request changes" because there would be changes either way.

Copy link
Member

@bonzini bonzini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extract the first three commits, up to "Bytes: Do not use to_region_addr()" in a separate PR.

@roypat
Copy link
Member

roypat commented Aug 11, 2025

+1 to changing GuestMemory. Could we maybe parameterized by the address space? E.g. GuestMemory<GPA> and GuestMemory<IOVA> (and then indeed introduce separate guest address types for guest physical and iommu addresses)?

@bonzini
Copy link
Member

bonzini commented Aug 11, 2025

and then indeed introduce separate guest address types for guest physical and iommu addresses

No, I don't think that is a good idea. IOVA and GPA are the same concept, though in different address spaces.

My hope when leaving the review was that no changes are needed outside vm-memory, and the interpretation of GuestAddress can be left to whoever creates the GuestMemory.

@XanClic
Copy link
Author

XanClic commented Aug 12, 2025

Could we maybe parameterized by the address space? E.g. GuestMemory and GuestMemory (and then indeed introduce separate guest address types for guest physical and iommu addresses)?

The main practical problem with using different types (as stated in the opening comment) is that users generally don’t even know whether a given address is a GPA or an IOVA. For example, the addresses you get from vrings and vrings descriptors can be either, it just depends on whether the IOMMU is enabled or not. Users should be able to use the same code whether it is on or off, which (I think) wouldn’t really be possible with different types.

Whether to add IoMemory or change GuestMemory

I hope Paolo will add a comment to that effect (we just had a talk); we agreed that adding IoMemory is OK as long as it is guarded under the new iommu feature.

I don’t like keeping GuestMemory because I think actually none of its current methods will work with virtual memory, and so I would prefer adding a new interface that makes it clear that these aren’t available at compile time.

(Paolo (maybe half-jokingly) suggested making the implementation of that a linking-time error by accessing undefined references in a hypothetical impl GuestMemory for IommuMemory implementation, which is fun, but I think just guarding the new trait under the iommu feature makes more sense.)

@bonzini
Copy link
Member

bonzini commented Aug 12, 2025

I don’t like keeping GuestMemory because I think actually none of its current methods will work with virtual memory, and so I would prefer adding a new interface that makes it clear that these aren’t available at compile time.

Yes, GuestMemory is a nice interface for implementation but clients should switch to IoMemory.

@roypat
Copy link
Member

roypat commented Aug 12, 2025

I don’t like keeping GuestMemory because I think actually none of its current methods will work with virtual memory, and so I would prefer adding a new interface that makes it clear that these aren’t available at compile time.

Yes, GuestMemory is a nice interface for implementation but clients should switch to IoMemory.

Different question about this: Do we really need the IoMemory trait, or could consumers just switch to IommuMemory<M: GuestMemory, I: Iommu> (e.g. be parametric by M and I instead of T: IoMemory, and for "no iommu" systems we can have a no-op implementation of Iommu)? That would also side-step all the problems of conflicting default implementations (although I'm not sure if the compiler will be clever enough to optimize the no-op iommu setup as well as with the special impl of IoMemory for GuestMemory :/)

@XanClic
Copy link
Author

XanClic commented Aug 12, 2025

could consumers just switch to IommuMemory<M: GuestMemory, I: Iommu> (e.g. be parametric by M and I instead of T: IoMemory, and for "no iommu" systems we can have a no-op implementation of Iommu)?

Doesn’t sound bad, but would be a more “radical change”: Intermediate users like virtio-queue currently use memory types via M: GuestMemory. Converting them to M: IoMemory is straightforward and allows their users to continue to use non-virtual memory types like GuestMemoryMmap with no change needed if they don’t need IOMMU functionality.

If we change patterns like foo<M: Deref<Target: GuestMemory>>(mem: M) to foo<M: Deref<Target = IommuMemory>>(mem: M), all users will need to be changed to use IommuMemory.

@roypat
Copy link
Member

roypat commented Aug 12, 2025

could consumers just switch to IommuMemory<M: GuestMemory, I: Iommu> (e.g. be parametric by M and I instead of T: IoMemory, and for "no iommu" systems we can have a no-op implementation of Iommu)?

Doesn’t sound bad, but would be a more “radical change”: Intermediate users like virtio-queue currently use memory types via M: GuestMemory. Converting them to M: IoMemory is straightforward and allows their users to continue to use non-virtual memory types like GuestMemoryMmap with no change needed if they don’t need IOMMU functionality.

If we change patterns like foo<M: Deref<Target: GuestMemory>>(mem: M) to foo<M: Deref<Target = IommuMemory>>(mem: M), all users will need to be changed to use IommuMemory.

Mh, potentially this wouldn't be that much churn, because we could have impl<M: GuestMemory> From<M> for IommuMemory<M, NoIommu> { ... }, and then users of, say, virtio-queue would only have to do a &(...).into() when passing memory. And having less traits in vm-memory would be a win for maintainability imo.

Somewhat related question, since I'm seeing the Deref bound there, if M becomes GuestAddressSpace, and the iommu is stored as I: Deref<Target=...> as outlined in #327 (comment), will there still be a need to put the IommuMemory behind a Deref itself?

@bonzini
Copy link
Member

bonzini commented Aug 13, 2025

And having less traits in vm-memory would be a win for maintainability imo.

In theory yes, in practice it's appearing that there are some design decisions that are made by IommuMemory that are good for vhost but not for VMMs.

In general, vm-memory uses traits because VMMs may have their own memory backends. Considering that QEMU is able to use vm-memory designed 5 years ago almost-unmodified, even without IOMMU support (and hopefully, unmodified once there is IoMemory), this worked out well and I'd rather keep it.

@roypat
Copy link
Member

roypat commented Aug 13, 2025

And having less traits in vm-memory would be a win for maintainability imo.

In theory yes, in practice it's appearing that there are some design decisions that are made by IommuMemory that are good for vhost but not for VMMs.

In general, vm-memory uses traits because VMMs may have their own memory backends. Considering that QEMU is able to use vm-memory designed 5 years ago almost-unmodified, even without IOMMU support (and hopefully, unmodified once there is IoMemory), this worked out well and I'd rather keep it.

My concern mainly stems from feedback like this, where the existing traits were already a big struggle for people trying to use vm-memory. That said, I'm not fundamentally opposed to adding a IoMemory trait, I'm mostly not a fan of the blanket impls. Let me bother you with some more clarifying questions:

  • Currently, we have impl<M: GuestMemory> IoMemory for M, so from that I'll assume that implementing both IoMemory and GuestMemory for the same type is not a use case (because with the blanket impl, it's not possible). Does that sound right?
  • If so, then the blanket impl is for compat reasons (e.g. so we can pass a M: GuestMemory where a M: IoMemory is requested)?

In this case, IMO it'd be less complex to have the compat layer be done through a new type instead of a blanket impl (e.g. add struct Compat<M: GuestMemory>(M) with impl<M: GuestMemory> IoMemory for Compat<M> (crates upgrading M: GuestMemory bounds to M: IoMemory bounds could even provide this shim themselves by keeping the old functions that take M: GuestMemory around, and internally dispatching to the new versions taking IoMemory). It'd also allow us to solve the Bytes dilemma of "implementing for IoMemory or GuestMemory": We could instead just make Bytes a super trait of both, and then instead of blanket impls, we can just provide macros that for specific types generate a Bytes impl in terms of GuestMemory/IoMemory (yeah, conceptually weird to implement a trait in terms of a subtrait, but works). This would mean people implementing their own backend will have to add one additional line of code, and code that is generic over GuestMemory/IoMemory can continue using the Bytes impl as today due to the supertrait bound (whereas just moving it over to IoMemory would actually force some code changes in Firecracker, because we're using GuestMemoryMmap, which does not/cannot implement IoMemory, so would lose its Bytes impl).

Paolo, do you have any pointers towards code for how QEMU would like to use this, so it's a bit easier for me to wrap my head around everything? git grep-ing for "vm_memory" in QEMU doesn't show anything (but probably am missing something) :(

@XanClic
Copy link
Author

XanClic commented Aug 13, 2025

Trait confusion

My concern mainly stems from feedback like rust-vmm/vm-virtio#278 (comment), where the existing traits were already a big struggle for people trying to use vm-memory.

In my very humble opinion, especially for the bitmaps, that’s not least because of the trait names. Intuitively, it is absolutely unclear what differentiates Bitmap, BitmapSlice, WithBitmapSlice, and NewBitmap. Or what GuestAddressSpace is supposed to be.

In contrast, GuestMemory as a trait always made intuitive sense to me because it is, well, a representation of the guest memory space. There can be different implementations, so it’s a trait, I get that. IoMemory by extension represents the same thing, but only the device-accessible (i.e. I/O-accessible) portion. (We could call it DeviceMemory if that makes it clearer.)

Blanket impl of IoMemory for GuestMemory

IoMemory is a stricter interface for users that don’t care how the (device) memory is organized, and it requires specifying the required permissions for memory accesses. It is a generalization of GuestMemory, and always represents a subset of the latter (which is reflected by there always being an underlying GuestMemory) with an opaque internal mapping, possibly strongly fragmented.

It follows from this that any GuestMemory is an IoMemory (using a complete identity mapping), and should be usable via the IoMemory interface, hence the blanket implementation.

Having it for compat reasons is of course very nice, too, because I want this MR to incur as little change in users as possible, but it is in fact just the right thing to do.

Implementing both GuestMemory and IoMemory

I don’t see a reason to explicitly implement both GuestMemory and IoMemory for any type. This would imply that you have a GuestMemory but you don’t want to use the complete identity mapping for IoMemory accesses, i.e. accessing the memory via IoMemory should yield different results than accessing it via GuestMemory.

I would find such behavior quite confusing, and I’d argue not be in line with what these traits are supposed to be (i.e. IoMemory as a generalization of GuestMemory). Instead, such a mapping is supposed to be represented by a wrapper type (like IommuMemory) instead. So I think it’s very good to prohibit explicit implementations of both GuestMemory and IoMemory.

Bytes dilemma

As far as I’m aware, there won’t be Bytes problem anymore once I change the blanket impl of IoMemory for GuestMemory to GuestMemory + ?Sized.

just moving it over to IoMemory would actually force some code changes in Firecracker, because we're using GuestMemoryMmap, which does not/cannot implement IoMemory, so would lose its Bytes impl.

So does virtiofsd, but that actually does not require any changes (in the current set of patches), unless you want to use an IOMMU. Thanks to the blanket impl, GuestMemoryMmap implements both GuestMemory and IoMemory.

@roypat
Copy link
Member

roypat commented Aug 13, 2025

I see, thanks for taking the time to explain all of this! I was not aware that M: GuestMemory would transitively inherit the Bytes impl from the blanket impl for IoMemory. I think I follow now; at least in my mind it all makes sense now :). Could I bother you to copy some of your excellent explanations into code comments? E.g. about how the Bytes impl is transitive, and why we have the blanket impl of IoMemory for GuestMemory? :o

One thing I just realized is that we can use this entire infrastructure to clean up the Xen stuff, by re-framing it as IoMemory that disallows access to the underlying GuestMemory implementation, which would be most joyous (in this light, maybe we should think of a name other than IoMemory. In my mind right now, I'm thinking of today's GuestMemory as GuestMemoryBackend and IoMemory as GuestMemory, but I suppose a rename of GuestMemory won't fly. But maybe something along those lines for IoMemory though?)

Comment on lines +155 to +156
<M as GuestMemory>::address_in_range(self, addr)
&& <M as GuestMemory>::address_in_range(self, GuestAddress(end))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mh, what if there's a hole in the physical address range (MMIO gap for example)? I think this might need to be some try_access magic (just .get_slice() won't cut it in case of the consecutive memory being split into different memslots I think?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I think most callers didn’t care so far (which is why GuestMemory only has address_in_range(), callers just used to test the end address), but it would definitely be better to do it right.

@XanClic
Copy link
Author

XanClic commented Aug 13, 2025

Could I bother you to copy some of your excellent explanations into code comments? E.g. about how the Bytes impl is transitive, and why we have the blanket impl of IoMemory for GuestMemory?

Absolutely!

@bonzini
Copy link
Member

bonzini commented Aug 14, 2025

in this light, maybe we should think of a name other than IoMemory. In my mind right now, I'm thinking of today's GuestMemory as GuestMemoryBackend and IoMemory as GuestMemory, but I suppose a rename of GuestMemory won't fly.

Yeah, that would be nice... and maybe we can do it?!? Almost all users use GuestMemoryMmap instead of GuestMemory, and because most access goes through Bytes anyway, the only thing to change would be an occasional use statement to add GuestMemoryBackend (to access methods like to_region_addr etc.).

For now I'd say the two main contendents are IoMemory and GuestMemoryView. But a big inversion is actually quite appealing as the final change towards 1.0 - in that case I'd keep IoMemory and focus later on the feasibility of using GuestMemory for @XanClic's new trait.

@bonzini
Copy link
Member

bonzini commented Aug 14, 2025

crates upgrading M: GuestMemory bounds to M: IoMemory bounds could even provide this shim themselves by keeping the old functions that take M: GuestMemory around, and internally dispatching to the new versions taking IoMemory

The "problem" with this is the lack of function overloading in Rust. It would mean changing the whole API depending on whether you want the IOMMU indirection or not.

Paolo, do you have any pointers towards code for how QEMU would like to use this, so it's a bit easier for me to wrap my head around everything? git grep-ing for "vm_memory" in QEMU doesn't show anything (but probably am missing something) :(

For now we have only an RFC implementation here. But you can see that it needs to apply some changes to vm-memory in order to add a direction (read/write) argument to try_access.

Because QEMU has its backend in the C code, the natural thing for QEMU is to forgo GuestMemory completely and just go directly from IoMemory to GuestMemoryRegions.

@roypat
Copy link
Member

roypat commented Aug 14, 2025

Yeah, that would be nice... and maybe we can do it?!? Almost all users use GuestMemoryMmap instead of GuestMemory, and because most access goes through Bytes anyway, the only thing to change would be an occasional use statement to add GuestMemoryBackend (to access methods like to_region_addr etc.).

mh, you're right, grepping Firecracker and cloud-hypervisor for GuestMemory doesn't show much more than imports and a handful of functions generic over GuestMemory. And the latter shouldn't even need to be changed, at worst they'll need a new import for GuestMemoryBackend (or whatever we want to call it). nvm, I once again messed up in my mind which way the default impl goes

For now I'd say the two main contendents are IoMemory and GuestMemoryView. But a big inversion is actually quite appealing as the final change towards 1.0 - in that case I'd keep IoMemory and focus later on the feasibility of using GuestMemory for @XanClic's new trait.

Potentially, we could have less churn for downstream if we do the rename straight away. But only if downstream would upgrade to the new interfaces straight away, instead of dealing with the rename by first updating trait bounds to be GuestMemoryBackend, and only later upgrading. Either way works for me though.

Maybe we should start having a GitHub milestone for 1.0 tasks.

The "problem" with this is the lack of function overloading in Rust. It would mean changing the whole API depending on whether you want the IOMMU indirection or not.

grr, true. Forgot about that as well.

For now we have only an RFC implementation here. But you can see that it needs to apply some changes to vm-memory in order to add a direction (read/write) argument to try_access.

Because QEMU has its backend in the C code, the natural thing for QEMU is to forgo GuestMemory completely and just go directly from IoMemory to GuestMemoryRegions.

Thanks, this is really helpful as a reference!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants