proposal(vm): heap subspaces #798

DonIsaac · 2025-07-23T18:14:51Z

What This PR Does

do not merge this PR yet. It still has rough edges that need to be smoothed out.
please do leave any/all feedback

Proposes the concept of heap regions via trait Subspace rather than via Vec<Option<T>>. It also implements IsoSubspace, a region storing well-sized, bindable homogenous data of a single type. To show what this would look like, I refactors heap.array to use IsoSubspace.

This design is heavily inspired by JavaScriptCore's Subspaces

Key Differences

The idea of heap regions already exist in Nova. This proposal seeks to solidify them by moving to an opaque type where its backing store cannot be directly accessed. It tries to do so within the existing heap architecture. I tried to limit the effect switching to a subspace will have on the rest of the engine.

Goals/Motivation

Custom heap-allocated data types

Runtimes want to store custom native data types on the heap. The heap currently requires regions to be direct properties of Heap. Since heap polymorphism is currently implemented by lots of copied code, this cannot be addressed yet. Trait-based implementations will unblock this.

The proposed solution uses WithSubspace<T>, which informs the heap on which subspace to store Ts on. This can then be used to replace CreateHeapData.

impl Heap {
      /// Allocate a value within the heap.
    pub(crate) fn alloc<'a, T: SubspaceResident>(&mut self, value: T::Bound<'a>) -> T::Key<'a>
    where
        T::Key<'a>: WithSubspace<T>,
    {
        T::Key::subspace_for_mut(self).alloc(value)
    }
}

NOTE: this is currently a rough edge. Rust cannot correctly infer T when calling heap.alloc(data). For now I'm implementing CreateHeapData as sugar.

More Efficient Backing Stores

We would like to experiment with different forms of backing stores that are not Vecs. A major downside of Vecs is that worst-case insertions require allocating an entirely new buffer and copying its data over.

No progress can be made here until the API surface of heap regions is controlled. I've implemented the minimal set of APIs to make Subspace and IsoSubspace work for heap.array, but no more. These should be expanded as necessary, but we must maintain control over what's in them (e.g. do not implement Deref<Target = Vec<T>> for IsoSubspace<T>)

DonIsaac · 2025-07-23T18:31:52Z

nova_vm/src/ecmascript/execution/realm/intrinsics.rs

@@ -259,7 +258,7 @@ impl Intrinsics {
            .heap
            .builtin_functions
            .extend((0..intrinsic_function_count()).map(|_| None));
-        agent.heap.arrays.push(None);
+        let array_prototype = agent.heap.arrays.reserve_intrinsic();


i don't like this, it feels ad-hoc.

DonIsaac · 2025-07-23T18:32:50Z

nova_vm/src/heap/subspace.rs

+{
+    /// # Do not use this
+    /// This is only for Value discriminant creation.
+    const _DEF: Self;


I don't think this should live here, since its only used on newtypes that have Value variants. It is, however, extremely convenient.

DonIsaac · 2025-07-23T18:33:32Z

nova_vm/src/heap/subspace/name.rs

+///
+/// Do not expose this outside of the `subspace` module.
+#[derive(Clone, Copy)]
+pub(super) struct Name(


possibly over-complicated, maybe not worth 8 bytes?

aapoalas · 2025-07-29T09:19:25Z

Thank you for the proposal / PR / trait.

Something akin to this is absolutely the correct direction to go to; holding direct Vecs won't be the be-all-end-all situation. That being said, I don't exactly see how this matches up with the "Custom heap-allocated data types" aim: the internal storage of "built-in" types should be entirely orthogonal to embedder-defined custom data types, and one can expect that (assuming embedder-types are not prepared at compile-time) storing custom data types will be less efficient than built-in data types, as custom types will need metadata and virtual methods while built-in data types can be known statically and need no such things. From that point of view, I'm not really comfortable adding two pointers to every single heap vector / subspace. The name is entirely unnecessary on those, and the alloc count is presumably there only because the alloc function cannot write to the heap's alloc_counter field directly. (Side note: the alloc counter should eventually become one or more atomic counters, possibly one per heap vector to enable "heartbeat" counting. But! They should likely be allocated far from the vector memory itself, so as to not cause cache contention.)

I'm also not terribly worried about the cost of reallocating heap vectors on growth: yes, growing a vector will be slow (and a Struct-of-Arrays vector growth is even slower) but it shouldn't be a very common operation. If need's must, we can change heap vectors to be chunked vectors but that will obviously cost us heavily on the lookup performance front.

Rather than that, my first instinct is actually to use virtual memory if that becomes necessary (and is possible). So eg. if the ObjectHeapData ends up being a SoA vector of two fields (shapes and property storage indexes) then the SoA would point to the start of the virtual memory containing up two 1 GiB of objects; the first half of that would be virtual memory for the shapes, the latter half would be for the indexes, but only the first pages of both halves would actually be backed by real memory. As more and more objects are allocated, instead of reallocating the memory somewhere else, unallocated virtual memory pages would get paged out to be replaced with real memory. Thus, slowly the 1 GiB gets turned from virtual allocations into real allocations. If the program keeps allocating more objects past the 1 GiB line, then we'll need to actually reallocate, but that seems like a rather rare case.

In general, I am not opposed to putting the heap vectors behind some kind of wrappers but it's maybe not going to be quite this simple of an API. One of my goals for the next two months is to design and implement a SoA vector type that could be used for the heap vectors: this would mean that different heap data would be laid out differently; depending on how many fields they get split up into. Many heap vectors will also have sparse arrays (hash map or btree) on the side; these cannot be put into the same Vec or SoAVec but conceptually should be in the same "subspace". As such, built-in data types will likely have a bit of a varied API forever and a day: they're not all the same after all.

Final note: I've been toying with an idea of splitting the Agent's or Heap's memory itself into a SoAVec like structure: first have all the heap vector pointers in a static array, then have all the vector lengths in a static array, then have all the vector capacities in a static array. The benefit here would be that eg. capacity wouldn't be needed unless you're growing the vector, and it'd stay out of the way. In a far-flung future where we're really sure we're not (usually) doing anything wrong with our indexes, the length of the vector would also become unnecessary as we're sure that our given index into the vector is within allocated memory; no need for a bounds check. This'd mean that the entire set of heap vector pointers could be held in just a few cache lines.

If something like that is eventually done, then the idea of a Subspace struct becomes entirely unworkable for the built-in data types. (It might also be possible and even beneficial to do something similar with custom embedder data types and their Subspaces, but there we'd need to be way more careful and the API would need to be uniform across all embedder data types. There, a trait of some sort would likely make sense.)

DonIsaac added 13 commits July 22, 2025 14:01

wip

34d5330

mark + sweep

6dee5c0

move to new file

f4d0fe7

by god it compiles

72710df

clippy fix

0b810e1

fn alloc

9d58ee9

docs + cleanup

3748684

smol names

b423f17

go back to using CreateHeapData

c7bb685

move name to separate file + add more docs

b444489

more docs

7d54a85

docs + cleanup

b38fc51

undo debug=false

a92f89b

DonIsaac commented Jul 23, 2025

View reviewed changes

cleanup

de3b3b7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

proposal(vm): heap subspaces #798

proposal(vm): heap subspaces #798

Uh oh!

DonIsaac commented Jul 23, 2025 •

edited

Loading

Uh oh!

DonIsaac Jul 23, 2025

Uh oh!

DonIsaac Jul 23, 2025

Uh oh!

DonIsaac Jul 23, 2025

Uh oh!

aapoalas commented Jul 29, 2025

Uh oh!

Uh oh!

proposal(vm): heap subspaces #798

Are you sure you want to change the base?

proposal(vm): heap subspaces #798

Uh oh!

Conversation

DonIsaac commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What This PR Does

Key Differences

Goals/Motivation

Custom heap-allocated data types

More Efficient Backing Stores

Uh oh!

DonIsaac Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

DonIsaac Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

DonIsaac Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

aapoalas commented Jul 29, 2025

Uh oh!

Uh oh!

DonIsaac commented Jul 23, 2025 •

edited

Loading