-
Notifications
You must be signed in to change notification settings - Fork 61
proposal(vm): heap subspaces #798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -259,7 +258,7 @@ impl Intrinsics { | |||
.heap | |||
.builtin_functions | |||
.extend((0..intrinsic_function_count()).map(|_| None)); | |||
agent.heap.arrays.push(None); | |||
let array_prototype = agent.heap.arrays.reserve_intrinsic(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't like this, it feels ad-hoc.
{ | ||
/// # Do not use this | ||
/// This is only for Value discriminant creation. | ||
const _DEF: Self; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should live here, since its only used on newtypes that have Value
variants. It is, however, extremely convenient.
/// | ||
/// Do not expose this outside of the `subspace` module. | ||
#[derive(Clone, Copy)] | ||
pub(super) struct Name( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly over-complicated, maybe not worth 8 bytes?
Thank you for the proposal / PR / trait. Something akin to this is absolutely the correct direction to go to; holding direct Vecs won't be the be-all-end-all situation. That being said, I don't exactly see how this matches up with the "Custom heap-allocated data types" aim: the internal storage of "built-in" types should be entirely orthogonal to embedder-defined custom data types, and one can expect that (assuming embedder-types are not prepared at compile-time) storing custom data types will be less efficient than built-in data types, as custom types will need metadata and virtual methods while built-in data types can be known statically and need no such things. From that point of view, I'm not really comfortable adding two pointers to every single heap vector / subspace. The name is entirely unnecessary on those, and the alloc count is presumably there only because the alloc function cannot write to the heap's I'm also not terribly worried about the cost of reallocating heap vectors on growth: yes, growing a vector will be slow (and a Struct-of-Arrays vector growth is even slower) but it shouldn't be a very common operation. If need's must, we can change heap vectors to be chunked vectors but that will obviously cost us heavily on the lookup performance front. Rather than that, my first instinct is actually to use virtual memory if that becomes necessary (and is possible). So eg. if the ObjectHeapData ends up being a SoA vector of two fields (shapes and property storage indexes) then the SoA would point to the start of the virtual memory containing up two 1 GiB of objects; the first half of that would be virtual memory for the shapes, the latter half would be for the indexes, but only the first pages of both halves would actually be backed by real memory. As more and more objects are allocated, instead of reallocating the memory somewhere else, unallocated virtual memory pages would get paged out to be replaced with real memory. Thus, slowly the 1 GiB gets turned from virtual allocations into real allocations. If the program keeps allocating more objects past the 1 GiB line, then we'll need to actually reallocate, but that seems like a rather rare case. In general, I am not opposed to putting the heap vectors behind some kind of wrappers but it's maybe not going to be quite this simple of an API. One of my goals for the next two months is to design and implement a SoA vector type that could be used for the heap vectors: this would mean that different heap data would be laid out differently; depending on how many fields they get split up into. Many heap vectors will also have sparse arrays (hash map or btree) on the side; these cannot be put into the same Vec or SoAVec but conceptually should be in the same "subspace". As such, built-in data types will likely have a bit of a varied API forever and a day: they're not all the same after all. Final note: I've been toying with an idea of splitting the Agent's or Heap's memory itself into a SoAVec like structure: first have all the heap vector pointers in a static array, then have all the vector lengths in a static array, then have all the vector capacities in a static array. The benefit here would be that eg. capacity wouldn't be needed unless you're growing the vector, and it'd stay out of the way. In a far-flung future where we're really sure we're not (usually) doing anything wrong with our indexes, the length of the vector would also become unnecessary as we're sure that our given index into the vector is within allocated memory; no need for a bounds check. This'd mean that the entire set of heap vector pointers could be held in just a few cache lines. If something like that is eventually done, then the idea of a Subspace struct becomes entirely unworkable for the built-in data types. (It might also be possible and even beneficial to do something similar with custom embedder data types and their Subspaces, but there we'd need to be way more careful and the API would need to be uniform across all embedder data types. There, a trait of some sort would likely make sense.) |
What This PR Does
Proposes the concept of heap regions via
trait Subspace
rather than viaVec<Option<T>>
. It also implementsIsoSubspace
, a region storing well-sized, bindable homogenous data of a single type. To show what this would look like, I refactorsheap.array
to useIsoSubspace
.This design is heavily inspired by JavaScriptCore's Subspaces
Key Differences
The idea of heap regions already exist in Nova. This proposal seeks to solidify them by moving to an opaque type where its backing store cannot be directly accessed. It tries to do so within the existing heap architecture. I tried to limit the effect switching to a subspace will have on the rest of the engine.
Goals/Motivation
Custom heap-allocated data types
Runtimes want to store custom native data types on the heap. The heap currently requires regions to be direct properties of
Heap
. Since heap polymorphism is currently implemented by lots of copied code, this cannot be addressed yet. Trait-based implementations will unblock this.The proposed solution uses
WithSubspace<T>
, which informs the heap on which subspace to storeT
s on. This can then be used to replaceCreateHeapData
.More Efficient Backing Stores
We would like to experiment with different forms of backing stores that are not
Vec
s. A major downside ofVec
s is that worst-case insertions require allocating an entirely new buffer and copying its data over.No progress can be made here until the API surface of heap regions is controlled. I've implemented the minimal set of APIs to make
Subspace
andIsoSubspace
work forheap.array
, but no more. These should be expanded as necessary, but we must maintain control over what's in them (e.g. do not implementDeref<Target = Vec<T>> for IsoSubspace<T>
)