Skip to content

Update ruby and mmtk-core repo rev #130

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

wks
Copy link
Collaborator

@wks wks commented Jun 30, 2025

This is a reguler merging commit that synchronizes the MMTK's CRuby fork
with the upstream.

The upstream introduced the imemo:fields type, and is now using it for
the generic_fields_tbl_, i.e. it now holds instance variables for
objects other than T_OBJECT, T_CLASS and T_MODULE. When using
MMTk, we treat the key-value pair in the generic_fields_tbl_ as a
strong edge, i.e. treating the imemo:fields of an object as if it were a
child. We now update the generic_fields_tbl_ like other weak tables.
This simplified the handling of generic fields table in the MMTk
binding.

With imemo:fields added, we now have 17 imemo types including MMTk's
imemo:mmtk_strbuf and imemo:mmtk_objbuf. We increased the header bits
of the imemo type to 5 bits and changed the value of some imemo-specific
header offsets, such as ISEQ_TRANSLATED.

The upstream changed the API for acquiring/releasing the GVL. We make
changes accordingly.

YJIT assumes there is only one thread doing GC. It panics when two GC
worker threads try to mark two iseq objects simultaneously. We made
several changes to support parallel GC for YJIT-compiled iseq objects:

  1. We replaced Block::gc_obj_offsets with Block::gc_obj_addresses
    which are now absolute pointers instead of offsets. By doing so, we
    no longer need to borrow the Rc<RefCell<VirtualMem>> during GC.
  2. rb_yjit_iseq_update_references no longer uses CodeBlock to write
    code, and no longer calls mark_all_executable after updating each
    object. Instead we make the whole code memory writable before
    updating any objects, and make it executable after the updating
    phase finishes. This should both make it friendly to parallel GC
    and improve performance.

wks added 3 commits June 25, 2025 14:51
The upstream has changed its way to handle the generic fields table.
The generic_fields_tbl_ now maps each object to an imemo:fields, and the
imemo:fields object will hold all the fields.  We now treat the
imemo:fields object as a child of its corresponding key in
generic_fields_tbl_, which is just like CRuby's default GC.  Like the
default GC, we handle generic_fields_tbl_ like other weak tables during
weak processing time.

On the Rust side, we replaced the moved_gen_fields_tables hash map with
a simple "backwarding table" that simply maps each moved object using
generic_fields_tbl_ to its old address.
This is a reguler merging commit that synchronizes the MMTK's CRuby fork
with the upstream.

The upstream introduced the imemo:fields type, and is now using it for
the `generic_fields_tbl_`, i.e. it now holds instance variables for
objects other than `T_OBJECT`, `T_CLASS` and `T_MODULE`.  When using
MMTk, we treat the key-value pair in the `generic_fields_tbl_` as a
strong edge, i.e. treating the imemo:fields of an object as if it were a
child.  We now update the `generic_fields_tbl_` like other weak tables.
This simplified the handling of generic fields table in the MMTk
binding.

With imemo:fields added, we now have 17 imemo types including MMTk's
imemo:mmtk_strbuf and imemo:mmtk_objbuf.  We increased the header bits
of the imemo type to 5 bits and changed the value of some imemo-specific
header offsets, such as `ISEQ_TRANSLATED`.

The upstream changed the API for acquiring/releasing the GVL.  We make
changes accordingly.

YJIT assumes there is only one thread doing GC.  It panics when two GC
worker threads try to mark two iseq objects simultaneously.  We made
several changes to support parallel GC for YJIT-compiled iseq objects:

1.  We replaced `Block::gc_obj_offsets` with `Block::gc_obj_addresses`
    which are now absolute pointers instead of offsets.  By doing so, we
    no longer need to borrow the `Rc<RefCell<VirtualMem>>` during GC.
2.  `rb_yjit_iseq_update_references` no longer uses `CodeBlock` to write
    code, and no longer calls `mark_all_executable` after updating each
    object.  Instead we make the whole code memory writable before
    updating any objects, and make it executable after the updating
    phase finishes.  This should both make it friendly to parallel GC
    and improve performance.

The upstream now acquires the GVL when adding a freed fiber to the fiber
pool.  Since MMTk worker threads cannot acquire the GVL, and MMTk only
calls `obj_free` in one GC worker thread, we skip the GVL when using
MMTk.
@wks wks force-pushed the update/merge-2025-06-25 branch from 4dd4675 to 32e8dd0 Compare June 30, 2025 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant