I am working on the mmtk-ruby project, and I think this patch is definitely on the right track. This is similar to JEP 304: Garbage Collector Interface. OpenJDK introduced a GC interface which made some other GC algorithms (including Epsilong, ZGC and Shenandoah) much easier to implement in OpenJDK.
In my opinion, whether the GC is extracted into a dedicated .so
is secondary. The most important thing is the interface between the VM and the GC. A good VM-GC interface shall clearly define the responsibilities of the VM and the GC, helping the VM developers avoid making too much assumption about one particular GC implementation and hindering the adoption of other more advanced GC algorithms.
Depending on the programming language, the interface may include functions (including call-back functions), abstract classes, interfaces (as in Java), traits, and so on. The GC and the VM each implements some functions, classes, etc. for the other party to use. This patch defines the interface informally as the set of exported functions, and it doesn't even have a dedicated header file for gc_impl.c
, or a "struct of function pointers" (which is roughly the counterpart of a C++ abstract class or a Java interface) for gc_impl.c
to implement. I believe this is just the first step of attempting to isolating the GC from the VM, and is not the final form of this patch.
The most important contribution of this patch is clarifying the responsibilities of the VM and the GC. For example,
newobj_of
calls rb_gc_impl_new_obj
to allocate an object, and the GC takes care of details such as size classes (size pools) and mutator-local (ractor-local in Ruby) caches, if the particular GC implementation has the notion of size classes at all. (FYI, bump-pointer allocators allocate objects of different sizes into the same contiguous region.)rb_gc_mark_roots
to identify roots. When the GC visits an object, it calls rb_gc_mark_children
to identify the children of an object, and calls rb_gc_update_object_references
to forward the reference fields if it is a copying GC. These functions were previously private functions in gc.c
. Now they are exported, and ready to be called by different GC implementations.gc_impl.c
really is¶
The gc_impl.c
in this patch is said to "contain the implementation of Ruby's GC". As a developer of MMTk, I find that the current gc_impl.c
not only contains the core of the GC (responsible for object allocation, marking, sweeping, compaction, ...), but also contains parts that are related to GC, but are also closely related to the VM. That includes
rb_gc_impl_new_obj
objspace->heap_pages.deferred_final
), the finalizer_table
, and the mechanism to call a function (rb_gc_obj_free
) when an object is dead.
This is not a bad thing. Putting those VM-related things in gc_impl.c
will allow those things to be implemented in a way specific to a particular GC implementation. For example, if a GC implementation does not use lazy sweeping, then we can eagerly clear the entries of the finalizer_table
that involve dead objects during a GC, instead of clearing them when the object is lazily swept (in obj_free
). As another example, if a GC implementation never moves any object, or if it is able to pin an object if its ID is observed, it can simply use the object's address as the ID instead of maintaining any hash tables.
In MMTk, we call such a component a "VM binding". It is a bi-directional component that bridges the GC implementation (such as MMTk) and the VM (such as CRuby, OpenJDK, JikesRVM, etc.). We strongly recommend CRuby to use the term "binding", too. While MMTk calls it "VM binding", CRuby can call it "GC binding".
Further splittinggc_impl.c
into a binding and an actual implementation.¶
One possible improvement over this patch is further splitting the current gc_impl.c
into two parts:
The first part includes the code that allocates raw bytes from heap pages, manages bitmaps (such as the marking bits), does the marking, transitive closure, sweeping, compacting, etc. The second part implements functions that are called by the VM, and wraps functions provided by the Ruby VM for the first part to call.
For example, gc_mark_stacked_objects
implements the tracing loop that computes the transitive closure from roots. It is an implementation detail of the GC, and the VM does not need to know it. It belongs to the first part.
As another example, rb_gc_impl_new_obj
is called by the VM (newobj_of
), and it calls the low-level allocator (newobj_alloc
) to allocate raw objects. It also initializes Ruby fields. Therefore, rb_gc_impl_new_obj
is part of the binding.
As another example, the function gc_mark_children
is called by gc_mark_stacked_objects
from the GC. It wraps the rb_gc_mark_children
provided by the VM, but also calls the GC-specific gc_mark_set_parent
. It sits between the GC and the VM, therefore belongs to the binding.
Ideally, if the actual GC implementation of CRuby is fully isolated, it can be used for other VMs like Lua, and we just need to make another binding for the Lua VM.
Conversely, when supporting other GC implementations in CRuby, such as MMTk and Lua's minimalist GC, we only need to develop the second part (the binding). For MMTk, the first part (the actual GC implementation) is in the mmtk-core written in Rust, and the second part (the binding) will be implemented by mmtk-ruby. If we are curious enough to try to port Lua's GC into CRuby, we may just copy the source code into CRuby's source tree because it's so small (<2000 LOC) and it's written in C, too.
For performance reasons, a binding usually reimplements part of the GC in a VM-specific way (such as letting the JIT compiler emit the bump-pointer allocation fast path which theoretically belongs to the GC), and reimplements part of the VM in a GC-specific way (such as rewriting the object-scanning functions in Rust so that they can be inlined into MMTk's tracing loop, the single hottest loop during a GC). I am expecting CRuby's GC binding to do the same for performance optimization, especially when using YJIT, in which case the overhead of GC will become more significant and the allocation fast paths will become more important once the mutator gets faster. Evidently, the GC interface should also involve the YJIT compiler so that allocation fast paths and write barrier fast paths can be emitted in GC-specific machine instruction sequences.
Problems¶ Assumption of mark-sweep-compact¶This patch also reveals problems in the GC interface. The most obvious problem is the strong assumption of mark-sweep-compact GC algorithm. There are rb_gc_mark_children
and rb_gc_update_object_references
. The former is called during marking, and the second is called during compaction. This approach is not suitable for evacuating GC algorithms, such as SemiSpace, Immix, etc. When an evacuating GC visits a field, the object it points to is immediately moved (if this is the first time the object is visited), and the field is updated immediately to point to the new location. As a workaround, we may call both rb_gc_mark_children
and rb_gc_update_object_references
when visiting each object. And we may rewrite the object-scanning function in the implementation langauge of the GC (Rust for MMTk) for performance reasons. But it is impractical for the GC binding to rewrite it for every type, especially complex types implemented in C (such as classes and iseqs), and it is impossible to rewrite the "marking" and "compacting" functions written by third-party C extensions.
To solve the problem, we should stop abusing rb_gc_mark_children
for implementing rb_objspace_reachable_objects_from
, but instead make rb_objspace_reachable_objects_from
a generic field-visiting function with a field-visitor callback that may potentially update fields, and use that to implement rb_gc_mark_children
. Such a generic field-visiting (and updating) function is what the GC really needs to scan an object. Matthew Valentine-House has an experimental branch for this: https://github.com/eightbitraptor/ruby/tree/mvh-test-push-fpointer-to-ractor
Another problem worth mentioning is the limit of object size the GC can allocate. The current GC implementation of CRuby cannot allocate objects larger than 640 bytes (the largest size pool), and that's why the VM still calls the function rb_gc_size_allocatable_p
implemented in gc_impl.c
. Most other GC implementations, including the tiny GC implementation in Lua, don't have such limitation. Such a limitation is also the source of the embedded-vs-heap design for many built-in types, including T_OBJECT
, T_ARRAY
, T_STRING
, etc. When switching to another GC implementation, CRuby cannot make the best use of the capability to allocate objects larger than 640 bytes unless the VM make changes accordingly. For example, do not ask the GC if an object of a given size can be allocated (because the GC can allocate any size). Instead, the VM should determine the optimal size of a String, Array or Object, either using predefined thresholds, or using profile data at run time.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.3