Skip to content
This repository was archived by the owner on Dec 14, 2023. It is now read-only.
This repository was archived by the owner on Dec 14, 2023. It is now read-only.

Garbage Collector #106

@antongulenko

Description

@antongulenko

Hello llgo contributors,

I am joining the team and I would like to look into adding garbage collection to the llgo runtime.
This issue is intended for discussions related to that.
I am no expert in this field, so please correct me whenever necessary!!

I will try to define some requirements for an llgo garbage collector.

Go needs very close integration with native programs. This renders several garbage collection techniques unusable.

  • pointers cannot be tagged to be distinguished from integers.
  • memory objects should be "pure data", so no headers containing GC metadata like reference counts or tag bits.
  • allocated data should probably not be moved in case native libraries are still holding references.

Are there any plans to make llgo compatible with other Go compilers?
I.e. linking a library compiled with llgo into another runtime, or vice versa.
I think this could be quite interesting.
In any case, the garbage collector should be able to coexist with other garbage collectors in the same address space.
That's not such a hard requirement anyways, as long as they all use malloc.

The garbage collector should be portable, at least in the first version(s).
I do not know how to implement root scanning in a machine independent manner. Fortunately there's enough research and implementations available.

The garbage collector has to deal with concurrent and parallel programs.
I think this is a pretty advanced requirement, and should be at least implemented very simply at first.

This means, the garbage collector would be:

  • not reference-counting, not copying, not generational
  • either conservative (not precise) or relying on runtime information about the data layout
    • From the language perspective, Go allows a precise collector.
      Especially since information about the data types has to be available for the runtime package.
    • there are also hybrid models that treat certain data (like []byte) specially

... That's what I can come up with right now.

I would proceed with the following in order to get some more insights. Let me know what you think.

  • Take the Böhm conservative collector (http://www.hpl.hp.com/personal/Hans_Boehm/gc/) and just attach it to llgo.
    This collector is stable, portable and has been used by many compiler projects.
    The main selling point is that it can just be attached to any runtime by linking malloc-calls to GC_malloc and turning free-calls into no-ops.
    It seems highly customizable. Even though it's conservative, it is possible to optionally insert some type-information.
    There are considerations about parallel programs, including locks and how/where to use them.
    Also, there are some pretty sophisticated techniques to avoid confusing arbitrary data for pointers.
    I think a collector like this could even turn out to be completely sufficient for llgo, at least for quite a while.
    I read about a benchmark in a paper today that stated the average heap overhead of a conservative collector is in the 10-12% range.
  • Write some kind of memory benchmark program. Maybe somebody can point me to one?
    It just should keep allocating random data objects and doing random stuff to them.
    I would then try to output some stats about the memory state, like total working set of the process,
    memory allocated by the collector, memory allocated by the program, leaked memory, etc.
    Then plot that over time, like here:
  • See how the Böhm collector performs.
  • Investigate LLVM shadow-stack functionality.
    That should allow us to implement our own collector in a very portable way.
    It's supposed to be pretty slow, though.
    It could still be used to experiment with different approaches.
  • I think once these things are in place we can experiment with different approaches, plug other gc runtimes into llgo, and so on.

I have not investigated the runtimes of GC and gccgo yet.
Here are some facts about the GC collector:
http://stackoverflow.com/questions/7823725/what-kind-of-garbage-collection-does-go-use
The C-code for their runtime package looks pretty huge and sophisticated, so I think it would be hard or impossible to extract and reuse that part... Might be worth checking out, though.

Cheers,
Anton

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions