Ruby 2.0.0: GC improvements
Congratulations on the release of Ruby 2.0.0!!
In this post, I will introduce the improvements to GC in Ruby 2.0.0. Please bear in mind that this article only covers CRuby, the C implementation of Ruby.
Ruby 2.0 has a new GC feature called Bitmap Marking. In short, Bitmap marking works very well with Copy on Write (CoW) which is used in fork(2). The following articles cover this in more detail.
- Feature #5839: Proposal: Bitmap Marking GC(in Japanese)
- Improving memory usage using Ruby Bitmap Marking (in Japanese)
- Why You Should Be Excited About Garbage Collection in Ruby 2.0
If you use fork(2) without Bitmap Marking, memory usage may dramatically increase because GC may run several times. You may have cursed GC for this. Sorry about that.
Bitmap Marking is enabled in the Windows environment to avoid unnecessary complexity, even though there is no fork(2) in this environment and therefore no benefit to using Bitmap Marking.
Bitmap Marking was first introduced in Ruby Enterprise Edition (REE)
In Ruby 2.0.0 there is Bitmap Marking equivalent to that in REE or even better.
If you have only chosen REE in order to use Bitmap Marking, I would urge you to install Ruby 2.0.0 and start migration work. BTW, REE hasn't been updated since February 2012, and its official blog already announced the end of life
In previous versions of Ruby, the object graph is traversed and marked by using a recursive function call on the machine stack. This may lead to stack overflow when a very deep object graph is traversed. To avoid this, previous versions of Ruby GC stop using the machine stack when a stack overflow is about to happen.
However, this leads to two additional problems:
- Marking becomes extremely slow when there are deeply referenced objects.
- The quality of detecting a stack overflow is not accurate.
For the former case, the worst case scenario is very slow because not using the machine stack means you need to search for everything in the heap. In addition, GC will be slow as long as these deeply referenced objects exist.
For the latter, it is very difficult to accurately check stack overflow in time, and sometimes this causes SEGV. In the worst case scenario, Fiber fails in the unexpected timing.
To solve these issues, Ruby 2.0.0 has its own Array based stack and marks without using recursive calls. In this way, it won't waste machine stack, won't cause overflow, and therefore doesn't need fall back functionality used in the former case. It also doesn't need the check sequence used in the latter.
The benchmark result shows that performance is slightly improved. This is probably due to the decrease in the number of function calls.
Please refer to these if you are interested in more detail.
I have introduced the key improvements to GC in Ruby 2.0.0. I hope it was useful information.
Finally, I will introduce some other interesting topics with regards to GC.
Koichi (Sasada) has an idea about Generational Garbage Collection, so we need to keep an eye on it. Koichi also started working on Symbol GC because of recurring Symbol related vulnerability issues (For example Rack vulnerability）.
I am also thinking about extending TracePoint.trace (Introduced in Ruby 2.0.0) by adding extra arguments such as :obj_alloc and :obj_free. This will be very useful for debugging, so I will add them when I have time.