The Cost of Ruby 1.9.3's GC::Profiler


Nov 19, 2012

This is a long one, and y'all are busy I'm sure so here's the tl;dr:

  • If you run ruby in production, you need to keep track of GC stats.
  • Ruby 1.9.3's GC::Profiler does a bunch of really weird shit.
    • It keeps a 104 byte sample of every GC run since it was enabled forever.
    • Calling GC::Profiler.total_time loops over every sample in memory to calculate the total.
    • The space used to keep those samples in memory is never freed. However, it does get reused when you call GC::Profiler.clear.
    • Therefore: if you are using GC::Profiler in production, and you're not calling GC::Profiler.clear regularly, you're leaking a substantial amount of memory (>1GB / machine for us), slowing down garbage collection somewhat, and the cost of retreiving the stats (GC::Profiler.total_time) will continue to increase unbounded until the process is restarted.
  • I am working on an alternative, low-overhead GC Profiler that is designed to be run in production. It's called GC::BasicProfiler. You can find the patch here and follow development here.
  • Also, you may want to check out this fork for some backports from ruby 2.0 — including the COW-friendly garbage collector. Good stuff.

The Long Version

Ruby's GC is a steaming pile of shit — but that's not news to anybody. If you're running ruby in production, tracking GC behaviour is essential so that you can minimize its effects on perceived performance (hence oob_gc, etc). Fortunately, Ruby 1.9.3 ships with GC::Profiler, which provides detailed instrumentation on GC runs.

Over the last month or so, I've been working on some rails performance tooling. Last night, I noticed that requests with my instrumentation enabled were taking around an order of magnitude longer than those without it. Weird. So, I installed perftools.rb and rack-perftools_profiler and got a really surprising result (irrelevant lines ommitted):

Apparently calling GC::Profiler.total_time is so slow that more than 50% of request time was spent in there? Is that actually possible? My instrumentation calls GC::Profiler.total_time frequently under the assumption that it's inexpensive, but obviously that was a faulty assumption unless perftools.rb is wrong. Let's take a look at the implementation. (the code in context is here)

There's a loop in total_time? What the fuck is going on here?

Turns out that if you have GC::Profiler enabled, the VM records a gc_profile_record every time the garbage collector runs.

Then, when you call total_time, it loops over all of the gc_profile_records that have been created in order to sum the total. According to my profile, total_time was responsible for more than 50% of request time. How many gc_profile_records could there actually be?

Oh. Well I guess that explains that. So — stupid question, but when do these things get freed? Apparently only in rb_objspace_free which only ever gets called when the VM terminates, so the answer is never. Cool.

Upon further investigation, it's pretty clear that this whole system was designed with the expectation that you'd call GC::Profiler.clear regularly. The profiler keeps its samples in an array at objspace->profile.record that it increases in size by 1000 every time it runs out of space.

If you don't call GC::Profiler.clear, that array keeps increasing in size forever. This is not documented. Obviously.

On our production systems, unicorn workers that have been running for a few hours had an objspace->profile.size of around 350000. On x86_64, sizeof(struct gc_profile_record) == 104, so around 35MB of overhead per process multiplied by 25 processes per machine for a total of nearly 1GB per machine — after only 3 hours. That will grow forever until the processes are restarted.

That's the bad news.

The good news: GC::BasicProfiler

Ultimately, GC::Profiler was designed to provide detailed information about every GC run — probably for the VM implementers to use when tuning the GC (haha yeah right). But seriously, somebody probably wants that, but it isn't me. For those of us who simply want to keep track of GC stats on our production applications, we need a less expensive implementation.

GC::BasicProfiler is a first step towards something like that. It has a very simple, low-overhead implementation, and GC::BasicProfiler.total_time works the way you might expect.

Enabling and disabling BasicProfiler works exactly the same as Profiler but you don't need to call clear to avoid leaking memory. In fact, there's no clear method at all.

If you're interested in following the development of this patch, it'll be here.

One more lol for the road

It's the little things. Don't worry, though — fixed in BasicProfiler.