This is a long one, and y'all are busy I'm sure so here's the tl;dr:
- If you run ruby in production, you need to keep track of GC stats.
- Ruby 1.9.3's
GC::Profilerdoes a bunch of really weird shit.
- It keeps a 104 byte sample of every GC run since it was enabled forever.
GC::Profiler.total_timeloops over every sample in memory to calculate the total.
- The space used to keep those samples in memory is never freed. However, it does get reused when you call
- Therefore: if you are using
GC::Profilerin production, and you're not calling
GC::Profiler.clearregularly, you're leaking a substantial amount of memory (>1GB / machine for us), slowing down garbage collection somewhat, and the cost of retreiving the stats (
GC::Profiler.total_time) will continue to increase unbounded until the process is restarted.
- I am working on an alternative, low-overhead GC Profiler that is designed to be run in production. It's called
GC::BasicProfiler. You can find the patch here and follow development here.
- Also, you may want to check out this fork for some backports from ruby 2.0 — including the COW-friendly garbage collector. Good stuff.
The Long Version
Ruby's GC is a steaming pile of shit — but that's not news to anybody. If you're running ruby in production, tracking GC behaviour is essential so that you can minimize its effects on perceived performance (hence oob_gc, etc). Fortunately, Ruby 1.9.3 ships with GC::Profiler, which provides detailed instrumentation on GC runs.
Over the last month or so, I've been working on some rails performance tooling. Last night, I noticed that requests with my instrumentation enabled were taking around an order of magnitude longer than those without it. Weird. So, I installed perftools.rb and rack-perftools_profiler and got a really surprising result (irrelevant lines ommitted):
GC::Profiler.total_time is so slow that more than 50% of request time was spent in there? Is that actually possible? My instrumentation calls
GC::Profiler.total_time frequently under the assumption that it's inexpensive, but obviously that was a faulty assumption unless perftools.rb is wrong. Let's take a look at the implementation. (the code in context is here)
There's a loop in
total_time? What the fuck is going on here?
Turns out that if you have
GC::Profiler enabled, the VM records a
gc_profile_record every time the garbage collector runs.
Then, when you call
total_time, it loops over all of the
gc_profile_records that have been created in order to sum the total. According to my profile,
total_time was responsible for more than 50% of request time. How many
gc_profile_records could there actually be?
Oh. Well I guess that explains that. So — stupid question, but when do these things get freed? Apparently only in
rb_objspace_free which only ever gets called when the VM terminates, so the answer is never. Cool.
Upon further investigation, it's pretty clear that this whole system was designed with the expectation that you'd call
GC::Profiler.clear regularly. The profiler keeps its samples in an array at
objspace->profile.record that it increases in size by 1000 every time it runs out of space.
If you don't call
GC::Profiler.clear, that array keeps increasing in size forever. This is not documented. Obviously.
On our production systems, unicorn workers that have been running for a few hours had an
objspace->profile.size of around 350000. On x86_64,
sizeof(struct gc_profile_record) == 104, so around 35MB of overhead per process multiplied by 25 processes per machine for a total of nearly 1GB per machine — after only 3 hours. That will grow forever until the processes are restarted.
That's the bad news.
The good news: GC::BasicProfiler
GC::Profiler was designed to provide detailed information about every GC run — probably for the VM implementers to use when tuning the GC (haha yeah right). But seriously, somebody probably wants that, but it isn't me. For those of us who simply want to keep track of GC stats on our production applications, we need a less expensive implementation.
Enabling and disabling
BasicProfiler works exactly the same as
Profiler but you don't need to call
clear to avoid leaking memory. In fact, there's no
clear method at all.
If you're interested in following the development of this patch, it'll be here.
One more lol for the road
It's the little things. Don't worry, though — fixed in