MRI's Method Caches
tl;dr
- Method resolution is expensive, so method caches are crucial to invocation performance.
- Your Ruby code probably calls methods kind of often, so invocation performance matters.
- MRI's method cache invalidation strategy is quite naive, leading to very low hit rates in most Ruby code.
- I wrote some patches that substantially improve the situation.
- This blog post is surprisingly uninflammatory.
The Long Version
One of MRI's big performance problems is that method cache expiry is global. That is, any time you make a change to any class anywhere, the entire VM's method caches get busted at the same time. This is why you'll frequently hear people saying that "calling Object#extend
is bad".
Actually, it's not just Object#extend
. Charlie Somerville put together what I believe to be an exhaustive list of things that clear MRI's method caches. Method cache busting is so pervasive that it's almost impossible to avoid using somebody's code that does it. Disaster.
Let's back up for a second, though. What is a method cache and why are they important?
Method Cache Basics
Take the following class hierarchy:
Internally, MRI stores methods in a hash table on the RClass struct
. When you call an inherited method on a descendent, MRI has to walk up the class hierarchy to find it, checking the method table at each step to see if there's anything there to call.
So, in our above example, if we wanted to call hello
on an instance of E
, MRI would have to execute method lookups on E
, D
, C
, B
, only to finally find the method in A
's method table.
It turns out that method resolution is actually quite expensive, which is why method caches exist. Rather than resolving a method each time we want to call it, we cache a reference to the method somewhere we can get to it cheaply, substantially reducing the cost of subsequent invocations.
But Ruby is dynamic, so those caches can't necessarily live forever. If we call String#gsub
, for example, and then undef
it without expiring the method cache, it'll still be reachable. Cache invalidation is hard, as we know, so MRI takes a somewhat brute force approach.
How MRI's Method Caches Work
Currently, MRI has two types of method caches. Ruby code is compiled down to MRI's instructions. Each instruction has some data associated with it. The send instruction has an iseq_inline_cache_entry
, which acts as an inline method cache.
You can read the logic here. Basically, it works like this: look at the inline cache. If it's valid, use it. If not, go actually look up the method and cache it. Pretty much exactly what you'd expect.
In the case of an inline instruction cache miss, there's actually a secondary, global method cache. Oddly, though, the global method cache is limited to 2048 entries, and its semantics for deciding what to keep and what to dump are essentially random.
It's not as unlikely as you might hope for two methods in a tight loop to be clobbering each others' entries in the global method cache table.
Both caches' entries have a field that stores the ruby_vm_global_state_version
from when they were filled. A cache entry is considered valid if it matches the klass
pointer and the current ruby_vm_global_state_version
. So, incrementing the global state version by 1
invalidates all of the inline instruction caches as well as the global method cache.
This has the effect of making invalidation very cheap, but far reaching. Whenever you make a change to any class, call extend, or do any of the other things detailed in Charlie's article, all of the method caches that have built up since your program started become invalid and you have to repay the cost of method resolution all over again.
The Numbers
After years of complaining about Ruby's method caching behaviour, I finally decided to instrument it a couple of weeks ago. I found that for our application, the method cache was being invalidated at least 20 times per request, and that around 10% of our request profile was spent performing method resolution. For our application, the cost of Ruby's global method cache invalidation was extremely high.
The average cost of each method resolution for our production application is around one microsecond, which doesn't sound like a lot but it adds up. We were seeing at least 8000 cache misses per request, totalling 8ms or more.
As part of my instrumentation patchset, I also created a mechanism that logs a stacktrace each time the method cache is invalidated. I found that the majority of invalidations in our app were from inside of ActiveRecord - some easier to fix than others. Many were also caused by random gems doing things like instantiating OpenStruct
s.
At this point, it started seeming somewhat impractical to go and patch rails and all these other gems that I use, so I decided to investigate the amount of effort that would be required to actually solve the problem in MRI.
Hierarchical Method Cache Invalidation
I'll be using class
to mean class
or module
here, since they have the same backing structure in the VM.
Ruby's inheritance tree is a directed acyclic graph, and the semantics of method resolution mean that a change to a given class only affects it and its descendents. So, in an ideal scenario, we would only need to invalidate the method caches for those branches of the inheritance tree.
I've written a patch for MRI that implements such an algorithm, and it's currently serving 100% of our production traffic. We've seen around a 9% reduction in average latency with this patch. Others who've tried it haven't seen such big jumps. Your mileage may vary.
The algorithm is actually quite simple (credit to Charlie Nutter / JRuby for the idea). We have a 64 bit global (one per VM instance) sequence that is monotonically increasing. Every time we alloc a new RClass
, we increment the sequence, and assign the class a unique value.
Method and inline cache entries are tagged with the class's sequence value when they're filled. When a class is modified, we traverse the class hierarchy downwards from the modification point, assigning each class a new sequence number.
A method cache entry is considered valid if its sequence number matches the current sequence of the class. So, if the class or one of its parents has been modified since the cache entry was created, its class will have a new sequence number, and it will have therefore been invalidated.
Performance
For our application, this cache invalidation strategy substantially reduces the number of method cache misses we see in production and has reduced request latency by ~8-9%, but there are tradeoffs involved. Since invalidation requires a graph traversal, it's a lot more expensive than the current strategy of merely incrementing an integer.
If your application makes frequent modifications to classes and modules which have a large number of descendents, the cost of invalidation may outweigh the increase in method cache hit rate. That said, I would imagine that such modifications are relatively uncommon and should be considered a bad practice either way.
It's also worth noting here that while this patch will likely improve the performance of apps that employ the strategy of extending arbitrary objects to implement DCI, that pattern is still a performance problem, because it creates tons of one-off metaclasses whose methods wind up being mostly uncacheable.
The Code
My patchset includes several things:
- Subclass tracking:
Class#subclasses
, andModule#included_in
. Rails implements this with an O(n) traversal ofObjectSpace
. With my patches, that's no longer necessary. - Hierarchical method cache invalidation: the subject of this whole article.
- Method cache instrumentation:
RubyVM::MethodCache
has several useful singleton methods you may want to track, includinghits
,misses
,miss_time
,invalidation_time
, etc.
You can find the code in my branch or install it with rvm install jamesgolick
.
I am planning to submit these patches back upstream, but I have to port them to Ruby 2.0 first, so I guess that's my next project. Huge thanks and credit to Aman Gupta, Charlie Somerville, Charles Nutter, and funny-falcon for all their code, help, and testing!