About

James Golick

James Golick is an engineer, entrepreneur, speaker, and above all else, a grinder.

As CTO (or something?) of BitLove, he scaled FetLife.com's traffic by more than an order of magnitude (and counting).

James spends most of his time writing ruby and scala, building infrastructure, and extinguishing fires.

He speaks regularly at conferences and blogs periodically, but James values shipping code over just about anything else.

Latest Tweets

follow me on Twitter

James on the Web

Crazy, Heretical, and Awesome: The Way I Write Rails Apps

Mar 21 2010

Note: This is going to sound crazy at first, but bear with me.

The current best-practice for writing rails code dictates that your business logic belongs in your model objects. Before that, it wasn't uncommon to see business logic scattered all over controller actions and even view code. Pushing business logic in to models makes apps easier to understand and test.

I used this technique rather successfully for quite some time. With plugins like resource_controller, building an app became simply a matter of implementing the views and the persistence layer. With so little in the controller, the focus was mainly on unit tests, which are easier to write and maintain than their functional counterparts. But it wasn't all roses.

As applications grew, test suites would get slow — like minutes slow. When you're depending on your persistence objects to do all of the work, your unit tests absolutely must hit the database, and hitting the database is slow. It's a given in the rails world: big app == slow tests.

But slow tests are bad. Developers are less likely to run them. And when they do, it takes forever, which often turns in to checking twitter, reading reddit, or a coffee break, harming productivity.

Also, coupling all of your business logic to your persistence objects can have weird side-effects. In our application, when something is created, an after_create callback generates an entry in the logs, which are used to produce the activity feed. What if I want to create an object without logging — say, in the console? I can't. Saving and logging are married forever and for all eternity.

When we deploy new features to production, we roll them out selectively. To achieve this, both versions of the code have to co-exist in the application. At some level, there's a conditional that sends the user down one code path or the other. Since both versions of the code typically use the same tables in the database, the persistence objects have to be flexible enough to work in either situation.

If calling #save triggers version 1 of the business logic, then you're basically out of luck. The idea of creating a database record is inseparable from all the actions that come before and after it.

Here Comes the Crazy Part

The solution is actually pretty simple. A simplified explanation of the problem is that we violated the Single Responsibility Principle. So, we're going to use standard object oriented techniques to separate the concerns of our model logic.

Let's look at the first example I mentioned: logging the creation of a user. Here's the tightly coupled version:

class User < ActiveRecord::Base
  after_create :log_creation

  protected
    def log_creation
      Log.new_user(self)
    end
end

To decouple the logging from the creation of the database record, we're going to use something called a service object. A service object is typically used to coordinate two or more objects; usually, the service object doesn't have any logic of its own (simplified definition). We're also going to use Dependency Injection so that we can mock everything out and make our tests awesomely fast (seconds not minutes). The implementation is simple:

class UserCreationService
  def initialize(user_klass = User, log_klass = Log)
    @user_klass = user_klass
    @log_klass  = log_klass
  end

  def create(params)
    @user_klass.create(params).tap do |u|
      @log_klass.new_user(u)
    end
  end
end

The specs:

describe UserCreationService do
  before do
    @user       = stub("User")
    @user_klass = stub("Class:User", :create   => @user)
    @log_klass  = stub("Class:Log",  :new_user => nil)
    @service    = UserCreationService.new(@user_klass, @log_klass)
    @params     = {:name => "Matz", :hobby => "Being Nice"}
    @service.create(@params)
  end

  it "creates the user with the supplied parameters" do
    @user_klass.should have_received(:create).with(@params)
  end

  it "logs the creation of the user" do
    @log_klass.should have_received(:new_user).with(@user)
  end
end

Aside from being able to create a user record in the console without triggering a log item, there are a few other advantages to this approach. The specs will run at lightning speed because no work is actually being done. We know that Fast specs make happier and more productive programmers.

Also, debugging the actions that occur after save becomes much simpler with this approach. Have you ever been in a situation where a model wouldn't save because a callback was mistakenly returning nil? Debugging (necessarily) opaque callback mechanisms is hard.

But then I'll have all these extra classes in my app!

Yeah, it's true. You might write a few more "class X; end"s with this approach. You might even write a few percent more lines of actual code. But you'll wind up with more maintainability for it (not to mention faster tests, code that's easier to understand, etc).

The truth is that in a simple application, obese persistence objects might never hurt. It's when things get a little more complicated than CRUD operations that these things start to pile up and become pain points. That's why so many rails plugins seem to get you 80% of the way there, like immediately, but then wind up taking forever to get that extra 20%.

Ever wondered why it seems impossible to write a really good state machine plugin — or why file uploads always seem to hurt eventually, even with something like paperclip? It's because these things don't belong coupled to persistence. The kinds of functionality that are typically jammed in to active record callbacks simply do not belong there.

Something like a file upload handler belongs in its own object (at least one!). An object that is properly encapsulated and thus isolated from the other things happening around it. A file upload handler shouldn't have to worry about how the name of the file gets stored to the database, let alone where it is in the persistence lifecycle and what that means. Are we in a transaction? Is it before or after save? Can we safely raise an error?

In the tightly coupled version of the example above, the interaction between the User object and the Log object are implicit. They're unstated side-effects of their respective implementations. In the UserCreationService version, they are completely explicit, stated nicely for any reader of our code to see. If we wanted to log conditionally (say, if the User object is valid), a plain old if statement would communicate our intent far better than simply returning false in a callback.

These kinds of interactions are hard enough to get right as it is. Properly separating concerns and responsibilities is a tried, tested, and true method for simplifying software development and maintenance. I'm not just pulling this stuff out of my ass.


On Mocks and Mockist Testing

Mar 10 2010

Every so often, somebody blogs about getting bit by what they usually call "over-mocking". That is, they mocked some object, its interface changed, but the tests that were using mocks didn't fail because they were using mocks. The conclusion is: "mocks are bad".

Martin Fowler outlines two kinds of unit testers: stateist and mockist. To simplify things for a minute, a stateist tester asserts that a method returns a particular value. A mockist tester asserts that a method triggers a specific set of interactions with the object's dependencies. The "mocks are bad" crowd is arguing for a wholly stateist approach to unit testing.

On the surface, stateist testing seems certainly more convenient. A mockist is burdened with maintaining both the implementation of an object and its various test doubles. So why mocks? It seems like a lot of extra work for nothing.

Why Mocks?

A better place to start might be: what are the goals of unit testing?

For a stateist tester, unit tests serve primarily as a safety net. They catch regressions, and thus facilitate confident refactoring. If the tests are written in advance of the implementation (whether Test Driven or simply test-first), a stateist tester will derive some design benefit from their tests by virtue of designing an object's interface from the perspective of its user.

A mockist draws a thick line between unit tests and functional or integration tests. For a mockist, a unit test must only test a single unit. Test doubles replace any and all dependencies, ensuring that only an error in the object under test will cause a failure. A few design patterns facilitate this style of testing.

Dependency Injection is at the top of the list. In order to properly isolate the object under test, its dependencies must be replaced with doubles. In order to replace an object's dependencies with doubles, they must be supplied to its constructor (injected) rather than referred to explicitly in the class definition.

class VideoUploader
  def initialize(persister = Persister.new)
    @persister = persister
  end

  def create(parameters)
    @persister.save(parameters[:temp_file_name])
  end
end

When we're unit testing the above VideoUploader (ruby code, by the way), it's easy to see how we'd replace the concrete Persister implementation with a fake persister for test purposes. Rather than test that the file was actually saved to the file system (the stateist test), the mockist tester would simply assert that the persister mock was invoked correctly.

This design has the benefit of easily supporting alternate persister implementations. Instead of persisting to the filesystem, we may wish to persist videos to Amazon's S3. With this design, it's as simple as implementing an S3Persister that conforms to the persister's interface, and injecting an instance of it.

This is possible because the VideoUploader is decoupled from the Persister. If the Persister class was referred to explicitly in the VideoUploader, it would be far more difficult to replace it with a different implementation. For more on decoupled code, you must read Nick Kallen's excellent article that goes in to far more detail on these patterns and their benefits.

To be sure, we're really talking more about Dependency Injection here than anything else, and stateist testers can and do make use of DI. But the mockist test paradigm prods us towards this sort of design.

We're forced to look at the system we're building in terms of objects' interactions and boundaries. This is because it tends to be quite painful (impossible in many languages) and verbose to unit test tightly coupled code in a mockist style.

So the primary goal of a mockist's unit tests is to guide design of their object model. Making it difficult to couple objects tightly is one such guiding force.

Mockist tests also tend to highlight objects that violate the Single Responsibility Principle since their tests become a jungle of test double setup code. We can think of mockist testing like a kind of shock therapy that pushes you towards a certain kind of design. You can ignore it, but it'll hurt.

Failure isolation is probably the other big advantage of mockist tests. If your unit tests are correctly isolated, you can be sure exactly which object is responsible for a test failure. With stateist tests, a given unit test could fail if the unit or any of its dependencies are broken.

But is it worth it?

Mockist or Stateist?

The burden of maintaining mocks is by far the most common argument against mockist tests. You have to write both the implementation and at least one test double. When one changes, the other has to change too.

Perhaps most troubling, if an object's interface changes, its dependencies' unit tests will continue to pass because the mock objects will function as always — arguably a hinderance to refactoring. Since you need to test for that scenario, mockists also write integration tests. Integration tests are probably a good idea anyway, but as a mockist, you don't really have a choice.

Also, the refactoring problem only applies to dynamic languages. In a statically typed language, the program will simply fail to compile.

I find this burden troubling. More code to write makes the “we don't have time” argument come out in pressure situations. For a design exercise, the cost of mockist tests seems quite high.

On my last open source project (friendly), I decided to give mockist testing a try. Most of the code turned out beautifully. And the mistakes I did make could have been avoided had I listened to the pain I felt while testing them.

Since that project worked out well, I've been applying mockist techniques to other work. I've written mockist tests in everything from my scala projects to my rails apps. So far, so good.

In theory, I hate the idea of mockist tests. They just seem like too much work. I don't want to like them and remain reluctant to admit that I do. But in practice, I'm writing better code, and it's hard to hate that.


Monkey-Patching, Single Responsibility Principle, and Scala Implicits

Feb 08 2010

When it's impossible to extend core classes, there's no choice but to write a whole bunch of classes with names like StringUtil to house your utility methods. Every namespace winds up having at least one StringUtil class, and it gets really ugly.

In ruby, it's possible to add methods to absolutely any class including String, Integer and other core classes. Rather than calling StringUtil.pluralize("monkey"), you call "monkey".pluralize. The technique is known as monkey-patching. Compared to utility classes, it's a hell of a lot more convenient, and it reads better.

But, monkey-patching isn't without flaw. When you add a method to String, you add it to everybody's String. You're polluting a global namespace.

A lot of rubyists will tell you that in practice, monkey-patching related namespace pollution isn't a problem, and they're mostly right. But, sometimes, they're wrong. The json gem doesn't work properly with activesupport, for example. That's a real problem that gives me real headaches on a daily basis.

In his CUSEC talk, and follow-up blog post, Reg Braithwaite raises the issue that monkey-patching Integer to add duration methods (like 1.hour or 1.day) is a violation of single responsibility principle. He goes on to justify monkey-patching somewhat:

A Ruby program with extensive metaprogramming is a meta-program that writes a target program. The target program may have classes that groan under the weight of monkey patches and cross-cutting concerns. But the meta-program might be divided up into small, clean entities that each have a single responsibility...In effect, the target program's classes and methods have many responsibilities, but they are assembled by the meta-program from smaller modules that each have a single responsibility.

I think this is a mostly adequate justification for violating SRP. But, needing such an in depth justification at all feels wrong. There must be a better way.


It turns out there is, in scala at least, in the form of something called implicits. Implicits are a way to tell the scala compiler how to convert from one type to another in the event that a method is needed from the second type. Implicits can be definted at (or imported in to) arbitrary scope, and their effects are entirely localized. In fact, they don't do anything at all unless they're needed.

So, let's say we wanted to add a k-combinator type method (like ruby's Object#tap) to all scala objects. First, we need a class that implements the tap method in a way that can be applied to any type.

class Tapper[A](tapMe: A) {
  def tap(f: (A) => Unit): A = {
    f(tapMe)
    tapMe
  }
}

This class has one method that taps the object supplied to the class's constructor. The details of its implementation are interesting, but unimportant here. This REPL session should explain everything:

scala> val tapper = new Tapper("hello!")
tapper: Tapper[java.lang.String] = Tapper@5e53bbfa

scala> tapper.tap { s => println(s) }
hello!
res0: java.lang.String = hello!

Now we need to get objects responding to #tap. We do that by defining an implicit.

object Tap {
  implicit def any2Tapper[A](toTap: A): Tapper[A] = new Tapper(toTap)
}

We wrap the implicit's definition in an object, which is scala for singleton. Those methods can then be imported in to arbitrary scope using the import statement. This REPL session should make everything clear:

scala> "hello!".tap { s => println(s) }
<console>:5: error: value tap is not a member of java.lang.String
       "hello!".tap { s => println(s) }
                ^

scala> import Tap._
import Tap._

scala> "hello!".tap { s => println(s) }
hello!
res2: java.lang.String = hello!

This Tapper certainly doesn't violate SRP. The addition of the #tap method to all Objects is localized and doesn't affect other running code. Libraries your code depends on can implement their own tap method without collision.

In the end, Reg's argument about meta-programming might still apply here. But, if it does, the kind of meta-programming introduced by implicits is limited in scope and prevents the kind of global namespace pollution that can bite you in the ass in ruby. And the resulting code doesn't have to violate SRP.

In practice, I've found scala's implicits a lot more pleasant to work with than ruby's monkey patching. They're more explicit which provides additional clarity without sacrificing much in the way of terseness. Like anything else, scala isn't a perfect language, but implicit type conversions are a really elegant solution to an ugly problem.


Friendly 0.5.0: Offline indexing and more

Jan 30 2010

There've been a few quiet releases of Friendly since I blogged about 0.4. Mostly, they were bug fixes, except for the addition of change tracking which is mostly an internal feature that will support arbitrary caches. You can see all the notable changes in the changelog.

This week, I released 0.5 which includes built-in support for building a new index in the background without interrupting your app. Here's how it works:

First, declare the new index in your model:

class User
  # ...snip...

  indexes :name, :created_at # this wasn't there before
end

Then, make sure to run Friendly.create_tables! to create the index table in the database. Don't worry, this won't overwrite any of your existing tables.

Friendly.create_tables!

Now that the the new table has been created, you need to copy the .rake file included with Friendly (lib/tasks/friendly.rake) to somewhere that will get picked up by your main Rakefile (lib/tasks if it's a rails project). Then, run:

KLASS=User FIELDS=name,created_at rake friendly:build_index

If it is a rails project, you'll need to prefix friendly:build_index with a rake task that loads your rails environment. For our app, the full command looks like this:

KLASS=User FIELDS=name,created_at rake environment friendly:build_index

If you're running this in production, you'll probably want to fire up GNU screen so that it'll keep running even if you lose your SSH connection. When the task completes, the index is populated and ready to go.

We've already built a couple of indexes with this code in production and it worked great!

Get It!

As always, install Friendly as a gem:

sudo gem install friendly

If you're not already following the github project, it's a great way to keep up with Friendly's development. Finally, if you feel so inclined, I'd appreciate a recommendation on Working with Rails.


Trend All the Fucking Time (TRAFT?)

Jan 10 2010

My new years resolution was to measure more. For a while now, I've wanted to get a better picture of our systems and our business, and hopefully, how they relate.

So, my first day back at work after the holidays, I started looking for the right tool to gather data with. After investigating some of the options, I wound up settling on munin.

I say settling because I was quite dissatisfied with the available options. I tried everything from collectd to reconnoiter and found all of the solutions horribly lacking in some way. This is an enormous market just waiting for a startup to revolutionize it.

In any event, we were already using munin to trend our system metrics. So, now it was just a matter of figuring out how to get our business metrics in there. Here's how we did it.

Custom Graphs

It's actually relatively easy to write a munin plugin. All you need is an executable that responds to a config command and emits a specially formatted value when it's called with no parameters.

Most of the examples I could find were implemented using multi-line strings, which seemed ugly to me. So, I wrote a little ruby DSL to make my plugins easier on the eyes.

Here's an example plugin written with munin_plugin. I won't go in to what all the parameters mean. The official documentation does a good enough job of that.

#!/usr/bin/env ruby

require 'rubygems' # or rip or whatever
require 'munin_plugin'

munin_plugin do
  graph_title  "Load average"
  graph_vlabel "load"
  load.label   "load"

  collect do
    load.value `cat /proc/loadavg`.split(" ")[1]
  end
end

Everything outside the collect block gets emitted as configuration. When the above script is called with config, it produces the following output:

graph_title Load average
graph_vlabel load
load.label load

When it's called without any parameters, it would produce something like the following:

load.value 0.03

As you can see, the DSL just emits whatever you give it, essentially verbatim. Nothing fancy, just a little syntactic sugar.

Let's trend some business metrics.

Trending Business Metrics

One of our most popular features is picture uploads. I wanted to get a sense of how quickly pictures were being uploaded at different times of day. Since munin polls nodes every 5 minutes, I wasn't sure exactly what kind of value it was going to need to get this going. Do I need to calculate the rate myself?

It turns out munin has an option called DERIVE, which turns your monotonically increasing value in to a per unit of time graph. So, I created a little REST API that returns the total number of pictures on the site. Then, all I had to do was scoop it up with a fairly simple munin plugin.

#!/usr/bin/env ruby

require 'rubygems'
require 'munin_plugin'
require 'open-uri'

munin_plugin do
  graph_title    "Picture Upload Rate"
  graph_vlabel   "Pictures / ${graph_period}"
  graph_category "FetLife"
  graph_period   "minute"
  pictures.type  "DERIVE"
  pictures.min   "0"
  pictures.label "pictures"

  collect do
    pictures.value open("http://an.internal.ip/stats?id=pictures").read
  end
end

Here's the result (actually for a different metric, but it uses roughly the same script):

We use a nearly identical plugin to chart the all the critical objects in our system. The graphs are starting to give us a nice look at exactly what happens during peak load, and as time goes on, hopefully they'll assist us in identifying problems, too.

The moral of the story is that seting up custom graphs is easy. You should do it.


Next →
← Prev