Building clusters in the cloud with conductor


Sep 28, 2009

I've been using chef in production for a while now. It saves me untold time and energy. Without it, I couldn't possibly manage our 15+ server cluster, in addition to writing and maintaining our code base. As is often the case, however, removing one bottleneck exposed several more.

Currently, we use a set of custom rake tasks and capistrano recipes to launch ec2 instances and deploy our chef recipes. For various reasons, this includes committing a sqlite database to a git repository. The system works, but it isn't pretty. Launching and configuring an instance, for example, requires several commands. And, without a running application, things like auto-scaling are basically out of the question.

I began to see a real need for an always-running application to manage my infrastructure. It would make everything easier, from launching instances to backing up EBS volumes. Today, I'm releasing an early alpha of that application.

conductor

Conductor is the beginnings of an infrastructure management application. The version I'm releasing today has a rather limited feature set. But, the ground work is there. So, most of the fancy stuff should be relatively easy to implement.

Right now, conductor is able to provision instances from ec2 and configure them with chef. You provide a url to a git repository, which contains your cookbooks and some metadata. Conductor will launch your instances and configure them with the cookbooks from that repository. I'm also releasing a basic rails stack that I'll be maintaining. I'm hoping others will contribute stacks for other plaforms.

Currently, things are tightly coupled to the structure of standard web infrastructure. Only two roles are supported: app, and mysql_master. The next major item on my TODO list is to make the launching system far more flexible, so that conductor can be used to manage any kind of infrastructure. I should get to this soon, since our infrastructure still isn't supported.

Roadmap

As much as I hate vaporware, the possibilities for conductor are exciting. Your indulgence is appreciated :-).

  • On-demand staging. Clone the entire production environment, including databases in one click. Terminate it when you're done. Pay for what you used.
  • Monitoring and auto-scaling. Pipe metrics, like CPU usage in to conductor. If average CPU usage on the app servers gets above a certain threshold, launch another one. If average CPU drops below the threshold, kill one. Only run the servers you actually need.
  • Coordinated snapshotting of EBS volumes. One of the problems with running EBS volumes in a RAID configuration is backups. If you are snapshotting more than one drive at a time, you need some way to keep track of which sets of snapshots go together for recovery. Even better, a central app can be smart enough to perform the recovery itself.

Follow Along

As I said, today's release is early alpha. But, I'm going to be putting a lot of work in to this, because it'll make my job a hell of a lot easier, and save us money. So, follow along with the development by following the github project or subscribing to my blog.

If you want to try conductor out, there's a ton of information in the README. I'll be hanging out in #conductorapp on freenode if you're having trouble getting things running.