If Linus's talk about git made you feel like a moron, rest assured you're not alone. Distributed version control is one of the most poorly explained topics in software, today. There are plenty of people saying that you should use it, but nobody has done a great job of explaining why. Here's my take.
WTF is Linus talking about?
First, let's talk about exactly what distributed version control is. The most common approach to describing a decentralized system is to present the reader with an image like this one, taken from the mercurial page called UnderstandingMercurial:

If that image makes any sense to you (and you're new to distributed version control), congratulations. The rest of us need a clearer description.
The best place to start is with the differences between centralized, and decentralized version control. With a centralized system like Subversion or CVS, there is a single copy of the repository; it typically resides on a server somewhere. When a developer works with the code, they receive something called a "working copy" from the central repository. The working copy contains enough information to interact with the central server, but does not contain the revision history for the project, nor the branches or tags (though, the branches and tags can often be explicitly requested).
With a decentralized system, the opposite is true. Instead of checking out a working copy, the developer works with the entire repository, including the entire revision history of the project and all the branches and tags. The copy that the developer receives is identical to the repository they fetched it from. Commits, branches, and tags occur locally, on the copy of the repository that the developer made. Changes can then be pushed and pulled to or from a public repository somewhere, another developer's repository, or anywhere else that a repository exists.
So?
Usually, when you want to get a build or source distribution of open source code, you head the project's main website. The project's repository is closely guarded, with commit rights only awarded to a select few. Anybody wishing to make changes to the source must first check out a working copy from version control, submit a patch, and hope that it is accepted.
When every developer has a full copy of the repository on their machine, the hierarchy of open source projects is all but eliminated. Any developer who wishes to work on the source code can clone the repository, commit as much code as they want to it, receive the changes from any other developer's cloned repository, and publish their work for other developers to use, or pull back into their own repositories. In a decentralized system, it doesn't matter who has the "...keys to the source repository..." (it actually says that on the rails core team page — take a look for yourself). If the original author continues to maintain the best version of the code, great; if not, users of that code can begin to pull from whoever does have the best version.
Really!
Entirely theoretical software articles are lousy — so, I always try to provide examples out of real software; this article is no exception.
Many (maybe most) rails plugins are inactive. They were created to scratch an itch, published, kept up to date for a few months, and then left with no maintainer. Since rails plugins largely reside in subversion repositories, nobody can continue development without losing the entire revision history of the project, and going to the trouble of setting up a public svn server.
Markaby is no exception to that rule. When I tried to use it in a recent project, I found that it was incompatible with rails 2.0.2. According to markaby's subversion logs, the last change was November 24th, 2007, a few weeks before that version of rails was released. Luckily, I was able to find a ticket in rails trac with instructions on how to hack a fix in to the plugin — a solution that worked great for me, but wouldn't work for a user who didn't know their way around plugins. Without commit access, normally I wouldn't be able to offer my fix publicly.
Not so with a distributed system! I was able to use git-svn to pull Markaby into a git repository. I've published my changes, and now anybody can grab a working version of Markaby by typing the following in to a command prompt:
$ git clone git@github.com:giraffesoft/markaby.git
Even cooler than that, somebody with commit access can grab the changes, and push them back into subversion, including commit messages, and everything! Anybody who clones the git repository can still pull changes from _why's subversion repository if it ever becomes active again. If not, development can continue anywhere, and be done by anybody. That's the beauty of decentralization.


I was wondering if you could help me figure out how to get a GitHub invite. I can't find a beta waiting list form.
Thanks Jim, you made me discovered GitHub, and I'm now looking for an invite too :p
I released Gitorious just before the github guys did (aint it typical that someone else worked on the same thing as you), its a free and opensourced way of hosting git repositories.
The real nice thing about it is that it allows anyone to instantly clone a repository and start improving on it while showing everyone else what you doing (and thus can help out or pull from it), since it was specifically built for the last scenario you mention (dead/stale projects).
Thanks for this, James - I'm a recent git convert, so I think the more posts showing its benefits, the better.
nice article James! As a long time svn user, it is very hard sometimes to get my head around the distributed nature of git/mercurial.
And may I add, git is fast as hell and doesn't use a lot of disk space. I've red that a 15 years old projects moved from CVS to Git and the git repo was only 100 MB. Waaho! Plus cloning a repo only takes a sec (not w/ WebDAV though!) it is a lot faster then a simple svn up or 1line svn ci. I don't know how it's done, but that's impressive!
Thanks for high level explanation of the concepts, James. It's true that the explanations of the "what" rather than the "how" are still pretty rare, on the subject of DVCS. On the other hand I noticed a while ago an article on BetterExplained.com, that also tries to explain it, with lots of graphics the illustrate the concepts. It might be of interest to you and other readers as well. http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/
Actually, I thought Linus was a moron. I don't care how smart you are or whether you got lucky distributing some C code for free. You don't have the right to call anyone else a moron.
Bitter Bob.
I've written a post on DVCS a while back and it serves as a collection of resources for people trying to learn DVCS: Learning Distributed Revision Control Systems
Thanks for the article. I've been wondering what git was all about and why people were using it. Now I know.
Good article. I still find DVCS to be just another tool in your toolbelt, as opposed to the next great source control system. I don't really see it offering much advantage over centralized source control for certain products. For example, if you work at a corporate place producing a piece of software (e.g. Word, Photoshop, Quicken, etc.), you really want centralized source control. You definitely want branching and things like that, but you want a mainline of code, and don't want multiple primary sources of the source.
That said, having used Subversion, Perforce, and various others, I do understand the pain of branching in Subversion, it's terrible compared to say Perforce. I also understand the notion of being able to work disconnected/offline - git is a win in that situation, although it's a fairly rare one for me, and probably a smaller percentage of use overall.
Maybe I need to read further perspectives and use cases and such, but I don't see the distributed aspect being something that is a reason to switch, or that makes DVCS the next evolution in source control. It is useful, as said, in some cases, but it seems counter to the needs, or the source of potential problems, in other cases.
Everything before your "So?" heading was spot on, but after that it misses the point of DVCS imho, sorry ;-)
Your ability to fork code, make changes, commit them to where-ever you like, and publish them at a different location is not in any way inhibited by the choice of VCS the original code happens to be in. That ability is only limited by the type of license of the code.
You can also just svn co Markaby and publish your changes in say CVS. Anyone with CVS can then grab a working version of Markaby. End result for end-users is the same. Don't need a distributed system for that.
The beauty of git (or any DVCS I suppose) in my mind is that I have one project that can point to several remote repositories, of which I have complete copies locally (including complete history), so within one tree I can switch back and forth to various branches with very little effort in an extremely fast way, I can commit my stuff locally (which I like very very much! especially when being on the road), and then push it to whatever remote repository I have write access to.