"If you are using Subversion, stop it. Just stop. Subversion = Leeches. Mercurial and Git = Antibiotics."
This is great. A few years ago, DVCS was weird and strange and SVN was the stuff. When people discover how awesome Git and Mercurial are, they stab SVN/CVS in the back.
RCS, CVS and SVN are in a real sense the same thing. You can view CVS as fixing a bug in RCS (it doesn't have concurrency), and SVN as fixing bugs in CVS (it doesn't have atomic commits). That is even how the SVN developers themselves marketed it.
Hg and git are different. It's not like SVN has a bug "it isn't distributed". It is fundamentally not distributed.
So the progression is:
1. Stop manually backing up versions, it sucks. Use RCS/CVS/SVN.
2. Stop using RCS/CVS/SVN, they suck. Use git/Hg.
I can easily imagine a new DVCS system that would fix bugs in git or Hg. The git UI sucks for example. But I can't easily imagine a fundamentally different successor to DVCS.
(Not that such a successor won't happen - of course it will. I just can't imagine what it will look like).
I agree and don't think it's limited to programming. Considering business, it's difficult for a large established organization to make drastic changes.
It's much easier for a new entrant (startups) to innovate because they don't have the same baggage.
I think this affects version control systems specifically because the nature of the beast is to create a format to track changes over a long period of time. Changing that format is a huge pain in the ass and rarely worth it.
In VCSes more than in regular software, you really have to get the design right very early in the process.
When an improvement comes from a new mental model of the data/problem, it's at least as much work to fix X as to write Y from scratch. Generally much more, as you must figure out how/whether to interface old features to the new model.
Similarly, in the case of changing mental models, most users and even developers of X won't be ready for Y when you write it. Trying to force the change onto those people will harm X and handicap acceptance of Y by confusing the Case For Y with the Case Against Killing X.
or he is just making a comparison of how before antibiotics one may have used leeches to try to rid themselves of sickness, but using leeches became obsolete(mostly) after the rise of antibiotics...
It really still depends on your use case. At my day job I advocated switching from SVN to HG and the change has been great - now we "branch" and merge constantly and it really helped our processes. On my iPhone game at home I still use SVN with beanstalk.com and it works great.
Old technology that has been surpassed should be stabbed in the back. That doesn't mean the old technology can't be appreciated, just that it's place in the world has changed.
Nobody is driving 55 Chevy's around anymore. Some people dedicated aspects of their lives to appreciating the time and space when that technology was dominant, but their glory days have passed.
It is great, but I actually think it's a little unfair. CVS was better than directories with date suffixes, and SVN was better than CVS. Git sounds better than SVN.
But to me, SVN isn't one of those technologies that I'll look back on and despise. Sometimes you look back on a technology and wish it had never existed (the original EJB seems to inspire this in some people), sometimes you look back on it and see deep flaws but still have respect for it as a first crack at a problem (some people look on the original Struts framework this way), and some you look back on fondly even if you no longer personally use it (a lot of ruby/python folks seem to look back on perl this way).
To torture the analogy, I'd say SVN is a moderately effective antibiotic that has been replaced with a more effective one that has fewer side effects. But a leech? Come on, it was a good technology.
He explains it more in the linked "HgInit" tutorial. An excerpt:
Mercurial actually has a whole lot more information: it knows what each of us changed and can reapply those changes, rather than just looking at the final product and trying to guess how to put it together.
For example, if I change a function a little bit, and then move it somewhere else, Subversion doesn’t really remember those steps, so when it comes time to merge, it might think that a new function just showed up out of the blue. Whereas Mercurial will remember those things separately: function changed, function moved, which means that if you also changed that function a little bit, it is much more likely that Mercurial will successfully merge our changes.
It doesn't. Joel is a little bit off base here, in that it makes no difference whether a VCS tracks snapshots or deltas. What does matter is whether it keeps track of merges. If you merge one branch into another, your VCS must record that information, so that future merges can be done correctly. SVN doesn't do this, Git and Mercurial do. (And note that Git stores snapshots, not changes.)
Joel's absolute right about one thing, though. Being able to merge correctly and reliably make a huge difference. It's what makes the distributed part of DVCS possible. Without a central repository, branches happen a lot more often, and merging has to work right. The contortions that teams go through to prevent branches are a thing of the past.
That's correct. However, there are still some significant issues with Subversion's merge tracking. There's an excellent blog post that summarizes these issues here:
I don't really get that part, at least not with the way I use svn. I branch, I tag and I use patch sets, which have some similarity to changes. I read this bit http://hginit.com/02.html from Joel's Mercurical intro, and it sounds a lot like how I use svn, except that the branches are on the server, so they are probably a little slower, with the positive tradeoff of being available to my team. The other bonus with dvcs is having local history without doing pushes, which I think is a bigger selling point (although I get something similar from my IDE).
I generally hew to the extreme programming view of branching and continuous integration. Push early and push often. Browsing github, I seem to find a preponderance of projects where the branches are never push-ed back to a master copy. No matter what tool you are using, branches can introduce semantic changes that are hard to merge.
To sum, DVCS not that big of a deal for the way many people already use CVCS, when you end up working with a centralized copy anyway.
I don't think it matters as much how we think about it. What matters is that the implementation of the VCS "thinks" of them as changes, which results in the easier merging.
Once upon a time I designed a versioning system for an object graph. It was for an application framework that was meant to be designed in an IDE, and didn't have source code, per se (it did have little snippets of code). But designing it around graph deltas, rather than versions, was always the right thing to do: that much was self-evident.
Even as I use svn today, I turn almost all my work into patches. For every bug I fix and every feature iteration, it gets turned into a patch. That way it's usually pretty easy if I have to apply it to an older branch for a hotfix or whatnot.
I think in this case a version is the changes along with the base they're applied to. Whereas a change set is a difference between 1 version and another. like a diff or delta.
I think the biggest conceptual difference between looking at "versions" versus "changes" is that versions imply a linear, static, and/or sequential history of revisions. By instead thinking of changes, you get a more open concept of a malleable history, in which you have more freedom to change histories or branches.
Eg, if you want to split the development branch (trunk?) into two competing experiments, and then discard the losing branch at the end, a linear history of version/revision numbers split between two branches seems limiting IMO.
Eg2, if you are working on a new feature or set of code that you haven't yet pushed to the server, having an open stack of changes gives you the freedom to modify and improve those changes without cluttering the commit history. When you're done, you simply push the finalized set of changes on top of the target branch.
The first example is a red herring, and the second really doesn't have anything to do with the change vs. revision dichotomy.
If you want to split trunk into competing experiments (and eventually discard one) using a "traditional" RCS like Subversion, you can do that. Easily. Worrying about the linear history of rev numbers would be like worrying about the ordering of hash tags in git. Just ignore them -- you always can get the change history of the branch on which you're working.
The second example is a feature provided by distributed version-control, not the choice of "changes" over "revisions". It's maybe easier to implement distributed version control with the former paradigm, but it's not impossible in either case.
Actually what's happening is the way you work mirrors your tools.
I spent years using CVS, then years using Subversion, now I have almost 2 years of git under my belt. There is something appealing about numbered versions, but there is no significant benefit over unique hashes of versions, or rather, any small benefits of sequential numbers is vastly outweighed by the tremendous benefits of unique hashes.
I don't think any developer would disagree that the ideal workflow is to do one thing after another sequentially. It's just that in the real world, reasons come up where you might need to create topic branches. Even if you rarely need that functionality, there's no reason to stick with a crippled system like subversion unless you really need the one feature that it's entire design philosophy optimizes (partial checkouts) more than solid fundamental changeset tracking and manipulation.
I'm no zealot—you won't catch me advocating linux vs mac, emacs vs vi, or ruby vs java. In most technology choices there is a wide range of tradeoffs and considerations. However Subversion is one of those rare cases where the tech is fundamentally flawed and serious developers need to move on, whether it be git, mercurial, darcs, or whatever. Subversion has a few use cases, and if you are not a professional developer then maybe it's deficiencies aren't very relevant. However, if you're slinging code all day, version control is your bread and butter, it will stay with you across languages and platforms, so it's insane to stick with a crippled platform that will always be that way due to fundamental design flaws.
Actually what's happening is the way you work mirrors your tools.
I don't think so. I've been using git for about a year, and I still practically never branch or merge. Switching from task to task has a cognitive overhead that I prefer not to pay.
You branch every time you pull, and merge every time you push, though in the simple linear case it will always be a fast-forward. Every repository is an independent branch.
Other than Bazaar. But Mercurial might be the only one to support both styles fully -- each changeset gets a hash and also a revision number for the branch.
What about topic branches? You have a cool idea and you don't want to break the main line of work. You at the same time want history of the development of this new cool idea. You also want to be able to continue to bring in critical work from the main line of work.
I don't know Mercurial but I imagine it can handle it as well as Git.
It's this feature that you'd have to pry from my cold dead hands. SVN now feels like a straight-jacket to me even when I'm working solo.
>> "What about topic branches? You have a cool idea and you don't want to break the main line of work. You at the same time want history of the development of this new cool idea. You also want to be able to continue to bring in critical work from the main line of work."
Why not just develop that new idea, without breaking the main line of work :/
Can you give an example of an idea you can't implement without breaking the main line of work?
> A version is just a changeset, so for me, it makes little sense to claim "changes" are a completely different way of thinking than "versions".
This supports his point. If you think that a version is a changeset, as opposed to thinking of a version as a monolithic collection of files, you have already achieved enlightenment.
I don't like branches either (we've been using git for maybe a year now at justin.tv, so I've definitely had the opportunity to grow to like them).
Actually I think one of the biggest reasons I don't like them is that they're a kind of "hidden state". If someone made a VCS (or just an interface to an existing VCS) where branches were just represented by different subdirectories in the filesystem, then I might use them more.
For me it's mainly as you said on another comment... The overhead of managing and switching between branches is just too big a price, for no gain that I can see.
I have a hard enough time remembering the state of a single trunk, let alone if I had several branches on the go at a time. I'd fail spectacularly at remembering which branch has what on it...
It's easier to just do:
* Never break the build / functionality of code (Good idea anyway)
* Architect your stuff well with minimal dependencies so you can
can swap in/out modules/components easily.
Personally, I feel branching+merging is as much use/fun as filling out TPS reports.
I know how svn branching works - I don't think I explained myself clearly.
I'd love a RCS that would let me see every branch that currently exists without having to check them out. The branches would exist as directories on my local filesystem. Any file operations (doing a directory listing, or accessing a file) on these local directories would actually be doing RCS commands and pulling data over the network, but I wouldn't have to think about that anymore.
If you checkout the svn root, you get all the branches as directories in your local filesystem. After that, it seems it would be the same as what you're saying.
I think he means that the way you work is the same as in SVN but the way the system works is like Git etc.: the branches are editable and accessible like directories but with the advantages and changeset-construction of Git behind it.
"The interesting part is that these systems think in terms of changes, not in terms of versions."
Not git. Git's repository model is very strongly snapshot-oriented. Of course the whole machinery for supporting changesets exists in git, but it is built atop a system that actively avoids "thinking" in terms of changes.
The storage model does not - it is not file-delta based, like SVN and Mercurial. However, you as a developer can (and often do) think of the commits in terms of a changeset or "work introduced" - this is evident in tools that exist like 'rebase' which treats each commit in a branch as a patch and the collection of unique commits on that branch as a patch series and helps you transfer that series elsewhere. The 'cherry-pick' command is similar - it enforces your thinking of each commit as a changeset or patch to what came before it.
This is not true for git's object model (which I believe maps to Joel's "user model".) Quote The Git Book:
"Subversion, CVS, Perforce, Mercurial and the like all use Delta Storage systems - they store the differences between one commit and the next. Git does not do this - it stores a snapshot of what all the files in your project look like in this tree structure each time you commit. This is a very important concept to understand when using Git."
Conceptually git thinks about snapshots of trees. The pack format uses deltas but that's a mostly hidden detail. The pack format didn't exist for the first months of git's existence.
I haven't bothered to read the blog post (Joel strikes me as someone with a huge ego and without any exceptional insights), but doesn't the above fact mostly refute his thesis? I like git and hg but they are just tools. You could do the same development model with patches.
Feature flags are usually just a poor-man's way to separate development (changing) and live (stable) code branches. They have explosive complexity (2^n combinations need to be tested). They require you to contort your code unnaturally to minimize the number of conditionals.
Compile-time choices are a terrible code smell. If object creation is sorted out (through dependency injection/service locator/etc), the compile-time choice can usually map to exactly one runtime option and exactly one conditional.
Exactly. You're already creating all your objects with factories, so all you have to do is build a FactoryFactory that builds the right kind of factories and you're good to go...
While that's a classic and amusing essay, having client code depend on concrete types instead if interfaces is lame unless you can be confident that they won't change their mind about which one they want to work with. I write mostly C, don't have anything called a factory, and yet every major object has a plugin architecture, many of these have more than a dozen implementations, and client code automatically works with all of them (and any that they or a third-party have written). Say what you will about architecture astronomy, but this is not complex and is far more maintainable and usable than the alternatives.
I'll take it one further and suggest that the expression concrete type is an oxymoron and that there are many, many ways to decouple code. You've mentioned two orthogonal ways to achieve this goal: Programming to interfaces (most people use this term to mean collections of method signatures) and Plugin Architectures (which sounds a lot like using composition or strategies instead of implementations).
But I stand by the tongue-in-cheek suggestion that FactoryFactories are high altitude if not low earth orbit. And obviously, you can use factories and still switch between production and development with a single flag. So please don't interpret my remarks as critical of code I've never actually seen.
My use of "concrete types" was merely a proxy for any code upon which client code should not have an explicit dependency.
I don't see interfaces and plugins as being orthogonal at all, you can't very well have plugins without interfaces (it wouldn't be much of a plugin if the client depended on the "plugin" itself). Besides, orthogonal vectors don't get you to the same place. ;-)
In my opinion, the defining feature of a plugin is that it can be loaded and used with no modification of the code that uses it. On architectures with dynamic loading, this means you can drop a DSO somewhere and use it without code modification or relinking. "Factories", as usually described, would require some modification of the factory to support this new implementation (perhaps just a single line). A "factory" with a runtime-extensible list of implementations that it knows about, is a plugin architecture, but a plugin architecture need not look anything like a factory.
I think that some of our discussion around terms is an artefact of discussing a manifestly typed language like C. But if we handwave and say that "interface" is any dependency with at least one level of indirection and "concrete type" is any dependency with no levels of indirection, I agree that you can't implement plugins without some indirection somewhere.
That being said, you clearly can implement indirection without plugins. I think of a plugin architecture as being composition at a coarse level. In the case of a factory with a runtime list of implementations, I think of that as a factory and a plugin architecture, possibly that the plugin architecture is implemented with a factory.
It often makes sense to have 2 alternate implementations of a component in the same codebase, so you can switch between them at startup/during runtime, and compare them side by side. Something that isn't as easy to do if you create 2 separate branches.
Indeed. It's a very common pattern to "has a" something that does some role, and then provide varying implementations. Image::PNG, Image::JPEG; Logger::File, Logger::Syslog, Logger::Database, Logger::Email, etc.
This may give you millions of combinations, but since the various parts that don't need to interact can't interact, this isn't really a problem. OOP is nice when used by people that know OOP.
Well, if you have a well defined protocol / interface for your component, and you have a runtime versioning mechanism (something as simple differently named DLLs will do if you're at the C level), then you ought to be able to do both: put the new code in a separate branch, but still switch in the old code at runtime.
An advantage of keeping an old code in its own branch is that it's less likely to become incompatible through random changes. In particular, I'm thinking of bug compatibility: there may be bugs you wish to fix in the new code, but you don't want to fix in the older code because of third-party dependencies.
I think a better solution is to have both features in branches, and have a 3rd branch which allows realtime switching between the 2.
a nice upside is that you can merge changes to each feature up into the "combined" branch as they get stable, to test side by side. when you finally finish, you can leave feature a behind by merging b into your release, or vice versa. or maybe even merge both into mainline.
sorry, the point I was trying to make is that cheap branching and merging is still an easier way to manage this situation than just ignoring branches altogether.
A: Create separate branches, manage merging/updating unrelated changesets.
B: Create 2 implementations of something in code, and manage nothing.
I guess it depends on how easy it is to isolate the part you want to create 2 implementations of, so it may depend on what language you're using as well as how you architected things.
Branching+Merging just seems like something extra I have to do, manage, and remember. Like filing. And I still can't see what benefit it gives for many cases.
But that's fine I think... I just don't get it. Maybe I'm too old ;)
It helped me to think about it in terms of connected graphs. Nodes are versions and arcs are changes. A path between two versions is the aggregation of changes.
Merging then becomes walking down the same path, but starting from a different node. I start at my current version node, and I walk down your path of changes. It's very different than trying to merge two nodes.
(This is just my mental model, no need to read into it more than that)
He implies in the last paragraph that it's his final post. Still, I think there's a point here. Writing, when you're that good at it, isn't so easy to give up. Also, the blog has been a huge part of Spolsky's success. So I bet he'll revive or continue it in some form before too long.
Edit: I hope he does, too. I probably disagree with him more than half the time, but there are few other writers on software who consistently hold my attention, and almost no one who consistently makes me laugh.
I would be happy to seem him move to a more personal blog. Something that is a little more about his personal experiences with technology. It sounded like he didn't like the regularly-published business-oriented posts.
I've only lived through one version control migration, that was from TFS to Subversion. The problem we have is the length of time Subversion takes doing updates, commits, and merges when you are dealing with a large repository and many changes spread out among files. (Good old TFS was fast.)
Can anyone say if Mercurial is significantly faster?
After going through Joel's tutorial it looks like Mercurial keeps history from branches in the root much better than either TFS or Subversion. I think I would opt for Mercurial the next time I have to set up source control.
I can say without hesitation mercurial or git will be insanely faster for these operations. Remember, you work locally, and then push with hg/git. So when you do "git commit" it only relies on the speed of your hard drive/cpu. Commits happen instantly. Updates are needed very rarely (since you don't need to update to make new commits). Merges are so much faster it's not even funny. After merging in hg/git you wonder why SVN even had a merge functionality.
In addition to those conceptual differences, the actual act of transferring files between remote and local on git/hg is also much faster than SVN. Entire checkouts of git repositories are usually smaller (disk size) than the equivalent SVN checkout... of one version. This means less time on the network all around.
I think you'll find that whatever git does, it's faster than the corresponding operation in SVN.
When you send commits to a remote server, git compresses them and sends the packfile over the network. The server will refuse non-fast forward merges (the branch you're pushing is supposed to be merged by you first, locally) so it pretty much has nothing to do but uncompress the changes and apply them. Unless you have a really slow network or are sending an insane number of commits, it's not going to take longer than a few seconds.
Cloning big repositories (like emacs, with ~25 years of history) can take a while if you have a slow network, but sometimes it feels like git can actually clone a repository faster than SVN can check out a single revision.
But repos like emacs' are rare. Often, SVN repositories end up being huge because they contain the code of many unrelated projects. In Git, having such a huge repository is impractical; each project should have its own repository. There is support for submodules though, so you can make a "master" repo that contains pointers to the child repositories, if you really have to.
Try out git. Force yourself to use it for a few weeks. I didn't like it at first. I used to be a Mercurial guy until I really learned git. I still don't know what it is, but there's something about git that just makes me want to love it.
There's something bothering me about the hosannas to the power of merging changes. Isn't there an unspoken assumption that all your work is text files ?
If we talk drawings, photos, whatever binary data is needed for a project - what happens ? Are the binary deltas as good as SVN's ?
Even if the "binaries" are XML text - e.g. drawings in SVG - wouldn't I be out of luck trying to merge changes if 'Beth' added a squiggle and 'Cath' a square to different parts of the drawing ? (many tools, upon writing, reorder data 'ad lib'). Are there "merge tool" plug-ins ?
Binaries are not merged - if histories diverge and one side changes but the other does not, it will choose the side that changes. If both sides change, it will record a conflict and you have to choose one side (or create a new binary that you tell the system is resolved). You can run an external merge tool with 'git mergetool' and there are over a dozen it knows how to run, but I don't know which of them will handle images. You can also use .gitattributes to help you diff binary files efficiently, but you still have to either choose one side or manually create a new binary. I mean, even a merge tool won't help much there - if you have images that changed on two different branches and you want to combine them, you have to fire up Photoshop or something anyways - that's almost always going to be a manual process. Though, in the four or so years I've been using Git (and the years before that with SVN and CVS and RCS) I don't remember having to do that very often - thus is the life of a coder, I suppose. The people who create and modify images for programs tend not to tread on each other.
Also, this is really the same problem even for systems without solid branching capability, like SVN. If two people modify a binary image at the same time and one commits, then the other will get a merge conflict when they update and will have to solve the conflict in order to commit.
Thank you for what you've been doing over the last ten years. I'm not sure what my career would look like without Joel on Software. It really helped shape my outlook and encouraged me to invest in reading and learning more about programming than I would have otherwise.
I wish you nothing but the best with your new venture and with Fog Creek.
Don't you think you'd be better off posting that "I've run the clock out" paragraph as it's own thing? It seems important enough to be it's own post.
Thank you for all your articles. I've learned a lot from them over the years and consider them really first class. Hell, my printout of the Joel Test is one of the three things taped to my desk (The other two being Merlin Mann's 5 Inbox-zero words and "Your Company's App" by Eric Burke)
I've learned more about programming (as in doing so) than at my university. And it helped me recognize the patterns you dissected when on my first job -- made it much easier to quit.
I have to say that this is kind of sad. I've just discovered your blog and the podcast and I feel like I'm going to miss your essays.
I'm not an engineer, but a designer who likes to program and your thoughts on software are useful even for me. Now that you are venturing into the realm of VCs and being a media company I would've love reading (or hearing in the SO podcast) your insights over that.
Anyway, good luck. And I guess I have ten years of essays to dive into.
For me, it always has been and always will be a right tool for the job issue.
We work on relatively small code bases with a very small team of people, not large, unwieldy open-source projects with hundreds of contributors. Primary interest is in a log of what has happened to the code chronologically, rather than applying/unapplying specific revisions/hunks frequently. No need whatsoever for people to run with their own branches. With those requirements, git, Hg or Darcs would all present far, far more headache than they're worth versus Subversion.
In some other scenarios, the formula may yield a different outcome...
How this article got so much points I'll never know. Its some novice advice on version control. Why don't you read the git books, they'll tell you all about it in well-organized fashion: http://book.git-scm.com
Agree: Upvote, Disagree: Downvote. That shouldn't be the way to go. I insist the article is novice advice. At least take the time to reply on how you find it original rather than simply downvote.
This is great. A few years ago, DVCS was weird and strange and SVN was the stuff. When people discover how awesome Git and Mercurial are, they stab SVN/CVS in the back.