Hacker News new | past | comments | ask | show | jobs | submit login
Git is cheap (coderwall.com)
113 points by beghbali on Oct 15, 2012 | hide | past | favorite | 53 comments



I absolutely agree with this. I recently started working on a small web app as a side project. I was dealing with a lot of elements that were new to me and was constantly pushing the limits of my knowledge and breaking things. I realized I wasn't using the power git provided me with branches.

Once I started branching things got quicker and more effective because I could bounce between tasks without affecting my deployment (master branch). In five days of frantic coding using git branching I now have a clean easy deployment of about 750 lines, which I iterated on through four different branches and, probably over 3000 lines of code.

It's not perfect, but it's eliminated any apprehension about aggressively changing my application, and made me far more ambitious. I feel great about it.


I've had a similar feeling of empowerment, although less structured. With hobby projects where sometimes I realize I was thinking all wrong about a certain solution, I'll just check in what I have and commence the deleting of large blocks. Occasionally I'll go back and find something that maybe was a good idea, but for the most part, having an excuse to part ways with bad ideas has alone been really great.


That's great, reducing the barrier to entry for new projects allows for more experimentation. More experimentation increases the likelihood you'll stumble upon something you really like!

What I meant to say more clearly is that I probably iterated through the creation of that web application so quickly because in five days I committed to git 75 times. In my time at Microsoft I don't know that I would have made 75 commits in a year, and creating new branches was very costly.

I got numbers on my git repository by using GitStats: http://gitstats.sourceforge.net/


There is another good reason to break commits down into small "bricks" (as the author calls them) and that is bisect[1]. Up until very recently I only used a tiny fraction of the git toolbox, but when I was complaining to a friend about how an app I was working on deployed fine on one machine but not on another with the exact same configuration he told me to run bisect.

It took me a matter of minutes to zero in on the exact commit that broke the app on that one machine. This would have been something I would have spent hours, possibly days trying to figure out by going over every tiny config detail (it wasn't a config issue).

Bisect is a powerful little tool to have in your back pocket when you know something you introduced at some time broke something and you need to find out what. The smaller and more "relevant" you keep your commits, the easier it will be to use. Obviously the opposite is true, if you have these massive commits which change large amounts of different files and/or multiple features, bisect becomes significantly less useful, almost worthless.

[1] http://git-scm.com/docs/git-bisect


As this is a positive article about git, if the OP reads this, I'd suggest using the word "inexpensive" next time. Cheap often has a negative connotation that something is low-cost at the expense of quality.

Just a pet peeve. Git is most certainly not cheaply made.


Yes, but the phrase "x is cheap" often means -- just use it. "x is inexpensive" doesn't have the same connotation. E.g. "memory is cheap"


I had sort of the same reaction, but it's one of those language pragmatics that seems heavily dependent on context. "Git is cheap" has the first reading to me with the cheap-crappy-gadget connotation, and I only get the intended meaning on the 2nd wave. But if you said "Git branching is cheap", I read that one as intended immediately.


I chose the exact same word "cheap" in a tutorial presentation that I made for new hires and co-op students at my current company. No confusion registered in the audience so far.


Is there any best practice for a large team of 50 developers to work in Git? We have about 20 separate (almost completely separate) projects.

We currently have everything under one big SVN directory. I use git for personal stuff so it's always a pain to work in SVN.


Git is sufficiently good at all workflows that "best practices" depend more on your team and code than any tool-enforced limits.

I'm working on moving a dev team from SVN to git myself; the plan that people seem to be happy with is to use git as a drop-in replacement for an iteration or two while people get used to the new tools, and then start introducing people to the more advanced features as and when they seem appropriate, hopefully ending up with a git-flow style workflow.


> Is there any best practice for a large team of 50 developers to work in Git?

Think about how the Linux kernel works. A single person at the top who has final say in the final product, a lieutenant responsible for each smaller group working on one subsystem.

Insist on keeping your history clean. Good commit messages and comprehensible changesets are necessary to make sure the code is understandable by code review, future maintainers, and when different teams' code needs to interoperate.


Unlike Subversion, Git has cheap branching and excellent merging which make branches an easy way to avoid 50 odd developers stamping on each others changes when they commit to master. I recommend checking out Git Flow which is a workflow for managing git branches.

See http://nvie.com/posts/a-successful-git-branching-model/

Using a CI server to test merging branches back to master and running the tests make it a lot easier to know when its safe to merge back to master (I left a comment in the comments earlier about how this can be done).


Shameless plug: http://gitpilot.com (I'm a cofounder). If you're interested, I'd be happy to learn more about the problems you're facing and we could see if Gitpilot would be a good fit or not. You can email me if you're interested: jp@gitpilot.com


That way lies madness.

In SVN is easy to create repositories. Not as easy as on GIT, but very easy.

Also, you can attach different commit-hooks to every repository, related to each project.


In git, I would create a seperate repository for each project. No need to mix up history of several projects into one repsository.


That's possible to do in SVN as well.


You can always use git-svn for local sanity :)


In theory yes, in reality git-svn is driving me insane. I can't wait to move to an all-git solution. Not because I love git but because I can then stop bridging two worlds with git-svn.

You can make git-svn work for non-trivial setups but it took me a lot of trial and error, failed merges and git stashes to make it work for me.


I can feel your pain.

Right now I have this workflow:

  git add filename(s)
  git stash
  git svn rebase
  git svn dcommit
  git stash pop
The worst is failed merges in git svn rebase...


I strongly half-agree here.

Commit early, hell yeah!

Branch often, umm, no. I feel that the excellent branching features of modern DVCSes have made people lazy and afraid to confront the inherent concurrent nature of working in parallel with people on the same codebase. Sure, if you're a 200 person team, there might be no way around it some decent branching setup. But don't forget: every feature branch means postponing continuous integration.

Continuous integration is the core of any productive and effective agile-ish team. Daily standups are nice, backlogs are lovely, but without continuous integration you can't get that design-develop-test cycle short enough to deploy/demo early and often.

Therefore, when working on a product or component with a limited team size (say, up to 15 people that are in close contact with one another), I believe that it is best to get your code on the mainline as fast as possible. Often, in git terms, the simplest way to do this is simply directly committing and pushing to master. Doing this helps signal conflicting concurrent work early, and it helps avoid double work. It encourages necessary but unforeseen design sessions before the work is done, rather than refactoring sessions afterwards.

The only strong downside to super-fast continuous integration that I can see is the chance that you "break the build" (or the tests, or whatever your situation has that needs to be OK for developers to be able to add features). If the team is in close contact, this is typically fixed within minutes ("hey Mike, you broke xyz.cpp" "oh damn, sorry, i'm right on it"), and if it isn't, well, git has cherry-picking features for a reason! You can use the tools to avoid the broken code for a few hours until the guy who broke it is back from the hairdresser.

Sure, you can do good CI with feature branches, but people have to be disciplined, and push their feature branch with master very often. Like, multiple times per day. I've never seen that work in practice. This doesn't mean that it can't work in practice, but it does mean that "branch away, buddy!" may be bad general advice.

When committing straight to master, the danger is of course that people hold back pushing their commits over the line entirely, which is bad too, but I find that once the horrible, horrible "whoever breaks the build gets pie" rule is replaced by the "whoever pushes more than x changed files/lines at a time gets pie" rule turns that culture around just fine.

I'll be the first to admit that this might work less well with e.g. highly distributed teams in different timezones. But that's hardly the most common scenario.

Continuous integration is called that for a reason. If you do A-few-times-a-week-integration, then call it that. And in my opinion, you're missing out.


DVCS does not discourage Continuous Integration as long as the developer pushes remotely on a regular basis to their own branch and the CI server is setup so that it builds the known active branches.

I've been working on solving this problem over the last year. We added the ability to Bamboo (CI server by Atlassian) to detect new branches as they are created in the remote repository and automatically test the merge with master and optionally push the branch to master if everything is OK. If the merge or the tests fail, Bamboo lets you know immediately via email/XMPP/HipChat/etc.

Yes, it relies on developers being disciplined (read: responsible and professional) enough to push back to master in order to have the merge tested and the tests run.

Developers actually do it though because the benefits to having their feature branch tested regularly against master are huge: code is regularly merged and tested with master to ensure integration state and other team members are isolated from changes that can potentially damage development velocity.

Anyhow, if your interested in learning more, see my blog post called "Making Feature Branches effective with CI". I'd love more comments and thoughts if you have them.

http://blogs.atlassian.com/2012/04/bamboofeature-branch-cont...

Disclaimer: I am product manager for Bamboo @ Atlassian


that looks really interesting, thanks!


No problems! If you have any questions, feel free to email me james@atlassian.com


This is why, at a minimum, you rebase (yes, rebase) your development branch before merging your changes onto the common branch on the shared repo. You catch conflicts in your work area without having to grind everyone's work to a halt while a fix takes place. Ideally you should rebase after every successful merge of changes to the common branch but that might not be practical depending on the change you're working on.

Branching often means not having to worry about the impact of making an experimental change: it may not go anywhere but you want to be able to try something out. You don't have any heavy lifting to start or to clean up.

And establishing good branching strategies are essential for any project that has any kind of parallel development. It doesn't matter if it's one developer or 200.


The only strong downside to super-fast continuous integration that I can see is the chance that you "break the build". If the team is in close contact, this is typically fixed within minutes, and if it isn't, well, git has cherry-picking features for a reason! You can use the tools to avoid the broken code for a few hours until the guy who broke it is back from the hairdresser.

Unless you're doing big commits (which we can all agree are bad), or the features you build in are trivial, the features are going to need more than one commit. At which point you either make the master non shippable (which is a bigger problem in continuous integration than integrating every 5 minutes), or you don't push often, at which point you're actually doing branching but your local feature branch is called 'master'.

The workflow you describe sounds more like "hey let's just edit the files on the server - make sure you don't hit "Save" before you make the complete change" than CI (at least my interpretation of it).


This workflow only needs one extra step: treat master (git) / default (hg) as the unstable branch, and have a stable branch for stable release. There, 2 branches with simple CI setup and low cognitive overhead, suitable for a small non-1337 team doing B2B development. (unstable branch: User Acceptance Test, stable branch: training and scheduled deployment)


Good point about the local feature branch. Apparently i'm more stuck in svn-o-world than I realized.


The point is that you should have personal and short-lived branches. Lots of them. (Any project already has one local branch per each developer, implicitly - it's right on each developers personal machine :)

And I think large scale OSS projects like Linux and Chrome are evidence that this approach works quite well.


> implicitly

Explicitly. You can't share branches between repos, but git does a good job of making it look like you can. master and origin/master in your repo are physically different things from master in someone elses.

I don't mean to single you out but I've seen developers with really, really nutty workflows because they didn't understand this.


Trust me, when I say 'implicitly', I mean it.

The implicit branch here is any local code. It's clearly diverging from the authoritative repo (whatever that is for your workflow and VCS)

My point was that any workflow that doesn't operate in a shared source directory already has those local "branches". You might as well acknowledge it and roll with it.

(Doesn't diminish your point about master & origin/master. I think this link is a propos to that: http://wheningit.tumblr.com/post/32959730634/when-the-office... )


Branches in git are just pointers. If two branches point to the same commit, they're the same branch.


If two pointers point to the same memory location, are they the same pointer? No.

Two branches are two branches, even if they point to the same commit. Update one branch, the other remains unchanged.


What the other response said + they don't point to the same commit. No branch in your git repo can point to a commit in another repo, even if those commits have the same hash.


Everything I do is on the master branch, because this branch represents the focus of my attention. If there's stuff I'm unsure about I feature flip whenever possible, and if I must do large scale refactoring it happens on the main branch so that I have to get something working to get it out of the way. It only ends up as a feature branch — namespaced under discarded/ — if the refactoring doesn't pan out.


I don't know why continuous integration is being put on such a pedestal. I would agree that as soon as features are done they should be committed, but there are two common cases that occur in my experience doing web development that don't fit with this approach.

First, there are plenty of features and especially refactors that reasonably require a few days of work in order to be complete. How is committing half-finished or partially executed features in the build is useful?

Second, there is the case where you are juggling development of multiple features at the same time. It is clearly better to have these in their own branches so they can be worked on independently of each other rather than working with a tangle of unrelated and unfinished changes.


> every feature branch means postponing continuous integration.

It's possible that you and the OP are talking about different things here. For me personally I branch a hell of a lot to organize my work locally. The vast majority of these branches will never be seen by anyone but me (they get merged into master when I'm done, and I just push master).


Rebase is the reconciliation between feature branches and continuous integration. As soon as someone merges to the integration branch, you rebase off of it, and you're integrated. Fix any merge conflicts at rebase time and now your merge can just be a fast-forward. That way, every feature branch contains the state of the integration branch, if that feature branch were immediately merged in. In other words, every feature branch is an integration branch.

The difference is that instead of having dozens of half-features in the integration branch at once, you have an integration branch with no dead code that you can actually ship (continuous deployment, anyone?) because your fully-integrated feature branches only make it in when they're ready, no matter how big or small the feature might be. It also means that experiments or major new features can be worked on and then discarded, or simply shelved in favor of higher priority work.


I'm just not clear on how branching gets in the way of continuous integration here. What I've started doing is any time I change anything I do it in a branch. Once I've got my changes working, tested, and I'm ready to deploy, I merge that branch back to master. The idea here being that I could deploy from master at any time and be completely functional. That way I can just constantly deploy any time I merge something back to master.

Perhaps I misunderstood something here though. Does anybody have any thoughts on this idea?


In general, it doesn't. It just looks like they've established a process where CI takes place on the shared master.

Builds/testing can happen on any branch, any repo. I'd like to have some local testing take place (for syntax at a minimum) before anyone submits changes to a shared branch.


FWIW, the comments on this post convinced me that I was wrong. Local, well-named, short-lived feature branches are something that I should learn to use more. These do not limit effective CI at all - this only happens once a feature branch lives on for days and days. I see now that the advice to use small, short-lived branches is in fact much of the article's gist.


I have a setup with an OS project, every bit of work is done on a branch, when that branch gets turned into a PR it runs against CI, when CI passes it is merged

https://github.com/daleharvey/pouchdb/pull/158

You can see here a PR failed CI, a new commit was made that passed CI, its was then merged.

Its a workflow made in heaven for me


TeamCity 7.x added the great ability run continuous integration configurations in any git branches automatically by trigger (also configurable to specific branches). Throw in some process for when pushes back to the main release branch happen and your CI problems are solved.


I can integrate continuously by doing this:

  git commit  (in my_branch)
  git checkout master
  git pull
  git checkout my_branch
  git merge master
  (possibly resolve conflicts)
Ta! Da!!

Continuous integration AND branching.


    git commit
    git fetch
    git merge origin/master
No need to switch your current branch to pull.


He compares it to SVN, but makes a classic error. In SVN branches and commits are just as cheap as in git (albeit marginally slower to create usually). It's the merges that are expensive.

It's there that git wins, with it's better understanding of how content moves around, and the ability of the merge tool to see the history and identify the common ancestor.


Beyond branches being expensive, even commits in svn are expensive. They publish immediately, which requires proof-reading, more testing, before each commit. This means there's an incentive to wait with commits.

Even if you do commit to a branch, where publishing is less of a problem, you still might regret a typo in the commit, which cannot (easily) be retroactively fixed whereas in git you can fix your private/unpublished commits as much as you'd like with virtually no drawbacks.


Good point - I suppose I hadn't considered it like that. I tend to happily commit to my feature branch on SVN and then make sure that the log message is good when I merge to trunk.

Incidentally you can edit previous log messages in SVN, it's just that most servers have it disabled, for traceability.


In git you can edit the commit content as well retroactively.


Branching in SVN is not cheap. It basically has to copy the entire directory structure into a new subdirectory (branches directory).

In git, it just creates one file with a 40 byte hex.


Errr, no - while that's what you see in the directory tree, that's not how it's done internally. SVN has a data model where multiple paths in the repo tree can point to the same underlying file object, so a branch requires no file copies, just updating what the directory tree looks like. Even that is done in a very efficient way. It might not quite be 40 bytes, but it is negligibly small.

This technique is in fact extremely similar to how Git works.


Server-side, that's cheap -- it doesn't actually copy any data. Client-side, it's also cheap: you `svn switch` to the new branch, which is identical to your old branch, so only minimal housekeeping on disk (just like git).


Sorry for going off-topic, but I tried to open this on iPhone and it never finished loading. It's 1.25 MB for one page of 3500 chars of text. Maybe the ratio of polish vs content is a bit too skewed here?


It's strange that coderwall doesn't seem to gzip their CSS nor JS files. It'd cut down their biggest file (~450k) to less than 50k.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: