Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The kernel isn't actually that large. It should be sub-GB for the foreseeable future iirc.

Single git repos get stretched to their limits when companies try to put 10-20 years worth of sourcecode from every single project they ever had (often with large binary files for testing or whatever) into a single repo because they are trying to use it like they used perforce. We're talking anything from several gigabytes up to the unimaginably large.



I was thinking more in terms of total lines of code.

Is the "best practice" to archive code after a while? Would make some sense, but I'm not sure how that would work. Everyone has to make the switch at once, right?


LOC doesn't matter all that much, the main problem is the shear number of git objects necessary (basic operations start to become very slow when working on a massive DAG) and the shear size of the repo (making the initial git-clone a procedure you start before you leave work for the night..)

Git should be able to handle just about any "single project" with ease for the foreseeable future though. If it can't, that is a strong indication that you need to start restructuring your projects into multiple separate "packages" or projects. You can do that slowly over time while you are still on your traditional VCS (just start breaking components out both in source code and in organization responsibility/hierarchy.

After this process is underway, if you are careful and a little clever, you can allow individual packages/projects to migrate themselves to git/Hg. You will probably be in this stage (many people on the old monolithic VCS, many people using git/hg for their projects) for a while. Managing a concept of "packages" at a higher than version control repos is somewhat important here to abstract away exactly which VCS is being used by a particular project (Android's 'repo' is sort of an example of a higher level concept above version control that facilitates Android development).

Ideally you would eventually give everybody using the old system a deadline to migrate to git.

Note that this sort of situation is really only something that large organizations (facebook, or larger) should ever find themselves in. If you're on a small team and you are running into these sort of problems, then you probably have a slightly different sort of problem: perhaps lots of checked in auto-generated code (suggested solution: stop doing that. work on caching in your build system if auto-generation takes too long to do it every time), or maybe too many checked in large binaries (suggested solution: if those files absolutely must be checked in, perhaps look into git-annex).

Edit: here is a slide deck that covers Perforce scaling at Google: http://www.perforce.com/sites/default/files/still-all-one-se... Note the page "Perforce at Google: Main Server". That is the sort of situation that you don't really want to get backed into, but after restructuring your codebase you can construct solutions with git that allow you to scale much further with lower operational cost.


The kernel argument still jumps out at me, though. There are many "subsystems" to the kernel that could have easily been split to separate repositories had that been the idea for the project. Indeed, this seems to be the entire monolithic kernel debate. They do not split things out, even if they could potentially do so.

Now, in general I think I agree with your points. Your two specific points I specifically promote at work and on school projects.

And, I fully see how the large organizations you are referring to would hit this. Especially when they essentially have independent projects in development. What I do not understand is where that line is drawn. To the point that I often find myself on the counter argument at work when teams want to immediately start a project in 3 repositories because we may want some utilities used elsewhere.

Take gnome. At face value, it seems like most of the core of gnome could be in one repository. Instead, it is very highly split out and has a specialized build system to support the it. Was this strictly necessary? Or is this more to support other infrastructure ideas at play? (This make sense?)




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: