Hacker News new | past | comments | ask | show | jobs | submit login

Google's mono-repo is interesting though, in that you can check out individual directories without having to check out the entire repo. It's very different from checking out a bajillion-line git repo.



It's kind of interesting that nowadays people assume that version control system == git.

For a huge, non-open codebase there are some pretty large downsides to a fully distributed VCS in exchange for relatively few benefits.


Good point.

It's important to stress that Google uses Perforce and not git (at least for that monorepo, they use git/gerrit for Android).

A monorepo this size would simply not scale on git, at least not without huge amounts of hacks (and to be fair, Google built an entire infrastructure on top of Perforce to make their monorepo work).


Google doesn't use perforce anymore. It's been replaced with Piper, you can read about it in articles from about 2015 or so. Perforce didn't scale enough. I guess it's not clear to what extent Piper is a layer of infrastructure on top of perforce or actually a complete rewrite? I was never super sure. The articles appear to imply way more than a layer on top...

You are exactly right that git doesn't scale though, go see the posts on git that Facebook's engineers made while trying, only to be met with replies to the extent of "you're holding it wrong, go away, no massive monorepo here", at which point they made it work with mercurial instead. Good read though, lot of good technical details. Can't find the link at the moment though :(, but it was from somewhere around 2012-13 ish.

Edit: here, looks like the original thread is deleted but here's the hn pointer: https://news.ycombinator.com/item?id=3548824


There's nothing wrong with saying "you're holding it wrong" if they're holding it in a way clearly contrary to the solution design. I don't fit in a toddler's car seat and if I tried, it's clearly my fault and not the seat engineer's. I doubt they'd want to accept my changes that would make it work worse for toddlers either.


Sure, if you don't care about people actually using your stuff you can ignore their requests. But Facebook and Google are now working on Mercurial rather than git, and Mercurial actually cares about ease of use (whereas git seems to revel in its obtuseness) and the Mercurial folks are looking at rewriting it, or parts of it in Rust to improve performance, which has always been the major issue.

If all those things continue I think the only reason to use git over hg would be github. How long until they decide to support Mercurial too and people abandon git?


> Sure, if you don't care about people actually using your stuff you can ignore their requests.

Yes. End of story. People will abandon things that don't support them for things that do and those that want to continue using something that fits their application will do so. Nothing to see here; we get it, you don't like git -- don't use it if it doesn't fit your needs. However, don't expect those who do like it to go out of their way in a way they don't want to please you. Just because there is a community developed around something and that something is open source does not mean they are required to accept whatever patches come their way -- often the best projects know what to keep out as much as what to let in. In this case, the git community has decided it doesn't want to do those things; more power to them.


>> Sure, if you don't care about people actually using your stuff you can ignore their requests.

I think you nailed the problem with Git here: it was created by one guy to support his pet project and as long as it works well for him all the other feature requests are low priority.


Agree completely, git is just not the tool for the job, the original thread (which I still can't find, gah), makes that pretty clear.




Mercurial (with lots of extensions) sits on top of Piper at Google. It doesn't replace it.


I thought it was Facebook that did the mercurial thing: https://code.facebook.com/posts/218678814984400/scaling-merc...


Actually that says they are working on improving Mercurial to the point where they can use it.


That article doesn't claim that. It only claims that mercurial is used within Google.


> monorepo this size would simply not scale on git

https://blogs.msdn.microsoft.com/bharry/2017/05/24/the-large...


That's still not even close to Google's repository:

"The Google codebase includes approximately one billion files and has a history of approximately 35 million commits spanning Google's entire 18-year existence. The repository contains 86TBa of data, including approximately two billion lines of code in nine million unique source files."

Source: https://cacm.acm.org/magazines/2016/7/204032-why-google-stor...


Thanks for this quote.

It prompted me to do a quick afternoon experiment with how git would handle a billion lines of code:

https://news.ycombinator.com/item?id=15892518


As another user mentioned, many git actions scale linearly in the number of changes, not in the size of the repository. Try recreating the scaled repo, but say, in commits of 1000 lines each (ie. 200K commits), and see how long things take.


Did your experiment also do 40,000 changes per day (35 million commits, of varying sizes throughout the repo), and then see how that affects git performance? My (admittedly crappy) understanding of git is that it also scales on the commits, not just the raw file number/size count.


This is just the Windows codebase and is relatively small compared to almost the entirety of Google.


Google no longer uses perforce either. I believe it also stopped scaling. They now use Piper, which has a perforce like interface, but is not the same thing.

And there are other non perforce like Piper interfaces.


Could you please elaborate? I've only used svn and git, and the largest codebases I've worked on have only been about 150k lines of code.

What are the other ones and the main differences, really curious


Perforce is really common in a few domains because it handles 1TB+ repo sizes cleanly, has simple replication, locking of binary files and a good UI client for non-programmers.

Was pretty much used exclusively back when I was in gamedev, not sure if that's still the case.


For example in svn checking out only some subdirectory instead of entire repo is pretty much the default way how you should use it.


git is designed from the ground up to be 100% distributed. This is useful for small and/or open source projects. It's 100% portable. You can fork and merge between different repos maintained by complete strangers.

Now, imagine you're a huge corporation. Your code consists of millions of files that have been edited millions of times. It's never going to be released to the public. It's never going to be forked, much less by a stranger. You're going to have only one main branch and main build ever, except for maintenance branches. The complete history of everything that has ever happened on that repo is would take up many gigabytes, and developers are probably only ever going to need to look at and/or build locally 0.01% of that code themselves.

If you were going to design a version control system from scratch for the latter scenario and you had never heard of git or any other existing VCS, how would you design it? Would you come up with something like git? Probably not. People would just have local copies of the minimum of what they needed to get their work done, anything else would call some server on the VPN they were always on. And you would probably come up with some whole specialized server architecture with databases and such that wasn't that similar to a corresponding client architecture that it would also need.


Such as? Then why has everyone switched to git? (Hint, because it is fundamentally built on more powerful ideas than what came before it)


Open source has moved to git, mostly because being standardized on one vcs made it easier for people to contribute.

A lot of companies don't use git.


No, before git the standard was svn, and before that, cvs. The switches happened even though there was an existing standard.


Having worked with all three, nothing but Stockholm syndrome would keep anyone to switch from cvs. Likevise the switch to git for open source happend (In my opinion) in large part because Github offers a far better experience than SourceForge witch was dominant at the time.


There was actually a brief period where Google Code was ascendant, but then GitHub was demonstrably investing more in collaboration.

I think one aspect of Git that is really important is forking, and having your own local commits. Merging commits and patches in svn were awful. You wouldn't ever allow someone random to join your svn repo, but if they can reasonably provide a patch, you could take it. Git makes that massively easier.


>There was actually a brief period where Google Code was ascendant,

brrrr


People switched to Git because Svn merging continues to suck and branching and tagging are implemented in the most inane way possible.


I didn't care about merging and tagging.

For me the main feature was distributed nature. SVN is OK on a gigabit corporate LAN with dedicated people to manage & maintain the servers + network. Anything less than that, and it becomes slow and unreliable.


Well clearly not "everyone" has since we're apparently talking about a company that hasn't.


While technically true due to some features of tooling, that is really only masking off part of the repo under a READ-ONLY directory.

Builds can (and usually do) depend on things that aren't part of your local checkout.

I'd say CitC is a much more accurate representation of the way Piper and blaze "expect" things to work.


The dependencies are still downloaded only on need though.


The dependencies aren't really "downloaded" at all. When you build something, the artifacts are cached locally, but the files you are editing generally speaking not actually stored on your machine. They're accessed on demand via FUSE.


This used to be done manually via "gcheckout" but that's long since been replaced. Users now don't do anything but create quick throwaway clients that have the entire repo in view.

Until very recently there was a versioning system for core libraries so those wouldn't typically be at HEAD (minimizing global breakage). Even that has been eliminated now and it's truly just the presubmit checks and code review process that keeps things sane.


> truly just the presubmit checks and code review process that keeps things sane.

also rollbacks :)


Microsoft is solving this for git with their GVFS.

Another issue with git monorepos is access control, does anyone know of good solutions for this, does GVFS solve this also?


Yeah that was what was nice about SVN. You could check out paths.


> Google's mono-repo is interesting though, in that you can check out individual directories

The same is true of svn, which many people like to bash nowadays, even in this discussion.


That's pretty common feature in most non-DCVS. It was nice having Perforce on the last game I worked on. The art directory was ~500GB and not fun to pull down even with a P4 proxy.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: