Why Google stores billions of lines of code in a single repository (2016)

nickm12 · on Dec 10, 2017

I've worked with both a distributed repo model and a monorepo model and vastly prefer the distributed approach (given the right tooling). The trade-offs are complementary and no doubt with proper discipline you can try to maximize the benefits, while minimizing the downside. But here's what I don't like about working in a large monorepo:

1) Difficult to track changes to the code I'm interested in. Every day there are hundreds of changes in the repo and almost all of them have nothing to do with what I'm working on.

2) all sorts of operations take longer (pulling, grepping source, etc.) to support code I couldn't care less about.

3) Frequently have to update the world at once. Unless the repo can store multiple versions of the same module, then all the consumers have to be updated at once, even if it's inconvenient. Sometimes migrations are better done gradually.

4) Encourages sloppy dependency management. There are frequently unclear boundaries between software layers.

I'm sure people will say "if you're having those problems, you're doing it wrong" but the same thing could be said to people who find the distributed model problematic.

vertex-four · on Dec 10, 2017

The trick is that Google have their own VCS, build tooling, automated refactoring tools, etc etc, specifically designed to deal with their monorepo. Nobody else has that - we're stuck with git and a complex landscape of tools for managing code in ad-hoc ways. As a result, with the tools we have, many repos is better than a monorepo - but perhaps if we had those tools, for some cases, a monorepo might be better than many repos.

Note that even where Google are forced to use git (e.g. Android, Chrome) they use a many-repo approach.

user5994461 · on Dec 10, 2017

Google has that, Facebook has that, Microsoft partially has that, some investment banks have that too.

Everywhere I've seen mono repo, mono repo was better than multi repo.

They all built special tooling and have dedicated teams to support it.

perfmode · on Dec 10, 2017

I wish someone would create an open-source vcs that supports mono-repos at scale out of the box.

albertzeyer · on Dec 10, 2017

Microsoft has released some tools to support that for Git. https://blogs.msdn.microsoft.com/bharry/2017/05/24/the-large... https://blogs.msdn.microsoft.com/devops/2017/05/30/optimizin...

supergreg · on Dec 10, 2017

Maybe we could look at the problem from the other side. Create tools to manage multi-repos like if they were a single mono-repo. A docke-compose for git.

monocasa · on Dec 11, 2017

It's sort of intrinsically a pain since then you lose atomic, cross project commits.

zeckalpha · on Dec 10, 2017

https://myrepos.branchable.com

zifnab06 · on Dec 11, 2017

Android does this with a tool named repo. It tires into gerrit and you can treat the whole Android project as one repo.

qznc · on Dec 10, 2017

Why is the problem with Subversion?

hennsen · on Dec 10, 2017

Are you proposing a subversion-based monorepo?

gmueckl · on Dec 11, 2017

We get a lot of mileage out of Subversion as a monorepo. It certainly works better than people like to give it credit for.

vertex-four · on Dec 11, 2017

Perforce Helix might be even better - it even has a DVCS model based on creating a "local server" that can fetch/push from a shared server asynchronously from use of that local server, and a hybrid model that allows for only parts of the repository to be hosted on your personal server, and other parts to follow the more traditional Subversion-like model. Things like exclusive locks on files that can't really be "merged" are also supported (for example, all your assets).

The only downside is that it's not open-source, and as a result has a much smaller community. It's free for up to 5 users, then "email us" for any more. But if a very flexible VCS model is something you need, it's the same as anything else you need to pay for.

Google used to use Perforce until they hit a certain scale, so it's likely it'll work for you until you hit that scale and can build your own tools too.

qznc · on Dec 11, 2017

Well, it seems to fit the requirements better than git. Obviously, subversion is not used much. I would like to hear some experience reports what is the problem with it.

gmueckl · on Dec 11, 2017

We have two issues with Subversion:

- it requires a certain discipline: we need branching in our workflow and this is handled mostly by convention in a subversion repository. We have "branches" that were created by less careful colleagues by copying subdirectories of trunk to the branches folder.

- all the tooling developers fled to work on making git bearable. It seems that there is good money in sugarcoating got and none in making good tools for Subversion (awareness of branches in Jenkins, decent code review...). We have a budget, but that does not compensate for the lead that git has in that regard.

Other than that, subversion fits our needs. It just works.

hennsen · on Dec 12, 2017

Subversion is not used much anymore - just in case you entered the industry after this.

Subversion has been used in basically each and every open source project as a replacement for the previously most used CVS.

Subversion was better tjan cvs, but still bad in many aspects, slow synchronization and bad branching and merging support come to my mind.

Because of these shortcomings and because of the idea of decentralized versioning coming up, many systems like git, mercurial, and others came up then, and git seems to be the most successful of these by now

hennsen · on Dec 12, 2017

That being said, not sure how and if subversion has improved by now, didn’t use it for years...

hello_there · on Dec 10, 2017

What special tooling is required to deal with a monorepo that is not required for multi repo?

user5994461 · on Dec 10, 2017

Must have: Tooling that can interact on a file or sub directory level. Git cannot do that.

Should have: Access control to view and change file on a subdirectory basis. Everyone can see the repo so you can't permissions users per repo anymore. It's optional but these companies have that.

Recommended: Global search tools, global refactoring tools, global linting that can identify file types automatically and apply sane rules, unit test checks and on commit checks available out of the box for everything and that run remotely quickly, etc...

It's regular tooling that every development company should have, but only big companies with mono repos have it.

It's not that the tooling is needed to deal with the mono repo, it's that the tools are great and you want them. But they can't be implemented in a multi repo setup.

Think of it. How could you have a global search tool in a multi repo setup? Most likely, you can't even identify what repo exists inside the company.

Makes me realize. If I ever go back to another tech company, the shit tooling is gonna make me cry.

bunderbunder · on Dec 10, 2017

IIRC, Bitbucket Enterprise has pretty decent global search. GitHub Enterprise doesn't seem to have much of any cross-repo tooling, which is one of my least favorite things about it.

Global refactoring seems a lot less necessary if you have clean separation among your processes. Maybe this is me coming from a more microservices perspective, but I'm inclined to say that needing to do a refactor that cuts across several different functional areas is a sign that things are becoming hopelessly snarled together.

malkia · on Dec 10, 2017

Google have dedicated (no more there) language, platform, library, etc. teams that can push really huge refactoring changelists - for example if they've noticed that code had plenty of: "if (someString == null || someString.empty())" - they would replace it with something simpler.

Or if they've found some bad pattern, would pull it too. I do remember when certain java hash map was replaced, and they replaced it across. It broke some tests (that were relying on specific orders, and that was wrong) - and people quickly jumped and fixed them.

This level of coordination is great. And it's nost just, let's do it today - things are prepared in advance, days, weeks, months and years if it had to. With careful rollout plans, getting everyone aware, helping anyone to get to their goal, etc.

It's also easy to establish code style guides, and remove the bikeshedding of tabs/spaces, camel braces or not, swtich/case statement styles, etc. Once a tool has been written to reformat (either IDE, or other means), and another to check style, some semantics - then people like it or not soon get on that style and keep going. There are more important things to discuss than it.

derefr · on Dec 10, 2017

The idea of global refactoring is mostly that you can decide to modify a private API, and in the process actually update all the consumers of that API, because they all live in the same repo as the component they're consuming. (This is also the argument of the BSD "base system" philosophy, vs. the Linux "distro" philosophy: with a base-system, you can do a kernel update that requires changes to system utilities, and update the relevant system utilities in the very same commit.)

xwvvvvwx · on Dec 10, 2017

Code search in bitbucket server is dismal. All punctuation characters are removed. This includes colons, full stops, braces and underscores. This makes it close to useless for searching source code.

Regarding global refactorings think new language features or library versions.

rbbitbucket · on Dec 11, 2017

Bitbucket PM here. Thanks for the feedback!

Support for punctuation in search is something we knew wasn't ideal when we first added code search. As with all software, there were some technical constraints that made it hard to do.

We plan to have support for full stops and underscores in a future version and are exploring how to best handle more longer term. Our focus, based on feedback, is on "joining" punctuation character to better allow searching for tokens. Support for a full range of characters threatens to blow out index sizes, but if we get more feedback on specific use cases we're always happy to consider them.

mikepurvis · on Dec 10, 2017

That boggles the mind. Why wouldn't they just ship Hound or something else based on the Go regex search backend?

rbbitbucket · on Dec 11, 2017

There's always a reason ;)

Being a self-hosted product we have to make tradeoffs for the thousands of people operating (scaling, upgrading, configuring, troubleshooting...) instances. In short, we try to keep the system architecture fairly simple using available technology and keeping the broad skillsets of admins in mind.

It was a somewhat difficult call to add ElasticSearch for it's broad search capability, but being used for other purposes helped justify it. Adding Hound or similar services that were considered would have added more to administrative complexity and wouldn't provide for a broader range of search needs.

We continue to iterate on search, making it better over time.

mikepurvis · on Dec 14, 2017

A fair point, but I will just say that Hound is _astonishingly_ low maintenance. I set it up at my current employer like two years ago and have logged into that VM maybe twice in the entire time. It just hums along and answers thousands of requests a week with zero fuss.

user5994461 · on Dec 10, 2017

You really need a good "search and replace", whether it's called a refactoring tool or something else.

lclarkmichalek · on Dec 10, 2017

> Must have: Tooling that can interact on a file or sub directory level. Git cannot do that.

I mean, when you get big, sure. But until you're big, git is fine. Working at fb, I don't use some crazy invocation to replace `hg log -- ./subdir`, I just do `hg log -- ./subdir`. Sparse checkouts are useful, but their necessity is based on your scale - the bigger you are, the more you need them. Most companies aren't big enough to need them.

> Should have: Access control to view and change file on a subdirectory basis. Everyone can see the repo so you can't permissions users per repo anymore. It's optional but these companies have that.

Depends on your culture (and regulatory requirements). I prefer companies where anyone can modify anyone's code.

> Recommended: Global search tools, global refactoring tools, global linting that can identify file types automatically and apply sane rules, unit test checks and on commit checks available out of the box for everything and that run remotely quickly, etc...

I'd bump this up to `should have`. The power of a monorepo is being able to modify a lib that is used by everyone in the company, and have all of the dependencies recursively tested. Global search is required, but until you're big, ripgrep will probably be fine (and after that you just dump it into elasticsearch).

Xorlev · on Dec 10, 2017

> Depends on your culture (and regulatory requirements). I prefer companies where anyone can modify anyone's code.

This is still true at Google, except for some very sensitive things. However, every directory is covered by an OWNERS file (specific or parent) that governs who needs to sign off on changes. If I’m an owner, I just need any one other engineer to review the code. If I’m not, I specifically need someone that owns the code. IMHO, this is extremely permissive and the bare minimum any engineering organization should have. No hot-rodding code in alone without giving someone the chance to veto.

>ripgrep, ElasticSearch

Having something understand syntax when indexing makes these tools feel blunt. SourceGraph is making a good run at this problem.

lclarkmichalek · on Dec 10, 2017

Eh, at least in FB, I see more unstructured querying.

user5994461 · on Dec 10, 2017

Elasticsearch is too dumb. You need to use a parser and build a syntax tree to have a good representation of the code base. That's what facebook and google do on their java code.

Agree that any small to medium company could have a mono repo without special tooling. Yet they don't.

There are companies that care about development and there is the rest of the world.

rpedela · on Dec 10, 2017

Github uses Elasticsearch [1]. I agree that ES is too dumb by default, however the analysis pipeline can be customized for searching source code.

1. https://www.elastic.co/use-cases/github

boyter · on Dec 10, 2017

Might I suggest using a tool designed for searching source code rather than dumping into elastic. Bitbucket, sourcegraph, github search or my own searchcodeserver.com

Unless designed to search source code most search tools will be lacking.

marssaxman · on Dec 11, 2017

I had a bad time at Google and was glad to leave, but wow did I ever miss that culture of commitment to dev process improvement and investment in tooling. The next startup I joined was kind of a shocking letdown. It became clear pretty early on that nobody else there had ever seen anything like the systems at Google, couldn't imagine why they might be worth investing in, and therefore the level of engineering chaos we wasted so much time struggling with was going to be permanent.

The startup I'm working for now is roughly half ex-googlers, so it is a different story. Of course we can't afford Google level infrastructure, but there is at least a strong cultural value around internal tooling, and a belief that issues with repetitive or error-prone tasks are problems with systems, not the people trying to use them.

malkia · on Dec 10, 2017

Worked at google for 2-3 years, mainly java, under google3: my thoughts: Having things under single repo, and with a system like blaze (bazel), I can quickly link to other systems, or be prevented/warned that it's not good idea (system may be going deprecated, or just fresh new, and you need visibility permission (can be ignored locally)).

Build systems, release systems, integration tests, etc. - everything works easier - as you refer to things just by global path like names.

Blaze helps a lot - one language for linking protobufs, java, c++, python, etc., etc., etc.

Lately docs are going in it too, with renderers.

Best features I've seen: code search, let's you jump by clicking on all references. Let's you "debug" directly things running in servers. Let's you link specific versions, check history, changes, diffs.

GITHUB is very far away from this, for nothing else - but naturally by not even be possible to know how things are linked. Even if github.com/someone/somelibrary is used by github.com/someone-else/sometool, GITHUB would not know how things are connected - is it CMake, Makefiles, .sln, .vcxproj. It maybe able to guess, but that would be lies at the end... Not the case at google - you can browse things better than your IDE - as you can't even produce this information for your IDE (a process that goes every few others updates it, and uses huge Map Reduce to do that).

Then local client spaces - I can just create a dir, open a space there, and virtually everything is visible from it (whole monolithic depot) + my changes. There are also couple of other ways to do it (git-like include), but I haven't explored those.

What's missing? I dunno... I guess the whole overwhelming things that such a beast exist, and it's already tamed by thousands of SREs, SWEs, Managers, and just most awesome folks.

I certainly miss the feeling of it all, back to good ole p4, but the awesome company that I'm in also realized that single depot is the way to go (with perforce that is). We also do have git, but our main business is game development, so huge .tiff, model files, etc. files require it.

Also ReviewBoard and now swarm (p4 web interface and review system) is so far nice. Not as advanced as what google had internally for review (no, it's not gerrit, I still can't get around this thing), but at going there.

Another last point - monolithically incremental change list number would always be easier than random SHAxxx without order - you can build whole systems of feature toggles, experiments, build verifications, around it, like:

This feature is present if built with CL > 12345 or having cherrypicks from 12340 and CL 12300 - you may come up with ways to do this too with SHA - but imagine what your confiuration would look like. It's also easier to explain to non-eng people - just a version number.

mrep · on Dec 10, 2017

Sounds like an opportunity for Google cloud

marcosdumay · on Dec 11, 2017

Wouldn't it be better to just adjust the linter, refactoring etc to work on a multi-repo hierarchy? Most of them already mostly do.

brlewis · on Dec 10, 2017

What special tooling is required to deal with a monorepo that is not required for multi repo?

From my time at Google the first thing that came to mind was citc. But I couldn't remember if citc was publicly known, so I did an Internet search for "google citc". The first search result was this article.

"CitC supports code browsing and normal Unix tools with no need to clone or sync state locally."

robrenaud · on Dec 11, 2017

Facebook is painfully non mono repo.

sah2ed · on Dec 11, 2017

Dan Abramov's comment elsewhere in this thread says otherwise:

"I work at Facebook, and can confirm we keep all code in a monorepo".

https://news.ycombinator.com/item?id=15893184

robrenaud · on Dec 11, 2017

Unless something drastic changed in the last year, I really doubt it. There is the fb frontend, the backend, the offline batch processing repo, and the instagram frontend repo. I think the phone apps have their own repos too? It was a giant mess, especially when you had to make changes that spanned repos, like introducing a new backend API and then depending on it, or changing logging formats.

fishywang · on Dec 11, 2017

> Note that even where Google are forced to use git (e.g. Android, Chrome) they use a many-repo approach.

Google uses many-repo approach for Android and Chrome because you cannot fit everything in a single git repo (well you can, but it will be a pain in the ass to work on that repo). Git is just not designed for huge repos. Google is also working on tools to make the many-repo of Android or Chrome work like a monorepo.

w_t_payne · on Dec 10, 2017

I have written my own tooling. It is doable. Just time-consuming. -- It took me about a year (2015) of my (limited) spare time.

geocar · on Dec 10, 2017

Software A version 1 consumes format F1 and produces format G1 data, and software B version 1 consumes format G1 and produces H1.

To upgrade format G2 we must change both software A and B.

First, software B version 2 must accept both G1 and G2. To do this we may need to build software A version 2 and try them in a sandbox environment to gain confidence that ∀F1 we produce the correct G2. If F1 is complete, we may be able to do this exhaustively, but if F1 is sufficiently diverse, monte carlo simulation might be used.

Then, if there's a 1:1 relationship between A/B we can upgrade pairs.

If there's a N:M relationship, we need to upgrade all of the instances of software version B1 to B2 (at least within a shard). If you're running in a non-stop environment, this might have it's own challenges. Only then, can we begin the upgrade from A1 to A2.

Now:

Something, somewhere needs to record what and where we are in this journey. It is relatively straightforward how to do this with a monorepo, but it is very unclear how to do it with a distributed repository:

Almost everyone I know punts and uses some other golden record (like a continuous integration server, or a ticketing system, or an admin/staging system), and like it or not: that's your monorepo.

Joeri · on Dec 10, 2017

You can also design software A to produce both G1 and G2 side by side, deploy it, and then develop new software B against G2, submitting bug reports to project A when there’s a problem detected in G2.

If you’re doing the multirepo strategy it’s best imho to make the projects truly independent, as if they were developed by different companies. That way every project only needs to think about its own dependencies and consumers, and how to do migrations, without needing to have the big picture mapped out.

geocar · on Dec 10, 2017

> You can also design software A to produce both G1 and G2 side by side

This can be impractical if G is a database table that is very large.

> it’s best imho to make the projects truly independent, as if they were developed by different companies.

One of our systems might cost £300k, so completely desynchronising them so that code paths can build both G1 and G2 simultaneously (allowing B to develop separately) means "simply" doubling the costs. That might put our team at a disadvantage against someone who figures out another way.

Groxx · on Dec 10, 2017

> This can be impractical if G is a database table that is very large.

If this is true, you have no choice, and must run things side by side while you convert to G2. Or shut everything down to make the migration atomic, which is increasingly not an option.

geocar · on Dec 10, 2017

> you have no choice

It depends on your database.

If you imagine a simple postgres or mysql server and an "alter table G..." then you're right.

If (however) G1⊂G2 then a document store or a column-based database can usually partition the table somehow.

gwenzek · on Dec 11, 2017

> This can be impractical if G is a database table that is very large.

Unless if the new format G2 is a super set of the old format G1. Read protobuffers best practice to know more about this.

pas · on Dec 10, 2017

Maybe using a ticketing system [or just call it a project management system] is the right abstraction level.

If A and B has nothing to do with each other - other than for some circumstantial reason they consume data from each other, then why would we care if A or B starts to support a new output format?

If we want to do a format change for some reason, maybe it'll allow better security/traceability, then sure, make a project and track the tasks (like make A able to produce/consume new format, make B able to produce/consume new format, deploy A2 and B2 to test environment, promote to prod), but I don't see why would you track that on the source code versioning level.

A and B has separate tests to ascertain that they can deal with the new format, and then you do the integration testing, that might catch problems, that then should be covered by unit tests in A or B. (Or in a fuzzer for said format.)

geocar · on Dec 10, 2017

> Maybe using a ticketing system [or just call it a project management system] is the right abstraction level.

> If A and B has nothing to do with each other - other than for some circumstantial reason they consume data from each other, then why would we care if A or B starts to support a new output format?

First, even if the coordination between A and B is recorded in the ticketing system, the coordination between F and G is probably not.

> I don't see why would you track that on the source code versioning level.

Pretend F and G are tables in a database (or other data storage system) if that makes it easier.

Where is the schema stored? Who records the migration path?

Many people like to record migrations in a version control system, but it is tricky to link those migrations to the (otherwise) independent A and B.

If these are file formats where exists the code to consume and produce them? Or network formats? The problem remains the same -- do we break this up into additional libraries?

That there's a very real ordering between release of A and B that isn't properly encoded, we're relying on process diligence (as opposed to tooling) to be correct.

pas · on Dec 11, 2017

If tables, then if they are in the same DB, they should be in the same project.

If they are independent tables, then I don't care, show me the API between the projects.

If these are file/network/serialization/wire/in-memory/binary/codec formats, then there are conformance checkers (passive and active, like fuzzers). Those are separate projects, but they can be used like tools during testing and development.

Rely on tooling to make sure that the stated goal of the project is reached. (It now supports F or G or X,Y,Z formats. It supports output-format G by processing input-format F. If that's a project requirement, test it in that project.)

You can use a top level repo for the integration tests. But it's no need to make it one flat repo.

geocar · on Dec 12, 2017

> If tables, then if they are in the same DB, they should be in the same project.

Lock-stepping two otherwise unrelated applications because they both share support for a data structure is silly at best, and often impractical, especially if development for only one of the projects is "in-house". Consider the possibility that "A" is a commercial product produced by another company.

Anyway, it's my experience most software upgrades don't involve a schema change, so it's worth optimising for the common case, and supporting the difficult case.

qquark · on Dec 10, 2017

> but it is very unclear how to do it with a distributed repository

Versioning through branching and tagging, while having some drawbacks - at least the fact that you have to DO an operation and that this is not automatic - seem to solve this problem, and are not, in my eyes, a form of monorepo. You globally get more flexibility at the cost of a bit more repo management work.

If the problem is retrieving the right version automatically, externals or submodules should be able to solve this problem. If A and B have no clear dependency direction, a top level repo might help.

geocar · on Dec 10, 2017

> a top level repo might help.

This is the way I generally do it: A repository that represents my system/environment that has submodules for A1, A2, B1, and B2, and scripts for updating the environment.

newfoundglory · on Dec 10, 2017

Not by any definition I can think of. How are you defining "monorepo"?

user5994461 · on Dec 10, 2017

I will get a furry of downvotes for saying it but the cause of your first three problems is git, not the mono repo approach.

Git only allows to check/commit/view to the entire repo at once. Then, some git operations are superlinear with the number of files or revisions, they are slow on large repo to the point of being unusable.

It's mandatory to have operation on a per file or subdirectory level in a mono repo approach. Companies that have mono repos all built tooling to support it. CVS/SVN used to do that out of the box but everyone hate them now.

svckr · on Dec 10, 2017

> CSV/SVN used to do that out of the box but everyone hate them now.

It's "CVS" and it lacks a concept for a repository-wide version (except, maybe, a timestamp). A repository-wide version is –I guess– the single best reason to have monorepo in the first place.

SVN is okay.

Also, `git log -- subdir/`.

k__ · on Dec 10, 2017

Why aren't submodules more wide spread?

Building a monorepo out of submodules should solve these problems, or not?

Also, there is subtree.

iveqy · on Dec 10, 2017

Because submodules is hard to understand. They require that you understand git and that you can add one level of abstraktion.

Once you done that, they är great!

Groxx · on Dec 10, 2017

Also hard to deal with. Lots of operations leave submodules in unclean / out-of-date states.

In general / light use, yeah, they're great. Unfortunately, they have a very large number of edge cases where they essentially require either a) everyone to be experts in the edge cases, or b) tons of new tooling (because existing tools won't take these steps for you).

lamlam · on Dec 11, 2017

That is... until you try to remove a submodule.

I like sobmodules, but the whole feature needs a lot more polish before it's widely adopted.

nickm12 · on Dec 11, 2017

That's fair. I think the important thing is to be able to have multiple versions of portions of your repo and then be able to version that. A branch of branches, so to speak.

Sacho · on Dec 10, 2017

> Every day there are hundreds of changes in the repo and almost all of them have nothing to do with what I'm working on.

Wouldn't the right tooling be able to show you changes to the slice of code you're interested in? I remember SVN would allow you to checkout just a single subdirectory, for example.

> all sorts of operations take longer (pulling, grepping source, etc.) to support code I couldn't care less about.

Makes sense about the pulling, but again, wouldn't grepping be configurable to only search where you need to?

> Frequently have to update the world at once. Unless the repo can store multiple versions of the same module, then all the consumers have to be updated at once, even if it's inconvenient. Sometimes migrations are better done gradually.

I'm not sure I understood you here, can you expand on that? Do you mean all the devs have to update their module? Why not use tagged/branched version of libs instead of working off trunk?

What kind of tooling do you use for the distributed approach?

nickm12 · on Dec 11, 2017

Yeah, it's all about the tooling. The distributed system I used was when I worked at Amazon. Each "package" there has its own git repo, which can have multiple branches and the dependencies of each branch are versioned along with the branch. I wonder if at the end of the day it really matters whether something is a "monorepo" or not if the tooling provides the necessary abstractions to version things the way you need.

gmueckl · on Dec 11, 2017

Did you use standard software for the versioning? We are looking for a solution for this at work and have come up blank so far.

gmueckl · on Dec 11, 2017

Using different branches for different modules becomes hard in a monorepo when branching is a global operation. You can only have a single branch checked out in your working copy then.

The only solution I can think of is to create a copy within the monorepo to create de facto branches without regular VCS support. This would be kind of terrible.

hvidgaard · on Dec 11, 2017

I am a firm believer of using processes and tools that make it hard to do stuff you should not be doing in the first place. Dependencies should be solved ala how FAKE and Paket is doing it, not in a mono repo. It's the same story for many projects in a solution vs few. With few projects you avoid wrestling with cyclic dependencies between projects, on the other hand, that is just a tell tale sign that the overall structure is starting to deteriorate.

I still do not see the appeal in mono repos since you're heavily dependent on discipline to not introduce spaghetti dependencies, where you fix one bug, but introduce 4 new in unrelated parts of the code. Now you solve that with an if statement, and thus we introduced a great deal of technical debt.

nerdwaller · on Dec 10, 2017

> 3) Frequently have to update the world at once

I’ve never worked in a monorepo, so may be wrong, but this point presumes a dependency on “latest” at all times. I’d assume the components in the mono repo still release versions to the various package management systems (maven, pypi, npm, etc), allowing dependencies to be more stable.

Has that not been the common experience by those that have worked in them? I see a lot of merits of having to update everything at once (less code rot, hopefully) but it does seem to have drawbacks (many have commented on these as well).

malkia · on Dec 10, 2017

If the library/component is wide spread, it either can be developed in a branch, and specific version be tagged, or always developed "trunk" mode with feature toggles (may not be always possible, but one can adjust), e.g. - certain features are disabled, and need to be enabled after others come across, or after some specific time, etc.

While at google, we used that kind of development for the project I was in. Someone would push source code changes for new features, but prefferable behind a flag (normally a command-line flag, driven to a configuration, like the one ksonnet has). The confiuration file would say - enable this flag, only the binary was compiled with this CL version, and/or these cherrypicks, or some other rule.

This also allows a feature to be quickly disabled by SRE, SWE, or other personnel if it's found to be not working well.

w_t_payne · on Dec 10, 2017

Both approaches can be made to work. For me, the overriding concern is simplicity and ease of configuration management, so I prefer something that on the surface looks like a monorepo. Somewhat paradoxically, in my attempt to solve the issues that you mention, I ended up scripting my commits and checkouts so I could place the repository in a set of git repositories -- so I have a distributed set of repositories under the hood that look like a monorepo to the people using it! Neat, huh?

zeveb · on Dec 11, 2017

(argh, I wasn't able to post this due to HN's over-eager 'submitting too fast' filter; at the time, I'd submitted all of two comments & one story within an hour — and then I had to restart my browser, which lost the text of this comment)

> 1) Difficult to track changes to the code I'm interested in.

What's wrong with 'git log $PATH'?

> 2) all sorts of operations take longer (pulling, grepping source, etc.) to support code I couldn't care less about.

A different format could help here, as can different tools (e.g. ripgrep or ag instead of grep). The time spent on those operations has to be balanced with the time spent updating your code to deal with someone else's incompatible library changes, again, when the other person is on vacation and you have no idea what the new philosophy of his library is. And you don't have any choice about updating, because another one of your dependencies that you really must update has already been updated to rely on his changes.

> 3) Frequently have to update the world at once.

IMHO that's a feature, not a bug. The person or team responsible for breaking the world is responsible for fixing it, rather than getting to break the world, then pop off down to Barton-on-Sea for an extended holiday while everyone else in the company gets to update his code to use an entirely different idiom.

> 4) Encourages sloppy dependency management.

My experience has been that multiple repos tend to encourage sloppy dependency management, while a monorepo tends to encourage deliberative, collaborative, professional dependency management. That's just my own experience, and of course different organisations will differ.

> I'm sure people will say "if you're having those problems, you're doing it wrong" but the same thing could be said to people who find the distributed model problematic.

My own experience has been that multirepos tend to be like dynamic typing and monorepos tend to be like static typing: multirepos can in theory be done right, but in practice they never are, while monorepos work, but at the cost of people having to colour within the lines. Which makes sense for any particular organisation may actually be a function of its maturity: if a place is trying to move fast and break things, maybe multiple repos make sense; if it's trying to deliver quality software, maybe a single repo makes sense.

albertzeyer · on Dec 10, 2017

You could argue that all your points can be solved also by the right tooling.

s17n · on Dec 11, 2017

1 and 2 are trivially fixable.

3 and 4 are pretty fundamental though, especially 3 - if you don't want to force everybody to keep up with head, you probably don't want to use a monorepo.

woolvalley · on Dec 11, 2017

#1 in git is `git log <subdirectory>`

matttproud · on Dec 10, 2017

Large-scale refactorings are actually pleasant.

My team owns a framework and set of libraries that are widely used within the Google monorepo. We confidently forward-update user code and prune deprecated APIs with relative ease — with benefits of doing it staged or all-at-once atomically.

It's imperfect, but maintenance in distributed repositories is infinitely worse. Still, I remember the earlier days of the monorepo and keeping Perforce client file maps; that was a pain! https://www.perforce.com/perforce/r15.1/manuals/dvcs/_specif...

nradov · on Dec 10, 2017

I attended a talk by one of the Google Guava (Java collections library) authors and he told us how they didn't have to worry about maintaining backward compatibility at all. When they made a breaking change they could check out all of the impacted Java code across Google, refactor it, verify that the tests still passed, and then commit everything in one shot. It's easy to understand the productivity advantages.

matttproud · on Dec 10, 2017

One challenge is latency in the generation of codebase and identifier and callgraph search index (Cf., Code Search and Kythe). We can perform global tests across the entire monorepo, but that takes time. What happens if someone introduced new usage of old API immediately before our atomic refactoring, and what about pathological tests or flakes? This still necessitates doing some cleanups multi-stage: (1.) mark old API as deprecated (optionally announce), (2.) replace and delete legacy usages, and (3.) deletion of final trailing usage sometime soon thereafter once the codebase has been reindexed.

Some languages and ecosystems are more tolerant of this problem than others. That said, incremental cleanup still has advantage with bisecting regressions.

As I said, it is not perfect but broadbased change quickly is relatively easy.

In my time maintaining open source, I never had these luxuries, which is why I said the monorepo is infinitely easier. Another consequence: if global cleanups are easy, perhaps that reduces the barrier to experimentation. Perfect is no longer the enemy of the good and the good enough. For me, I felt in open source where I had zero control over dependent code and its callgraph, the reverse was true: hesitance to publish something for fear of cost.

user5994461 · on Dec 10, 2017

Facebook gave a conference not long ago about then doing the same thing.

Interestingly, they only do that for java code. Java has good analysis and refactoring tools.

danabramov · on Dec 10, 2017

>Interestingly, they only do that for java code.

Not sure what you mean. I work at Facebook, and can confirm we keep all code in a monorepo (or, rather, one of two big monorepos) rather than just Java code.

This lets us easily do React API changes: we can deprecate an API internally, and update all JS code that references the old APIs in a single commit.

user5994461 · on Dec 11, 2017

I meant the refactoring. I've only ever seen it work on Java.

Other languages are much harder to process.

huherto · on Dec 11, 2017

> Interestingly, they only do that for java code. Java has good analysis and refactoring tools

I can definitely see this.

Aissen · on Dec 10, 2017

You can do that with a many-repo, provided you have the right tooling. In fact, I'd argue Google's advantage is in the tooling they built around the repo, not the monorepo itself. E.g how fast you can find all your dependents in the whole repo.

boulos · on Dec 10, 2017

Not really, as there isn't the ability to atomically commit your changes. With 70000+ full-time employees code is getting checked in all the time. Atomicity is extremely valuable.

kadenshep · on Dec 10, 2017

It's like the people commenting about this forget what distributed means. You can have multiple repos, but you can still have a gate way/"source of truth" repo. You can run tests and whatever else on it just like you do in a "monorepo." The power behind Google's/Facebook's choice isn't and will never be the mono repo. It's specifically the tooling they built around their choice.

yoz-y · on Dec 10, 2017

You can. You have many repos and one top level repository with submodules.

Edit: And although this is a multi step process, it still allows you to de-couple modules and work on them separately.

sah2ed · on Dec 11, 2017

The 70,000+ number you cited includes engineers and non-engineers but surely, the actual number of engineers that need commit access will be much lower.

nickm12 · on Dec 10, 2017

Infinitely worse? That's a bit hyperbolic.

InclinedPlane · on Dec 10, 2017

I agree, though I wouldn't say this is a fundamental aspect to distributed systems, mostly just a consequence of git being built with terrible merge and merge conflict tooling.

Groxx · on Dec 10, 2017

What changes can you do that you couldn't have done before, in a multirepo world? And I do mean could not have done - clearly the monorepo enjoys a few hundreds of thousands (millions?) of hours of effort that the multirepo did not.

i.e. what stops your current tools from `for each repo, run...`, or how is monorepo fundamentally more capable than building automated library management / releases / etc with the same level of tooling?

londons_explore · on Dec 10, 2017

With a single commit you can change an API, and all it's users, and run all the tests for all the dependant projects, etc, and you're done. All in a days work, and no emails/communication necessary.

In a multi-repo world, people are probably linking against old revisions of your library, and against certain tags/branches etc. There is probably no overarching code search to find all users of the API. You're gonna have to grep the code and hope to find all uses. You might miss some repos/branches. Everyone has their own continuous integration/testing procedures, so you can't easily migrate their code for them. You're gonna have to support both API's for probably months until you have persuaded every other user to upgrade to the latest 'release' of your code which supports the new API before finally turning off the old API. The work involved in the migration is spread amongst all the project owners, which is probably much less efficient.

As others have said, it's the fully integrated version consistent codesearch with clickable xrefs across gigabytes of source code, cross repo code review, cross-repo testing, etc. which really makes a monorepo work well.

Groxx · on Dec 11, 2017

(edit: shortened dramatically. apologies, earlier wasn't all that useful.)

With the exception of cross-repo code review (I hadn't thought of that one - would be useful for multi-repo too, but I've honestly never seen a multi-repo tool for this, thanks!), this is all just the benefits of standardization, plus a massive injection of tooling enabled by the standards.

Standardization of projects brings huge benefits when it's done right, absolutely agreed. But that's entirely orthogonal to mono vs multi.

joshuamorton · on Dec 11, 2017

Not really. The point is that you can't have the same level of standardization in a multi repo set up.

Groxx · on Dec 11, 2017

    for repo in ls repos
    do
      thing
    done

?

joshuamorton · on Dec 11, 2017

Diamond dependency like issues crop up.

Imagine I have Repos A,B,C. A is a base repo. B and C depend on A, and C also depends on B. If I modify some API in A, and also update all the callsites in B and C, I also have to bump the version of A depended on by B and C, and also bump the version of B that C depends on, otherwise I'll get version mismatch/api compatibility breakages.

To make this work that means that nothing can depend on latest, everything has to have frozen dependencies, and you either need to manually, or via some system, globally track all of the dependencies across repos, and atomically update all of them on every breaking change.

In other words, you reinvent blaze/bazel at the repo level instead of the target level, and you have to add an additional tool that makes sure you're dependencies can never get mismatched.

The monorepo sidesteps this issue by saying "everything must always build against latest".

Groxx · on Dec 11, 2017

"everything must always build against latest" is perfectly enforceable on multirepo too, it's just that nobody does it.

joshuamorton · on Dec 12, 2017

>"everything must always build against latest" is perfectly enforceable on multirepo too, it's just that nobody does it

No, you cannot. That's my entire point. Here's a minimal example:

Repo one contains one file, provider.py:

    def five():
        return 5

Repo two contains one file, consumer.py.

    import provider  # assume path magic makes this work
    def test_five_is_produced():
        assert provider.five() == 5

    if __name__ == '__main__':
        test_five_is_produced()

I also have an external build script that copies provider and consumer, from origin/master/HEAD into the same directory, and runs `python consumer.py`.

Now I want to change `five` to actually be `number`, such that `number(n) == n`, ie. I really want a more generic impl. What sequence of changes can I commit such that tests will always pass, at any point in time?

There is no way to atomically update both provider and consumer. There will be some period of time, perhaps only milliseconds, but some period of time, at which point I can run my build script and it will pick up incompatible versions of the two files.

This is a reductive example, but the function `five` in this case takes the role of a more complex API of some kind.

Groxx · on Dec 12, 2017

or you give your CI the ability to read transaction markers in your git repo. e.g. add a tag that says "must have [repo] at [sha]+". dependency management basically. you can even do this after the commits are created, so you can allow cycles and not just diamonds.

but yes, cross-project commits are dramatically easier in a monorepo, I entirely agree with that - they essentially come "for free".

joshuamorton · on Dec 12, 2017

Didn't you just reinvent versioning and frozen dependencies? What you described is not always building at latest, it's building at latest except when there are issues at which point you don't build at latest and instead build at a known good version.

Consequences of this are, for example, that you cannot run all affected tests at every commit.

Groxx · on Dec 12, 2017

sure. I honestly don't see why that's a problem though, especially since "at every commit" can have clear markers for if it's expected to be buildable or not.

My point here is that you're describing a known problem with known solutions, and saying it's impossible. I'm saying it requires work, as does all this in a monorepo.

edit: to be technical: yes, you're correct, it can't always build at latest at every instant. Agreed. I don't see why that's necessary though. Simplifying, sure; necessary? No.

joshuamorton · on Dec 12, 2017

>sure. I honestly don't see why that's a problem though, especially since "at every commit" can have clear markers for if it's expected to be buildable or not.

The value from this is the ability to always know exactly which thing caused which problem. If you know things are broken now, you can bisect from the last known good state, and find the change that introduced a breakage. With multi-repo, you can't do that, since it's not always a single change that introduces a breakage, but a combination.

Ensuring that everything always builds at latest allows you to do a bunch of really cool magical bisection tricks. If you don't have that, you can't bisect to find breakages or regressions, because your "bisection" is

    1. now 2 dimensional instead of 1
    2. may/will have many false positives

That puts you in a really rough spot when there's a breakage and you don't have the institutional knowledge to know what broke it.

Groxx · on Dec 12, 2017

No, you're back to "we can't build HEAD in a multirepo", which is fixable with CI rules. If you can, you can bisect exactly the same (well, with a fairly simple addition to bisect by time. `git bisect` is pretty simple, shouldn't be hard to recreate).

In any case, unless you have atomic deploys across all services, this is generally untrue. Bisecting commit history won't give you that any more in a monorepo than in a multirepo.

joshuamorton · on Dec 12, 2017

To your first point, I'mma need you to explain how you bisect across a poset, because that's what you just claimed you could do.

To your second point, nothing I've said has anything to do with deployment. We're still entirely in the realm of continuous integration.

Roritharr · on Dec 10, 2017

How many people work on the repository tooling at Google?

I'm asking because i wouldn't know how to setup a mono repository at my 50 people Startup even if we deemed this to be necessary.

rifung · on Dec 10, 2017

> I'm asking because i wouldn't know how to setup a mono repository at my 50 people Startup even if we deemed this to be necessary.

Sorry if this is a really dumb question. If you only have 50 people I'm assuming your codebase isn't that big, so why can't you just make a repo, make a folder for each of your existing repos, and put the code for those existing repos into the new repo?

I imagine there's a way to do it so that your history remains intact as well.

aurelianito · on Dec 10, 2017

Yes, there is. Move the entire content of each repo to a directory and then force-merge them all in a single repo. I did this a few years ago with 4 small mercurials repositories that belonged together.

fla · on Dec 10, 2017

You can do that easly with a subtree

izacus · on Dec 10, 2017

For a 50 people startup, a Git repository will be usually enough. At my previous company we managed to do the monorepo approach easily with similar amount of people and GitHub.

chias · on Dec 10, 2017

Google's mono-repo is interesting though, in that you can check out individual directories without having to check out the entire repo. It's very different from checking out a bajillion-line git repo.

nerfhammer · on Dec 10, 2017

It's kind of interesting that nowadays people assume that version control system == git.

For a huge, non-open codebase there are some pretty large downsides to a fully distributed VCS in exchange for relatively few benefits.

hota_mazi · on Dec 10, 2017

Good point.

It's important to stress that Google uses Perforce and not git (at least for that monorepo, they use git/gerrit for Android).

A monorepo this size would simply not scale on git, at least not without huge amounts of hacks (and to be fair, Google built an entire infrastructure on top of Perforce to make their monorepo work).

dmoy · on Dec 10, 2017

Google doesn't use perforce anymore. It's been replaced with Piper, you can read about it in articles from about 2015 or so. Perforce didn't scale enough. I guess it's not clear to what extent Piper is a layer of infrastructure on top of perforce or actually a complete rewrite? I was never super sure. The articles appear to imply way more than a layer on top...

You are exactly right that git doesn't scale though, go see the posts on git that Facebook's engineers made while trying, only to be met with replies to the extent of "you're holding it wrong, go away, no massive monorepo here", at which point they made it work with mercurial instead. Good read though, lot of good technical details. Can't find the link at the moment though :(, but it was from somewhere around 2012-13 ish.

Edit: here, looks like the original thread is deleted but here's the hn pointer: https://news.ycombinator.com/item?id=3548824

justinjlynn · on Dec 10, 2017

There's nothing wrong with saying "you're holding it wrong" if they're holding it in a way clearly contrary to the solution design. I don't fit in a toddler's car seat and if I tried, it's clearly my fault and not the seat engineer's. I doubt they'd want to accept my changes that would make it work worse for toddlers either.

IshKebab · on Dec 10, 2017

Sure, if you don't care about people actually using your stuff you can ignore their requests. But Facebook and Google are now working on Mercurial rather than git, and Mercurial actually cares about ease of use (whereas git seems to revel in its obtuseness) and the Mercurial folks are looking at rewriting it, or parts of it in Rust to improve performance, which has always been the major issue.

If all those things continue I think the only reason to use git over hg would be github. How long until they decide to support Mercurial too and people abandon git?

justinjlynn · on Dec 10, 2017

> Sure, if you don't care about people actually using your stuff you can ignore their requests.

Yes. End of story. People will abandon things that don't support them for things that do and those that want to continue using something that fits their application will do so. Nothing to see here; we get it, you don't like git -- don't use it if it doesn't fit your needs. However, don't expect those who do like it to go out of their way in a way they don't want to please you. Just because there is a community developed around something and that something is open source does not mean they are required to accept whatever patches come their way -- often the best projects know what to keep out as much as what to let in. In this case, the git community has decided it doesn't want to do those things; more power to them.

mamon · on Dec 10, 2017

>> Sure, if you don't care about people actually using your stuff you can ignore their requests.

I think you nailed the problem with Git here: it was created by one guy to support his pet project and as long as it works well for him all the other feature requests are low priority.

dmoy · on Dec 10, 2017

Agree completely, git is just not the tool for the job, the original thread (which I still can't find, gah), makes that pretty clear.

exikyut · on Dec 11, 2017

Working gmane link: http://permalink.gmane.org/gmane.comp.version-control.git/18...

Edit: scratch that, that works but has no threading. Take two.

These work:

http://git.661346.n2.nabble.com/Git-performance-results-on-a...

http://www.spinics.net/lists/git/msg173931.html

ehllo · on Dec 10, 2017

Google replaced piper with mercurial

https://opensource.googleblog.com/2017/03/dispatches-from-la...

jsolson · on Dec 10, 2017

Mercurial (with lots of extensions) sits on top of Piper at Google. It doesn't replace it.

dmoy · on Dec 10, 2017

I thought it was Facebook that did the mercurial thing: https://code.facebook.com/posts/218678814984400/scaling-merc...

IshKebab · on Dec 10, 2017

Actually that says they are working on improving Mercurial to the point where they can use it.

ithkuil · on Dec 10, 2017

That article doesn't claim that. It only claims that mercurial is used within Google.

kozhevnikov · on Dec 10, 2017

> monorepo this size would simply not scale on git

https://blogs.msdn.microsoft.com/bharry/2017/05/24/the-large...

codeka · on Dec 10, 2017

That's still not even close to Google's repository:

"The Google codebase includes approximately one billion files and has a history of approximately 35 million commits spanning Google's entire 18-year existence. The repository contains 86TBa of data, including approximately two billion lines of code in nine million unique source files."

Source: https://cacm.acm.org/magazines/2016/7/204032-why-google-stor...

nh2 · on Dec 10, 2017

Thanks for this quote.

It prompted me to do a quick afternoon experiment with how git would handle a billion lines of code:

https://news.ycombinator.com/item?id=15892518

joshuamorton · on Dec 11, 2017

As another user mentioned, many git actions scale linearly in the number of changes, not in the size of the repository. Try recreating the scaled repo, but say, in commits of 1000 lines each (ie. 200K commits), and see how long things take.

dmoy · on Dec 10, 2017

Did your experiment also do 40,000 changes per day (35 million commits, of varying sizes throughout the repo), and then see how that affects git performance? My (admittedly crappy) understanding of git is that it also scales on the commits, not just the raw file number/size count.

manojlds · on Dec 10, 2017

This is just the Windows codebase and is relatively small compared to almost the entirety of Google.

joshuamorton · on Dec 10, 2017

Google no longer uses perforce either. I believe it also stopped scaling. They now use Piper, which has a perforce like interface, but is not the same thing.

And there are other non perforce like Piper interfaces.

malux85 · on Dec 10, 2017

Could you please elaborate? I've only used svn and git, and the largest codebases I've worked on have only been about 150k lines of code.

What are the other ones and the main differences, really curious

vvanders · on Dec 10, 2017

Perforce is really common in a few domains because it handles 1TB+ repo sizes cleanly, has simple replication, locking of binary files and a good UI client for non-programmers.

Was pretty much used exclusively back when I was in gamedev, not sure if that's still the case.

dfox · on Dec 10, 2017

For example in svn checking out only some subdirectory instead of entire repo is pretty much the default way how you should use it.

nerfhammer · on Dec 11, 2017

git is designed from the ground up to be 100% distributed. This is useful for small and/or open source projects. It's 100% portable. You can fork and merge between different repos maintained by complete strangers.

Now, imagine you're a huge corporation. Your code consists of millions of files that have been edited millions of times. It's never going to be released to the public. It's never going to be forked, much less by a stranger. You're going to have only one main branch and main build ever, except for maintenance branches. The complete history of everything that has ever happened on that repo is would take up many gigabytes, and developers are probably only ever going to need to look at and/or build locally 0.01% of that code themselves.

If you were going to design a version control system from scratch for the latter scenario and you had never heard of git or any other existing VCS, how would you design it? Would you come up with something like git? Probably not. People would just have local copies of the minimum of what they needed to get their work done, anything else would call some server on the VPN they were always on. And you would probably come up with some whole specialized server architecture with databases and such that wasn't that similar to a corresponding client architecture that it would also need.

hossbeast · on Dec 10, 2017

Such as? Then why has everyone switched to git? (Hint, because it is fundamentally built on more powerful ideas than what came before it)

joshuamorton · on Dec 10, 2017

Open source has moved to git, mostly because being standardized on one vcs made it easier for people to contribute.

A lot of companies don't use git.

davidgay · on Dec 10, 2017

No, before git the standard was svn, and before that, cvs. The switches happened even though there was an existing standard.

progre · on Dec 10, 2017

Having worked with all three, nothing but Stockholm syndrome would keep anyone to switch from cvs. Likevise the switch to git for open source happend (In my opinion) in large part because Github offers a far better experience than SourceForge witch was dominant at the time.

boulos · on Dec 10, 2017

There was actually a brief period where Google Code was ascendant, but then GitHub was demonstrably investing more in collaboration.

I think one aspect of Git that is really important is forking, and having your own local commits. Merging commits and patches in svn were awful. You wouldn't ever allow someone random to join your svn repo, but if they can reasonably provide a patch, you could take it. Git makes that massively easier.

TheArcane · on Dec 10, 2017

>There was actually a brief period where Google Code was ascendant,

brrrr

kevin_thibedeau · on Dec 10, 2017

People switched to Git because Svn merging continues to suck and branching and tagging are implemented in the most inane way possible.

Const-me · on Dec 10, 2017

I didn't care about merging and tagging.

For me the main feature was distributed nature. SVN is OK on a gigabit corporate LAN with dedicated people to manage & maintain the servers + network. Anything less than that, and it becomes slow and unreliable.

emodendroket · on Dec 10, 2017

Well clearly not "everyone" has since we're apparently talking about a company that hasn't.

jsolson · on Dec 10, 2017

While technically true due to some features of tooling, that is really only masking off part of the repo under a READ-ONLY directory.

Builds can (and usually do) depend on things that aren't part of your local checkout.

I'd say CitC is a much more accurate representation of the way Piper and blaze "expect" things to work.

netheril96 · on Dec 10, 2017

The dependencies are still downloaded only on need though.

joshuamorton · on Dec 12, 2017

The dependencies aren't really "downloaded" at all. When you build something, the artifacts are cached locally, but the files you are editing generally speaking not actually stored on your machine. They're accessed on demand via FUSE.

srj · on Dec 10, 2017

This used to be done manually via "gcheckout" but that's long since been replaced. Users now don't do anything but create quick throwaway clients that have the entire repo in view.

Until very recently there was a versioning system for core libraries so those wouldn't typically be at HEAD (minimizing global breakage). Even that has been eliminated now and it's truly just the presubmit checks and code review process that keeps things sane.

jedmeyers · on Dec 10, 2017

> truly just the presubmit checks and code review process that keeps things sane.

also rollbacks :)

Too · on Dec 10, 2017

Microsoft is solving this for git with their GVFS.

Another issue with git monorepos is access control, does anyone know of good solutions for this, does GVFS solve this also?

exclusiv · on Dec 10, 2017

Yeah that was what was nice about SVN. You could check out paths.

paganel · on Dec 10, 2017

> Google's mono-repo is interesting though, in that you can check out individual directories

The same is true of svn, which many people like to bash nowadays, even in this discussion.

vvanders · on Dec 10, 2017

That's pretty common feature in most non-DCVS. It was nice having Perforce on the last game I worked on. The art directory was ~500GB and not fun to pull down even with a P4 proxy.

haglin · on Dec 10, 2017

I work in a large company and I have used a central repository for six years and a distributed for six years. I think a central repository is better. The benefits are:

1) Transparency. I can see what everybody else is doing and if somebody has an interesting project I can find it quickly. You can also learn a lot from looking at other peoples changes.

2) Faster. To check out the source code for the project I now work on takes an hour in the distributed system, while it only took 5 minutes in the centralized system.

3) Always backed up. All code that is checked into the central repository is backed up. It has happened twice that employees have left and code was lost because they only checked it in locally.

Many have only used CVS or SVN, which are horrible. I rather use Git or Mercurial, but Perforce is really good.

JoshTriplett · on Dec 10, 2017

> 1) Transparency. I can see what everybody else is doing and if somebody has an interesting project I can find it quickly. You can also learn a lot from looking at other peoples changes.

This doesn't require a single central repository, just that all repositories live in a common location.

> 2) Faster. To check out the source code for the project I now work on takes an hour in the distributed system, while it only took 5 minutes in the centralized system.

What distributed repository management system do you use, and what centralized system did you use?

> 3) Always backed up. All code that is checked into the central repository is backed up. It has happened twice that employees have left and code was lost because they only checked it in locally.

As with point 1, this doesn't require a single central repository, just that all repositories live in a common location.

monksy · on Dec 10, 2017

>1 Single repo

It's more of a matter of your tool to visualize change set history.

>2 Faster

This again is an issue with the tool quality. There needs to be meta git repos. Groups in Github and Gitlab attempt to create a shallow sense of that.

>3

Always push. That's not an issue that is resolved by a single central repo.

mindcrime · on Dec 10, 2017

This doesn't require a single central repository, just that all repositories live in a common location.

Even better, if every project includes a DOAP file (or something similar) and/or you publish commit messages using ActivityStrea.ms or something, you could easily have an interface that shows project activity around the organization, regardless of how many repositories and/or servers you use. Of course it's probably easier if all the repositories live in a common location...

candiodari · on Dec 10, 2017

I use git-svn to use a central repository. Let me list the advantages

1) Faster

There is no comparison. But let me count the ways

a) checking out stuff

It is faster than just downloading a directory using SVN.

b) just trying something out (ie. branch)

Creating a branch, making a few changes takes me seconds, and does not require me to change paths like it does for the svn victims I work with. Throwing it back out again takes seconds, and all operations are reversible for when I fuck up (which is often).

c) merging

Git's merging. Oh my God. In half the cases I just have to check stuff over, if that.

d) submitting

We use code review. Unlike most of the subversion folks I can easily have 5 co-dependant changes in flight (5 changes, each depending on the previous one) without going insane, and I have gone up to 13, not counting experimental branches. I observe around me that it takes a good developer to manage 2 with subversion. 5 is considered insane, I bet if I showed them the 13 were in flight at the same time they'd have me taken away as a danger to humanity.

2) always backed up

Subversion doesn't back up until you commit and people don't commit anywhere near quickly enough ... The way people lose code around here 99.9% of the time is by accidentally overwriting their in-flight code contributions (the remaining 0.1% involves laptop upgrades and overenthusiastic developers. Even then cp -rp will just copy my environment and just work, and yet the same is absolutely not true for the subversion guys).

Now with Git, I commit every spelling fix I make, every semicolon I have forgotten, on occassion separately, other times with "--amend". And only then make my share of stupid mistakes, after committing, something that's technically not impossible on subversion but not practical, mostly because of code review ("just commit it" on subversion takes ~5 minutes in the very fast case (that requires a colleague dropping everything that very second, AND can't involve any actual code changes, as that trips a CI run that takes 3 minutes assuming zero contention), and 20-30 minutes is a more typical time (measured from "hey, I'd like to commit this", to actually in the repository). Committing on git takes me the time to type "<esc>! git commit % -m 'spellingfix'". The subversion commit time means that developers often go for weeks without committing. Weeks, as in plural.

I get that a git commit isn't the same thing as a subversion commit. But it does allow me to use the functionality of source control, and that's exactly what I'm looking for in a source control system. Subversion commit doesn't allow me to use source control without paying a large cost for it, that's what I'm getting at.

So I have backups guarding against the 99.9% problem (and an auto-backup script that does hourly incremental backups for the 0.1% case). The subversion guys are probably better covered for the 0.1% problem. Good for them !

3) actual version control

Git's branches, rebase, merge, etc mean I can actually work on different things within short time periods in the same codebase.

The fact that other developers are using subversion means I can have my own git hooks that I use for various automated stuff. Some fixing code layout, some warning me about style mistakes, bugs, ... (you'd be surprised how much your reputation benefits from these). Some updating parts of the codebase when I modify other parts, ... you have to be careful as these are part of the reason subversion is so slow (esp. the insistence on CI, I hear a CI run at big G, which is required before even code review can happen, takes upwards of an hour on many projects with some taking 8-9 hours)

icebraining · on Dec 10, 2017

The discussion wasn't really about SVN vs Git; you can have one or multiple repositories with either system.

candiodari · on Dec 10, 2017

You'd have the same problems with any other centralized versioning system the way companies use it these days (ie. with CI, and code review).

joshuamorton · on Dec 10, 2017

Not really. I work at google. I work on a leaf, so my CI takes < a minute. I also can send out multiple chained changes, in a tree, to multiple reviewers, and have them reviewed independently.

Certainly, CI takes a long time for certain changes, but those are changes that affect everything. You'd have the same problem in a multi-repo approach if you updated a repo that everything else depended on. At some point, you have to run all of the tests on that change.

candiodari · on Dec 10, 2017

Cool. I've wondered about Google's CI a lot, but there are a lot of horror stories online. Most people are complaining about it taking an hour for simple changes (something called "tap", I wonder what that stands for).

Chained code review changes, I refuse to believe that in Google version control (which is perforce according to Linus' git talk at Google) chained changes are easy. Branching in perforce is literally worse than SVN, it's a bit more like the old CVS model, and they've sort-of tried to get the SVN copy-directory model forced into the design afterwards. Also the tool support (merges ...) is bad compared to subversion and stone-age compared to Git's tools.

The one reason I keep hearing for using perforce is that perforce allows the administrator to "lock off" parts of the repository to certain users.

I've done branches and merges in Git, Subversion and CVS (and I've had someone talk me through one in Perforce, but I don't really know). Google's branch/merge experience is very likely to be somewhere between SVN and CVS, and those can accurately be referred to as "disaster" and "crime against human dignity". It's certainly not impossible, but it's very hard and you can't expect me to believe (normal developer) people can reasonably do that in Perforce.

Also: what would happen if you send out 20 chained commits, 10 of which are spelling corrections, 5 of which are trivial, compile-fixing bugs (forgot semicolon, "]" that should have been ")", etc ...), 2 of which are small changes to single expressions and 3 of which introduce a new function and some tests. Perforce, like subversion and cvs doesn't have any way of tracking stuff unless you commit it and you can almost never commit without CI and code review, so would you track changes like that, or would you just leave them in your client untracked until you're ready for a code review ?

joshuamorton · on Dec 10, 2017

>Cool. I've wondered about Google's CI a lot, but there are a lot of horror stories online. Most people are complaining about it taking an hour for simple changes (something called "tap", I wonder what that stands for).

Well, like I said, its possible to do modify things that have a lot of dependencies, at which point you run a lot of tests, but that would be truish anyway. Consider the hypothetical situation where you're changing you're modifying the `malloc` implementation in your /company/core/malloc.c`. Everything depends on this, because everything uses malloc. If you have a monorepo, you make this change, and run (basically) every unit and integration test, and it takes a while.

Alternatively, if `core` is its own repo, you run the core unittests, and then later when you bump the version of `core` that everything else depends on, you run those tests too, but now if there's a rarely encountered issue that only certain tests exercise, you notice that immediately when you run all the monorepo tests, and can be sure that the malloc change is the breakage. If you don't do that, then you notice breakages when you update `core`, or maybe you don't notice it, because its only one test failing per package, and it could just be flakyness. So noticing it is harder, and identifying the issue once you've decided there is one is harder, and now you need to rollback instead of just not releasing.

>Chained code review changes, I refuse to believe that in Google version control (which is perforce according to Linus' git talk at Google) chained changes are easy. Branching in perforce is literally worse than SVN, it's a bit more like the old CVS model, and they've sort-of tried to get the SVN copy-directory model forced into the design afterwards. Also the tool support (merges ...) is bad compared to subversion and stone-age compared to Git's tools.

Google no longer uses perforce, we use Piper (note that this is a google develped tool called Piper, not the Perforce frontend called Piper, yes this is confusing, afaik, Google's Piper came first). Piper is inspired by perforce, but is not at all the same thing. (See Citc in the article). The exact workflow I use isn't piblic (yet), but suffice to say that while Piper is perforce inspired, Perforce is not the only interface to Piper. This article even mentions a git style frontend for Piper.

>Google's branch/merge experience is very likely to be somewhere between SVN and CVS, and those can accurately be referred to as "disaster" and "crime against human dignity". It's certainly not impossible, but it's very hard and you can't expect me to believe (normal developer) people can reasonably do that in Perforce.

Suffice to say you're totally mistaken here.

>Also: what would happen if you send out 20 chained commits, 10 of which are spelling corrections, 5 of which are trivial, compile-fixing bugs (forgot semicolon, "]" that should have been ")", etc ...), 2 of which are small changes to single expressions and 3 of which introduce a new function and some tests. Perforce, like subversion and cvs doesn't have any way of tracking stuff unless you commit it and you can almost never commit without CI and code review, so would you track changes like that, or would you just leave them in your client untracked until you're ready for a code review ?

So, Piper doesn't have a concept of "untracked". Well it does, in the sense that you have to stage files to a given change, but CitC snapshots every change in a workspace. Essentially, since CitC provides a FUSE filesystem, every write is tracked independently as a delta, and it's possible to return to any previous snapshot at any time. One way to think of this concept is that every "CL" is vaguely analogous to a squashed pull request, and every save is vaguely analogous to an anonymous commit.

This means that in extreme cases, you can do something like "oh man I was working on a feature 2 months ago, but stopped working on it and didn't really need it, but now I do", and instead of starting from scratch, you can, with a few incantations, jump to you're now deleted client and recover files at a specific timestamp (for example: you could jump to the time that you ran a successful build or test).

>Also: what would happen if you send out 20 chained commits, 10 of which are spelling corrections, 5 of which are trivial, compile-fixing bugs (forgot semicolon, "]" that should have been ")", etc ...), 2 of which are small changes to single expressions and 3 of which introduce a new function and some tests.

I'd logically group them so that each resulting commit-set was a successfully building, and isolated, feature. Then, each of those would become its own CL and be sent for independent review.

Too · on Dec 10, 2017

I think you are confusing central/distributed with monorepo/multiple repos. Also distributed VCS doesn't imply that you don't have a central master somewhere.

leetcrew · on Dec 10, 2017

perforce has such a janky ui though. whenever i try to do anything significant with my company's codebase, the whole application locks up for hours. i guess i need to learn how to use the cli.

mcbain · on Dec 10, 2017

This might not only be GUI vs cli, it can just be down to the granularity of your client mapping - if the p4 server thinks it needs to lock across large regions of depots it can go into the weeds.

I always try to have the absolute minimum in my client specs, but sometimes you do need to operate over the world.

The perforce docs are generally well written, worth looking at them.

itronitron · on Dec 10, 2017

yeah, the perforce UI is super easy to crash

adrianmonk · on Dec 10, 2017

The third problem can essentially be solved by doing all your production builds by checking out code from some central repository. If you follow that rule, then you guarantee you'll have the source code for every binary in production.

That way, you can still have a distributed repository (Git, Mercurial, etc.) if you want. Even if some code exists only in some developer's local repository, it's presumably not that big of a deal since that code can never have made it to production.

cletus · on Dec 10, 2017

So as someone who previously worked for Google and now works for Facebook, it's interesting to see the differences.

When people talk about Google's monolithic repo they're talking about Google3. This excludes ChromeOS, Chrome and Android, which ard all Git repos that have their own toolchains. Google3 here consists of several parts:

- The source code itself, which is essentially Perforce. This includes code in C++, Java, Python, Javascript, Objective-C, Go and a handful of other minor languages.

- SrcFS. This allows you to check out only part of the repo and depend on the rest via read-only links to what you need from the rest.

- Blaze. Much like Bazel. This is the system that defines how to build various artifacts. All dependencies are explicit, meaning you can create a true dependency graph for any piece of code. This is super-important because of...

- Forge. Caching of built artifacts. The hit-rate on this is very good and it consumes a huge amount of resources given the number of artifacts produced. Forge turns build times for some binaries from hours (even days) into minutes or even seconds.

- ObjFS. SrcFS is for source files. ObjFS is for built artifacts.

This all leads to what is usually a pretty good workflow like the ability to check out directories if you want to modify them and just use the read only version if you don't. You can still step through the read only code with a debugger however.

Now Facebook I have less experience with (<6 months) but broadly there are four repos: www, fbobjc, fbandroid and fbcode (C++, Java, Thrift services, etc). At one point these were Git but for various reasons ended up being migrated to Mercurial some years ago.

The FB case (IMHO) highlights just how useful it can be to have one repo. Google uses protobufs for platform independence. FB uses GraphQL at a client level and Thrift at the service level.

So one pain point is that, for example, you can modify a GraphQL endpoint in one repo but its used by clients in others (ie mobile clients). There are lots of warnings about making backward-incompatible changes, some of them excessively pessimistic because deterministically showing something will break some mobile build in another repo is hard.

Google3 has less of these problems because the code is in the same repo. On top of that, Google has spent a vast amount of effort making it so the same build and caching systems can handle C++ server code as well as Objective-C iOS app code. Basically if you're working on Google3 you basically compile very little to nothing locally.

Engineers on Android, Chrome and ChromeOS however compile a lot of things locally and thus get far beefier workstations.

At FB the mobile build system doesn't seem to be as advanced in that there is a far higher proportion of local building.

IIRC the Git people seemed to reject the idea of large code bases. Or, rather, their solution was to use Git submodules. There was (and maybe is?) parts of the Git codebase that didn't scale because they were O(n). Apologies if I'm misspeaking here but I peripherally followed these discussions on HN and elsewhere years ago as someone from the outside looking in so I'm no authority on this.

The problem of course is that Git submodules don't give you the benefits of a single repo and I've honestly not heard anyone say anything good about Git submodules.

Just to stress, the above is just my personal experience and I hope it's taken as intended: general observations rather than complaints and definitely not arguing that one is objectively better than the other. There are simply tradeoffs.

Also, there are definite issues with Google3, like the dependency graph getting so large that even reading it in and figuring out what to build is a significant performance cost and optimization issue.

epage · on Dec 10, 2017

I have two main concerns when I see monorepos being used.

First, like in other areas, I see companies that want to "google scale" and blindly copy the idea of monorepos but without the requisite tooling teams or cloud computing background / infrastructure that makes this possible.

Second, I worry about the coupling between unrelated products. While I admit part of this probably comes from my more libertarian world view but I have seen something as basic as a server upgrade schedule that is tailored for one product severely hurt the development of another product, to the point of almost halting development for months. I can't imagine needing a new feature or a big fix from a dependency but to be stuck because the whole company isn't ready to upgrade.

I've read of at least one less serious case of this from google with JUnit

> In 2007, Google tried to upgrade their JUnit from 3.8.x to 4.x and struggled as there was a subtle backward incompatibility in a small percentage of their usages of it. The change-set became very large, and struggled to keep up with the rate developers were adding tests.

https://trunkbaseddevelopment.com/monorepos/#third-party-dep...

bunderbunder · on Dec 10, 2017

> I worry about the coupling between unrelated products.

I even worry about coupling among related products.

I could see monorepos working out well for a company that just does SaaS, and is able to get away with nice things like maintaining a single running version of the app, and continuous delivery.

Having mostly worked in companies that do shrinkwrap software or that allow different teams or clients to manage their own upgrade schedule, though, monorepo seems to me like a recipe for a codebase that is horribly resistant to change. Not just in the "big bang upgrades like JUnit4 are awful" ways described above, but also in a, "We never clean up old stuff, because most of the time when we try it breaks a bunch of other teams' code and we just nope out of that whole hassle, so barely-supported code sort of collects continuously, like dead underbrush in a forest that's never allowed to burn, until eventually it all explodes in a horrible conflagration," sort of way.

Seeing the list of things that Google keeps in a monorepo, vs things that Google keeps in Git repos, it seems like they might be thinking similarly. They've really only got a precious few products that typically run on non-Google-owned hardware, and apparently the major ones live outside the monorepo.

smallnamespace · on Dec 10, 2017

The dynamics actually played out very differently at Google. Because it was a monorepo with automated testing, if you didn't want other teams to break you when they change the dependencies, then you had better have a robust test suite.

Breaking changes would then lead to a discussion with your team, rather than your fruitlessly trying to binary search to find the commit that broke you.

Over time, the culture at Google became that all teams need to write tests at the unit, functional, and (usually) integration level.

joshuamorton · on Dec 10, 2017

>They've really only got a precious few products that typically run on non-Google-owned hardware, and apparently the major ones live outside the monorepo.

This depends on what you mean. Most/all consumer android applications don't run on google-owned hardware, but are in the monorepo.

That said, you're right that the whole "keep things up to date" thing is important. That's where tools like rosie and even bots come in.

rifung · on Dec 10, 2017

> Seeing the list of things that Google keeps in a monorepo, vs things that Google keeps in Git repos, it seems like they might be thinking similarly. They've really only got a precious few products that typically run on non-Google-owned hardware, and apparently the major ones live outside the monorepo.

I always thought it was more that the things which take open source contributions are hosted in Git while the internal things would be hosted in Google3.

cletus · on Dec 10, 2017

First, I agree with you about companies worrying prematurely or unnecessarily about “google scale”. You saw this a lot in the NoSQL hype days. You can go pretty darn far with even a single MySQL instance.

Second, source level dependencies bs binary level dependencies has s a choice and a commitment.

Even at large companies release schedules can really hinder you.

I didn’t hear about the JUnit issue but I can believe it. With code bases this large you have to get really good at static analysis (dynamic languages are your enemy here), tooling for refactoring and just general hygiene of the code baee.

argonaut · on Dec 10, 2017

For the first concern you have things totally flipped. A monorepo actually seems best suited for a small company, with a small codebase and a small set of services.

If anything, the stage of company where it makes most sense to have many small repos is when you have a large company with multiple unrelated products, services, teams, etc.

bonzini · on Dec 10, 2017

You're right that monorepos are just fine for small company. The problem is that at some point they don't scale without code review discipline and sophisticated tooling (such as Google's); thus there is a bottleneck where it becomes harder and harder to scale it until you get your tools right.

argonaut · on Dec 10, 2017

Yes, but that point is probably when you have hundreds of engineers, at which point you can afford to have (and probably will already have) a few engineers working on internal tooling.

user5994461 · on Dec 10, 2017

>>> First, like in other areas, I see companies that want to "google scale" and blindly copy the idea of monorepos but without the requisite tooling teams or cloud computing background / infrastructure that makes this possible.

Mono repo will work fine for most small and medium companies without issue, even on top of git.

The need for special tooling and performance issues will only pops up when you have millions upon millions of lines of code.

ridiculous_fish · on Dec 10, 2017

> broadly there are four repos: www, fbobjc, fbandroid and fbcode

This used to be true, but today these are all in fact the same hg repo (www as a possible exception, I'm unsure). The "sparse checkout" machinery disguises it, but for engineers working cross platform (e.g. React Native) it's routine to make commits that span platforms.

rdl · on Dec 10, 2017

It would be really interesting to compare internal tools across both the biggest tech companies and various startups.

(All of it is kind of depressing when one's comparison is non-tech companies, though.)

yeukhon · on Dec 10, 2017

I'd add that most technical companies today, unless at the scale of Google/Facebook do not necessarily (and most likely don't) have very sophisticated tooling in place. You can imagine Jenkins is always in place and codes are splitted into "self-contained" repos; I don't know if Google/Facebook uses Jenkins, but I know Netflix certainly does.

I haven't heard much about Microsoft or Amazon, though I do know from a friend working at Apple their toolings are not always consistent from team to team. I would appreciate if we have someone from these other big tech companies to discuss their development workflow.

As a SRE/DevOps, I love working on internal toolings because I get to feel like creating my own programming language - I can be creative but focus on solving problems in my domains.

jingwen · on Dec 10, 2017

> I don't know if Google uses Jenkins

Yes, Google runs hosted Jenkins internally.

[Video] https://www.youtube.com/watch?v=rJXmGGu1kf8

[Slides] https://www.cloudbees.com/sites/default/files/2016-jenkins-w...

vitus · on Dec 10, 2017

With the caveat that Jenkins is not the main form of CI at Google -- that spot goes to TAP.

ianamartin · on Dec 10, 2017

Why would it be interesting?

Startups don't need most of the things that big companies use. Trying to use them before you need them seems like an absurd waste of time.

rdl · on Dec 10, 2017

Big, non-tech companies use completely crap tools in a lot of cases, compared to even 50 person startups. Google's a clear outlier in terms of tool quality, even for a tech giant, but I've seen some great in-house or newer tools used by a lot of startups, too. There are often huge differences in tool quality vs. task for companies of the same size/stage, too.

ianamartin · on Dec 10, 2017

I respect that opinion, but it's different from my experience. My experience is that non-tech companies just don't care (i.e., we can do what we want), and that startups spend way too much time worrying about the tools they will need when they get as big as google. Instead of, you know, getting as big as google.

warent · on Dec 10, 2017

Why would you so willingly give up what are intended to be company secrets like that?

scarmig · on Dec 10, 2017

These are only secrets in the most tendentious sense. Half the people at Facebook worked at Google before anyway, and there are maybe a half dozen companies that could plausibly get the benefits of going Google-scale for source control. And they already have all their internal systems: if they lack all the same capabilities as Google, it's not because Google's systems are secret but because of other challenges.

bsder · on Dec 10, 2017

Heh. Youngsters. They still think that ideas are the bottleneck. :)

Google could hand you the source code, and you still wouldn't be able to implement what they have and compete against them.

Execution is more important than almost anything else.

Occasionally you need to cough up a really clever idea. Those days are really rare, though.

Jach · on Dec 10, 2017

Yeah, the mono repo / distributed division is almost a red herring.. A key component to the division though is fault and responsibility in making something as relatively unimportant as the source control and build systems work well.

Google has put a lot of money and effort to make their system nice. Working in a mono repo without that much effort is very frustrating, doubly so because there's nothing individual teams can do about it. It's especially worse if you can't even make team specific branches on the mono repo to try and isolate yourself from the steady stream of breaking changes elsewhere.

However if you're a team lucky enough to get out and do most things on your own git repo, then you're now the only ones responsible for making that better or worse. Fortunately there's a ton of open source to learn from and use, so taking control of your own team's destiny to get to a point better than before doesn't have to mean much work.

DannyBee · on Dec 11, 2017

" Google has put a lot of money and effort to make their system nice. Working in a mono repo without that much effort is very frustrating, doubly so because there's nothing individual teams can do about it."

Sure there is. They can architect code in a way that it doesn't break heavily when other people do things. IE abstract things reasonably. They can test things well. and they can complain when other teams aren't doing either and it's making them less effective.

themind · on Dec 10, 2017

A refreshing comment. I feel like people often overlook the importance of execution. Though, basing one's execution off shaky ideas is not really the best either.

voidfunc · on Dec 10, 2017

Because it's not really secret? Numerous articles have been written about this stuff for years and years and also it's not hard to get [G|X]ooglers to talk about this stuff candidly in casual convo.

underwater · on Dec 10, 2017

GraphQL backwards compatibility has nothing to with cross-repo constraints. It’s because there are millions of clients running old native code.

jingwen · on Dec 10, 2017

Is there a distributed execution backend like Forge at FB? If so, how similar is it to Forge?

wocram · on Dec 10, 2017

>> There was (and maybe is?) parts of the Git codebase that didn't scale because they were O(n).

This is indeed true, the various efforts to upscale git (msft gvfs, etc) run into this and try to upstream things, but it is slow going.

sytse · on Dec 10, 2017

I think that what is missing when using multiple git repos is the ability to make a code change that spans multiple projects. We're open to adding that to GitLab

jacobr · on Dec 10, 2017

This will help if you don't want to run a monorepo, but many in the industry consider a monorepo suitable for their organisation. The biggest blocker with Gitlab in my opinion is not being able to only run a job if some specific folder was modified (https://gitlab.com/gitlab-org/gitlab-ce/issues/19232).

sytse · on Dec 10, 2017

I agree that functionality would be great to have. We planning it in the next 3 months but we also very open to someone contributing it earlier.