Hacker News new | past | comments | ask | show | jobs | submit login
Monorepo is great if you're really good (yosefk.com)
225 points by mr_crankypants on July 30, 2019 | hide | past | favorite | 151 comments



> the entire code base got forked, and the entire org is now held hostage by the dumbass.

> Of course in a Good team, needless dependencies would be weeded out in code reviews, and a Culture would evolve over time avoiding needless dependencies.

Really, the one consistent thing is that if you have a good team, you'll make it work no matter what tech or decisions you make (assuming you're also good enough to know when you've lost and change course), and if you're a bad team, you're doomed to failure, because, well, you're bad (by definition).

I think this article also vastly underestimates the cost and annoyance of the tooling of CI'ing a large number of repos, especially if you have to match or do some kind of cross product on the feature branches. (such as, repo A branch B can only be built with repo C branch F, but all the other repos should be master)


"Really, the one consistent thing is that if you have a good team, you'll make it work no matter what tech or decisions you make (assuming you're also good enough to know when you've lost and change course), and if you're a bad team, you're doomed to failure, because, well, you're bad (by definition)."

The middle ground is vast, and nuanced in many dimensions. Which is a good thing, because there sure aren't very many large, good teams (by this definition of "good").


Totally agree. Also many teams aren't all "good" or "bad" people, but more a mix of people with different skillsets and viewpoints, and it's not as if all good people think you should do it one way, and bad people the other.

I think the normal case though is that there are too many cooks, and they spoil the broth. I've had teams where one person wants to go off and make their own repo just because, and can't be convinced to follow the rest of the team. Sometimes these are good people, although I find a lot of "bad" people don't want to be team players and be consistent, even if being consistent means doing something they don't like.


> repo A branch B can only be built with repo C branch F, but all the other repos should be master

This alone is pretty much what makes me prefer monorepos. If you don't have a stable interface for all of your in-house dependencies (and nobody does early on in a project), you're doomed to spend a ton of time matching branches like this. Not to mention, a naive build process of "grab the latest everything and build it" will break in that period of time when you've merged the feature branch in one repository but not the other.


All modern dependency management tools solve this problem by allowing dependencies to be specified as URI's.


That comes with its own problems. You could end up spending 3 hours tracking a bug you thought you had fixed two weeks ago, only to find out that some other service is pinned to a commit three weeks ago.


>> Of course in a Good team, needless dependencies would be weeded out in code reviews, and a Culture would evolve over time avoiding needless dependencies.

Heh. This reminded me of a different story, which I remember vaguely enough that I'll paraphrase from memory:

> "The Excel team will never go for it. Their motto is 'Find the dependencies... and eliminate them.'"

> This probably explained why the Excel team had its own C compiler.


Like a lot of things in a Good Team, dependency curation requires a tempo. This will be done, and putting it off doesn't get you out of doing it.


Sounds like Joel on Software.



Tooling CI into a monorepo can be nasty, too. Do I update my staging deployment for every job on every commit? You can slow down deployments pretty fast, too, and make integration a real pain.


Exactly. You have the same problem in a monorepo since the problem is validating a change actually works without breaking an unknown number of other things indirectly.

There is a little bit of a novel problem in correlating N feature branches and cloning them, but it's not that much more complicated than correlating N subprojects in a monorepo.


Observation: the quality of the work done by a team is too often a MIN or PRODUCT over the members of the team, not a SUM.


One of the concepts being pushed by some in the Lean community is the idea that some activities need to be bounded at all times.

You don't want people favoring starting new work while someone else is flailing on old work. The fact that they started this two weeks ago indicates it was probably higher priority than whatever you might start today. If today's work is an emergency (eg, if we could go back in time, we'd have started this immediately) then sure. But barring extenuating circumstances, go help Paul. He's been staring at that code for weeks and making no progress.

I think having a rule for how many branches/forks whatever you want to call them, can exist at once might be a good idea. Every time another opportunity to use a branch comes up, the older reasons have to defend their continued existence. Having to explain yourself over and over is a form of positive peer pressure, if potentially a little passive-aggressive (solution: use an assertive person to be the messenger).


Umm, appoint someone or a trusted few as git admins and only allow them to merge commits?

(responding to the article contents quoted in above comment)


I imagine this would add a lot overhead when the git admins are not well-versed in all subprojects of the monorepo.


Is the monorepo/multirepo choice really the most important thing to consider?

Branching: monorepo or not, if a feature-incomplete development branch for one of the supported targets can "hold the entire organization as a hostage" then the SCM people, and/or persons responsible of the SCM policy, should do some introspection...

Why are deliveries done from a branch which is obviously still in development? Why does code-to-be-released need to depend on incomplete work? Why aren't something like "topic branches" used?

Modularity: monorepo or not, problems will certainly appear when the complexity of implementation outpaces the capacity created by the design. To get modularity, one needs actual modules with properly designed (=not brittle, DRY, KISS, YAGNI, SOLID, etc. etc.) interfaces between the modules. Now, does monorepo/multirepo really play a role here at all? If everyday changes are constantly modifying the module interfaces in incompatible ways which breaks existing code, this speaks something about the design, or rather the insufficiency of it.

Of course, every project and team is different. However, even if a locally optimal choice for the monorepo vs. multirepo question is found, problems existing regardless of monorepo/multirepo will still be there.


I was wondering the same thing. Like, I use a microservice architecture at work, but the choice of using one vs. many git repos to represent diffs in those services over time seems largely meaningless. I don't like large diffs or people breaking production, but this is solved by testing and insisting upon small diffs, not by how many git repos we use.

Formally speaking, multi-repo management allows a strict subset of the diffs allowed to a mono-repo (because diffs can 't extend beyond each repo root). Are the excluded possibilities all bad? No. Are they generally bad? Not really. Are they sometimes bad? Sure. Are they sometimes better than many diffs across many repos? Sure. Can a reasonably competent dev team tell the difference? Sure, usually. Unsurprisingly, this usually requires the exact same tooling as ensuring the quality of microrepo changes.

If you're continuously deploying master, have a healthy ci/cd pipeline, and enforce good merging discipline, you're fine either way.

I'm a little tired of doing things like revving our trace and logging libraries across our 50+ micro repos that represent microservices. That's genuinely obnoxious. Is it bad? No. Is it obviously more or less error prone than the equivalent monorepo update? No. All the bad bits of either strategy just require some tooling and a clear head.


So far I've only managed to find one thing that monorepo fundamentally offers that micro does not: atomic commits across projects.

But I'm not sure that's a useful feature anyway:

1) If you are doing a whole-repo refactor (one of the main atomic-commit benefits I see claimed), you still have to run on X -> try to commit X+1. If someone committed in between you may have to redo the whole thing. Or lock the whole monorepo while doing so. Both scenarios seem worse to me for mono, since microrepos stand far less of a chance of conflicting (less frequent commits, less code to consider (faster refactoring tool runs), etc) and a lock would be a far smaller interruption (one repo vs the whole company).

2) Atomic commits don't represent how things are deployed. You still have to deal with version N and N-1 simultaneously. So e.g. breaking refactors of RPC APIs have exactly the same problems in mono vs micro.

On the other hand, downsides are pretty clear and take immense work to sidestep: most tools will either be much slower or not work at all, because they now need to work on 100s or 1000s of times more data than they were developed against. That's probably thousands of man-years of tooling you may have to understand and improve, or wholly replace.

---

The vast majority of monorepo benefits that I usually see claimed are actually tool-standardization benefits. Or "we could build tool X to do that". Or top-level control, like "we can commit for team X". Of course that's useful! But it has nothing to do with monorepo vs microrepo.

Monorepo just happens to be the carrot/stick used to finally achieve standardization. Others could work, this is just the current fad (which, in some ways, is why it sometimes works - it's easier to convince others).


As the project scope and customer base continues to grow, the likelihood that you picked all the correct boundaries within the system drops to zero.

When the right boundaries reveal themselves, you can divide the code up. But who is to say those will still be the right ones in ten years?

If you divide the source code into separate repositories before getting the boundaries right, there's a tremendous amount of friction built into the system preventing the problem from being addressed. Each repository has its own actors, cycles, and version control history, and you break two of those when you start trying to move code across project boundaries. So people just hit things with a hammer or steal functionality (three modules with a function that ostensibly does the same thing but with different bugs).

One of the things I see over and over again is people conflating one repository with one lifecycle. One binary. It's possible to have a monorepo with multiple build artifacts. The first monorepo I ever worked on had 60 build artifacts, and it worked pretty well (the separate artifacts weeded out a lot of circular dependencies).

I can still get inter-version dependency sanity checks with a monorepo. When I am writing new code I can have everything talk to localhost (master@head) or I can have it talk to a shared dev cluster (last labelled version) or some of both, allowing me to test that I haven't created a situation where I can't deploy until I've already deployed.


I completely agree with your comment.

Monorepos will not save a company from their lack of discipline. But while you can have problems if you do stupid things in a monorepo, you will always have to deal with the dependency hell and what come with it on multirepos.


My team just switched to a monorepo. It's been only a few weeks, so I can't claim any results yet, but we've lived w/ the pain of poly-repo for long enough that we were ready to invest in a single repo.

We've spent a lot of time building and iterating a unified ci/cd environment to support the new repo. Previously each project had it's own test/deploy/build/publish story and usually it's own jenkins project. Now, each project is registered and triggers its own steps. Cross-project edits can happen in a single pull request. We have an incredible amount of integration tests (more so than unit tests), and getting them to work corss-project while migrating has been challenging.

We've gone from ~10-15 actively maintained repos to about 3 as we're slowly migrating. We have a mix of services, libraries, and batch processing all mixed in.

The authors points about forking and long-lived branching being incredibly difficult for most teams is really crucial. We're going to have to invest in education for new members about WHY we have a monorepo, what it means for your development, and how to change your perspective for developing at HEAD. I don't think 'bad' developers make it easier or harder. Instead, clearly articulating behaviors that exist in a poly-repo vs mono-repo world to developers is the Differentiator.

These articles were absolutely crucial to developing our monorepo.

https://trunkbaseddevelopment.com/

http://blog.shippable.com/ci/cd-of-microservices-using-mono-...

https://www.godaddy.com/engineering/2018/06/05/cicd-best-pra...


> My team just switched to a monorepo.

I feel like this discussion is missing an appreciation for size/scope of repositories vs. size/scope of the organisation developing that software, with a pinch of appreciation for Conway's law.

If your team is a typical team of at most, say, 30 people, then maintaining 15 different repositories is clearly insane, but merging them into a single one likely doesn't truly deserve the moniker "monorepo", because it's just not that large (and varied in scope and purpose) of a project at the end of the day.

Think of it this way: the Linux kernel is certainly a larger project, but nobody thinks of it as a monorepo. Same thing goes for major software projects like Qt.


How do you handle building changes to just one of those projects? Can Jenkins do that (easily)?

I think that's the big thing that always puts me off monorepo... We'd basically be going from ten 5 minute builds to one 50 minute build if it wasn't possible to do incremental builds. IIRC Google and MS have purpose built tools that do impact detection to work out what to build for their monorepos to keep build times down.


If you're doing a monorepo I think it's strongly implied that you'll also use a build system (Blaze/Brazil/BuildXL etc) that has granular compilation units and output caching so build time doesn't scale linearly with the company's total codebase.

It's definitely important to consider before jumping in. Going from 5m to 50m compile times would be a major issue for me.


Err what is even the alternative? Even a makefile would provide incremental builds.


A good makefile provides incremental builds in every possible scenario of source file changes. It's very easy to write bad makefiles that don't catch modified or removed header files, or even worse when code generation or other complicated build logic is involved. Because a lot of developers seem to think "random build fail? oh just do make clean" is an acceptable workaround.


Good to know, thanks. Something I'll have to do a bit more reading up on.


It was a bit hacky, but we've basically implemented some of the stuff in [1] to achieve incremental builds. If a pr changes projects a,b,c and not x,y,z then it will only build a,b,c. But it's not truly incremental right now, as it won't test things that depend on A/B/C.

We have plans to use Bazel in the future, but you have to boil the ocean when moving to bazel and get everything ever inside bazel before you get any benefit out of it.

Jenkins can't do it "easily" but it definitely can. I'd be happy to share our Jenkinsfile if you'd like.

Our finding of changes is something like:

#!/bin/bash set -euxo pipefail

COMPARE_BRANCH=$1

MERGE_BASE=`git merge-base $COMPARE_BRANCH HEAD` FILES_CHANGED=$(git diff --name-only $MERGE_BASE | grep '/') echo ${FILES_CHANGED} | xargs dirname | cut -d "/" -f 1 | sort | uniq

[1] blog.shippable.com/ci/cd-of-microservices-using-mono-repos


If you have to do make clean in a monorepo you are pretty much toast. Tooling for impact detection and reliable makefiles that always succeed incremental builds is absolutely crucial.

In a way this is one of the hallmarks of a monorepo - Interfaces and dependencies changing so quickly it becomes too troublesome for humans to categorize (and re-categorize) them into repositories, so you let a machine (makefiles) do the work instead. And even without a monorepo you still have the same problem, eventually you will have to integrate all your mini repos into one final product, which you want to have tested. This is something you want to do as frequently as possible, ideally on every commit, not by doing major version-steps of sub-projects.


I suspect for that number of projects monorepos make a lot of sense.

The major technology organizations we hear about usually have at least several monorepos, due to the legacies of acquisitions and mergers if nothing else.

At the scale of thousands of subprojects, I am not entirely sure the benefits are as advertised. There will be support of subprojects forked to public github.com or gitlab.com if nothing else. And there will be external dependencies to manage; system level libraries like openssl and libc if nothing else. Even if they are vendored in to the monorepo, any upstream regression is a significant problem in a monorepo... and the problem sometimes has to be solved in a big bang instead of incrementally.


Going from 15 to 3 is definitely a different discussion from going from 7000 to < 5.

At 15, it feels like it's kind of just a toss up. We have several thousand repos, and sometimes we see 5-10 of them that really should be grouped, and we do so. Sometimes we see 1 repo that has 5-10 projects in it, and we break them down. Whatever works.

But when the entire org is on a dozen project you're potentially in the worse of both worlds. Your repos aren't small enough or aligned with team ownership enough to really benefit from it. So its straight overheard.


FWIW at least 95% (anecdotally) of Facebook’s main code is in two gigantic monorepos: fbsource and www. (The other major repos are for configuration-related stuff).

Last I heard there were plans to move www into fbsource.

There are certainly not random dependencies on public GitHub pages. Everything is versioned.

There is a mind boggling amount of custom tooling to make this work.


The versioned external dependencies work for systems that support semantic versioning. Some dynamic languages. Some C. Definitely nothing with a non-C ABI.

But "not working" looks like fixing an unknown number of bugs across the various subrepos. Because permanently forking upstream it never applying security patches isn't a good business model.


Linus Torvalds said something about this, in relation to microkernels. But the gist is in how interactions between many pieces makes the whole thing complex. Here's the quote from his book "Just for Fun".

"The theory behind the microkernel is that operating systems are complicated. So you try to get some of the complexity out by modularizing it a lot. The tenet of the microkernel approach is that the kernel, which is the core of the core of the core, should do as little as possible. Its main function is to communicate. All the different things that the computer offers are services that are available through the microkernel communications channels. In the microkernel approach, you’re supposed to split up the problem space so much that none of it is complex. I thought this was stupid. Yes, it makes every single piece simple. But the interactions make it far more complex than it would be if many of the services were included in the kernel itself, as they are in Linux. Think of your brain. Every single piece is simple, but the interactions between the pieces make for a highly complex system. It’s the whole-is-bigger-than-the-parts problem. If you take a problem and split it in half and say that the halves are half as complicated, you’re ignoring the fact that you have to add in the complication of communication between the two halves. The theory behind the microkernel was that you split the kernel into fifty independent parts, and each of the parts is a fiftieth of the complexity. But then everybody ignores the fact that the communication among the parts is actually more complicated than the original system was—never mind the fact that the parts are still not trivial. That’s the biggest argument against microkernels. The simplicity you try to reach is a false simplicity."


> "...The simplicity you try to reach is a false simplicity."

Also applies to some microservice architectures I've seen. People completely disregard the complexity (and overhead!) of the interactions between microservices.


I feel like I've become a crusader against microservices for the same reason.

They're so easy to set up and they immediately solve problems. But they also create many more, which aren't immediately obvious. And very few people are willing to say, "I was totally wrong to move to them, and let's spend some more precious time rolling them back."


As somebody who built some stuff with microservices before, I think the key is, that you don't view any architectural pattern as a silver bullet. All things come with their own trade offs.

It can make totally sense to break out certain parts into their own microservices if you thought long and hard about the interface and which data is going to be passed around – but if you break things out into their own microservices just for the heck of it, you will end up in a very messy mess quickly.

Using microservices to solve problems which don't demand them is similar to using OOP patterns in places where they are known to bring pain: you are holding a hammer and you think the world is made of nails.

That beeing said I am sure this has nothing to do with the real practicality of the underlying patterns, it just shows how easy people can lie to themselves.


This is a poor take away. Even in a monolith, you still want to separate your concerns, yes? Your code should be almost as abstracted in a monolith as they are in micro services. In microservices the code is just deployed across multiple instances.

RPCs and local method calls both need to be fault tolerant and race condition free. As you break up datastores, transactions become more complex but certainly you had a specific reason to do that so the complexity isn't a choice.

Sure the communication layer is added complexity but that too is should be abstracted into boilerplate such that you shouldn't have to think about it. Overall the added complexity requires more work but it shouldn't really make your business logic problems more complicated.


Micro services are a different kettle of fish though

Firstly, the benefits they offer have often little to do with the architecture itself, but with the bigger picture (separating teams, CIs, allowing different stacks, managing costs, scaling, etc).

Secondly, unlike microkernels, not all microservices have to talk to every single other microservice. If you have a service to send emails, say, there'll be a few services that interact with it, but the majority won't. The same for an image resizing service.

So what you say doesn't necessarily hold


Does anyone here do Microservices well? And keep them in a monorepo?


We (~150 eng) build microservices in a polyglot environment (mostly Python, JS, and Go), all in a monorepo! We also build + deploy in containers with Jenkins, etc.

The structure looks something like this:

    |-- third_party (third party libs)
    `-- python
        |-- libs (internally-written, common dependencies)
        `-- application_name
            |-- client (react app, connects to flask)
            |-- server (flask app, connects to services)
            `-- services (microservices)
We use a dependency-management + build tool (https://www.pantsbuild.org/index.html, we started before Bazel was public) to manage dependencies. Without pants, our repo would be a mess.

Let me know if you have any questions, I'm happy to answer them! I'm super happy about our setup and eager to share it :)


> Without pants, our repo would be a mess.

Filed under sentences I never thought I'd hear.


Hahaha, because pants is a bad tool? Or because it sounds funny? I’m sympathetic to both :)

In defense of pants, I meant that our repo would be a mess without a versioned dependency graph + reproducible builds. Of course other tools give you that too, and definitely do it better than pants does.

I guess I should have said “without some build tool”, our repo would be a mess.


Organising apps by language seems weird...


Hierarchical layout demands choices be made, but there are advantages in grouping source packages by language: 1) can naturally reflect object packaging models of target language (eg python, jvm). 2) this can encourage reuse of packages across projects.


I've seen this and it works rather well in that one can have per-language bottom level build setup (with autoconf for C, setuptools for Python, ...).


I agree with the GP that it seems weird, and with you that it has its benefits. Of course, other approaches have their own benefits.

Personally, I feel like the top-level directory ordering for a monorepo is somewhat arbitrary, in that you can argue for anything, but it probably doesn’t matter; especially if you have a decent build tool.


Exactly, they each have a maintenance cost which isn't shared when they're all separated.


It boils down to gut instinct again, since there is no clear line of how much complexity to accrue before splitting something up. I think he likes to keep Linux in one repo because that makes it easier for him to watch the project and manage it. Currently, if linux gets separated into 10 components, Linus will still have to keep an eye on all 10, so from his point of view, complexity is still the same, or worse. But if he could actually let go and let someone else completely own another component, this would not be the case. Bottom line is that he is smart enough to be able to do manage a repo that large, which only proves the author's point.


If I'm not mistaken, another argument for microkernel was the isolation of each modules. For example, if I'm using some driver X and it crashes, rest of my system will continue to work fine. That's not the case with monolithic kernel. I think this safety is pretty cool to have.

Although, Linux still probably does the best job of being stable compared to rest of OS I use (Windows, macOS). I can't recall the last time I got into kernel panic or crashed (despite worse drivers in some cases).


I've been positively suprised by Windows in the presence of some issues with bad graphics drivers. On Windows, the screen flickers, the "guilty" app dies and a popup appears in the corner "sorry, the graphics subsystem had to be restarted". Whereas GPU driver issues on Linux typically leave you at the text console at best. (a driver going totally haywire can of course bring down both entirely)


This is mostly because of a long history of crummy graphics drivers on Windows led to countless bluescreens. In the past, if your Windows machine bluescreened, it was a safe bet that the graphics driver was the cause.

Microsoft had enough telemetry telling them this that they spent a large effort restructuring the graphics driver subsystem so that it could crash and burn and be restarted without affecting the rest of the system.

Although Linux already has the isolation, it doesn't have the clean recovery. Since the year of the Linux desktop hasn't arrived yet, Linux is yet to make this journey.


> The theory behind the microkernel is that operating systems are complicated.

Seems like a blatant straw man to me.


The advantage of microkernels is that they can be extended with “untrusted” code like hardware drivers or file systems. This runs in user space and thus any bugs in such code will not crash the kernel process.

So I agree with you that Linus is presenting a straw man and your comment shouldn’t have been downvoted.


> The advantage of microkernels is that they can be extended with “untrusted” code like hardware drivers or file systems. This runs in user space and thus any bugs in such code will not crash the kernel process.

Did this advantage play out in practice? If your filesystem module goes down then every module that talks to the file system module needs to gracefully handle the failure or it will still effectively crash the system.

Or the module core dumps and the system keeps chugging on, but everything is locked up because they're waiting for the return from the crashed module. Did MINIX have a way to gracefully restart crashed modules?


> Did this advantage play out in practice? If your filesystem module goes down then every module that talks to the file system module needs to gracefully handle the failure or it will still effectively crash the system.

If the file system process crashes then in theory the OS would simply relaunch it.

But your core services should be stable, it’s more about extensions, for example you may want to have virtual file systems (ftp, sshfs, etc.), which until FUSE wasn’t possible in the non-microkernel world.

As for how it played out in practice: I think microkernels lost early on because of performance and things like FUSE were created to allow the most obvious extension mechanisms for the otherwise non-extendable monolithic kernels.


That's the theory yes, but I was asking about real life. Did those early microkernel systems actually deliver?

Also, for anything stateful, like a filesystem, simply relaunching it may not be sufficient. You need to make sure it hasn't lost any data in the crash and possibly rewind some state changes in related modules.


> That's the theory yes, but I was asking about real life. Did those early microkernel systems actually deliver?

According to Wikipedia “[MINIX] can also withstand driver crashes. In many cases it can automatically restart drivers without affecting running processes. In this way, MINIX is self-healing and can be used in applications demanding high reliability”.

While this kernel was originally written to teach kernel design, all Intel chipsets post-2015 are running MINIX 3 internally as the software component of the Intel Management Engine.

Another widely deployed microkernel is L4, I assume this has similar capabilities, as it also puts most things in user space and is used for mission critical stuff.

> Also, for anything stateful, like a filesystem, simply relaunching it may not be sufficient.

True, but simply rebooting when the kernel process crashes due to buggy driver code won’t be sufficient either :)

FYI when Apple introduced extended attributes their AFP (network file system) did have a bug that made the kernel (and thus entire machine) crash for certain edge cases involving extended attributes.

In that case, had their AFP file system been a user space process, I may still have lost data, but it would have saved me from dozens of reboots.


My nvidia driver regularly hangs my system every ~90 minutes or so, so I can certainly empathize with the goals & vouch that they still have a role today.


Please note that Linus wrote an operating system that in practice showed greater reliability than competing commercial microkernels. I do not believe that the principles that he came to believe in that process should be dismissed as straw man arguments.


> […] showed greater reliability than competing commercial microkernels

What is your basis for this claim?

I am only aware of QNX as a commercial microkernel (and real-time OS) and that is widely used in cars, medical devices, etc. with a strong reputation for reliability.

But for many tasks, Linux is good enough and free, which is hard to beat. But that does not mean that Linus is automatically correct in his statements.


According to the public advertising at the time, Windows NT, GNU Hurd, and Mach were all designed as microkernels. Mach of course is the basis for OS X.

At the same time that Windows NT was being claimed as a microkernel, Linux was outperforming and had a reputation as being more reliable. Ditto with Mach. And GNU Hurd famously was hard to get running at all.

QNX is highly reliable, but is also a specialized use case.


Source? Speed I can imagine, but not reliability.


Tell that to my (lack of) graphics drivers. You can say its political but as it stands its no where near apples to apples in terms of what Windows supports vs what Linux supports.


Which video card do you have that lacks drivers for Linux? Or do you need fully open source drivers that fully support 3D acceleration and computation?


And yet somehow Linux manages to run on a greater variety of hardware than Windows does.

I am of course including supercomputers, embedded hardware, and hand-held phones. Admittedly Windows has greater support for is running consumer hardware for desktops. But that has to do with how small the Linux marketshare is. And is hardly an indictment of Linus' work.


It's not an indictment at all. I'm just pointing out that there is no apples to apples comparison and it's misleading to imply there is.


So similar in principle to FUSE, but applied more broadly? Seems like a neat idea.


In practice it was less useful than people assumed, because:

1. Things like drivers and filesystems are usually written by a small handful of vendors, who already have rigorous engineering cultures (hardware is a lot less forgiving than say web design), and a large base of demanding users who will rapidly complain and/or sue you if you get it wrong. When was the last time you personally had a crash due to a driver or filesystem issue? It used to happen semi-frequently in the Win95 days, but there was a strong incentive for hardware manufacturers to Fix Their Shit, and so all the ones who didn't went out of business.

2. You pay a hefty performance price for that stability - since the kernel is mediating all interactions between applications and drivers, every I/O call needs to go from app to kernel to driver and back again. There's been a lot of research put into optimizing this, but the simplest solution is just don't do this, and put things like drivers & filesystems in the kernel itself.

3. The existence of Linux as a decentralized open-source operating system took away one of the main organizational reasons for microkernels. When the kernel is proprietary, then all the negotiations, documentation, & interactions needed to get code into an external codebase become quite a hassle, with everyone trying to protect their trade secrets. When it's open-source, you go look for a driver that does something similar, write yours, and then mail a patch or submit a pull request.


Its only a false simplicity if you still need to track the interaction between everything and everything else.

When you break up a problem the goal is to find clear bottlenecks of complexity such that you can abstract a thing to its inputs and outputs and ignore the complexity within. You reduce the amount of knowledge required from any given perspective, thus reducing peak cognitive load.

Sure the system is as or possibly slightly more complex, but there is a distinct advantage to reducing the peak complexity of any given sub-problem.


This is analogous to the debate about when to break long functions into shorter ones. The simplicity argument usually doesn't consider the increased complexity of all the interactions between the new, shorter functions. If you omit that, you get a misleading picture, or as Linus puts it, a false simplicity.


>>>* They don't like to have to clone lots of other repos, and to then worry about their versions (in part because the tooling support for this might be less than great).*

I think this is the argument around which the whole post is made. Everyone does want to work in a small space where they control everything. I want to see git log with just my code commits - so I'll make a microservice out of it.

All other arguments are just there to wrap this one. I think it's wrong.

At an organisational level, a monorepo is more good than bad because it simplifies dependency management and makes for a low-ego team.


Counterpoint: monorepos make it easier to neglect packaging and dependency management since it's mechanically possible to assume any files in source control are on-hand. This makes it trivial to have implicit dependencies and see the uglier implications of Hyrum's Law.


It's not that hard to make a library/module, and then type:

    git log .


I don't agree with the author that the quality of the team is what determines if a mono-repo is appropriate or not. It doesn't really matter if you use a mono-repo or not, what matters is what individual engineers are empowered to do.

If engineers or even team-leads don't have permission to create a repo themselves well then you're probably going to see benefits from a monorepo.

At the same time if you have say a ~100 people sharing a repo then you have to make sure that you have tooling that allows each team to customize their building and test environments themselves, which is hard because many CI solutions assume that One Repo = One Build status. Implicit in the author's reasoning is the principle that good engineers don't make mistakes; they don't break the build, not ever. But of course they do, everyone does, and if you have a hundred developers builds will be broken and people's productivity ruined.

Perhaps because we're an industry so prone to failure we keep looking for that one solution, that given a good team makes all problems go away. Agile, XP, Monorepos, Containers, Microservices: we tell ourselves will solve our problems and get those pesky business people off our backs for good. But they won't and never will.

What really matters is enablement, how can we get our code into production doing what it was intended to do without having your toes stepped on all the time. If you design your processes and tooling around enablement not the cargo-cult flavour of the month buzzword invented in the modern beautifully architected, yet completely open office spaces of one of the FAANG companies, then maybe then you can get some actual results.


> Implicit in the author's reasoning is the principle that good engineers don't make mistakes; they don't break the build, not ever.

They don't, ever, because the VCS refuses a push if it breaks the build.

That's the problem with all those single-repo discussions. It works perfectly well if you have all the tooling that makes a single repo work like a multi-repo.

And it's great because you can enforce behind the scenes that everything is coherent... Except that you can enforce the same thing on a multi-repo if you write the equivalent tooling. All the points are completely moot, except the one that if you don't have a ton of tooling, a single-repo won't work at all, while a multi-repo will just be not great.


I actually tend to find the opposite - with a multi repo, CI/CD is often just plain broken and people waste significant amounts of time cargo culting working setups (and often do so badly or don't keep up with upstream changes).


> Monorepo is great if you're really good, but absolutely terrible if you're not that good. > Multiple repos, on the other hand, are passable for everyone – they're never great, but they're never truly terrible, either.

The calculus is trickier than this.

He thinks the above is true because of this other thing he says:

> With multiple repos, modularity is the norm.

But if this were true, being "Good" would be easy. I wish programming tools were this able. Then I could go to the pool every day!

But just because you're using some feature of a build or programming system -- like modules or classes or namespaces -- doesn't mean you get the win. Certainly doesn't mean you know how to wield these tools.

In the end the technical feature doesn't save you. You actually have to have a hard-earned skill, which is how to properly modularize code into stable components with narrow stable interfaces and all of that. This skill is very rare ime.

Now back to monorepo vs modules.

If you use modules but you suck at modularization you're going to be paying a huge tax. Because you'll be creating volatile code/interfaces and you'll have to go through a process each change. You will be amplifying the tax from your lack of skill that you wouldn't if you were just in a single monorepo.

On the other hand if you use a monorepo and you suck, you won't experience ^^this^^ pain and you'll be at a much higher probability of staying sucking.

In short, programming language and build features don't bestow skills.


Its funny and makes some good points but I don't think the distinction is between good and bad teams but whether or not you're 1 team or more.

If one team has one application that is split across multiple repositories it can be a productivity boost and a simplification to unite them into a single repo with some single tools and norms.

If you have two teams working primarily on two sets of repos and two different systems or applications, by all means split them into two (or more) repos. Just be cause its called a "monorepo" doesn't mean you can't have more than one!

It may be simpler to have one, it may be simpler to have many. Do whats simpler for you! I happen to think that it primarily depends on how your teams are organized more than on who is in the teams or their "badness" levels.


I think this only works if the teams are actually working on different products. If Team A and Team B are both writing to and reading from the same data stores, then I’d say they’re likely both working on the same product, and you have a multirepo.


> What do you do when you have a branch working on Android and another branch working on iOS and you have deliveries on both platforms? You postpone the merge, and keep the fork.

Honestly, it never occurred to me that you're deploy from more than one branch. If you can't merge the branches into <your main branch that releases are built from>, then what's in the branch doesn't make it into a release (from my experience).


Seriously, this part threw me hard. Having multiple active forks is not something that's ever been considered as an option anywhere I've worked. Worse, I'm not convinced multi-repo even fixes this issue if you already have a culture that allows multiple active forks.

For instance, big app rewrite, half-new REST API on the backend. Oh, but we need to maintain the old app APIs for those who can't update (like SuperImportantCustomer). Better fork!


I suppose it depends on your definition of "active" here but having release branches for previous major versions that are still supported with things like backported security fixes etc. is a pretty common setup.


> Honestly, it never occurred to me that you're deploy from more than one branch.

Release branches! Deploy 1.1 from the 1.x branch on the same day you deploy 2.2 from the 2.x branch. 1.x merges into 2.x which merged into master.


Does anyone here have experience using submodules to tie individual repositories together? We've been using this in our small startup (only a few developers) and so far it works nicely. It allows us to check out and develop repositories individually but at the same time maintain an exact dependency graph for our entire system. You can for example have a single repository that ties together different projects and has submodules that contain specific references e.g. to the frontend, backend and deployment repositories. You can then use this master repository e.g. for deployment and integration testing. Gitlab makes CI/CD in such a setup very easy and checks out submodules recursively during a build, even across groups.

The drawback is that not many people are familiar with submodules and they can be a bit tedious to set up, though working with a submodule is almost like working with a normal file in git. One danger is of course that branching between individual submodules can get messy. Another nuisance might be that you have to commit recursively, i.e. if you have one repository with a submodule to which you make changes you need to first commit these changes in the submodule and then create a new commit in the parent repository that adds the new version of the submodule. Maybe this is a good thing though as it forces you to commit changes individually in each submodule before committing a larger change into your main repository. In general I would avoid nesting submodules more than two levels deep, as this can quickly get confusing.

In the past I've also worked on a large mono-repositoriy and enjoyed it as well, just curious to hear if anybody has used submodules in a larger team.


I have bad experiences with submodules. It was only one submodule just one level deep, and it only contained something like a convoluted configuration shared among multiple repos.

As you say, getting a change first into the child and then into the parent requires double PRs and testing. But that's the same as if you had it as a package dependency. Only that instead of a version, you have a sha1 which you never know what it is.

I prefer package dependency, because it forces you to explicitly make a release, where it should have passed PR and some integration tests. Also merge conflicts are clearer.


Yes, I have experience and it has always seemed a great idea with a handful of developers and then come back to be a disaster when the team scaled. They become a nightmare to keep everything properly up to date when you have a lot of people working on all of it. Git sub repos worked a bit better, but overall I really recommend submodules exclusively when the submoduled repo changes very infrequently.


That is very, very subjective

> With multiple repos, modularity is the norm. It's not a must - you technically can have a repo depending on umpteen other repos. But your teammates expect to be able to work with their repo with a minimal set of dependencies.

You'd think so.. but no. I am working with multi-repo project where some repos have about dozens of dependencies, all developed locally, and interdependent on each other. Bumping the basic repo is very hard and frustrating. I miss my monorepo every day, where I could just make a PR and fix all consumers at once, where I had a CI which would test all modules at once.


One interesting new thing in this post that I've not seen in this debate is that polyrepos tend to enforce acyclical dependency graphs.

All monorepo projects I've ever worked on enforced the same either through the language involved or mechanically, and it was universally a good thing.


Does anybody know some monorepo horror stories?

I have heard plenty of people complain about their many-repo structure and wishing for a monorepo. I would like to hear some concrete story where a monorepo went wrong. This article is just abstract opinion.


Oh yes. #1 problem I've experienced at multiple places: delaying integration with tons of branching.

Usually the PM or PO force everyones to using some vague "product version" they track for public releases. Places that use monorepos well tend to have very few branches (like, maybe 2 or 3), which have nothing to do with your public product versioning scheme, and instead use "branch by abstraction" to stay integrated.

But what I end up seeing, is that the product team and middle management gets involved with dictating version control, e.g., "this will be version 1.2, and then that should be 2.3, ok let's cut those branches...", and then they change their minds as some team has to delay and before another is ready. And then bugs start rolling in from both testing and they don't know what to do, and they start asking people to just "get it done", and then, things really start falling apart. You add 10+ teams trying to use branches for their own work based off of god knows what and it becomes a mess of crazy integration problems.

I seriously think that a huge benefit of multiple repositories, is that is scares the pseudo-technical managers and product people into not bothering with trying to track or dictate usage of the version control system.


My experience is similar. The newly promoted CTO used a botched merge and delayed release to justify the move to a complex but half-assed monorepo branching strategy with minimal tooling and documentation, trashing any systems teams had in place. We've started botching releases, losing prospective clients, and now are in release limbo.


Or you could just tell them to stay off versioning/release/branching discussions, even in a monorepo. All they need to know is what release/deploy has what features, and when the release happens.


Easy to say when you're starting from scratch, but in all of my cases, these were "pre-existing conditions", i.e., was in place before I joined, and it took a long time to get people to see the light. It gets real tricky when you're _not_ a manager, and you're basically asking other managers to stop having such a loud voice in things.

Just another case of "culture eats strategy for breakfast". Once people start using any particular versioning strategy, it becomes canon, even if it's a terrible way to organize.


Yeah sorry if I came off as dismissive. It's usually a long process, if that's how things run.


Not horror stories, but I've used mono repos in a company with 50+ projects and the results tend to be tight coupling, macro level spaghetti and libraries being left to atrophy due to fear of change.

For example I can be working on project B and need to make a change to Lib A, so I make the change commit my work and now project Z broke. Now I have to learn whatever the hell project Z is because it's not my responsibility and we may not even have anyone responsible for it. Then I have to work out of the changes to lib A need to be reverted, backward compatible or if project Z needs to be updated. This sort of thing with 10 libraries and 40 apps and the complexity that every individual developer has to deal with goes off the charts.

Separate repos with versioned packages don't necessarily fix this but they do let you manage it a lot better, whoever is working on project Z can update it's version of lib A at an appropriate time (or never).


Sounds like a feature and not a bug. It prevents irresponsible breaking changes of lib A interface and just hoping that some other Z team will clean up after you.

Postponing the required Z change to later could be seen as beneficial in some scenarios but what if the change you made to lib A was a security fix, then you would want all apps of that lib to be forced updated right away. Then your your change should be backwards compatible, monorepo or not.

If you want to have reusable components then make sure they are reusable, if you want a special version of lib A that only works with lib B you are essentially forking lib A making it not longer a reusable lib, just a subdir for project B. Interface versioning could help with such non backwards compatible changes, in a monorepo you normally do this with a /2.0-directory.


> It prevents irresponsible breaking changes of lib A interface

But sometimes breaking changes are necessary and a mono repo provides no responsible option way to make them. We keep things backward compatible where possible but this creates just as many problems as it solves when cruft builds up over years.

> just hoping that some other Z team will clean up after you.

Ideally you notify them and they'd keep somewhat up to date, but the yes it's up to the team that work on that project to do so. I can't be an expert on everything but the mono repo assumes I am.

> in a monorepo you normally do this with a /2.0-directory.

Now you've reinvented version control and package management.


Not sure if it's a horror story, but I was on a team considering breaking up a monorepo mainly due to build times. Basically our CI would build from scratch a lot more often than we'd like rather than incrementally. It'd also run the full suite of tests more often than we would like. This made builds often take something like half an hour, which if there were multiple PRs open could force a wait of an hour or more before an approved PR could be merged (since CI would need to re-build and re-test a PR if another one was merged before it). This was exacerbated by a version of the branching problem mentioned in a sibling comment.

This wasn't an insurmountable problem. Something like Bazel probably would've done wonders here rather than our homegrown incremental testing logic (as well as nailing down why incremental builds weren't happening). Personally that's where I would've invested time rather than splitting up the repo. I'm not sure what ended up happening. I moved onto another project before seeing the conclusion of that conversation.

FWIW, a younger me led a break-up of another monorepo into a multi-repo that I now regret and think caused more pain than it was worth (likely because I split the repo along the wrong lines). And so I disagree with the premise of the article. If you split your repos incorrectly, you can cause more pain than not splitting your repos at all. Long build times are annoying and a velocity-killer. Moreover in the long run you can get ball-of-mud problems that repo boundaries make harder (that was probably the biggest impetus for why I wanted to break up the repo in the first place). However, incorrect version linking due to miscoordination of fast-moving dependencies in different repositories is a production-services-killer, and that caused us no end of frustrations. This was in addition to the annoyances around the fact that we had several different JVM languages that all had different build systems in each repo meaning that cross-repo edits were even more difficult than usual to corral together locally on a developer's machine since the build artifacts all depended on each other, but this was expressed in different ways in different repos.

Just as a bad abstraction is worse than no abstraction, I believe bad modularity is worse than no modularity.

Note that tooling helps this as well; tooling that exposes the transitive dependency chain of production services can reveal inconsistencies in what you thought was the version of a dependency that was deployed and what was ultimately deployed. But that means that both multirepos and monorepos need tooling.


I asked a similar question a previous time the mono- vs many- repo question came out, and the few responses I got were roughly "The repo became many tens of gigabytes which was unwieldy"


Unfortunately, breaking a 10 gig repo into 5 repos often means you have to now download five 2 gig repos.


That can still be better, e.g. when part of the problem are tools that work on a per-repo level and fall over/get slow if the repo gets to large.


How do these tools work when it comes to working on multiple repos? I imagine they either don't, or that it's worse than working on a single monorepo of the same size.


I think one should not underestimate the powers of the not-so-good-team to make anything go sour. Who says the not-so-good-team isn't going to make sure that any change needs to touch at least 3 repos and quite regularly as many as 10?

As for the philosophical issue that yosefk raises, I generally advocate solutions that work for the case that the team is good. I tend to think that if the team is not good you would be cooked anyway. Also, if you raise the stakes people might actually start learning a bit.


> In a Good team, you don't have multiple concurrent branches from which actual product deliveries are produced, and/or where most people get to maintain these branches simultaneously for a long time.

These sorts of Black and White, naked assertions drive me nuts. Buried in this statement is an assumption that the only software model worth even discussing is SaaS software - all copies of the code being run are run by your team, so master@HEAD is our ground truth at all times (except during deployments, which BTW are happening at least 10 minutes out of every hour of every day...)

Teams that sell applications or allow self-hosting, or even some SaaS shops with large enough customers are going to have to maintain multiple release branches. Possibly for years. From personal experience, anything above 3 seems to become unsustainable. But having 3 repos (a monorepo with 2 active branches + master) may be the right answer for you. One can't work, and 100 is murder. Stop the pendulum in the middle.


> If you agree with the above, the choice is up to your personal philosophy. To me, for instance, it's a no-brainer

The most obvious issue with this post is it fails to acknowledge that for any production scale company, you can't blindly say that a decision like this is a choice of personal philosophy (unless you're starting a brand new project from scratch, in which case, spending tons of time structuring repos well probably isn't your first priority since new projects have very little code).

I'd love to see more articles that discuss repo structure in the context of a pre-existing codebase with hundreds of thousands of lines of production code and 10+ engineers collaborating on it.

For anyone reading: I'd be interested to hear anecdotes from people working at companies that have successfully (or unsuccessfully) re-structured a monorepo, the reasons you did it, how much time was invested in the restructure, and whether you think it was a net positive long-term.


I think the whole idea of simplicity vs complexity, and the linked article "Worse is Better" is fundamentally ill-conceived. Simple vs. Complex is contextually relative. Literally every observer is subject to their own opinion of a design. Success in the wild = success. I think success in the wild, especially in our age of information overload, depends on being "convenient to understand and operate".

A 'simple' solution may not be easy to grok. Few people think at the level of axioms.

A survivable solution must be passed along to many people across generations.

For some, monorepos are simple because that is what they know, for others multirepos are the norm. The survival of the firms that adopt these strategies will somewhat dictate what repo strategy propagates in the world, not the best design.


What author misses is that multi-repo is also not dumbass proof. Our team of about 60 people owned about 5-6 git repos that compiled into one SO. Every simple bug fix required multiple diffs to multiple repos.

The real argument that the author is making is that worse is better. Monorepo requires good tooling and a disciplined team. Multi-repo is worse but it is easier to manage when dealing with inexperienced programmers who want to have their own repo for their shiny little microservice.

IMO, the difference is that multi-repo has limitations and mono-repo has challenges. You will never get atomic commits and precise versioning in multi-repo. With mono-repo, there are a lot of challenges that can be solved with good engineering.


Ask your doctor if Monorepo is right for you.


In the real world, it doesn't work that way.

In the real world you ask your doctor if it's okay and he says "sure" because he's read a few things about it and it's what you seem to want. But he doesn't really know, because no one is capable of understanding the human body in its full complexity.

So you just end up taking Monorepo and hoping it doesn't make you severely depressed or give you seizures that send you to the emergency room.


Individual results may vary. Don't take Monorepo if you are bad at programming, or if you may become bad at programming.

Side effects include but are not limited to your repo growing into a single giant ball of circular dependencies.


Monorepo may cause blindness, sleep deprivation and suicidal thoughts. If you experience any of these symptoms, stop taking monorepo and consult your manager.


If you experience a build lasting more than six hours, seek rockstar ninja help immediately.


Ask not what your monorepo can do for you - ask what you can do for your monorepo.


Let's assume without proof that Glibc developers are good. Similarly, kernel.org developers are good as are the gcc.gnu.org people.

Should glibc, gcc and the kernel be in a monorepo?

(Cue laugh track ...)


That is actually one of the advantages of the BSDs. Kernel and base system are in one repo and released together. It simplifies development and testing. and an easier upgrade for end-users.


This is the red tinged future linux is headed towards anyway.


Did you just invent the unikernel?


AFAIK, Unix did this, so BSD Unix did this, and the modern BSD derivatives still do this.


One thing that's nice about a monorepo is if the language has modules. It it supports modules well you can isolate your code better but still keep it in one large repo.

I dislike having all my code under one 'src' directory and it's nice to have modules like foo-ui and foo-util and so forth. Knowing that one module doesn't use ui components is nice because you can use it on the backend for example.

But it's all fully integrated and tested together.


Always makes me think of a philips radio my Dad told me about called the "Mono Knob". It had a complex design that allowed control via just one knob. He said they were always in the shop for repair when he worked as a teenager in a radio store.

https://www.thevalvepage.com/radios/philips/785ax/785ax.htm


One of the tenets of XP that survived (or should I say, is amplified?) into CI/CD is the idea that you should build up callouses for painful activities instead of trying to avoid them.

In that context, if a thing is tough but has value, you make a path to it. First make the tools consistent, and then make people consistently use the tools. The more predictable the system becomes (predictability is the opposite of magic!), the more you insist on people using it. Pushback is a kind of feedback, and you have to address at least some of the concerns of people who refuse ('meet me halfway here').

Someday it will shock no-one to say that Git is not the best of all possible version control tools. If this is difficult, it may not be the people. Maybe it's time to start thinking about the next version control system?

SVN had some pretty decent facilities for monorepos. Some people will tell you that Git traded some of these features for others, but looking through the information architecture documentation for git, I don't think I can agree. Some of that information is there, it's just maybe not packaged for consumption.


>bad fork

How is this an argument? You merge the unforked projects and its the equivalent to multirepo. I don't follow the argument at all.

...I started writing rebuttals to the others but I guess when your argument is "yeah you can do it right but you _could_ do it wrong and I have defined the question in such a way that we err on success for multi-repo and err on failure on mono-repo" I can't really fight that.


You're just pushing your trouble into a shadow layer that's hidden. You still have just as much complexity. It's just not encoded.

If branch A on repo X will only work with branch B on repo Y, you're holding that relationship in an uncoded way. It's true and unrepresented, and you never want that.


I've experienced 3 large codebases with different approaches:

1. Monorepo: Google3

2. Non monorepo, large pile of #@$: Microsoft Exchange

3. Non monorepo, Amazon

After working with them, from my personal experience, monorepo was the best. Yes, Google has ton of internal stuff and they could go away with not using much of external dependencies, but when everything works, it works like a charm. Convenience of defining protobufs/contracts, ease of reference them and ton of the things are given to you when you're in the system.

At Google I never felt that the system is hostile to you. It was extremely easy to start hacking something if you'd like to. Yes, it's not only monorepo, but the overall quality of the tools available, but monorepo is also quite a significant part of it.


> With multiple repos, modularity is the norm. (...) With a monorepo, modularity is a mere ideal.

> ...

> In a not-so-good team, your monorepo will grow into a single giant ball of circular dependencies.

Sounds like the author was lucky enough to not encounter a not-so-good team with multiple repos.... :D


I’m firmly on team monorepo, and I agree great tooling is an absolute requirement. We use Bazel + AWS CodeBuild with local caching. We have an average incremental CI build of our monorepo that’s under 45 seconds. Clean build 30+ minutes.


There is also a way to simultaneously reap the drawbacks of monorepos and multirepos: consolidate numerous small repos into somewhat big monorepos and then don't merge everything into one big repo.

This way, you'll have to deal with

* difficult tooling

* dependency hell between the mono repos (which are now way more tightly coupled due to the dependency graph between them being denser)

* long living branches causing way more collateral damage as described in the original article

* cross-repo changes have become even harder for all the reasons above

You get all the bad things and avoid those advantages! Welcome to my world :)


I feel like I’m missing something. Never used monorepo before but looking into it for our node code base. It seems like the combination of a monorepo and individually publishing the packages to a private npm server such as nexus eliminates many of these issues. One project can remain using an older version while another can use a breaking change. Pin your versions and follow semver and you should be fine. What am I missing here?


Coming from a small company with a monorepo to a large team with a monorepo, it seems to work fine. But maybe I need to be exposed more to other workflows.


My primary language now is Common Lisp and I have one mother of all mono repos for all of my personal Common Lisp code. I set the root of this repo as a Quicklisp load point so all my libraries and applications are available with a Quicklisp load. Life is good.

I don’t quite get the point of this article though. I am a very enthusiastic but not rockstar programmer, and I don’t have problems with a large mono repo.


But you're one person. The article argues that a single repo is unmanageable when you have many devs working at the same time.


He is talking about big teams.


I find this article more compelling than Yossi's article: http://blog.shippable.com/our-journey-to-microservices-and-a...

Yossi's comes off extremely pretentious, without really explaining why people consider multi-repo projects to begin with.


There's some nuggets of good things to look out for in here, but "don't do monorepo because your team is full of dumbasses" feels like a useless argument, maybe even a little adversarial. Monorepos solve versioning problems very effectively, which would be a pro if you're worried about working with dumbasses.


> like Google, the ultimate force for Good in technology

Hahaha...

https://www.reddit.com/r/degoogle/

https://twitter.com/hashtag/degoogle


> What do you do when you have a branch working on Android and another branch working on iOS and you have deliveries on both platforms?

Stop it and focus on one single web app instead that anyone can run. Unless your app is going to be in the top 10 that people can't miss downloading.


Is being demeaning and rigid just the cool thing to do today in blogs. I feel like I keep reading articles like this every day now. "Well if your team is full of dumbasses" or "If you use OO instead of functional programming"


Have you not noticed that this is the new norm everywhere?


This reminds me of some the trade-offs I've looked at with monorepos: https://epage.github.io/dev/monorepos/


So glad Yosef keeps writing on his blog. This is a well articulated gem. I had intuition as to why the monorepo style was not great for most shops. His clever argument nails it.


Ugh I don't like his use of the word dumbass. Coding is hard, it's easy to break things. We've developed lots of tools of strategies so we don't have to rely on people getting it right every time.

Branching: getting forked by your worst programmer This example seems contrived I've never worked anywhere where having a fork that works for some scenarios and not others is tolerated for long. This would be given the highest priority.

Modularity: demoted from a norm to an ideal This is basically saying a multi-repo makes it hard to reuse your own code which increases modularity. I find devs on very large projects are already reticent to reuse code from other projects/teams, but this pushes them even farther to rewrite domain logic that should probably be shared. In most cases I'd trade a little bit of modularity for increased domain logic consistency.

Tooling: is yours better than the standard? There are few organizations that have so much code they break available source control solutions but simultaneously don't have the technical expertise to manage a monorepo that large. For these I guess it makes sense to break it up into manageable peaces based on the relationships between your projects.

I've worked on subpar teams that decided against monorepo and it was a nightmare. It took forever to get setup, the build time was days, and cross repo edits were painful. They regretted going multi-repo.


The branching example is very tempting when you have a product that is used on site by multiple customers. Customer A asks for some change that impacts customer B. Without good tests the easiest thing to do is fork the world for A and B, so the changes only impact one. Then you gradually drift apart. Add customer C, D, E... and it gets really fun.

With multiple repos it gets even crazier. You have forks for A and B that work with some other software repo, maybe common, maybe forked itself. Soon you have wiki pages with compatibility matrices, common libraries that mysteriously break with minor changes despite being battle tested ... for some set of versions.


Re:Branching, having lived this, monorepos are very much a blessing, not a curse.

If software is big enough to force considering monorepos, then cross-cutting dependencies will happen, and then one diffable long-lived monobranch is much better than a bunch of interleaved ones.

Incrementally building, landing, etc., cross-cutting deps becomes much less of a slog. Ex: skip concerns about versioning of _internal_ APIs.

The other issue I had was the sleight-of-hand on build modules. Yes, you don't need good devs to speed up incremental builds b/c you can search/build in each project. But if you want to run 20 modules together to test/experience them, good luck, esp. for interactive modes. (Congrats, you reinvented the monorepo!)


> cross repo edits were painful.

We are moving to a mono-repo for this very reason. Debugging something where you had to make changes to multiple projects, wait for those changes to go through the build pipeline, pull in updated packages in projects that depended on them, then change those projects (and on), was a complete nightmare.

We're a bit over halfway there (the most painful half) and I expect us to get the rest done sometime in the next 3 months or so. Not looking back. At all.


> There are few organizations that have so much code they break available source control solutions but simultaneously don't have the technical expertise to manage a monorepo that large.

Facebook's Mononoke (https://github.com/facebookexperimental/mononoke) pretty much removes that argument. They outgrew their current source control, and that's their path forward for the next couple orders of magnitude.


Not sure what you're implying about mononoke but the readme says:

> The version that we provide on GitHub does not build yet.

So, maybe eventually. They found that Git didn't do well at their scale, so they modified mercurial instead.

I'm all in favor of a company-wide mono-repo if it doesn't have scaling issues.


Most of monorepo tools for Mercurial are upstreamed already and are available in a standard Mercurial install.

I don't think anyone in this thread is CTO of a company size of Facebook. For a medium size company, you can easily use Mercurial or Perforce.


> In most cases I'd trade a little bit of modularity for increased domain logic consistency

In my experience that is not the entire trade-off. At scale, you also get circular dependencies between modules, which makes refactoring, migrations, deprecations, and other improvements incrementally impossible. Sometimes this can happen unintentionally through including a "upcall" to a module that is actually the best tool for the local job at hand.

In the case of several repos, you will notice the extra work needed to pull in the extra project. In the case of a monorepo... it might look like any benign change.


> We've developed lots of tools of strategies so we don't have to rely on people getting it right every time.

I think one of the points of the article is that this isn't (sufficiently) true; people will use tools incorrectly and make excuses for deviating from the strategy often enough that it's a problem.

> Ugh I don't like his use of the word dumbass

Substitute "well-meaning person who makes a totally understandable process mistake"




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: