Let's face it: monorepos are just huge monoliths. If you want modular software, you WILL have to pay the price of fragmentation.
Someone needs to think about the boundaries between different parts of a system. If those boundaries are defined by functions, classes, packages or services, it doesn't matter that much. Yes, it is always a pain.
Shipping the entire forest of modules in a single repo seems like a good compromise. It stops being a good compromise when you start creating a complicated network of soft dependencies between those modules. And when you have everything in a single repository, it's hard NOT to do that.
By the way, there's nothing wrong with monoliths. They're just better if you design them as such. There's nothing wrong with microservices as well. It shouldn't be a surprise though that neither of them are actually silver bullets.
I understand why these solutions were created. I see it as an ephemeral thing. Once someone figures out the tooling and practices to automate all of that painful management locally, developers will always prefer that. Then when it's fast and easy, we'll break it once again (like we did with classes, packages, containers, etc).
Monorepos are a tooling nightmare. Monorepos are a bandwidth black hole.
Monorepos make some things possible that are just not possible or very very hard when you have multiple independent repos that are built and tested independently.
Namely,
a) they allow you to test the effects of a change on all the components that depend on you, before you merge your change. This reduces the noise caused by regressions (API or behaviour) introduced in one component that is depended on by many consumers.
b) they are a practical way to ensure that all components have up to date internal dependencies: by placing the burden of API and behaviour breakage to the author of the change, you don't end up having hundreds of teams each struggling to keep up with dependencies that keep breaking their builds when you update them and consequently hating the teams that release those changes.
None of these things is a big deal unless you're a huge company with hundreds of teams.
I think in theory there could be some tooling and workflow that could provide all or moat of the benefits of monorepos, without the downsides.
Until then, monorepos are likely going to be a bad choice for small companies.
I agree with the other commenter - in my view, a monorepo is the _best_ choice for a small company. I guess this depends on what tooling is available for your language / ecosystem of choice though. In my experience of TypeScript and Java with monorepos, you definitely need to know how to configure the tooling properly (which is certainly "overhead"), but it massively reduces the maintenance cost and increases the consistency of your tooling config. Spreading out over loads of repos means you need to share artefacts, which means package managers and package manager hosts, and a whole suite of release CI/CD which gets out of sync almost immediately.
It's also getting a lot better, gradle works amazingly well for a monorepo even with dozens of developers committing to it every day with shared caching, nx/turborepo/others are making the story for front-end/TS much better too.
The biggest issue is that every CI provider speaks with "repo=project" language, so your way forward with monorepos is using stuff like Bazel.
Bazel is extremely cool! But you end up with, like, a handful of people on the team who can write Bazel and eveyone else cargo culting their way through it.
There are a lot of JS "monorepo management" tools but they all seem concerned about the release phase for a lot of libraries that really should just be one library instead of 300 npm packages or whatever.
> Bazel is extremely cool! But you end up with, like, a handful of people on the team who can write Bazel and eveyone else cargo culting their way through it.
It's not that difficult and docs are outstanding. It can and it will be worse with your own Bash-isms that likely won't have any docs at all.
I think Bazel is doing a lot of good stuff... but I think it suffers a bit from the same thing as Angular, where it makes a lot of its own terminology.
There is another aspect where if you are using a language like Python or JS then you have to kind of swim upstream to get an existing project onto Bazel. Far from impossible but if you just look at the default Bazel stuff without pulling in third-party libs it's pretty tedious to get a project with a good amount of dependencies working.
>Can you explain how a monorepo for microservices is a tooling nightmare and bandwidth sink?
bandwidth:
I meant "bandwidth" as literal bandwidth. When your codebase becomes huge, your VCS repo size becomes enormous and it becomes harder and harder to keep a full checkout on all the development machines, especially if they are over the WAN (e.g. at home on your laptop).
This has fuelled solutions like sparse checkouts (like MS vfs for git, now scalar) remote development (like TFA; but also Google's cider and srcfs etc).
tooling:
naïve monorepo tooling (which I've seen in various companies I worked for) simply perform a full build of the whole monorepo for each CI execution. At first this is just fine since you can parallelize builds and call it a day; but after a while, the builds just don't scale anymore ; flaky tests become an increasing frustration etc.
The tooling that can help scale large monorepos does exist, but requires buy in and comes with its own learning curve and tradeoffs. One well known such tool is bazel (https://bazel.build) and bazel remote builds and remote caches. These tools are hard to set up, although folks at https://www.buildbuddy.io/ can help smaller startups by offering a managed service.
(Again, I'm talking about really large monorepos. A monorepo which includes a dozen or so modules, for which you can easily perform a full re-build on a single CI worker and on your laptop is not the kind of repo that creates tooling nightmares.)
Thank you. Monorepos of _that_ size is definitely far more complex and comes with certain issues that you mention. Although, I feel like you'll only hit that wall once you are _very_ large as a company and at that point, you'll also have resources to climb that wall.
For small-medium scale, monorepo has been a blessing after dealing with multi-repo systems for years.
That said, I cannot pretend I don't see the problems with suboptimal tooling working in a medium size codebase. "big" and "medium" and "small" are quite subjective things.
At $work we have a monorepo whose git repo grew to 1GB in size and where the CI turn-around is so high, and the glitches so often, that it often takes hours or even days to land some code to prod. Developers instinctively react by making bigger and bigger changes because the very thought of going through PR/review/CI/merge cycle once again terrifies them. It's all compounded by a security policy that forces a code review approval every time the source code changed, including when you have to apply fixes to build failures induced by a component you don't own.
All of these things can and should be be fixed. But this is work, and is not urgent work so it's not often done at the same pace other stuff is done. This induces fatigue in the team and, as usually things go, people tend to blame the most easiest thing to blame: the monorepo.
That's why I try to phrase the problem to be a tooling problem and not a monorepo problem.
Clearly if you don't have a problem with your monorepo, you either already have good tooling, or you don't need good tooling.
You don't need a monorepo to have integration tests. You can very well have a CI that clones a bunch of stuff and builds and tests them all together.
The issue that monorepo is solving is regarding write operations: FOO depends on BAR, which depends on BAZ. If you need to change BAZ to develop what you want on FOO and they're all on multiple repos, you'd have to pull request your way from the bottom to the top of the dependency graph. This is what causes the friction that monorepos avoid, and this is the hard part of such workflow to automate with multiple repos.
That's exactly how I understand the benefits of monorepo and it seems like a terrible idea.
You might spend one week building next version of your component and 3 months updating all of the dependents. Then do it again. And again. You are working at a fraction of your productivity. I wonder if that's why Google needs thousands of engineers.
Whereas I just updated stripe from 2.x.x to 5.x.x in one of the projects I'm working on, because new version has features that I needed. I never wasted time updating to other versions until I had a need to.
It also limits your ability to break backwards compatibility in new versions, because we all of course are coming up with great designs right from the beginning.
I get the security and performance benefits of keeping all dependencies up to date, but man, the time sink and limitations seem so not worth it.
> you start creating a complicated network of soft dependencies between those modules. And when you have everything in a single repository, it's hard NOT to do that.
That is what the visibility [1] in Bazel solves. You can't import other people's code unless they say you can by making their code visible to yours.
Someone needs to think about the boundaries between different parts of a system. If those boundaries are defined by functions, classes, packages or services, it doesn't matter that much. Yes, it is always a pain.
Shipping the entire forest of modules in a single repo seems like a good compromise. It stops being a good compromise when you start creating a complicated network of soft dependencies between those modules. And when you have everything in a single repository, it's hard NOT to do that.
By the way, there's nothing wrong with monoliths. They're just better if you design them as such. There's nothing wrong with microservices as well. It shouldn't be a surprise though that neither of them are actually silver bullets.
I understand why these solutions were created. I see it as an ephemeral thing. Once someone figures out the tooling and practices to automate all of that painful management locally, developers will always prefer that. Then when it's fast and easy, we'll break it once again (like we did with classes, packages, containers, etc).