Should Cloud-Native Applications Use a Monorepo?

When we look at a typical, modern, cloud-native application, what we see is a variety of distinct components: Services, databases, load balancers, interfaces and maybe even a mobile app or two.

Each of these components has some source code and other configuration information associated with it, which is stored in a code repository, usually using a variation of the open source code management system Git. While nearly every system or component has code or configuration to manage, does every system or component need its own code repository? Or can they all share a single code repository?

In most situations, each individual component of an application would have its own set of repositories. This model is called polyrepo because it involves a large-scale application having multiple independent code repositories.

In recent years, a few companies, most notably Google, have advocated putting all the code for all the components of an application into a single large code repository. This code management model is called a monorepo.

While there are advantages and disadvantages to each model, in my mind, there is one option that is superior to the other by far: The polyrepo option is the best option.

The traditional polyrepo model wins handily over the monorepo model because the monorepo model encourages many bad habits and bad processes and makes application scaling—at least the scaling of the development organizations and the complexity of the application itself—substantially more challenging.

Why are polyrepos superior to monorepos? A few reasons:

Monorepos go against single-team ownership principles. I’m a firm believer in the Single Team Oriented Service Architecture (STOSA) model of service ownership for organizations. In this model, ownership for each of the many services, systems, and modules belongs to a single development team. While each service can belong to distinct teams, and each team can own more than one service, a single service has one and exactly one owning team. A single, identified, owning team for each service drives improved quality and accountability in all the software your teams develop. This single-team ownership model means the codebase for a service needs to be managed by the same team that owns the service. The owning team needs approval, moderation and review capabilities for all changes to the codebases they are responsible for. If you have a single monorepo that contains all of the application’s components together, it is much more difficult—almost impossible—for each team to provide ownership control and management over their services’ source code.
Monorepos encourage bad global code management practices. When you talk to monorepo proponents, one of the advantages you will hear is that monorepos make it easier to refactor extremely large sections of code in a single change request. However, I consider this type of massive change to be an anti-pattern. Rather than doing a single massive refactoring that crosses multiple team boundaries and responsibilities, each team should be responsible for its own part of the refactoring. Each team needs to coordinate their work with other teams, use proper internal API management procedures and ensure necessary backward compatibility while each team performs its part of the refactoring. This might make the refactoring job itself take a lot longer to implement, but that is a small price to pay to ensure that all issues associated with such a massive refactoring are successfully understood by all parties involved rather than just those who happen to be involved with a particular change request. This level of coordination and management can’t be done with a single massive change request.
Monorepo repositories can become massive. Large applications have large repositories. If a large application with hundreds of components all share the same code repository, that code repository will be huge. Google’s monorepo is gigantic. Their single repository holds all of the primary Google code, containing over 2 billion lines of code, and is over 85 terabytes in size. This is 40 times the size of the entire Microsoft Windows operating system.And remember that with the most commonly used Git agents, the entire repository must be downloaded to each developer’s laptop. Imagine having to check out and manage an 85-terabyte repository on your local development machine. No single person would ever be able to do this, so special software is required to manage the massive repository—Git alone is no longer sufficient. One way this special software works is by isolating relevant parts of the code base from other parts, allowing developers to download only the parts they care about. This task is unnecessary with polyrepos, and the process goes against the entire merged philosophy that the monorepo was trying to accomplish.

Large repositories have other problems, too. The larger the repository, the harder it is for each individual engineer to manage the repository while trying to develop code for inclusion in the repository. The more engineers you have working on a single repository, the greater the number of changes that occur to the repository. More ongoing changes mean more maintenance tasks (merges, pulls, rebases, etc.) that each individual developer using that repository has to deal with. In Google’s case, there are over 45,000 changes to their monorepo made every single day. This code management becomes an exponential problem in overhead as the number of developers in an application grows and the number of components within the application expands. From a practical standpoint, a good model I use is to assume an average of five to 15 developers per repository creates a good balance between encouraging cooperation and maintaining a reasonable level of manageability. At that level, repositories can be easily managed. For a large application and organization, that means multiple repositories are necessary using a polyrepo model.

A monorepo, in use by many tens or hundreds (or even thousands) of developers, is simply unwieldy.

While there isn’t a single right and wrong answer here, there are pros and cons of both the monorepo and the polyrepo models. Google, in particular, has invested heavily in building tools to support its monorepo worldview. However, in my opinion, in a modern development organization, the disadvantages of the monorepo far outweigh any perceived advantages of it.

Polyrepos encourages scalable processes that can expand as your organization and products need to expand. Monorepos encourage bad global systemic processes that should be avoided. As such, the polyrepo model, in my opinion, is the best solution for most companies in most situations.

Should Cloud-Native Applications Use a Monorepo?

Share This Story, Choose Your Platform!

About the Author: Lee Atchison

Containerized Application Management