Home

Build systems and package management in 2024

Written on 2024-05-23

I've recently had the dreadful realization that wanting that modern build systems and package management are not a solved problem: let's discuss.

This article does not propose ground breaking solutions to your problems. Instead, it presents potential approaches, as well as the challenges one might face implementing them. My goals are to organize my own thoughts, as well as provides items that I believe someone working in this space should consider. If you are looking for actionable suggestions, you might not want to look elsewhere instead.

My (relatively uncontroversial) premises are that:

This article assumes some familiarity with the above (or similar) systems.

From there, it would seem fairly reasonable for someone to want to combine those in one of their projects (whether personal, open-source or professional) to get a good developer experience. Having worked professionally with both Bazel and Buck2, and being a fairly heavy Nix user, this sounded obvious to me, until I started to seriously look into implementing this.

The problem

"Ambient environment vs hermetic build", which can be rephrased as "use the package manager vs import everything into the build system".

To get dependencies into a build (including third-party libraries, language toolchains, sysroots, etc.) a build system can either pick them up from the ambient environment (which would resolve to a user-activated environment, a user-wide setup, the system, or a mix of all of those) or take responsibility of managing those itself. Reproducible package managers have to deal with non-reproducible build systems, so they create a hermetic/sandboxed environment that provides everything required for a build and leave the building to the build system.

The most obvious approach is to just create some sort of environment with the package manager and execute the build within it. There are a number of issues with this approach however:

On the other hand, we can import the world into the build system: once it has first class targets for each of the packages that we depend on, the previous concerns mostly vanish. I consider this to be the right approach but:

Therefore, there remain open questions:

Approaches, potential solutions

Just wrap it!

This is where most of the real world is at: wrap your build in an environment that is hopefully reproducible and hermetic. It has the downsides presented earlier, but it also "kind of works", for better and for worse.

A major upside of this approach is that it works well with existing tooling, which typically take this workflow for granted.

Just don't use package managers!

A viable approach in megacorp monorepos, but not satisfying for most projects, mostly because of how much maintenance it requires. The cost can easily be justified when it is shared across a large organization, but it does not scale down very well.

With that being said, it would be interesting to see if new tooling could help with this: the vast majority of work being done (in the open anyway) on dependency management presumes working with individual packages, presumably served from some somewhat external repository, rather than vendoring everything. With the recent popularity of static linking and support for simultaneous multiple versions/variants of packages, some might come to the conclusion that far better support for vendoring might be worth working on. This is somewhat explored by tools like Reindeer, however it breaks down when other languages are involved (e.g. calling out to C code from Rust). Build rules need to be provided for those, which need to somehow be integrated in the main build system, and the work scales linearly with the number of ecosystems involved. This is partially true for other approaches as well, but at least a package manager will get you the code, build it, and expose its runnables and libraries in a hopefully convenient way: the problem is reduced to consuming the world rather than building it.

Get the package manager and build system to communicate

A lot of the problems presented here would be much easier to solve if the build system and package manager could communicate with more structure than just calling one another.

However, not only would it be challenging technically, it would be even more so socially, as we would need various package managers and build systems to agree on a standard way to communicate. The design would likely require at least a few iterations to get right, which is slow when many parties are involved.

For context, the C++ community is trying to achieve this with the CPU project. This is a relevant comparison because the C++ ecosystem is diverse and has not had the luxury to standardize around a common toolchain like most languages. Integrating package managers and build systems in a language agnostic way would be a far greater endeavor, which is concerning given that it is not going particularly well in the case of just C++ (I don't have specific opinions on CPS, but this problem has been a pain point in the ecosystem for decades, and it is at best unclear that it will be solved any time soon).

Link out to the package manager's store

By which we mean that the package manager installs all dependencies in a directory that it controls (the store), and the build system has symlinks into it in its own store. First class targets are then created in the build system for each dependency that is provided by the package manager.

This approach is not ideal for reasons discussed previously, but it is relatively cheap to implement and can be good enough with a decent package manager.

Notable pain points include the build system not having enough information about dependencies to materialize them during remote execution, and having to define targets manually if the package manager does not expose enough metadata about its packages to derive the targets automatically in more complex cases. The cost of defining those targets by hand could be prohibitive, especially if regular users are supposed to interact with this system. This is particularly true if those definitions need to be edited when updating dependency versions.

Import everything in the build system

In this approach, we copy all the dependencies from the package manager into the build system's store. This is more costly than the previous approach because it reaches deeper into the package manager: we don't need to just know the layout of the files that we care about, but also how to import them in an isolated way, how to fix them up so that they are usable in this new environment, etc. It is also even more challenging to automate, given that dependencies must be handled transitively, properties like rpaths must be fixed up, etc. However, this automation is also more crucial, as specifying all this by hand and redoing it on updates is a non-starter.

With that being said, it is also the one that arguably gets us the most correct builds: the build system is in control of all dependencies, which unlocks its potential, e.g. for hermeticity and remote execution.

Having standard solutions in this space would be very valuable.

Merge the two

Modern build systems have many of the characteristics of modern package managers, so we could use a build system as a package manager, but:

I do not think that the other way works as well though, because package managers are designed for more "macro" use cases. There are counterexamples in the Nix ecosystem, particularly for building Rust and Haskell, where each dependency is built as its own package by Nix, rather than relying on the `cargo`/`cabal` to build the whole project from source code and pre-downloaded dependencies. I won't elaborate as I'm not very familiar with this approach, but my understanding is that it results in a relatively poor experience, including in terms of performance (especially due to lack of remote execution). A new package manager could be built with those considerations in mind, but at this point, would it not be a build system already?

Think outside the box

Perhaps we need to drastically move away from the status quo to reach a less local maximum. One idea that is currently floating in the build system space is using a virtual file system to access source code, by it first or third party.

This idea lifts some of the problems outside of the build system/package manager by making providing dependencies the job of the filesystem instead, but the rest of a package manager's job (e.g. version compatibility resolution) still needs to be figured out.

While I'm not suggesting that this is a better solution that the ones discussed previously, it is certainly healthy to keep an open mind to alternative solutions, as the shape of the problem of building software keeps evolving.

The situation

Having presented various approaches, how widely are they adopted, and which tools enable them?

I think that it is fair to say that most of the world either uses no solution, or wraps their build in some sort of environment (often times using Docker to wrap the whole system rather than just providing build dependencies hermetically).

Some megacorps "just" vendor whatever they need and build them with their build system.

As for the other approaches... while I naively thought there would be a good way to achieve the rest now that we are firmly in the 2020's, there simply is not. As someone who takes Nix for granted and has used Bazel (and now Buck2) professionally, I am quite shocked that the state of the art is not further along.

There are projects to use various package managers with Bazel and similar work is being done on Buck2 as well, but those projects are neither stable or standard, and come with plenty of caveats. There sure is no out of the box simple, fast and hermetic solution.

So what, and what is next?

I'm not sure. Like I said at the beginning of this post, I don't have great answers to most of the questions that are asked here.

I'm experimenting in this space on my free time, but do not expect to deliver anything, especialy not ground breaking. At this point, I have Nixpkgs-based rules that let me fairly trivially build toolchains using Buck2, which is not nothing, but it is still "only" using the "link out to the package manager's store" approach, which is not the endgame, as far as I'm concerned.

I'm currently professionally involved in finding a suitable solution for a much bigger project, and looking forward to progress in this area in general. On the one hand, there are enthusiastic and skilled people working on those problems, so I am hopeful that there will be some progress, but on the other hand the needle seems to not have moved much for the past half decade or so.