Index RSS

Rust compile time reflections

Since my last post, I have experimented with uv and uvx for Python and I am very happy with them for the most parts. I am still using twine for uploading Python packages to PyPI, but that is mostly because of laziness. This lead to me wanting to also install uv, uvx and also ruff (a nice Python formatter and linter from the same group of people) on my OpenBSD system.

I first installed uv using pip, which also brings uvx along. I could then install ruff using uv, which is said to be faster than pip. Sounds straight-forward, right? However, these tools are not written in Python, but in Rust ...

Installing Rust packages from PyPI

Rust is a very cool language - I like it very much. As a professional C++ developer who also likes Haskell and Ocaml, I get to enjoy fast binaries, deterministic memory allocation and a modern type system. My Firefox browser has Rust code in it, large tech companies like Microsoft and Google are using it and it even gets integrated into the Linux kernel.

But Rust is also known for being slow to compile, while also eating a lot of memory at build time. Firefox on i386 OpenBSD systems is not built anymore because of this: 32 bit systems often cannot satisfy the memory hunger of Rust anymore.

For me, this means that if I know that I will install a Rust package from PyPI, I will have to change my memory limits before:

ulimit -d 8000000

This allows Rust to almost fully utilize the feeble memory capacity of my Thinkpad X240. I already tried to install uv once, and it failed because of the default memory limits on OpenBSD. So I knew that I needed this, and I had to hope that my chosen limit will be enough.

Installation output for uv and ruff

Here is an excerpt of my shell output during the installation:

((uv) ) [12:20] $ ~/projects/python$ pip3 install uv
Collecting uv
  Downloading uv-0.7.13.tar.gz (3.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.3/3.3 MB 6.9 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: uv
  Building wheel for uv (pyproject.toml) ... done
  Created wheel for uv: filename=uv-0.7.13-py3-none-openbsd_7_7_amd64.whl size=17182463 sha256=fe7e3b23b3c8582d6dfd94ba58a74f85e2a3f27dd89f4246b7450810a6413866
  Stored in directory: /home/astharoshe/.cache/pip/wheels/b1/2f/47/acef72fa6b01c8accdb0f874cf9bdb46a3d7e28f64a7d2377e
Successfully built uv
Installing collected packages: uv
Successfully installed uv-0.7.13

[notice] A new release of pip is available: 25.0.1 -> 25.1.1
[notice] To update, run: pip install --upgrade pip
((uv) ) [13:07] $ ~/projects/python$
((uv) ) [13:10] $ ~/projects/python$ uv pip install ruff
Using Python 3.12.10 environment at: uv
Resolved 1 package in 774ms
      Built ruff==0.11.13
Prepared 1 package in 35m 30s
Installed 1 package in 7ms
 + ruff==0.11.13
((uv) ) [13:46] $ ~/projects/python$

As you can see here, uv took about 47 minutes to install, while ruff took about 36 minutes. I know that my hardware is not high-end, but it is absolutely futuristic compared to some hardware I used for commercial software development, which compiled proprietary software with more features faster than that. I am not really happy with that.

At least I have the binaries now, right?

Yes, I now got my compiled uv, uvx and ruff binaries. They are fast, they work well and I can just keep them and not suffer this compilation times again ... at least until OpenBSD breaks the binary compatibility.

You see, other operating systems might try to keep compatibility with existing binaries for as long as possible. Windows is really famous for it, Linux binaries still work after a long time (if you can get the matching dynamic libraries), FreeBSD has many compatibility options for running old binaries ...

But OpenBSD is different. At OpenBSD, they do not want to burden themselves with compatibility clutches, which results in larger code sizes, in more complicated logic - in short, which results in more possibilities for bugs and security vulnerabilities. At OpenBSD, they deem that this is a huge prize to pay, which for most parts only really helps closed source software. For open source, if binary compatibility gets broken, you can just recompile.

I absolutely and without restrictions support the OpenBSD way here. Code bases tend to get too big all too easily, and closed source software annoys me since I cannot just recompile it for a different architecture or operating system. And OpenBSD does not break the binary compatibility just for fun, so even on my OpenBSD-Current systems it is rarely an annoyance to me.

But here, with Rust binaries, I am even more annoyed by long compile times, when I consider that my binaries can stop working after some system update and I will have to build them again. This means I will again have to wait a long time. And I have no hope that future versions will compile faster, since future versions are bound to contain more code.

Rust compile time mistakes

I understand that I cannot demand that everyone writes their code in fast to compile C (or even faster languages). But I know that as I grow older, I am getting more and more grumpy at the thought that perfectly fine hardware like mine should be replaced with high-end hardware, which again will be obsolete a few years later. I am very pro-environment and find that this is a ridiculous waste of our resources.

And, as stated before, I myself like Rust a lot. I can totally understand why one might want to use it for tools like uv and ruff. However, I see two potential mistakes here that cost us compile time. Both mistakes also have their upsides, so do not take my choice to call them mistakes as an attack on Rust developers. As always, this is my subjective opinion, which I hope you may find worth of consideration.

Mistake 1 is the NPM-ish dependency jungle.

Mistake 2 is that the crate is the translation unit in Rust, and not each single file.

What is wrong with Rust dependencies?

I am not the first one to criticize the fact that many Rust projects have many dependencies. Joe Armstrong (creator of Erlang and a personal IT hero of mine) had a nice quote about object-oriented programming, which I will steal to talk about dependencies here:

You wanted a banana but what you got was a gorilla holding the banana and the
entire jungle.

This is what I feel when I install any Rust executable. Lets take typst-cli as an example: It seems to depend on 170 crates. That is a lot!

Now, dependencies are not automatically bad. If you really need the functionality, you have to decide between using a dependency and implementing it yourself. If you need something, there will have to be code for it and I will have to compile it.

But I feel that we often overlook something important here: Many libraries contain functionality we do not need. Lets say that we utilize 75% of each crate (which I believe is unrealistically high for 170 crates). We could do everything we need with 128 crates of the same relative size. If every dependency would only contain the code we actually require, each crate would be faster to compile and maybe some crates would not be needed at all (together with their dependencies). Compile times and memory usage would likely be much smaller, if we did not include so much functionality that we do not use at all.

I recommend these recent blog entries from flak.tedunangst.com for a further perspective on the cost and benefit of dependencies:

another tale of go.mod bloat
sometimes the dependencies are useful

Why is it a mistake that the crate is the translation unit?

Other languages often have translation units corresponding to files. This means that we can have concurrent compilation of multiple files. In Rust, the crate is the translation unit. We can compile multiple crates concurrently, but parallelism inside of the crate is more limited.

Why could that be a problem? Does it matter if code is distributed over one or many crates?

I believe that there is a human factor here at work. We have learned that files get too large to work with them after some time. If you open a C or Rust file that spans 100000 lines, you most likely will not like that. Generated code might be an exception here, but in general, even your development environment might slow down or even fail with large files. Also, during a code review, a file with 100000 lines is obvious, while another small file in a large crate might not be noticeable.

With crates, I believe that we do not have the same sensibilities. Adding a new module into a large Rust crates feels better than adding a set of new types and functions to a large file, since the former is still encapsulated by the module system. There is no way that this module can disturb other unrelated modules. Even more, using submodules, you can keep each single folder in the crate small.

For my installations of uv and ruff, more than 50% of the total time was spent compiling the final crates (named "uv" and "ruff"). I expected that linking could be a factor here. But I also wonder if these crates might be too large. If you need more functionality, I expect that the default will be to add it to the main crate, instead of immediately spinning of a new helper crate. This might be different for files.

Summary

I am grateful that installing uv and ruff is so simple and I really like using these tools. But I am unhappy with Rust compile times, especially when forced in situations where I am not willingly developing Rust. I question whether the same tools would not be even better if they were written in Python, maybe with a small sprinkle of (low-dependency) Rust or C code for hot code.

I believe that as a Rust developer it is really important to be mindful of the dependencies used by your own project. Consider whether each dependency you are using really is carrying its own weight. As a compromise, maybe it might be best to copy a dependency into your own project and strip away all code you do not actually need? This has its own downsides, but should not be discarded thoughtlessly.

I feel that the Rust way to use crates as the translation unit might be a regression in regards to other languages using files as translation units. I have not deeply looked into it, so take this with a pinch of salt.

I hope that this article helps a tiny bit to improve our resource usage (both computational resources as well as physical resources like hardware) by bringing people to reflect on what they achieve and how much it actually costs.