ccache is used together with distcc at the current place I am working at. Starte...

thechao · on June 1, 2023

My codebase is significantly larger than yours (mine's a mix of mostly-C++ & some C) — perhaps 10–12 million lines. Clean builds are ~10m; clean-with-ccache are ~2m; incremental are millisecond.

I know this probably won't help with your current project, but you should think of your compiler as an exotic virtual machine: your code is the input program, and output executable is the output. Just like with a "real" CPU, there are ways to write a program that are fast, and ways to write a program that are slow.

To continue the analogy: if you have to sort a list, use `qsort()`, not `bubble sort()`.

So, for C/++ we can order the "cost" of various language features, from most-expensive-to-least-expensive:

    1. Deeply nested header-only (templated/inline) "libraries";
    2. Function overloading (especially with templates);
    3. Classes;
    4. Functions & type definitions; and,
    5. Macros & data.

That means, if you were to look at my code-base, you'll see lots-and-lots of "table driven" code, where I've encoded huge swathes of business logic as structured arrays of integers, and even more as macros-that-make-such-tables. This code compiles at ~100kloc/s.

We don't use function-overloading: one place we removed this reduced compile times from 70 hours to 20 seconds. Function-overloading requires the compiler to walk a list of functions, perform ADL, and then decide which is best. Functions that are "just C like" require a hash-lookup. The difference is about a factor of 10000 in speed. You can do "pretend" function-overloading by using a template + a switch statement, and letting template instantiation sort things out for you.

The last thing is we pretty much never allow "project" header files to include each other. More importantly, templated types must be instantiated once in one C++, and then `extern`ed. This is all the benefit of a template (write-once, reuse), with none of the holy-crap-we're-parsing-this-again issues.

ComputerGuru · on June 1, 2023

I love your comment and it is 100% spot-on. extern C is the magic sauce for making anything fast.

The only downside is that it adds a ton of boilerplate and a lot of maintenance overhead. You need separate compilation units for everything and then you need a sub-struct to use the pimpl approach. Fast pimpl (in-place new in reserved space in the parent struct itself) gets rid of the heap allocations but you still have a pointer indirection and prevent the compiler from properly stripping out unused code across translation units normally (that’s where LTO comes in these days).

Really, the problem is just that it’s a PITA to write compared to sticking everything in the header file.

(It’s ironic that rust meets the first two rules by design but is still much slower than C++ to compile, though it does imply what’s already known, specifically that there’s a lot of room for improvement.)

thejosh · on June 1, 2023

You are probably aware, but for. others with ccache this is called "cache sloppiness", which is my favourite term.

You can set this via config, as by default ccache is paranoid about being correct. But you can tweak it with things like setting a build directory home (this is great for me, as I'm the only user but compile things in say `/home/josh/dev/foo` and `/home/josh/dev/bar` and have my build directory as my dev directory and it's shared. (see https://ccache.dev/manual/latest.html for all the wonderful knobs you can turn and tweak).

Fantastic tool, the compression with zstd is fantastic as well.

I played with distcc (as I have a "homelab" with a couple of higher end consumer desktops), but found it not worth my time as compiling locally was faster. I'm sure with much bigger code bases (like yours) this would be great. Reason I used it it that archlinux makes it super easy to use with makepkg (their build tool script that helps to build packages).

bombcar · on June 1, 2023

The best use of distcc "at home" is when you have one or more "big iron" (desktop, server, whatever) and a few tiny machines that work just fine but don't have much processing power.

For example, with some work, you can setup distcc to cross-compile on your amd64 massive box for your raspberry pi.

klodolph · on June 1, 2023

For builds that large, I (personally) start evaluating Bazel. Bazel has distributed build + shared cache features built-in. But I’ve always just dug into reducing build times in any large C or C++ code base I’ve worked on—damn what management says is important. And the switch to Bazel can be costly (effort) and it may be difficult to get team buy-in.