Having written, benchmarked, and maintained C and C++ compilers for decades, I k...

kazinator · on April 29, 2024

Having done some casual benchmarking recently, I found that GCC is about 15 times slower when optimizing than when not. In both situations, the compiler is scanning the same header files, so that activity is bound up within the 1/15th of the optimized compilation time.

It used to be a common wisdom that the character-level processing of code took the most time. Just like the old floating-point is slow; always use integer when possible.

Also note that the ccache tool greatly speeds up C and C++ builds. Yet, the input to ccache is the preprocessed translation unit! When you're using ccache, none of the preprocessing is skipped. ccache hashes the preprocesed translation unit (plus compiler command line options and the path of the compiler executable and such) in an intelligent way and the checks its cache. If there is a hit, it pulls the .o out of its cache, otherwise it invokes the compiler on the preprocessed translation unit.

If most of the time were spent in preprocessing, a much more modest speedup would be observed with ccache.

WalterBright · on April 29, 2024

Generally when benchmarking compile speeds, the unoptimized build is used, as that is the edit-compile-debug loop. It's always been true that a good optimizer will dominate the build times.

Back in the Bronze Age (1990s) I endeavored to speed up compilation in a manner that you describe ccache as doing. After the .h files were taken care of, the compiler would roll out to disk the state of the compiler. (It could also do this with individual .h files.) Then, instead of doing all the .h files again, it would just memory map in the precompiled .h file.

And yes, it resulted in a dramatic improvement in compile times, as you describe.

The downside was one had to be extremely careful about compiling the .h files the same way each time. One difference could affect the path through the .h files, and invalidate the precompiled version.

It was quite a lot of careful work to make that work, and I expect ccache is also a complex piece of work.

What I learned from that is it's easier to just fix the language so none of that is necessary. C/C++ can be so fixed, the proof is ImportC, a C compiler that can use imports instead of .h files, and can compile multiple .c files in one invocation and merge them into a single .o file.

SleepyMyroslav · on April 29, 2024

There is a reason of why

> unoptimized build is used, as that is the edit-compile-debug loop

is no longer true.

Modern C++ has a lot of metaprogramming abstractions in it and they are no cost only in optimized builds.

In my years of gamedev work I have not met a sizeable project that was working in unoptimized builds even for debug purposes. Unoptimized only worked in unit tests or small tools.

59nadir · on April 29, 2024

I think at that point the real solution is to seriously consider all of the language constructs you use and their compile times as well. It's not a given that using more of C++ is always better and real, sustainable change in compile times can be had by moving more and more towards C in many ways but keeping some of the safety C++ provides.

(I'm sure you've been there, though; gamedev is one of the areas I would expect people to be more sensible about their C++ feature usage in.)

SleepyMyroslav · on April 29, 2024

If you are implying that we can go back to force inlining everything and only using small wrappers around memcpy then I will have to say that that ship has sailed years ago. I do not know anyone who wants to go back for more than brief moments while changed header causes cascade of rebuilds.

Now the elephant in the room of build times that no one wants to talk about is 'the optimized' build with PGO+LTO. I think none of the projects I worked that got used to it ever did a local pipeline to do it xD. But if you ask people if they want to ship a build without it the answer is clear 'no'.

I will totally understand if authors of the linked article also do not like to talk about it. What I am trying to do here is to clear confusion about importance of it. Pretending that IWYU is more than polishing of last 5% of build times helps almost no one. YMMV ofc.

59nadir · on April 29, 2024

There are plenty of constructs in C++ that are safer than C and still don't impact compile times that much and some that, while safer and better in some regards, are murder for compile times. I'm saying there is a tradeoff to be made and faster iteration speeds are oftentimes more valuable for end result quality than (often perceived) safety.

taylorius · on April 29, 2024

"Just like the old floating-point is slow; always use integer when possible."

I know this was only an aside - but it took me the longest time to properly internalize that floats were fast these days. I'm still getting used to the idea that double precision isn't a preposterous extravagance. :D

Arech · on April 29, 2024

It depends on what you're doing. Doubles are still slower than floats due to twice bigger requirements imposed on memory performance & cache size. So if you doing a "calculator" style of work, there's not much difference, but if you're processing large arrays of data, it's still something you should think of.

taylorius · on April 29, 2024

Yeah, exactly that. I was getting artifacts processing long vectors, and it was 32 bit float precision that was the culprit.

spacechild1 · on April 29, 2024

On certain platforms this rule still holds. Recently I have been working with the ESP32. Although some versions have an FPU, floating point math is still slower than integer math. Also, the FPU can only process 32-bit floats, 64-bit doubles are emulated in software and terribly slow.

pajko · on April 29, 2024

Precompiled headers eliminate some of these issues, if the usage is right. At one of my former companies any available build optimization stuff had been added to the code base over the years by the DevOps guys without any thought about how the thing would work together, and it just got slower and slower. At the end, the initial 5-minute complete build time increased to about 35 minutes, of which about 10 minutes could be attributed to the various refactors and extremely high amount of templating, but couldn't find a cause for the extra 20-minute increase. It was about 15 minutes to test just a single change.

uecker · on April 29, 2024

This is the same for C and C++ and C compile times are dramatically shorter than C++.

Also doing semantic analysis during parsing should save time compared to having additional tree walking later (and does in my experience).

zerr · on April 29, 2024

Isn't `#pragma once` helpful for avoiding reparsing headers?

knome · on April 29, 2024

`#pragma once` is to prevent a header from being reparsed repeatedly for the same translation unit when ten different headers all include a common one transitively.

it replaces the prior pattern of including code predicated on a unique definition defined only within that same block to avoid double parsing.

    /* if it isn't unique, you're going to have a bad time */
    #ifndef SOME_HOPEFULLY_UNIQUE_DEFINITION
    #define SOME_HOPEFULLY_UNIQUE_DEFINITION
    
    ...code...
    
    #endif /* SOME_HOPEFULLY_UNIQUE_DEFINITION */

This, however, doesn't stop those same headers from needing to be reread and reparsed and reread and reparsed for every single cpp file in the project.