Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Having written, benchmarked, and maintained C and C++ compilers for decades, I know why the compiles are slow:

1. phases of translation

2. constant rescanning and reparsing of .h files

3. cannot parse without doing semantic analysis

4. the preprocessor has its own tokens - so you gotta tokenize the .h file, do the preprocessing, convert it back to text, then tokenize the text again with the C/C++ compiler. This is madness. (Although with the C compiler I did manage to merge the preprocessor lexer with the compiler lexer, this made it the speed champ.)

This experience fed into D which:

1. uses modules instead of .h files. No matter how many times a module is imported, it is lexed/parsed/semanticed exactly once.

2. module semantics are independent of who/what imports them

3. no phases of translation

4. lexing and parsing is independent of semantic analysis



Having done some casual benchmarking recently, I found that GCC is about 15 times slower when optimizing than when not. In both situations, the compiler is scanning the same header files, so that activity is bound up within the 1/15th of the optimized compilation time.

It used to be a common wisdom that the character-level processing of code took the most time. Just like the old floating-point is slow; always use integer when possible.

Also note that the ccache tool greatly speeds up C and C++ builds. Yet, the input to ccache is the preprocessed translation unit! When you're using ccache, none of the preprocessing is skipped. ccache hashes the preprocesed translation unit (plus compiler command line options and the path of the compiler executable and such) in an intelligent way and the checks its cache. If there is a hit, it pulls the .o out of its cache, otherwise it invokes the compiler on the preprocessed translation unit.

If most of the time were spent in preprocessing, a much more modest speedup would be observed with ccache.


Generally when benchmarking compile speeds, the unoptimized build is used, as that is the edit-compile-debug loop. It's always been true that a good optimizer will dominate the build times.

Back in the Bronze Age (1990s) I endeavored to speed up compilation in a manner that you describe ccache as doing. After the .h files were taken care of, the compiler would roll out to disk the state of the compiler. (It could also do this with individual .h files.) Then, instead of doing all the .h files again, it would just memory map in the precompiled .h file.

And yes, it resulted in a dramatic improvement in compile times, as you describe.

The downside was one had to be extremely careful about compiling the .h files the same way each time. One difference could affect the path through the .h files, and invalidate the precompiled version.

It was quite a lot of careful work to make that work, and I expect ccache is also a complex piece of work.

What I learned from that is it's easier to just fix the language so none of that is necessary. C/C++ can be so fixed, the proof is ImportC, a C compiler that can use imports instead of .h files, and can compile multiple .c files in one invocation and merge them into a single .o file.


There is a reason of why

> unoptimized build is used, as that is the edit-compile-debug loop

is no longer true.

Modern C++ has a lot of metaprogramming abstractions in it and they are no cost only in optimized builds.

In my years of gamedev work I have not met a sizeable project that was working in unoptimized builds even for debug purposes. Unoptimized only worked in unit tests or small tools.


I think at that point the real solution is to seriously consider all of the language constructs you use and their compile times as well. It's not a given that using more of C++ is always better and real, sustainable change in compile times can be had by moving more and more towards C in many ways but keeping some of the safety C++ provides.

(I'm sure you've been there, though; gamedev is one of the areas I would expect people to be more sensible about their C++ feature usage in.)


If you are implying that we can go back to force inlining everything and only using small wrappers around memcpy then I will have to say that that ship has sailed years ago. I do not know anyone who wants to go back for more than brief moments while changed header causes cascade of rebuilds.

Now the elephant in the room of build times that no one wants to talk about is 'the optimized' build with PGO+LTO. I think none of the projects I worked that got used to it ever did a local pipeline to do it xD. But if you ask people if they want to ship a build without it the answer is clear 'no'.

I will totally understand if authors of the linked article also do not like to talk about it. What I am trying to do here is to clear confusion about importance of it. Pretending that IWYU is more than polishing of last 5% of build times helps almost no one. YMMV ofc.


There are plenty of constructs in C++ that are safer than C and still don't impact compile times that much and some that, while safer and better in some regards, are murder for compile times. I'm saying there is a tradeoff to be made and faster iteration speeds are oftentimes more valuable for end result quality than (often perceived) safety.


"Just like the old floating-point is slow; always use integer when possible."

I know this was only an aside - but it took me the longest time to properly internalize that floats were fast these days. I'm still getting used to the idea that double precision isn't a preposterous extravagance. :D


It depends on what you're doing. Doubles are still slower than floats due to twice bigger requirements imposed on memory performance & cache size. So if you doing a "calculator" style of work, there's not much difference, but if you're processing large arrays of data, it's still something you should think of.


Yeah, exactly that. I was getting artifacts processing long vectors, and it was 32 bit float precision that was the culprit.


On certain platforms this rule still holds. Recently I have been working with the ESP32. Although some versions have an FPU, floating point math is still slower than integer math. Also, the FPU can only process 32-bit floats, 64-bit doubles are emulated in software and terribly slow.


Precompiled headers eliminate some of these issues, if the usage is right. At one of my former companies any available build optimization stuff had been added to the code base over the years by the DevOps guys without any thought about how the thing would work together, and it just got slower and slower. At the end, the initial 5-minute complete build time increased to about 35 minutes, of which about 10 minutes could be attributed to the various refactors and extremely high amount of templating, but couldn't find a cause for the extra 20-minute increase. It was about 15 minutes to test just a single change.


This is the same for C and C++ and C compile times are dramatically shorter than C++.

Also doing semantic analysis during parsing should save time compared to having additional tree walking later (and does in my experience).


Isn't `#pragma once` helpful for avoiding reparsing headers?


`#pragma once` is to prevent a header from being reparsed repeatedly for the same translation unit when ten different headers all include a common one transitively.

it replaces the prior pattern of including code predicated on a unique definition defined only within that same block to avoid double parsing.

    /* if it isn't unique, you're going to have a bad time */
    #ifndef SOME_HOPEFULLY_UNIQUE_DEFINITION
    #define SOME_HOPEFULLY_UNIQUE_DEFINITION
    
    ...code...
    
    #endif /* SOME_HOPEFULLY_UNIQUE_DEFINITION */
This, however, doesn't stop those same headers from needing to be reread and reparsed and reread and reparsed for every single cpp file in the project.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: