> But that's the thing: dereferencing an invalid pointer is undefined behaviour, which means the compiler is allowed to assume it never happens
It would be more helpful for the compiler to assume that it does not know what will happen. This is how C actually worked for many years.
> a C program executing undefined behaviour is _invalid C_.
That was not formerly the case, and it is not always helpful to redefine C in this way. Sometimes you really are not trying to write portable code, and you really do want the behavior you know that the target machine will give you, even if the C spec doesn't require it.
> It would be more helpful for the compiler to assume that it does not know what will happen. This is how C actually worked for many years.
If we don't know what will happen that is Undefined Behaviour.
The contradiction you have within yourself is that you know what you want to happen, but that's not what the specification says. If you want specific behaviour you need to specify what it is - not mumble and make a vague wave of the hand about "behavior you know that the target machine will give you" when you've no promise of any such thing. That would come at a cost, and of course you don't want to pay that cost, but that means you can't have what it buys.
That is certainly one perspective that one can have. The point here is that the language and its usage precede the specification, and a pedantic, narrow-minded adherence to a certain interpretation of a document which was actually a post-hoc rationalization of existing practice has made the language less useful for certain applications.
The C standard could easily make dereferencing a null pointer implementation-defined behavior.
And even more critical: Signed integer overflow should be implementation defined, and each implementation do something sane (different from assuming it doesn't happen). This would have saved us many security vulnerabilities, and unnecessary program crashes.
> If we don't know what will happen that is Undefined Behaviour.
Implementation-defined behaviour is a thing. Not knowing what will happen is not an accurate description of undefined behaviour. What the compiler does is assume that undefined behaviour doesn’t happen. When it does happen, it results in a contradiction, and logically every sentence is a consequence of a contradiction (see e.g. “Bertrand Russell is the pope”). That produces all those infamous bugs. Because just like every sentence is a consequence of a contradiction, every program state can be a result of UB. This is untenable.
A compiler that makes incorrect assumptions is a bad compiler.
In fact, in the rationale of the original spec., I remember reading that the C standard was expressly designed to be a minimal spec, and that just being compliant with the spec was insufficient for the resulting compiler to be fit for purpose.
And of course the original spec did specify a range of acceptable behaviors, and that language is, in fact, still in the standard. It was just made non-binding. However, it is still there, and pretending it is not seems disingenuous at best.
There are many optimizations that a compiler can perform without relying on the optimization level to determine how to pervert your program that day. If different optimization levels produce different results that is bad thing, something to be avoided, not encouraged.
If it is really necessary to generate random code when some anomalous situation is encountered, that should be a special option to enable dangerous non-deterministic if-you-made-a-mistake-we-will-delete-parts-of-your-program type behavior. I wouldn't consider that optimization though, more like disabling all your compiler's safety features.
Loop unrolling for loops that have a static or range bounded number of iterations is a good example. Others include constant expression evaluation, dead code elimination, common subexpression elimination, and static function inlining.
Looking at the GCC docs, it seems like it isn't possible to have zero optimizations at any point, even at the lowest optimization levels. To quote the docs "Most optimizations are completely disabled at -O0", so it seems you can't assume you can force correct behavior just by turning off optimization passes.
Part of the difficulty here is working out which transformations are specifically "optimisations". Some compiler passes are required for correct (or indeed any) code generation -- for example, in the compiler I was employed to work with, the instruction selector was a key pass for generating optimal code, but we only had one: if you "turned off optimisations" then we'd run the same instruction selector, merely on less-optimal input. So we'd disable all the passes that weren't required for correctness or completeness, but we'd not write deliberately non-optimal equivalents for the passes that were required.
Beyond that, you've got a contradiction in your statement -- you can't "force correct behaviour" from a compiler at any point. The compiler always tries to generate correct behaviour according to what the code actually says. If you lie to the compiler, it'll try its best to believe you.
C compilers are intended to accept every correct C program. But they can only do this by also accepting a wide range of incorrect C programs -- if we can prove that the program can't be correct then we can reject it, otherwise we have to trust the programmer. Contrast this with Rust, where the intent is to reject every incorrect Rust program. Again, not every program can be clearly judged correct or incorrect, but in this case we'll err on the side of not trusting the programmer. Of course, "unsafe" in Rust and various warnings that can be enabled in C mean you can tell the Rust compiler to trust the programmer and tell the C compiler to disallow a subset of possibly-correct but unprovable programs, but the general intent still stands.
So if you want to write in a language that's like C but with "correct behaviour" then ultimately you'll have to procure yourself a compiler to do that. Because the authors of the various C compilers try very hard to have correct behaviour, and just because you want to be able to get away with lying to their compilers doesn't magically make them wrong.
It would be more helpful for the compiler to assume that it does not know what will happen. This is how C actually worked for many years.
> a C program executing undefined behaviour is _invalid C_.
That was not formerly the case, and it is not always helpful to redefine C in this way. Sometimes you really are not trying to write portable code, and you really do want the behavior you know that the target machine will give you, even if the C spec doesn't require it.