> But that's the thing: dereferencing an invalid pointer is undefined behaviour,...

tialaramex · on Feb 2, 2023

> It would be more helpful for the compiler to assume that it does not know what will happen. This is how C actually worked for many years.

If we don't know what will happen that is Undefined Behaviour.

The contradiction you have within yourself is that you know what you want to happen, but that's not what the specification says. If you want specific behaviour you need to specify what it is - not mumble and make a vague wave of the hand about "behavior you know that the target machine will give you" when you've no promise of any such thing. That would come at a cost, and of course you don't want to pay that cost, but that means you can't have what it buys.

marssaxman · on Feb 2, 2023

That is certainly one perspective that one can have. The point here is that the language and its usage precede the specification, and a pedantic, narrow-minded adherence to a certain interpretation of a document which was actually a post-hoc rationalization of existing practice has made the language less useful for certain applications.

zosima · on Feb 2, 2023

The C standard could easily make dereferencing a null pointer implementation-defined behavior.

And even more critical: Signed integer overflow should be implementation defined, and each implementation do something sane (different from assuming it doesn't happen). This would have saved us many security vulnerabilities, and unnecessary program crashes.

saagarjha · on Feb 4, 2023

If your program is crashing because of an overflow you’re lucky because it’s saving you from a security vulnerability.

GrumpySloth · on Feb 2, 2023

> If we don't know what will happen that is Undefined Behaviour.

Implementation-defined behaviour is a thing. Not knowing what will happen is not an accurate description of undefined behaviour. What the compiler does is assume that undefined behaviour doesn’t happen. When it does happen, it results in a contradiction, and logically every sentence is a consequence of a contradiction (see e.g. “Bertrand Russell is the pope”). That produces all those infamous bugs. Because just like every sentence is a consequence of a contradiction, every program state can be a result of UB. This is untenable.

mpweiher · on Feb 2, 2023

> What the compiler does is assume that undefined behaviour doesn’t happen

That is an incorrect assumption, as it clearly does.

It is also incorrect given the standard text.

GrumpySloth · on Feb 2, 2023

> That is an incorrect assumption, as it clearly does.

That’s my entire point. Compiler is free to make incorrect assumptions.

> It is also incorrect given the standard text.

According to C11 standard, section 3.4.3, the standard imposes no requirements on undefined behaviour.

mpweiher · on Feb 2, 2023

A compiler that makes incorrect assumptions is a bad compiler.

In fact, in the rationale of the original spec., I remember reading that the C standard was expressly designed to be a minimal spec, and that just being compliant with the spec was insufficient for the resulting compiler to be fit for purpose.

And of course the original spec did specify a range of acceptable behaviors, and that language is, in fact, still in the standard. It was just made non-binding. However, it is still there, and pretending it is not seems disingenuous at best.

GrumpySloth · on Feb 3, 2023

> A compiler that makes incorrect assumptions is a bad compiler.

I agree, but that includes GCC and Clang. ¯\_(ツ)_/¯

mpweiher · on Feb 3, 2023

Yep. The problem when you rely on free software is that you are not a customer.

account42 · on Feb 2, 2023

That's OK, you can compile your program with -O0 if that's the behavior you want from your compiler.

pitaj · on Feb 2, 2023

Unfortunately, `-O0` doesn't actually disable all optimizations. It probably disables any that would affect this though.

butlerm · on Feb 2, 2023

There are many optimizations that a compiler can perform without relying on the optimization level to determine how to pervert your program that day. If different optimization levels produce different results that is bad thing, something to be avoided, not encouraged.

If it is really necessary to generate random code when some anomalous situation is encountered, that should be a special option to enable dangerous non-deterministic if-you-made-a-mistake-we-will-delete-parts-of-your-program type behavior. I wouldn't consider that optimization though, more like disabling all your compiler's safety features.

gpderetta · on Feb 2, 2023

Which optimizations?

butlerm · on Feb 2, 2023

Loop unrolling for loops that have a static or range bounded number of iterations is a good example. Others include constant expression evaluation, dead code elimination, common subexpression elimination, and static function inlining.

gpderetta · on Feb 2, 2023

If you fold float expressions at compile time you will get different results than runtime if the program has changed the fpu control word.

People complain about dead code elimination all the time when we have these discussions.

Inlining break code that try to read the return address off the stack frame or that make assumptions about stack layout.

Loop unrolling might change the order of stores and load, which is visible behaviour if any of those traps.

I assure you that for each optimization, no matter how trivial, it will break someone code

Narishma · on Feb 2, 2023

> It would be more helpful for the compiler to assume that it does not know what will happen. This is how C actually worked for many years.

Can't you get that behaviour with -O0 or similar?

eestrada · on Feb 3, 2023

Looking at the GCC docs, it seems like it isn't possible to have zero optimizations at any point, even at the lowest optimization levels. To quote the docs "Most optimizations are completely disabled at -O0", so it seems you can't assume you can force correct behavior just by turning off optimization passes.

https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

andrewaylett · on Feb 3, 2023

Part of the difficulty here is working out which transformations are specifically "optimisations". Some compiler passes are required for correct (or indeed any) code generation -- for example, in the compiler I was employed to work with, the instruction selector was a key pass for generating optimal code, but we only had one: if you "turned off optimisations" then we'd run the same instruction selector, merely on less-optimal input. So we'd disable all the passes that weren't required for correctness or completeness, but we'd not write deliberately non-optimal equivalents for the passes that were required.

Beyond that, you've got a contradiction in your statement -- you can't "force correct behaviour" from a compiler at any point. The compiler always tries to generate correct behaviour according to what the code actually says. If you lie to the compiler, it'll try its best to believe you.

C compilers are intended to accept every correct C program. But they can only do this by also accepting a wide range of incorrect C programs -- if we can prove that the program can't be correct then we can reject it, otherwise we have to trust the programmer. Contrast this with Rust, where the intent is to reject every incorrect Rust program. Again, not every program can be clearly judged correct or incorrect, but in this case we'll err on the side of not trusting the programmer. Of course, "unsafe" in Rust and various warnings that can be enabled in C mean you can tell the Rust compiler to trust the programmer and tell the C compiler to disallow a subset of possibly-correct but unprovable programs, but the general intent still stands.

So if you want to write in a language that's like C but with "correct behaviour" then ultimately you'll have to procure yourself a compiler to do that. Because the authors of the various C compilers try very hard to have correct behaviour, and just because you want to be able to get away with lying to their compilers doesn't magically make them wrong.