First, there is reading an uninitialized variable (i.e. something which does not necessarily have a memory location); that should always be a compile error. Period.
You're out of luck. You have to solve the halting problem to statically analyze whether or not a variable will be used when it's undefined. The reason why this is not solved well is because it's impossible to solve perfectly! Java made another approach that I hate:
For example if you have
Object a;
for(int i = 0; i < 1;++i) a = new Object();
a.toString();
then Java will give you
error: variable a might not have been initialized
even though it's plain and obvious that a is initialized. Things like that make me mad, it's a non-solution.
I'm not asking for the compiler to consistently identify undefined behavior, only to have a mode where it refuses to silently make 'optimizations' when it does identify UB.
The parallel to your example would be if the for loop was "for(int i = 0; i < j;++i)". If the compiler was able to determine that there is a code path whereby j might be undefined, should it be allowed to remove the body of the loop, even in those cases where the programmer knows by other means that "j >= 1"?
My request is that it either keep the loop body, or complain about the undefined behavior, but not silently make 'optimizations' based on the fact that it has identified the potential for undefined behavior to occur.
Note that I'm just using an 'uninitialized variable' as a hypothetical example. Given a chance, I always compile with -Wall -Wextra, and in practice, GCC, CLang, and ICC (the compilers I use) do a good job of issuing warnings for the use of uninitialized variables. I like this current behavior, but would prefer a philosophical approach that makes warnings like this more rather than less common.
I agree in cases where the compiler knows that undefined behavior is taking place. A lot of the silent optimizations LLVM and GCC make are in cases where the compiler isn't really sure it has identified undefined behavior, though.
To put it in classical logic terminology, one case is modus ponens reasoning. Undefined behavior implies the compiler can do whatever it wants. The compiler finds undefined behavior. Therefore it does whatever it wants. This is the case where it'd be better for the compiler to error out than do something nutty.
But many of the optimizations are doing modus tollens reasoning. If X were true, then the program would perform undefined behavior. Conforming programs do not perform undefined behavior. Therefore NOT-X must hold in conforming programs, and this fact can be used in optimizations.
> If the compiler was able to determine that there is a code path whereby j might be undefined, should it be allowed to remove the body of the loop, even in those cases where the programmer knows by other means that "j >= 1"?
No; rather, the correct logic is that compiler must preserve the body of the loop if there exists the possibility that it can be reached by a valid code path without any undefined behavior (j is defined, and so forth). Only if the compiler can prove that no well-defined execution path can reach the body can it remove it.
(A bad idea to do without any warning, though. If undefined behavior is confirmed, it should be diagnosed.)
> only to have a mode where it refuses to silently make 'optimizations' when it does identify UB.
That's not always possible. It's not that it makes the optimization when it identifies UB. It's that it makes an optimization that is valid to make if UB doesn't occur, but if UB were to occur then that optimization could cause all kinds of unexpected problems. But the compiler can't necessarily identify those cases.
Please read the "what every C programmer should know about undefined behavior" series of articles from LLVM; they describe the reason why they can't, in general, provide warnings or errors for these cases in which optimizations rely on lack of undefined behavior:
The third article describes why the compiler can't, in general, warn about those cases in which it's relying on lack of UB, but you should read the first two as well.
Note that for some of those cases, clang and GCC have recently added undefined behavior sanitizers, invoked via "-fsanitize=undefined", which can help even more than the warnings they can add. However, what they do is add extra instrumentation to the executable, and then either log a warning or crash when you hit undefined behavior. The runtime aspect helps avoid the "getting this right would involve solving the halting problem" aspect of why they can't, in general, provide appropriate warnings, but it does mean that this is generally only appropriate in test builds, and that you will only find the undefined behavior that you can trigger during test, while there may be more hiding that only show up in obscure circumstances.
If you really don't want undefined behavior, it's best to use a language, like Rust, which does not have any undefined behavior (outside of "unsafe" blocks). The problem with any kind of warnings that are tacked on after the design of the language is that you are either going to get lots of false positives, lots of false negatives, or both. With a language that is designed not to allow undefined behavior, you know that if the code compiles, it doesn't invoke UB.
Not a rust programmer so forgive me if this is a dumb question.
Sometimes in C one might initialize a variable by passing a pointer to it to an init function:
void f( void )
{
int i;
bool success;
success = init( &i );
if ( success )
do_stuff( i );
}
That "init" function might be located in a separate .c file, so there's no way for the compiler to know whether or not the memory whose address is passed to init is initialized or not. So how can Rust "solve" the problem? Does Rust simply not allow taking addresses of variables? Or does it not use .o files, compile all codefiles at once and actually analyze globally for uninitialized variables?
The `init` function you use would not be valid. You would instead write something like this:
fn f() {
if let Some(i) = init() {
do_stuff(i);
}
}
In this case, the `init` function would return an `Option<i32>`. In a failure state, this would return `None`, and the pattern match would fail. In a success state, this would return `Some(i)`, where i corresponds to the variable you describe.
The Rust pattern is not only safer, but briefer than yours. It describes the code flow such that you can't remove or repeat a part and end up with inadvertently broken code, and it's memory safe. There is no way for `init` to blow up the stack (whereas in your example, a malicious or buggy init can use the address of i to smash the stack.)
You're out of luck. You have to solve the halting problem to statically analyze whether or not a variable will be used when it's undefined. The reason why this is not solved well is because it's impossible to solve perfectly! Java made another approach that I hate: For example if you have
then Java will give you even though it's plain and obvious that a is initialized. Things like that make me mad, it's a non-solution.