*First, there is reading an uninitialized variable (i.e. something which does no...

nkurz · on June 29, 2015

I'm not asking for the compiler to consistently identify undefined behavior, only to have a mode where it refuses to silently make 'optimizations' when it does identify UB.

The parallel to your example would be if the for loop was "for(int i = 0; i < j;++i)". If the compiler was able to determine that there is a code path whereby j might be undefined, should it be allowed to remove the body of the loop, even in those cases where the programmer knows by other means that "j >= 1"?

My request is that it either keep the loop body, or complain about the undefined behavior, but not silently make 'optimizations' based on the fact that it has identified the potential for undefined behavior to occur.

Note that I'm just using an 'uninitialized variable' as a hypothetical example. Given a chance, I always compile with -Wall -Wextra, and in practice, GCC, CLang, and ICC (the compilers I use) do a good job of issuing warnings for the use of uninitialized variables. I like this current behavior, but would prefer a philosophical approach that makes warnings like this more rather than less common.

mjn · on June 29, 2015

I agree in cases where the compiler knows that undefined behavior is taking place. A lot of the silent optimizations LLVM and GCC make are in cases where the compiler isn't really sure it has identified undefined behavior, though.

To put it in classical logic terminology, one case is modus ponens reasoning. Undefined behavior implies the compiler can do whatever it wants. The compiler finds undefined behavior. Therefore it does whatever it wants. This is the case where it'd be better for the compiler to error out than do something nutty.

But many of the optimizations are doing modus tollens reasoning. If X were true, then the program would perform undefined behavior. Conforming programs do not perform undefined behavior. Therefore NOT-X must hold in conforming programs, and this fact can be used in optimizations.

kazinator · on June 29, 2015

> If the compiler was able to determine that there is a code path whereby j might be undefined, should it be allowed to remove the body of the loop, even in those cases where the programmer knows by other means that "j >= 1"?

No; rather, the correct logic is that compiler must preserve the body of the loop if there exists the possibility that it can be reached by a valid code path without any undefined behavior (j is defined, and so forth). Only if the compiler can prove that no well-defined execution path can reach the body can it remove it.

(A bad idea to do without any warning, though. If undefined behavior is confirmed, it should be diagnosed.)

lambda · on June 29, 2015

> only to have a mode where it refuses to silently make 'optimizations' when it does identify UB.

That's not always possible. It's not that it makes the optimization when it identifies UB. It's that it makes an optimization that is valid to make if UB doesn't occur, but if UB were to occur then that optimization could cause all kinds of unexpected problems. But the compiler can't necessarily identify those cases.

Please read the "what every C programmer should know about undefined behavior" series of articles from LLVM; they describe the reason why they can't, in general, provide warnings or errors for these cases in which optimizations rely on lack of undefined behavior:

http://blog.llvm.org/2011/05/what-every-c-programmer-should-...

The third article describes why the compiler can't, in general, warn about those cases in which it's relying on lack of UB, but you should read the first two as well.

Note that for some of those cases, clang and GCC have recently added undefined behavior sanitizers, invoked via "-fsanitize=undefined", which can help even more than the warnings they can add. However, what they do is add extra instrumentation to the executable, and then either log a warning or crash when you hit undefined behavior. The runtime aspect helps avoid the "getting this right would involve solving the halting problem" aspect of why they can't, in general, provide appropriate warnings, but it does mean that this is generally only appropriate in test builds, and that you will only find the undefined behavior that you can trigger during test, while there may be more hiding that only show up in obscure circumstances.

If you really don't want undefined behavior, it's best to use a language, like Rust, which does not have any undefined behavior (outside of "unsafe" blocks). The problem with any kind of warnings that are tacked on after the design of the language is that you are either going to get lots of false positives, lots of false negatives, or both. With a language that is designed not to allow undefined behavior, you know that if the code compiles, it doesn't invoke UB.

markus2012 · on June 29, 2015

Fyi Rust has solved this problem; the compiler forbids the use of undefined variables. Compile error example:

    let x;
    println!("x:{}", x);

xamuel · on June 29, 2015

Not a rust programmer so forgive me if this is a dumb question.

Sometimes in C one might initialize a variable by passing a pointer to it to an init function:

  void f( void )
  {
    int i;
    bool success;

    success = init( &i );

    if ( success )
      do_stuff( i );
  }

That "init" function might be located in a separate .c file, so there's no way for the compiler to know whether or not the memory whose address is passed to init is initialized or not. So how can Rust "solve" the problem? Does Rust simply not allow taking addresses of variables? Or does it not use .o files, compile all codefiles at once and actually analyze globally for uninitialized variables?

AaronFriel · on June 29, 2015

The `init` function you use would not be valid. You would instead write something like this:

    fn f() {
        if let Some(i) = init() {
            do_stuff(i);
        }
    }

In this case, the `init` function would return an `Option<i32>`. In a failure state, this would return `None`, and the pattern match would fail. In a success state, this would return `Some(i)`, where i corresponds to the variable you describe.

The Rust pattern is not only safer, but briefer than yours. It describes the code flow such that you can't remove or repeat a part and end up with inadvertently broken code, and it's memory safe. There is no way for `init` to blow up the stack (whereas in your example, a malicious or buggy init can use the address of i to smash the stack.)

arielby · on June 29, 2015

Rust doesn't allow this direct use-case. You can have conditionally-successful initialization by returning an `Option` or `Result`, however.

anon4 · on June 30, 2015

This should be an error and you should be required to write

    int i = 0; // or some other default value

SamReidHughes · on June 30, 2015

And dozens of other languages.

anon4 · on June 30, 2015

I love Java's behaviour. It gives a few false positives as you've noticed, but you can fix that just by doing

    Object a = null;

and go on your way. You do run the risk of a null pointer exception if you don't assign it a valid object reference, though.