Why doesn't GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)? (2013)

Khaine · on March 27, 2014

Completely relevant podcast discussion http://edgecasesshow.com/083-floating-point-numbers-are-a-le...

mahmud · on March 27, 2014

Wow! Thank you. I didn't realize there were such legit programming podcasts around. Any other similar ones I need to know about?

mietek · on March 27, 2014

Check out The Haskell Cast:

http://www.haskellcast.com

lstamour · on March 27, 2014

I also follow Debug podcast, some C9 efforts and some vendor podcasts. It's rare for me to discover the type of high quality content in audio you used to find at places like ITconversations back in the day, mostly because people use video/YouTube these days. But I try to keep an ear out ;-)

tikhonj · on March 27, 2014

Because, of course, floating point addition and multiplication is not associative. This turns out surprisingly easy to demonstrate:

    0.1 + (0.2 + 0.3) = 0.6
    0.1 + 0.2 + 0.3 = 0.6000000000000001

and the same for multiplication:

    0.1 * 0.2 * 0.3 = 6.000000000000001e-3
    0.1 * (0.2 * 0.3) = 6.0e-3

It actually isn't "surprising" if you understand how the format works. It essentially uses scientific notation but in binary, with a set number of bits for both the mantissa and the exponent as well as a few changes thrown in for better behavior at its limits (like denormalization). This means that it can't directly express numbers which are very easy to write in decimal form, like 0.1, just like we can't express 1/3 as a finite decimal. It's designed to manage this as well as possible with the small number of bits at its disposal, but we still inevitably run into these issues.

Of course, most programmers only have a vague idea of how floating point numbers work. (I'm certainly among them!) It's very easy to run into problems. And even with a better understanding of the format, it's still very difficult to predict exactly what will happen in more complex expressions.

A really cool aside is that there are some relatively new toys we can use to model floating point numbers in interesting ways. In particular, several SMT solvers including Z3[1] now support a "theory of floating point" which lets us exhaustively verify and analyze programs that use floating point numbers. I haven't seen any applications taking advantage of this directly, but I personally find it very exciting and will probably try using it for debugging the next time I have to work with numeric code.

A little while back, there was an article about how you can test floating point functions by enumerating every single 32-bit float. This is a great way of thinking! However, people were right to point out that this does not really scale when you have more than one float input or if you want to talk about doubles. This is why SMT solvers supporting floating point numbers is so exciting: it makes this sort of approach practical even for programs that use lots of doubles. So you can test every single double or every single pair of doubles or more, just by being clever about how you do it.

I haven't tried using the floating point theories, so I have no idea how they scale. However, I suspect they are not significantly worse than normal bitvectors (ie signed/unsigned integers). And those scale really well to larger sizes or multiple variables. Assuming the FP support scales even a fraction as well, this should be enough to practically verify pretty non-trivial functions!

[1]: http://z3.codeplex.com/

alco · on March 27, 2014

As the answer on stackoverflow mentions, there is -ffast-math option to gcc that tells the compiler to treat fp operations as associative. I've done some testing:

  $ clang -v
  clang version 3.3 (tags/RELEASE_33/final)
  Target: x86_64-apple-darwin12.4.0

  double pow(double);
  
        LLVM bitcode                   Assembly

  // a*a*a*a*a*a                 
  %1 = fmul double %a, %a       |  movaps %xmm0, %xmm1
  %2 = fmul double %1, %a       |  mulsd  %xmm1, %xmm1
  %3 = fmul double %2, %a       |  mulsd  %xmm0, %xmm1
  %4 = fmul double %3, %a       |  mulsd  %xmm0, %xmm1
  %5 = fmul double %4, %a       |  mulsd  %xmm0, %xmm1
  ret double %5                 |  mulsd  %xmm0, %xmm1
                                |  movaps %xmm1, %xmm0

  // a*a*a*a*a*a -ffast-math    
  %1 = fmul fast double %a, %a  |  mulsd  %xmm0, %xmm0
  %2 = fmul fast double %1, %1  |  movaps %xmm0, %xmm1
  %3 = fmul fast double %1, %2  |  mulsd  %xmm1, %xmm1
  ret double %3                 |  mulsd  %xmm0, %xmm1
                                |  movaps %xmm1, %xmm0

  // (a*a*a)*(a*a*a)            
  %1 = fmul fast double %a, %a  |  movaps %xmm0, %xmm1
  %2 = fmul fast double %1, %a  |  mulsd  %xmm0, %xmm0
  %3 = fmul fast double %2, %2  |  mulsd  %xmm1, %xmm0
  ret double %3                 |  mulsd  %xmm0, %xmm0

DannyBee · on March 27, 2014

I wrote GCC's reassociation pass.

LLVM and GCC use the same algorithm for reassociation (and as far as i know, there is still no good literature on tradeoffs one way or the other), so you will get the same results, modulo some small differences implementation differences.

The reassociation passes exist mainly to promote redundancy elimination, and so the factorizations they perform are tilted towards making binary operations that look the same. Factorization and transformation into pow calls is also done, but this is just because it's easy to do :)

Note that the operations it forms may be "less than ideal" from a register pressure perspective, since it does not take this into account at the level reassociation is being performed (it assumes something will later reassociate them some other way if it wishes).

nly · on March 27, 2014

Just to make your significant digits clearer:

    0.1 + 0.2 + 0.3   = 0.6000000000000000900
    0.1 + (0.2 + 0.3) = 0.5999999999999999800

    0.1 * 0.2 * 0.3   = 0.0060000000000000010
    0.1 * (0.2 * 0.3) = 0.0060000000000000001

Using -ffast-math doesn't effect these results.

haberman · on March 27, 2014

I find it illuminating to look at the exact values of the floating point numbers. This takes away an air of mystery: floating point numbers do in fact have very specific values, which we can inspect exactly.

Here's 0.1 + 0.2 + 0.3:

        0.1 (0.1000000000000000055511151231257827021181583404541015625)
      + 0.2 (0.200000000000000011102230246251565404236316680908203125)
           = 0.3000000000000000166533453693773481063544750213623046875
    rounded: 0.3000000000000000444089209850062616169452667236328125

      + 0.3 (0.299999999999999988897769753748434595763683319091796875)
           = 0.600000000000000033306690738754696212708950042724609375
    rounded: 0.600000000000000088817841970012523233890533447265625

The operands can't be represented exactly as floating point so they are rounded to the nearest floating-point number before the addition even happens. And once the addition does happen, it is rounded to the nearest representable floating point value.

Now we can look at the same for 0.1 + (0.2 + 0.3):

        0.2 (0.200000000000000011102230246251565404236316680908203125)
      + 0.3 (0.299999999999999988897769753748434595763683319091796875)
           = 0.500000000000000000000000000000000000000000000000000000
    rounded: 0.500000000000000000000000000000000000000000000000000000

      + 0.1 (0.1000000000000000055511151231257827021181583404541015625)
           = 0.6000000000000000055511151231257827021181583404541015625
    rounded: 0.59999999999999997779553950749686919152736663818359375

It's interesting to note that the first addition ends up being exact! This is a bit lucky, because 0.5 is exactly representable in floating point and the initial operands got rounded in opposite directions. It's also interesting that 0.6... only turns into 0.599999... at the point that we round the answer.

You can play around with this stuff conveniently using the Python "Decimal" module.

    $ python
    Python 2.7.5 (default, Aug 25 2013, 00:04:04)-
    [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from decimal import Decimal, getcontext
    >>> getcontext().prec = 1000  # To prevent truncation/rounding
    >>> Decimal(0.1)
    Decimal('0.1000000000000000055511151231257827021181583404541015625')
    >>> a = Decimal(0.1) + Decimal(0.2)
    >>> a
    Decimal('0.3000000000000000166533453693773481063544750213623046875')
    >>> Decimal(float(a))  # get the value rounded to the nearest fp number
    Decimal('0.3000000000000000444089209850062616169452667236328125')

You can also use my tool "dumpfp" (https://github.com/haberman/dumpfp) to inspect the value of a floating-point number in detail:

    $ ./dumpfp 0.1
    Single Precision (IEEE 32-bit):
               raw = 0x3dcccccd
              sign = 0x0
          exponent = 0x7b (-4)
       significand = 0x4ccccd

       VALUE CALCULATION =
           significand   (1 + 5033165/2^23  (1.60000002384185791015625))
         * 2^exponent    (2^-4)
         = VALUE         (13421773/2^27  (0.100000001490116119384765625))

    Double Precision (IEEE 64-bit):
               raw = 0x3fb999999999999a
              sign = 0x0
          exponent = 0x3fb (-4)
       significand = 0x999999999999a

       VALUE CALCULATION =
           significand   (1 + 1351079888211149/2^51  (1.600000000000000088817841970012523233890533447265625))
         * 2^exponent    (2^-4)
         = VALUE         (3602879701896397/2^55  (0.1000000000000000055511151231257827021181583404541015625))

Scaevolus · on March 27, 2014

Note that recent Python versions use Gay's correctly-rounded conversions, so .1+(.2+.3) displays as .6, since that's equivalent when converted back to binary floating point. http://bugs.python.org/issue1580

whoopdedo · on March 28, 2014

If you want an exact representation, C99 has the %a format specifier for base-16 floating point. It seems that Python 3 doesn't support it but Lua 5.2 does.

    Lua 5.2.2  Copyright (C) 1994-2013 Lua.org, PUC-Rio
    > print(string.format("  0.1 (%a)\n+ 0.2 (%a)\n= %a\n+ 0.3 (%a)\n= %a", 0.1,0.2,0.1+0.2, 0.3, 0.1+0.2+0.3))
      0.1 (0x1.999999999999ap-4)
    + 0.2 (0x1.999999999999ap-3)
    = 0x1.3333333333334p-2
    + 0.3 (0x1.3333333333333p-2)
    = 0x1.3333333333334p-1
    > print(string.format("  0.2 (%a)\n+ 0.3 (%a)\n= %a\n+ 0.1 (%a)\n= %a", 0.2,0.3,0.2+0.3, 0.1, 0.2+0.3+0.1))
      0.2 (0x1.999999999999ap-3)
    + 0.3 (0x1.3333333333333p-2)
    = 0x1p-1
    + 0.1 (0x1.999999999999ap-4)
    = 0x1.3333333333333p-1

enupten · on March 28, 2014

I don't know if you realise that by converting a double "0.1" into the python decimal of a much larger size, you're bound to get garbage.

Edit: I see what you're doing now. I didn't appreciate enough how converting to decimals had such unpleasant effects :) Thanks.

haberman · on March 28, 2014

I'm not sure what you're trying to say. 0.1000000000000000055511151231257827021181583404541015625 is the precise value of float(0.1). It is not "garbage" or an approximation in any way. It is exact.

userbinator · on March 27, 2014

> Of course, most programmers only have a vague idea of how floating point numbers work.

From my experience, most programmers have a vague idea of how basic integers work, so that's not so surprising. (I'm in the group who only knows a little more than average about floating-point, but for what I do, I don't use FP math much either.)

mark-r · on March 27, 2014

I've always thought it would be helpful to have floating point values that tracked the upper and lower bounds of the possible error. That would make analysis of the results much easier.

jordigh · on March 27, 2014

That exists and is called interval arithmetic:

https://en.wikipedia.org/wiki/Interval_analysis

wbl · on March 27, 2014

That's called ball arithmetic, and there are many libraries for it, such as arb. http://fredrikj.net/arb/ It's useful in mathematical calculation, but usually with floating point calculations you have some error bounds coming from your algorithm which are "good enough" if the result is going to be the value of something with 5% tolerance.

robzyb · on March 27, 2014

But what about the accuracy of the error bounds?

pascal_cuoq · on March 27, 2014

It is easy to track the real result of the computation with floating-point intervals: round down when computing the lower bound, and up when computing the upper bound of each interval.

ambrop7 · on March 27, 2014

A common and simple way to treat FP mathematically is by bounding the result with the machine epsion u [1]. For example, given FP numbers x and y:

x <fp_mul> y = (1+e)xy ; abs(e) <= u

The u ideally depends only on the FP format and rounding mode. I say ideally because library functions like sqrt may be less precise than the ideal (which is the result being equal to the exact value rounded according to the rounding mode). Note that the abstraction only as long as you don't go outside the range of normal exponents (it breaks when you reach infinity or subnormal).

EXAMPLE. Suppose we have a FP value L representing a length of something, and want to split it into segments no longer than the FP value M. Assume both integers are small enough to be representable in FP. The answer in exact arithmetic is:

N = ceil(L/M).

However if we evaluate this in FP, we may get a result N'<N for certain edge values. Which means one of the segment lengths we output will be greater than M!. So the result of the algorithm is incorrect!

We can fix it by multiplying (L/M) with a number sufficiently greater than one before the ceil, at the cost of sometimes deciding to split into one more segment than mathematically necessary. The fixed FP algorithm may then be:

N' = ceil(1.00005 <fp_mul> (L <fp_div> M))

After we apply the machine epsilon formulas we come to the conclusion that:

N' = ceil( (1+e1)(1+e2)1.00005 (L/M) ) ; abs(e1)<=u, abs(e2)<=u

We then show that:

(1+e1)(1+e2)*1.00005 >= 1.

This proves that what the algorithm gives to the ceil function is not less the exact value L/M, and that N'>=N. And ceil() can't make errors by definition, assuming the integer value is representable. QED

[1] http://en.wikipedia.org/wiki/Machine_epsilon

stephencanon · on March 28, 2014

The “malicious” rounding that you are attempting to protect against cannot actually occur in IEEE-754 floating-point. You are jumping at shadows, making lots of results less accurate to protect against something illusory.

Why is this the case?

In order for ceil(L/M) to produce an incorrect result, we would need to have the mathematically exact result be of the form N + epsilon, with 0 < epsilon <= ulp(N)/2, and N an exact integer. But, can that ever actually happen? No. If it did, we would have a floating-point number L = M(N+epsilon) exactly, or L = MN + delta, with 0 < delta < ulp(MN)[1], which can never happen by the definition of ulp.

This illustrates a difficulty naive backwards-error analysis as commonly taught in numerical analysis courses. While it provides reasonable conservative bounds, which is often sufficient for engineering needs, it misses much of the subtlety of floating-point arithmetic, and leads people to take unnecessary steps to guard against “errors” that aren’t really there.

[1] Lemma: For any binary floating-point type, ulp(xy) > x ulp(y)/2.

Proof: First note that rounding of the product xy will only ever make ulp(xy) larger, so we can safely ignore it. Use the formula ulp(x) = 2^(floor(log2(x)) - P) where P is the precision in bits and expand:

    ulp(x*y)  =  2^(floor(log2(xy)) - P)
              =  2^(floor(log2(x) + log2(y)) - P)
             >=  2^(floor(log2(x)) + floor(log2(y)) - P)
              =  2^(floor(log2(x)) ulp(y)
              >  2^(log2(x) - 1) ulp(y) = x ulp(y)/2.

ambrop7 · on March 28, 2014

I suspect a bug in your proof right at the beginning, where you say "exact result be of the form N + epsilon, with 0 < epsilon <= ulp(N)/2, and N an exact integer". Assuming you're referring to the result of the entire expression including ceil, can we still say that, considering that ceil() is far from being a continuous function?

stephencanon · on March 29, 2014

I’m referring to the mathematically exact result L/M. Sorry for being unclear.

ambrop7 · on March 28, 2014

Anyway, my example was a bit too theoretical. A better version is one where L and M are integers which are not necessarily representable. Here: http://ideone.com/1Xkhjp

stephencanon · on March 29, 2014

Yes, of course it’s possible to exhibit errors if you change the problem (though it should be noted that the source of error here is entirely in the fact that l is not representable). As soon as you allow representation error, of course, your tolerance-based approach goes out the window; what if the representation error in l and m is larger than your tolerance accounts for?

If I wanted a correct bound for your revised problem, I would require that l be an upper bound on the true value (by forcing it to be rounded up earlier computations, for example) and that m be a lower bound on the true value. Then the simple computation delivers a correct result regardless of the magnitude of the error in l or m.

ambrop7 · on March 28, 2014

"This illustrates a difficulty naive backwards-error analysis"

It's not naive. It's conservative, the exact opposite.

stephencanon · on March 29, 2014

I pointed part of this out:

> it provides reasonable conservative bounds

It is the technique of backwards-error analysis that is naive, not the resulting bound. A “naive” method is not “simple” or “dumb” (or “aggressive” or whatever you think the opposite of “conservative” might be). It is simply a method that doesn’t use some a priori knowledge that’s available for the specific problem at hand. Backwards-error analysis is a naive technique in that it doesn’t benefit from specialized theorems of floating-point that allow one to establish tighter bounds, like the one I proved.

ambrop7 · on March 27, 2014

Though, as nice as the examples are, FP rounding problems in practice are rarely due to binary/decimal conversions, which are only relevant when reading and writing decimal encoded numbers. These are normally absent when doing tasks such as moving motors based on sensor readings.

ronaldx · on March 27, 2014

Are there cases where a program says a * a * a but evaluating it as a * (a * a) would be wrong?

I gather (at least in your examples) that dealing with bigger numbers first typically affords more 'real' accuracy?

pascal_cuoq · on March 27, 2014

There are no values of a where a * a * a differs from a * (a * a), because floating-point multiplication is commutative.

With arbitrary numbers a, b, and c, it would be a different story for a * b * c. Without further knowledge of a, b, c, there would be no particular reason to do the operations in one order or another, but the result of a * b * c and a * (b * c) could definitely differ by one ULP, probably by two in ordinary cases, by more if underflow occurred (but one of the variables would already have to be very small for this to happen).

acqq · on March 27, 2014

> floating-point multiplication is commutative.

It's not, except in the special case of one and the same number.

pascal_cuoq · on March 27, 2014

You have to be kidding. Otherwise, read up on the definition of “commutative” here: http://en.wikipedia.org/wiki/Commutative_property

Look for the first occurrence of the word “commutative” in this page: http://en.wikipedia.org/wiki/Floating_point

But remember, if both the Wikipedia editors and I are wrong, you only need to produce one pair of operands a and b such that a * b does not produce not the same result(1) as b * a to be vindicated.

(1) I suggest we consider all NaN values to be “the same result” for the purpose of this challenge.

acqq · on March 28, 2014

You're right of course. I was thinking about associtivity, not seeing what's really written. Sorry. I didn't expect discussing commutativity in the context (it's the non-associativity that doesn't allow all the "automatic" optimizations expected by some) where the formula in the same sentence is with three elements and the different order of computation. Which doesn't change the fact that I've made an error.

ska · on March 28, 2014

No, fp (IEEE-745) + and * are both commutative. They aren't associative or distributive in general.

muyuu · on March 27, 2014

Yep grouping can lead to better real accuracy. But then you lose exact IEEE compatibility (get slightly different results that will fail a comparison).

ronaldx · on March 27, 2014

Thanks, I now see more clearly why this is important for comparisons :)

nly · on March 27, 2014

Speaking of comparisons... -Wfloat-equal will warn whenever you attempt to compare floats or doubles for equality.

AlexanderDhoore · on March 27, 2014

Rule number one of floats: don't use == on them.

stephencanon · on March 27, 2014

Pure cargo-culting. There are circumstances in which it’s perfectly appropriate to compare floating-point data for equality. There are also circumstances in which a comparison with some tolerance is more appropriate. Understand what situation you’re in, and understand the tools that you’re using.

TeMPOraL · on March 27, 2014

Comparing floats for equality is equivalent to comparing them with tolerance that you don't control. I don't see a good reason to risk like that.

millstone · on March 27, 2014

Comparing floats for equality is equivalent to a comparison with zero tolerance, not an uncontrolled tolerance.

Consider storing a timestamp as a number of seconds since the epoch. We might use == to check if the timestamp has changed.

nitrogen · on March 27, 2014

Read the wikipedia entry on IEEE754-1985 (it has better diagrams than the main 754 entry): http://en.wikipedia.org/wiki/IEEE_754-1985#Representation_of...

Knowing is better than fearing.

stephencanon · on March 27, 2014

No, it's not equivalent to that at all. Why would you think such a thing?

dandrews · on March 27, 2014

Oh, there must be a Common Lisp predicate that addresses this. prettymuchequalsp?

More seriously, I remember APL\360 SETFUZZ from long before many of you whippersnappers were born, which told the system to disregard so-many bits in an 'equals' evaluation.

FP implementations are generally evil, and errors cascade in irritating fashion. 50% of a numerical analysis class I took once was dedicated to error estimation. Stick to rationals if you can - it'll save your hairline.

howeman · on March 27, 2014

Comparing floats is sometimes useful. For example, if you are testing that you have copied a data structure correctly.

octo_t · on March 27, 2014

Basically:

floating point is hard. Programmers get it wrong, compilers get it right.

(normally).

pascal_cuoq · on March 27, 2014

In the case of C compilers, they get floating-point right by re-defining what the language definition means: http://arxiv.org/abs/cs/0701192

There has been some progress since the time this report was an accurate description of the situation, but the situation is still mostly terrible: http://stackoverflow.com/questions/17663780/is-there-a-docum...

And soon, fused-multiply-add will be common enough that compilers will start generating by default the instruction for code that contains addiction and multiplications, causing a regression with respect to the current situation.

pm215 · on March 27, 2014

Using fused-multiply-add unless the user has given the compiler option to deviate from strict IEEE754 results would be a compiler bug. I'm pretty sure gcc gets this right -- it will use FMAC on ARM (for instance) only if you pass it the fast-math option.

pascal_cuoq · on March 27, 2014

A C99 compiler pragma, FP_CONTRACT, lets the programmer decide locally whether using an FMA instruction for multiplication and addition is allowable.

The debate is whether the pragma should be on or off by default. I am a strong proponent of the “off” default. Some argue for the “on” default. Even if the “on” camp wins at some point for some compiler, you can always use the line below to make a standard-compliant compiler that is otherwise respectful of IEEE 754 respect multiplication and addition:

#pragma STDC FP_CONTRACT ON

But the point is that you will have to use the incantation explicitly.

If you are an ordinary programmer who does not know about the incantation, you will eventually encounter a situation where, say, (a * b + c == a * b + c) evaluates to false despite the results being finite, and it will reinforce the superstition that floating-point is black magic. And since this example is remarkable, you will tell others about it, the example will be repeated and distorted, and the superstition will never die.

stephencanon · on March 27, 2014

I think the most compelling argument for “off” being the default is xx - yy. If FP_CONTRACT defaults to “on”, most uses of this expression will become fma(x,x,-yy), which will result in the expression not being exactly zero when x == y.

The most compelling argument for “on” being the default is that most users don’t care, and “on” allows compiler optimizations that save energy. Numerical consistency or independence of Russian gas[1]?

[1] Yeah, right. "Numerical consistency or 20 seconds longer playtime of flappy bird on a charge?” is a way better argument.

jw2013 · on March 27, 2014

slightly off-topic here, floating point operation is even much much more harder inside kernel, mainly because of trapping mechinism. In fact most OS either do not support floating point operation inside kernel or just turn that option off by default. But the good thing is you do not really need floating point operations inside kernel, but when you do and needs to modify your kenrel code to achieve that, it is extremely implementationally intensive and hard to made one (and the runtime cost is expensive also).

loeg · on March 27, 2014

Hm, in our kernel, when we need to do floating point, we just set a flag on the thread control block which preserves the floating registers on context switch, and then do our floating point operations. Maybe there is some API abstraction hiding the messy details :).

sheetjs · on March 27, 2014

Always a great read: "What Every Computer Scientist Should Know About Floating-Point Arithmetic"

PDF rendering: http://www.cse.msu.edu/~cse320/Documents/FloatingPoint.pdf

HTML rendering: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.ht...

TeMPOraL · on March 27, 2014

One thing I always wondered about is why are we using floating point arithmetic at all, instead of fixed point math with explicitly specified ranges (say, "here I need 20 bits for the integer part and 44 for the fractional part")? What is the practical value of having a floating point that would justify dealing with all that complexity and conceptual problems they introduce?

pascal_cuoq · on March 27, 2014

Have you used fixed-point arithmetic much? You need to specify the width and the position of the dot for each intermediate variable (for instance, multiplying two numbers that have 20 bits for the integer part may require 40 bits for the integer part of the result). Since the operands in 20.44 format had only 64 bits of precision, it only makes sense to represent the result to 64 bits, so the format of this result should probably be 40.24.

If you don't do this, your program may still work but you are wasting space. For instance, if the multiplication of your two 20.44 numbers does not overflow the 20.44 format, good for you, but it means that between the two operands, you were wasting 20 bits in total for leading zeroes that carried no information.

Consider floating-point as a way not to have to make all these decisions at compile-time and still use space extremely efficiently to represent as much precision as possible.

stephencanon · on March 27, 2014

Also: when people screw with FP, 99% of the time the answer is still vaguely sane or makes it very obvious that something went wrong. When people screw up with fixed point, 99% of the time the result is indistinguishable from noise.

StillBored · on March 27, 2014

Because of hardware, and the fact that fixed point doesn't solve the core problems with floating point as it stands today. Its hard enough to get good stable/fast floating point hardware, much less adding the complexities of variable sized words, with variable sized mantissa/exponents.

Really, what your probably asking for is decimal floating point. Which, has its own set of limitations when implemented in hardware.

And if you happen to have an application where what you really need is more precision there are a number of bignum libraries like GMP (https://gmplib.org/) which provide that. But, throwing more precision at a problem doesn't alleviate the need to do error estimation.

adwn · on March 27, 2014

Dynamic range. A double-precision variable has the same precision at 1e+10 as at 1e-10, your proposal does not. This is not a big problem if you have a good idea about the numbers in your application (e.g., digital signal processing), but it is a big problem if you write a matrix library (or anything else which deals with real numbers) for arbitrary inputs.

TeMPOraL · on March 27, 2014

But OTOH having "the same precision" means loosing accuracy. This is a common case in computer games, especially with outdoor worlds like space simulators. When you move away from the world center, you start loosing significant digits on the fractional side and your calculations soon turn into nonsense. When you go far enough, the difference between two consecutive numbers can get as high as 0.1, at which point everything falls apart.

KSP scene has even a fun name for it, Deep Space Kraken [0].

This can be of course mitigated by subdividing the world (octrees and stuff) and stacking coordinate frames (though for some reason nobody seems to be doing that), but you said that floats are good "for arbitrary inputs". And if you have small and big floats in your computations, the result might quickly stop making any sense.

I admit I'm confused about the topic, any clarification is welcome.

[0] - http://wiki.kerbalspaceprogram.com/wiki/Deep_Space_Kraken

Dylan16807 · on March 27, 2014

Any particular fixed point format will have slightly more precision/accuracy at the edges than a floating point format with the same number of bits, but then it will hit a brick wall and stop working entirely. And it's easy for it to have less precision/accuracy in practice because it's hard to keep each intermediate result at the exact right exponent.

Fixed point won't save you from the space kraken. Instead of going so deep into space that consecutive numbers gradually become .1 apart, your ship will spontaneously explode a fiftieth of that distance out.

It's probably much easier to use two coordinate frames than to properly use fixed point numbers. Then a ship can run physics relative to its center of mass or a nearby planet/moon and have very high precision/accuracy.

zurn · on March 28, 2014

On one hand FP lets you program without thinking too hard about the numerical properties of your program - until you hit a gotcha. On the other hand FP is pretty powerful for many domains if you're a FP wizard. Like physics simulation.

There have been other approaches besides fixed that people have tried out, such as interval arithmetic and exact (rational) arithmetic, but they all have their own problems.

pkulak · on March 27, 2014

> Um... you know that aaaaaa and (aaa)(aaa) are not the same with floating point numbers, don't you?

Why would so many people upvote such a condescending comment?

kawliga · on March 27, 2014

The exact same reason I stopped asking/answering questions on SO years ago - too many condescending trolls trying to act smart.

d23 · on March 27, 2014

Random sidenote, but I've never seen an SO answer be upvoted in realtime until now. Didn't even know they programmed such functionality.

bashcoder · on March 27, 2014

It's a pretty impressive system. If you're interested in their architecture, here's a YouTube video worth watching: "Marco Cecconi The Architecture of Stack Overflow - Developer Conference 2013" [0]

[0] https://www.youtube.com/watch?v=OGi8FT2j8hE

willvarfar · on March 27, 2014

Dup:

Previously on HN https://news.ycombinator.com/item?id=2684254

octo_t · on March 27, 2014

About 3 years ago...

chrisBob · on March 27, 2014

3 years is a loooooong time for the internet. I am going to put a note on my calendar to post this again in 3 years. What is the median age of an active HN account anyway?

termain · on March 27, 2014

Perhaps you'd like to to tell us about FP on the Mill.

username42 · on March 27, 2014

The non associativity of addition is obvious, but for the multiplication, I understand why it does not give always the same answer, but I do not see why a change of the order could change the accuracy.

Ono-Sendai · on March 27, 2014

The result depends on the number of multiplications, and hence roundings, that are done.

ww2 · on March 27, 2014

How would a functional language compiler deal with this case? like GHC?

pascal_cuoq · on March 27, 2014

It doesn't. If we forget about FPU flags, floating-point multiplication is pure. That does not change anything to the fact that a * (b * c) is different from (a * b) * c.

The problem is not side-effects in multiplication. The problem is that * is not associative.

skrebbel · on March 27, 2014

You could try it out!

pygy_ · on March 27, 2014

I already knew about the lack of associativity of IEEE floats, but TIL that `pow(x,n)` is more precise than chained multiplications.

Good to know.

rbonvall · on March 27, 2014

Technically it is not a matter of precision but of accuracy.

pygy_ · on March 27, 2014

Interesting, thanks for the correction.

I'm neither a native English speaker nor an engineer, and I didn't know about the distinction.

https://en.wikipedia.org/wiki/Accuracy_and_precision

taeric · on March 27, 2014

Also a crap ton faster, right?

Narishma · on March 27, 2014

Yes, if by faster you mean slower.

taeric · on March 27, 2014

Ha! Yes, for floating point. I meant my post to more communicate that moving it behind a library/method would allow you to make that decision elsewhere easily. I did not even come close to making that point. (I got distracted reading some interesting hacks that approximate this in insane speed.)

jevinskie · on March 27, 2014

There is a discussion from today and yesterday on llvm-dev that deals with floating point guarantees and optimizations: http://thread.gmane.org/gmane.comp.compilers.clang.devel/358...

woodchuck64 · on March 27, 2014

floating point in short: computer representation of scientific notation(1) with sign bit, exponent(2) and coefficient(3) crammed in the same word.

1. http://en.wikipedia.org/wiki/Scientific_notation

2. base 2, biased by 127, 8 bits (IEEE float)

3. base 2, implicit leading 1, 23 bits (IEEE float)

http://en.wikipedia.org/wiki/Single-precision_floating-point...

picomancer · on March 29, 2014

It actually does, if a is an integer.