Haskell as fast as C: working at a high altitude for low level performance

bo1024 · on March 25, 2012

This is very cool, but it still makes me extremely skeptical. In brief, the more posts of this type I see, the more I'm implicitly being convinced that writing performant code in high-level languages requires a series of tricks that get around the "natural" or "naive" way to write the code. The impression I get is that high-level languages allow you to express your thoughts concisely, or they perhaps allow you to mimic the performance of low-level languages, but not both at the same time.

Notice that it takes an entire blog post to use Haskell to emulate the performance of a straightforward 20-line C program that can be written in under 60 seconds. There are high-performance tasks which are practically trivial in an imperative language but which merit a conference paper when accomplished in Haskell.

So it's hard to see something like this as an argument that higher-level languages are simplifying anything.

T_S_ · on March 25, 2012

You should think of this post as being aimed at Haskell library writers. The benefit to the application developer is that more and more optimizations are available through identical or nearly identical, familiar interfaces such as those for lists.

In my case (joe average haskell dev) that means I can just stick to normal constructs and switch out the underlying data structure to get the speed benefit. This article actually might lead to such a change in a computer vision package I work on.

You can expect to see this pattern over and over with Haskell, so that by the time you get to data structures that abstract away GPU-based calculations, nobody will be wishing people would just write the darn code in C.

_ezkx · on March 25, 2012

Exactly. A this point you can just write idiomatic haskell-98 list code and just do an

    import qualified Data.Vector.Unboxed as V

and get blazing-fast performance.

Even with the complication of laziness, getting great (for most reasonable definitions of great) performance out of pure haskell code was a solved problem years ago. Doing high-performance IO looks like it's getting there too.

Drbble · on March 26, 2012

Solved how? Most Haskell performance advice (like Tibbel's recent presentation) is to add strictness annotations, which is both non-Haskell98 and non-lazy. So it is solved, but not elegantly or hidden in core libraries.

_ezkx · on March 26, 2012

Fair point. I meant solved in that it's not difficult with a bit of experience and profiling. I actually don't mind the workflow of writing naive (or non-prematurely-optimized) code and then adding strictness hints where the compiler needs it.

_ezkx · on March 29, 2012

Ah, also meant to say that adding strictness with `seq` is haskell 98, and there's nothing so strange about forcing evaluation in a lazy language.

SkyMarshal · on March 25, 2012

Meant to upvote, fat-finger downvoted instead, my bad. Hopefully somebody compensates.

duncanbojangles · on March 26, 2012

This is wildly off-topic, but an idea caught my interest. I upvoted the parent of your comment to compensate for your downvote and it made me wonder how many other people have done the same. It put a bunch of thoughts in my head that I can't seem to form into coherent words, but those thoughts have to do with gaining quite a bit of comment karma from one person's accident, karma trolling, etc. I can't imagine that occurring on HN too any extent, but it's an interesting thought.

bo1024 · on March 26, 2012

Tangentially, there are people in CS who study trust/reputation systems (like karma), and how to design those systems so that people want to participate and "do the right thing". Maybe related.

dkarl · on March 25, 2012

It's simpler to write efficient low-level code in C, but you have to pay the cost of efficiency (in effort, clarity, and LOC) everywhere in your program, in every single function you write. For some kinds of programming, such as kernel coding, almost all of your code needs to be as efficient as you can make it, so that's fine. For a lot of application programming, though, low-level efficiency only matters for a small percentage of the code, so it's a good trade-off to use a more elegant and expressive language, if optimizing the performance-sensitive parts isn't so difficult that it cancels out the savings elsewhere.

That caveat makes analyses like this one interesting, because the trade-off hinges on how hard it is to write efficient code when you need to. For me, the significance of this article is that it's possible to write high-performance Haskell in a simple and readable style if you're a Haskell expert who knows a lot about how the compiler optimizes code. That's a useful thing to know if you're considering using Haskell for a project. (The next thing I would ask is whether you're screwed if you're just an intermediate programmer who knows the language but not the implementation.)

true_religion · on March 25, 2012

Not to take away from Haskell, but this is exactly why many people in academia use Python. It's easy to express an idea in it, and test at a low performing level. Then when you know where your bottle necks are, you can re-write that tiny bit in Cython and gain 90% of the speed improvement of writing the whole thing in C.

freshhawk · on March 25, 2012

As someone who knows a bit of haskell but not a lot I don't see this so much as a series of tricks. Ignoring all the explanatory pieces it comes down to "use the fusible Data.Array.Vector library and unbox your doubles". The unboxing is a given, optimization 101 in haskell, and the fusion stuff is the cool bit.

You can probably make the argument that the tail recursive function is not the "natural" way to write this code but conceptually that's on the same level as loop unrolling in C.

It doesn't take an entire blog post unless you are explaining the compiler tricks working behind the scenes and comparing the assembly produced.

So basically I agree that there is some express concisely vs. low level performance tension here but the takeaway is how little of it there is. This is probably lost if you are also having to try and parse this weird haskell stuff at the same time.

regularfry · on March 25, 2012

The thing is, barring the intervention of marketing departments, compilers don't get more dumb over time. If this is possible now, it ought to be easy soon (for some value of "soon").

Haskell's main benefit, to my mind, is its type safety. That's what saves programmer time in the long run. If that can be fused (hah!) with decent performance - even if it's not quite up to C without contortions - it's a net win.

kruhft · on March 25, 2012

I've always wondered why language designers don't strive to optimize for the trivial cases. There is a natural way to design code for a beginner, and having that map to the most efficient constructs seems to be one of the areas of programming language research that is lacking. Maybe it's the need for the 'sufficiently smart compiler' that is holding things back, but given that PLs are abstractions, couldn't the underlying abstraction be completely different than the user visible abstractions?

Periodic · on March 25, 2012

I believe they do optimize for the simple cases, but just in very different ways. Haskell optimizes for composition, abstraction and expressiveness in a function-application sense. C optimizes for imperative loops, simple functions and controlling your memory layout and execution precisely [There is probably a better characterization of C, please comment].

I think there are plenty of tricks involved in writing performant C code that aren't obvious. Things like cache behavior, memory access patterns, etc. The job of the compiler and PL is to help us by making it unnecessary to worry about such things unless we really need to.

It's a testament to the power of modern programming languages and computer speeds that there are many programmers who don't understand registers, caches, assembly, virtual memory, etc.

pcwalton · on March 25, 2012

In the design of Rust, as a rule of thumb, we've always tried to make the simplest code also the fastest code, to minimize performance footguns. (Writing the Rust compiler in Rust and constantly running it through Instruments.app helps to keep us honest.)

stcredzero · on March 25, 2012

There is a natural way to design code for a beginner

Smalltalk was purposely aimed at children. The thing is, our culture is quite complicated, and there are expectations set up in different fields. A grade school child might find operator precedence confusing, while an engineer would view it as second nature and a non-negotiable feature. Likewise, someone steeped in the Unix way of doing things might find a language incorporating Sed/Awk syntax to be very "intuitive" while someone from a different discipline might find it to be cryptic.

given that PLs are abstractions, couldn't the underlying abstraction be completely different than the user visible abstractions?

They almost always are for PL.

duaneb · on March 25, 2012

Having seen hundreds of these types of blog posts, touting faster-than-c (superluminal?) benchmarks in arbitrary languages, I'm extremely skeptical. Real-world applications are rarely solely limited by small chunks of code that are simple enough to be optimized independently of the rest of the program. Accurate comparisons should be done on large, complex code bases that mirror an equivalent C program.

Unfortunately, those don't exist, so in my mind the true performance potential of haskell is still unknown. I do have high hopes for the language, especially since whole-program optimization and aggressive inlining/code folding should yield very, very efficient code, but as of yet the only large programs in haskell remain GHC and darcs, and darcs is extremely slow.

Still: a single benchmark showing a good result is better than one showing a bad result.

joeyh · on March 25, 2012

At 15 thousand lines, git-annex is only 5 thousand lines less of haskell than darcs. I happen to know, since I wrote git-annex. I'm not sure where you're coming from with your statement about there only being two large haskell programs.

For that matter, I don't know if I'd consider darc's 20 kloc very large. Or that I'd consider another haskell program I wrote, github-backup, to be small -- that 2 kloc program sits at the apex of a lot of libraries, and probably combined they have more lines of code than darcs. Your whole premise about lines of code feels thoroughly flawed to me.

Anyway, As a git extension, git-annex is expected to run quite fast. I've never had any difficulty, in writing git-annex, with the speed of haskell code. I'm sure darcs is slow due to its patch theory thing, not due to its implementation language.

duaneb · on March 25, 2012

> I'm not sure where you're coming from with your statement about there only being two large haskell programs.

I concede the point, but I'd rather say that there's only one large haskell program.

Let's consider large C/C++ programs:

1. Chromium clocks in around 4 million lines of code.

2. LLVM/clang together has a (fuzzy) estimate of ~1165539 LOC on my system.

3. GHC has unknown lines of code, and I am unsure as how to count it, whether to include the standard library, etc, but it has at least 100k, so we'll just call that sufficiently large.

Now, I know haskell can be fast—I've spent a lot of time myself hand-tuning ghc processed code. But I've noticed that unless I can trace the bottleneck to a single function or small set of functions, optimization to near-c levels is extremely difficult. Now, even 2-3x slower than c—far slower than ghc-generated code—is still very fast, so I'm not calling haskell slow. Far from it.

joeyh · on March 25, 2012

Actually, there are packages in hackage with > 100 kloc: CHXHtml KiCS-debugger HaRe

sigil · on March 25, 2012

> Real-world applications are rarely solely limited by small chunks of code that are simple enough to be optimized independently of the rest of the program.

Two reactions to this:

1. If they aren't, maybe they should be. Not sure the large applications of the past are a good guide to the future. Perhaps a large pile of functional code is a clear software engineering win for some value of "large," and we're still discovering this.

2. In my experience, large applications are often performance-limited by small chunks of code. The results of actually profiling never cease to amaze me -- my guesses are so often wrong. Of course, once you fix the bottleneck in one spot, the bottleneck moves to another, but hey that's the game of optimization in any language.

maxcan · on March 25, 2012

A big reason for this is that a big chunk of Haskell adoption is in software you will never know about: financial risk systems for major investment banks (one of which has over 400 KLoC in Haskell), crypto work for the NSA, backends powering parts of large sites (facebook, bump, etc) and the full stack on some smaller ones (planely, yesodweb, and quite a few others I can't mention).

dons · on March 26, 2012

There are several million lines of code in Haskell on Hackage now - so any non-trivial Hackage-based app is going to pull in 100k+ lines.

The current project I'm working on commercially has 600k lines of Haskell, though I expect we'll hit the 1M mark in the next year or two.

It benefits massively from the compiler being able to see the forest from the trees.

From my point of view, this stuff was all solved years ago. There's nothing much left to prove -- so get on with building great things in Haskell.

Periodic · on March 25, 2012

The benchmark is pretty simple and artificial.

I think what we should be looking at is that the performance can asymptotically approach C, where C is used because it's the closest to the theoretical optimum assembly that is still practical for writing large programs.

If a language can get arbitrarily close to C in performance, it starts to become a much better design trade-off to use.

Drbble · on March 26, 2012

One of the touted benefits of pure strongly typed code in languages like Haskell or OCaml is that they can compile to assembler that is faster than C, because the compiler can guarantee certain behaviors and thereby increase sharing of computations and insert multicore parallelism .

tel · on March 25, 2012

It's just part of the prescription though. The idea is to use GHC profiling code to find the real bottlenecks and then use techniques like stream fusion to cause those inner loops to compile to near C-optimal assembly.

That said, it's a tough process.

comex · on March 25, 2012

I was going to go into a series of optimizations that could make the C code faster, starting by comparing against the integer counter rather than the double, then switching to vector math if applicable...

Then I realized that all this program does is compute an approximation of `(d + 1) / 2`. If you need the exact same result as it computes (500000000.067109), it's hard to add any parallelism because it depends on the imprecision of the intermediate add results, but if you can settle for the mathematically correct solution (500000000.500000), it's hard to distinguish between "fair" optimizations and "unfair" ones - indeed, if you go ahead and just change all the doubles to long longs (they can fit), the compiler will automatically use the formula.

In general, I think that to get maximum performance on many problems on this, you must make assumptions the compiler cannot, which means your code cannot be naive; but the difference between naive and non-naive code in C is much less than in Haskell.

neutronicus · on March 25, 2012

A year or so ago I tried for a couple days to write a 1-D neutron transport solver based on this post, and found the experience frustrating in the extreme. Obviously, I'm no Haskell expert, but I was really turned off of using Haskell for numerical work.

I'd really like to give it a go again, but I feel like I'm missing a lot of the knowledge required to solve P-(I)DEs in any kind of moderately clever way in Haskell. If anyone can point me towards a nice resource for Haskell numerics I'd be grateful (repa is not flexible enough for my needs).

_ezkx · on March 25, 2012

A lot of really clever folks monitor the Haskell tag on stackoverflow, including dons. I would post there.

berkut · on March 25, 2012

Getting languages to run as fast as (or faster than) C for things like tight floating point or int calculations isn't that difficult - at the end of the day, any decent compiler JIT or pre-compiled is going to produce roughly the same asm.

What's difficult is getting the whole application as fast, as opposed to just a few functions. The hot points of an application are rarely due to CPU throughput or lack of asm instruction optimisation - they're normally due to memory allocation inefficiencies or cache thrashing, or bad thread concurrency, and this is where using C/C++/(ADA in embedded world) shine, as you have complete control over pretty much everything, from struct bit packing, allocation size, allocation location (stack/heap), when to deallocate memory (if ever in real-time's case), optimising for memory access patterns, etc, etc.

stcredzero · on March 25, 2012

The hot points of an application are rarely due to CPU throughput or lack of asm instruction optimisation - they're normally due to memory allocation inefficiencies or cache thrashing, or bad thread concurrency, and this is where using C/C++/(ADA in embedded world) shine, as you have complete control over pretty much everything

What you are saying is that C/C++/ADA are wonderful because compilers optimize one thing (CPU throughput/instruction optimisation) whereas other factors are more important (memory allocation inefficiencies or cache thrashing, or bad thread concurrency).

Whenever a programmer says they need manual control of [X] -- it's time to start looking at automation of [X]. (It may not work in all contexts, however.)

berkut · on March 25, 2012

There's a reason languages with garbage collectors aren't used where speed is important - because they normally get in the way.

stcredzero · on March 25, 2012

I wasn't thinking of GC. GC's been around awhile, and is quite advanced, yet is not desired for certain purposes.

Yet, there must be something that programmers conceptualize when they allocate a struct, do whatever they do with it, then free it. Everything that I know about how programmer's minds work tells me that, most likely, 80% of this work is fairly mechanical.

Then again, you might want to look at iGC. (Granted, phones are pretty ridiculously powerful in comparison to lots of embedded devices.) By tailoring their GC to the particular way it's used, they can do interesting optimizations. (Use comparisons to addresses to drastically reduce the number of roots for tracing.)

wingo · on March 25, 2012

> The fix is straightforward: just use a strict pair type for nested accumulators:

Uf. Haskell impresses me a lot, but it seems that performance-wise, it would be better if it were strict by default.

Tyr42 · on March 25, 2012

But the rest of the fusion and other transformations work better when lazy.

_ezkx · on March 25, 2012

(2008)

...not that things have gotten slower since then.

z92 · on March 25, 2012

Thank you. This [2008] tag indicates I probably read it when it came out. Therefore didn't click on it now. And you probably have saved some of my time.

Just wanted to inform you and others how much these little bits of information help us.

eta_carinae · on March 25, 2012

Please don't make any claims about high performance based on micro-benchmarks.

it · on March 25, 2012

I tried to reproduce the results, but the Haskell bit failed to compile on my Mac with the -fvia-C flag. Without the -fvia-C flag, the Haskell version takes about 11 times as long to run as the C version. A simple Go version runs at the same speed as the C version.

I posted some code to run the comparison at https://github.com/ijt/fast-as-c-article. Here are the results:

    [ issactrotts ~/haskell/fast-as-c-article ] ./compare
    == cmean ==
    gcc -O2 -o cmean cmean.c
    mean: 500000000.500000

    real    0m0.774s
    user    0m0.770s
    sys 0m0.002s
    == gomean ==
    6g gomean.go
    6l -o gomean gomean.6
    mean: 500000000.500000

    real    0m0.776s
    user    0m0.769s
    sys 0m0.003s
    == hsmean ==
    ghc -O2 hsmean.hs -optc-O2 -fvia-C --make
    [1 of 1] Compiling Main             ( hsmean.hs, hsmean.o )
    In file included from /usr/local/Cellar/ghc/7.0.4/lib/ghc-7.0.4/include/Stg.h:230,

                     from /var/folders/g2/ylbqfw5533n65z6qkxljg0_h0000gn/T/ghc23078_0/ghc23078_0.hc:3:0:
     

    /usr/local/Cellar/ghc/7.0.4/lib/ghc-7.0.4/include/stg/Regs.h:177:0:
         sorry, unimplemented: LLVM cannot handle register variable ‘R1’, report a bug
    make: *** [hsmean] Error 1
    == hsmean_nollvm ==
    ghc -O2 hsmean.hs -optc-O2 --make -o hsmean_nollvm
    [1 of 1] Compiling Main             ( hsmean.hs, hsmean.o )
    Linking hsmean_nollvm ...
    ld: warning: could not create compact unwind for .LFB3: non-standard register 5 being saved in prolog
    mean: 500000000.067109

    real    0m8.886s
    user    0m8.863s
    sys 0m0.018s

dons · on March 25, 2012

These days you'd use the -fllvm flag

jrockway · on March 26, 2012

I started reading this and wondered, "why is dons back to using via-C instead of LLVM". Then I realized that this article is 4 years old.

forgottenpaswrd · on March 25, 2012

If you are new to programming languages, please don't listen to those non sense articles "X language faster than c" that comes from people that are religious about high level languages.

If you have only a hammer, everything is a nail.

It is a very dangerous meme that will make you incompetent in the real world when real life high level languages are >100 times slower that programs written in c(from people that know what the computer is doing as they knew assembler first).

Microsoft geniuses fell on this meme and as a result created Windows Vista, and a 50KB file transfer could take you 20 minutes.

In Android garbage collector will start collecting memory once it wished and will visually break the continuity of the screen, making it irresponsible at times. This is unacceptable for Apple that used c for this reason(yes c, not Obj c). Samsumg and HTC started using c too for this.

C has this place, high level languages have their place. you trade abstraction for control and (if you know what you are doing) performance.

Go and learn low level and high level and decide for yourself witch one is appropriate for what circumstance.

E.g the fast python in python, things like numeric python, are written in c for this reason(once they discover how slow high level programming was in real life).

If you are going to spend the same time optimizing language X on c, you can make c super fast as well, not 10% faster, with using c you should get 100%, 1000% or 10000%.

Sorry, I feel super dumb by having to say the obvious, but good programmers are busy coding and the void is filled with non sense.

_ezkx · on March 25, 2012

To be honest, your comment strikes me as far more "religious" than the article, which (since you obviously didn't RTFA) documents optimizing a particular bit of haskell code to be on par with an equivalent implementation in C. The article is not saying that haskell (the language) is faster than C.

If all you have is a hammer...

stcredzero · on March 25, 2012

when real life high level languages are >100 times slower that programs written in c(from people that know what the computer is doing as they knew assembler first)

There is an important truth here, in the article, and in other comments here: Speed doesn't come free.

You don't get the very best speed and efficiency by just using a certain language, a certain library, or a certain algorithm. It's essential that the programmer has some specialized knowledge. (How a particular GC works, how a particular library is put together, or exactly what your compiler is doing under the covers.)

In the end, it's all a trade-off between hardware resources, human capital, and time. Have an excess of hardware resource? Then why not spring for GC? Have very tight and fixed hardware resource, well you're going to have to pay in another way!

derleth · on March 25, 2012

You can say all the same things if you replace 'C' with 'assembly' and 'high-level languages' with 'C'.

forgottenpaswrd · on March 25, 2012

C is "portable assembler".

If I do something like

if (foo = bar) { }

I know that the computer is comparing one value to another(witch means an op, eg.substraction ), then doing a jump.

I don't need to code the assembler, but I could estimate how much it takes everything really well(orders of magnitude). This is invaluable with coding.

If you do: [object foo] in objective c

or object.foo()

You lost all you control, sometimes classes will use iterative methods or alloc and free memory every time you call a method, from nested methods, making it super slow.

Sometimes I want abstraction, sometimes I want to know what is happening.

Xurinos · on March 25, 2012

This just isn't true. C is not portable assembler. It was never intended to be. I hear it claimed, and it is wrong every time somebody calls C a low-level language close to assembler. You can make some roughly reasonable assumptions about what comes out of the compiler, but often it is not what you think it is.

Let's challenge this specific claim, that when you do "if (foo == bar)" -- I corrected the syntax error, which is a symptom of C's high-level syntax and not of the underlying assembly code -- you compare one value to another and then jump. For this challenge,I will write some trivial code that we should be able to make easy assumptions about, and I will compile it with debugging enabled so that I can dump the results with gdb.

  $ gcc -g example.c

  1       #include <stdio.h>
  2
  3       int main() {
  4          int foo = 10;
  5          int bar = 20;
  6          if (foo == bar) {
  7             printf("Fun\n");
  8          }
  9          return 0;
  10      }

  Dump of assembler code for function main:
  0x0000000100000ef8 <main+0>:    push   rbp
  0x0000000100000ef9 <main+1>:    mov    rbp,rsp
  0x0000000100000efc <main+4>:    sub    rsp,0x10
  0x0000000100000f00 <main+8>:    mov    DWORD PTR [rbp-0x4],0xa
  0x0000000100000f07 <main+15>:   mov    DWORD PTR [rbp-0x8],0x14
  0x0000000100000f0e <main+22>:   mov    eax,DWORD PTR [rbp-0x4]
  0x0000000100000f11 <main+25>:   cmp    eax,DWORD PTR [rbp-0x8]
  0x0000000100000f14 <main+28>:   jne    0x100000f22 <main+42>
  0x0000000100000f16 <main+30>:   lea    rdi,[rip+0x19]        # 0x100000f36
  0x0000000100000f1d <main+37>:   call   0x100000f30 <dyld_stub_puts>
  0x0000000100000f22 <main+42>:   mov    eax,0x0
  0x0000000100000f27 <main+47>:   leave  
  0x0000000100000f28 <main+48>:   ret

We see that in the very basic version of this code with absolutely no optimizations and doing the silliest things that we can, we store our two values into some memory locations, perform a comparison (cmp), and jump if not equal. We can see that the jump leads us to the puts() call.

Now, let's get smarter. The variables foo and bar do not change value, and we only work with two variables in the routine. Therefore, we could optimize by storing those values in temporary registers instead of using expensive memory transfers. Further,since our two constants are being compared and will always return a false, we actually have a section of code -- the printf -- that is dead code, that can be completely removed from final compilation. Well, that's simple, and everyone who uses C in production at least turns on some minor optimization:

  $ gcc -g -O1 example.c  # the only difference is the -O1

  Dump of assembler code for function main:
  0x0000000100000f34 <main+0>:    push   rbp
  0x0000000100000f35 <main+1>:    mov    rbp,rsp
  0x0000000100000f38 <main+4>:    mov    eax,0x0
  0x0000000100000f3d <main+9>:    leave  
  0x0000000100000f3e <main+10>:   ret

This does not look like our C code at all! And thankfully so! What a waste of space and CPU time it would have been had we treated C like an interpreted language! C is a high-level language with numerous compiler implementations that can intelligently convert the human-readable code into the binary code that represents the real situation behind the code.

The point here is that you are not properly guessing the assembler code that will be produced. The compiler is doing a better job of that; that is the compiler's job. As a programmer, you can just focus on the algorithm. C is not an assembler macro language. For that, you would use things like "gas".

kevinnk · on March 25, 2012

C is not assembly and hasn't been for a very long time. But I think when people use the people use the phrase "portable assembler" they really mean that in C you both control the memory layout of data types very finely and that code maps very directly to an equivalent assembly construct. True, optimizers frequently change the actual executed code from what what we expect, but C gives a very intuitive feel of what the "upper bound" assembly output is.

For example in C "array[0] = (x + y);" will never be more than a couple assembly instructions long. In many languages, including Haskell (and in the case of operator overloading, C++), the equivalent construct might map to hundreds if not thousands of instructions. Or it might map to the same one or two that C would emit. It's impossible to know and there is no reasonable upper bound on what could happen.

stcredzero · on March 25, 2012

might map to hundreds if not thousands of instructions. Or it might map to the same one or two that C would emit. It's impossible to know and there is no reasonable upper bound on what could happen.

Over every possible piece of code that could be compiled anywhere, this might well be true. But for a properly informed programmer for a given piece of code, not so much.

kevinnk · on March 25, 2012

>But for a properly informed programmer for a given piece of code, not so much.

There are a couple reasons that even for "informed" programmers this is still important

1) For most dynamic languages, even simple operations can take a highly variable amount of time to execute. How many instructions does an array access take in Javascript? The answer depends on everything from the state of the JIT to the types involved, both of which are usually impossible to know before hand. In C we can answer this pretty easily.

2) The modern trend is towards writing more and more generic code. Even for statically compiled languages like C++ and Haskell, the actual underlying operations are purposely* abstracted away from you. Unless you know every possible instance that your code could be used it is impossible to know how long any operation will take.

And all this is assuming that the programmer knows everything about their compiler, assembler, standard library, imported libraries, ect, which isn't true for all but the most expert programmers.

*Admittedly, the actual length of time it takes is dependent on the state of the processor which can be very difficult to predict, but we will have a lot more information than we would have had otherwise.

stcredzero · on March 25, 2012

You need to take both the "informed" and "given." Not all pieces of code are "cross platform" and even within that, there's different levels.

In other words, you're talking about one end of the spectrum. You are right, though, that things are moving in that direction.

derleth · on March 25, 2012

In general, I agree with you. I just feel the need to expand on a few points.

> in C you both control the memory layout of data types very finely

True to an extent.

> code maps very directly to an equivalent assembly construct

True but less and less relevant.

Here's where C disconnects you from the processor in the ways that matter most:

1. malloc()/free() are too high-level: You can't control where the allocation subsystem gets your next chunk from, you can't see whether your malloc arena is getting full, you can't see whether you're about to double-free something, and you have no way to recover from a failure to allocate (if that's even possible on your OS).

2. C has no concept of cache; admittedly, assembly usually tries to hide it from you to an extent as well, but assembly language at least has hooks into the cache hardware in the form of memory barriers. C doesn't even have that much.

3. C completely hides the processor status word from you. A minor concern, usually, except in precisely the kind of tight loops people most advocate C for.

4. C has no concept of out-of-order execution or opcode pairing or pipelining in general. Just hope your compiler does.

So, added up, that means C is farther and farther from the hardware all the time. It was reasonably close on the PDP-7 where it was born, was fortuitously even closer to the PDP-11 where it was later implemented, and remained fairly good for a while after, but once you get to dual-core superscalar designs with cache hierarchies and SIMD hardware, you have to rely on the compiler to turn your C into good assembly. Which is, really, a lot like what you do when you write Haskell.

Danieru · on March 25, 2012

"I know that the computer is comparing one value to another"

Actually it isn't.

stcredzero · on March 25, 2012

It seems to me that machine learning has advanced to the point, where those "obligatory" points made again and again in things like language efficiency threads could actually be automated.

What if we used a Bayesian filter for threads where such "obligatory" points on both sides of a heated debate are made again and again, then use machine learning to post well a written and well curated set of "obligatory" comments? This would save a lot of man-hours online.