Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Dead Code Elimination for Beginners (blog.mozilla.com)
121 points by kinetik on Nov 18, 2010 | hide | past | favorite | 57 comments


I highly recommend reading this article.

Although it does point out one reason that JS is a poor language to be the foundation for client-side web computing: It's extremely difficult to optimize a language this dynamic. I honestly think we could be leaving 50% perf improvement on the floor due to the dynamic nature of the language.

And this is on top of the fact that you already have to reduce the amount of time spent optimizing since you don't compile the code ahead of time (most optimistically using a JIT of some sort).

Can we please move away from JS and move to a nice IL?


> JS is a poor language to be the foundation for client-side web computing

> I honestly think we could be leaving 50% perf improvement on the floor

Unfortunately, humans have not got significantly faster (or smarter) in the last couple thousand years, as opposed to computers, that get twice as fast every 18 months or so.

Dynamic languages, by placing the burden of optimization on the compiler/runtime are making the right bet. Static-typing is little more than a human doing the machine's work, often at the expense of introducing subtler bugs (the compiler catches the obvious mistakes and that may leave programmers with a false sensation of security).

> Can we please move away from JS and move to a nice IL?

As for moving to precompiled, possibly obfuscated scripts, I would much rather concede a little less dynamism on the language and move the compiler into the browser. You could, conceivably, declare the type of arguments and return values and rather easily, in that case, perform some optimization as the code is parsed.

I have no love for JavaScript. It's a little tortured language that I am sure passed through a lot more marketing specialists than needed before taking its current form and bears the scars from that process. It has, however, some good parts that are very good.


This is a false dichotomy. You can have a dynamic language that isn't as insane to optimize for as Javascript (or Python).

Do you really need to be able to override the valueOf method on an integer?

Lua manages to be both dynamic, and to play with the big compiled boys on the Compiler Shootout with its JIT. I don't know the details but I suspect it is because its dynamicity is carefully designed with an eye to the cost/benefit tradeoffs, instead of always choosing to make things dynamic regardless of the costs. How much time does a straightforward implementation of a conformant Javascript interpreter actually need to spend looking up and executing "valueOf", anyhow? The vast bulk of those will be wasted.

With the nature of Javascript even "optional type declarations" are easier said than done; you not just to specify "an int", but "an int that you have not mucked with the prototype of, or any of the prototypes of the parent objects", and that's actually a declaration that intrinsically fights against the nature of the language. Actually propagating such things out will eventually create an implicit dichotomy matching the much-maligned Java difference between objects and primitive types, only in a language where it was bolted on a decade after the fact instead of at least being in there since day one (of public release).

It's an interesting enough language but it's not really suitable for use as the base level of a tech stack. Nevertheless, it will be forced into that role. (It's not like there's actually a choice of a really well-designed tech stack out there that can replace the web, it's all various incoherent pieces bashed on by monkeys until it mostly works most of the time. Not really trying to be sarcastic here, this is a realistic assessment, IMHO.)


> Dynamic languages, by placing the burden of optimization on the compiler/runtime are making the right bet.

It is a good idea to let the compiler/runtime do work for you.

But highly dynamic structures make some kinds of static analysis (which the compiler needs to do to optimize) really hard and often undecidable. We need a pretty good (hotspot like) JIT in the runtime to compensate.


> Static-typing is little more than a human doing the machine's work

Interesting. I look at (good: e.g. ML or Haskell) static typing as the machine doing the work that a human would have to do in a dynamic language. Especially type inference.


I was going to write the exact same thing. As long as I don't have to explicitly declare the type of variables and functions, I don't really care whether it's because the compiler infers the types or because the language is dynamic.

I don't write much Javascript, but I do use Python quite a bit. I'd say that most of the code I write is probably just statically typed code without declarations. That's not a bad thing, because it makes it easy to take advantage of a JIT (either psyco or pypy) or, if the JIT approach isn't fast enough, to statically compile with cython by adding declarations.


There are cases when it makes sense to use static typing or, at least, to stick to certain types for input and output.

In my experience however, those cases are not many and doing it when you don't have to creates unneeded complexity.


There are cases when it makes sense to use dynamic typing (e.g. dealing with XML and friends).

In my experience however, those cases are not many and doing it when you don't have to creates an unneeded burden on the programmer.

(sorry -- this is probably not to HN's taste, but I just wanted to show how an entirely different perspective can also be valid)


Unless you intend to strictly enforce types (rather than validity) on your dynamic code, there is no burden on the developer.

Now I am curious: what do you do? I do mostly server-side web stuff, but I have done embedded/driver and can see situations where you really need a 3-byte unsigned integer to stay a 3-byte unsigned integer. But, apart from that, I can't imagine other cases.


Unless you intend to strictly enforce types (rather than validity) on your dynamic code, there is no burden on the developer.

Of course there is. I program more in dynamic languages (particularly JS) than in statically-typed ones, and I always keep worrying about passing representation A of some data rather than representation B. I find myself having to look at the source code of callees far more than I think I should be.

On the other hand, my academic life revolves around statically-typed languages (Haskell). The way Haskell's type system is designed means that there are entire programming paradigms that are not just available but production-ready in Haskell, but are virtually impossible in any other popular language. For a concrete example, see software transactional memory.

I'm also a huge fan of pushing as much lower-level drudgery as possible to machines to free up human minds for higher-level thinking, and a good type system combined with good type inference goes a long way.


I am not familiar with Haskell typing - Haskell is on my "languages to learn" list right after Clojure. I just need an excuse to move it up. Lately I bumped the project in favor of a "learn to use Google's App Engine" project.

It's interesting you feel like this with JS. I spend most of my time with Python and seldom have any doubts about what should I do. I usually make little assumptions on the type of the object I am receiving and don't think much about it.


Do look at Haskell whenever you can. Haskell (and ML/F# to an extent) pushed me from the "static typing is a PITA" camp to the "static typing done right is a magnificent benefit to programming" camp.


I wonder to what degree this depends on the individual programmer. I had a class in University that used both Ruby and O'Caml (only the more functional aspects allowed for use there) for different projects.

We did Ruby first and O'Caml afterwards (we were to implement a scheme interpreter for the O'Caml project). Anyway, the impression I got was that most students hated O'Caml and loved Ruby, but while it was harder to get used to working in O'Caml I loved what the typing system did for me.

Every time I'm working in a language like Ruby and suffer from a bug a type system would have caught, it frustrates me. Perhaps for other programmers, every time they have to do extra work so the type inference works when in a language like Ruby they wouldn't have to, they get frustrated.

I prefer my frustration up front, and honestly, found that O'Caml told me, "You didn't think of this nimrod" mostly when I needed it, and rarely when it was just being a pain.


I don't understand why you need monads to get STM. orElse is neat, but there are other ways to do the same thing in imperative languages.


I quote Simon Peyton-Jones: "STM requires runtime logging of every memory read and write. In an imperative language (such as C#), in which the very fabric of computation involves reads and writes, STM is bound to be expensive. [Lots of work has been done to omit redundant logging based on program analysis, but it's still expensive.] When do you need readTVar/writeTVar? Answer, precisely when you are manipulating state shared between threads."

http://www.0x61.com/forum/programming-haskell-cafe-f245/is-t...

It looks like STM works better for functional languages and you have to mark shared data somehow. And then you have to mark the code that manipulates shared data somehow (so it won't prematurely launch missiles). Then you're reinvented monad.

I reserve right to be wrong so I'd like to see what are the ways for imperative languages.


Oh, tons of work has been done in trying to bring STM to imperative languages, most notably with STM.NET (which is what SPJ's referring to there).


With the same performance, static protection against irreversible side-effects, and clever aversion of the privatization problem? I'm not aware of any. While STM can be theoretically done in any memory-safe imperative language, Haskell -- mainly because of its purity and type system -- seems to be the only language in which it can be fast enough and correct enough for practical use.

edit: well, Clojure restricts mutability too, so STM is sort of tractable there, but it doesn't get the other benefits that Haskell's type system gives you.


Static typing (and especially type inference) is useless in a web scripting language. Every time you load a new script, you would have to re-compile all the previously loaded scripts as a unit, and in the worst case throw away all existing state (pretty much equivalent to reloading the page).


I don't think that it is possible to replace JS with a nice IL in near future due to the momentum it gathered all these years. However, is it possible to suggest a strict subset of JS that is easier for interpreter to optimize on? If that can be done, I am happy to write code with less flexibility but can run thousands faster.


Neat idea. Which subset of the language could be better optimized by interpreters?


Subset is hard to define I think, it seems that would have many repercussions. I like JoachimSchipper's suggestion above, that adds a construct in which the programmer promises not to do any funny stuff to certain variables so that the optimizer can go to town on the optimization on them.


So like ECMAScript 5 strict mode?


Looks something like it, I didn't know about it. I don't know if it will protect against the things that are mentioned in the article though, the descriptions that I've (admittedly only cursory) read don't say that the things that mentioned in the article aren't allowed anymore under strict mode (like the Object.prototype.valueOf = function() example given).

Maybe it is, in which case this would be great it seems. Not that I have intimate knowledge of it, but still ;)


You can get most of the benefits of not being ridiculously dynamic without switching to a new language. Take a look at http://www.iro.umontreal.ca/~gambit/doc/gambit-c.html#Defini...: Gambit Scheme (a Lisp) has a (declare (block)) form to tell the compiler that all variables defined in the current file will not be overwritten outside the current file, and a (declare (standard-bindings)) form to tell the compiler that the standard Scheme functions are not overwritten. With proper use of these options, Gambit is quite fast.

Javascript could have a "fixate foo bar qux" command that did something similar. Of course, for best results, it would need to fixate the current foo bar qux, as Javascript frameworks seem quite keen on overriding the standard functions. I'm sure it would be a pain to implement, but...


>Can we please move away from JS and move to a nice IL?

A thousand times this. The web is playing a bigger and bigger role in computing but IMO it's hamstrung a bit by having it's "assembler" Javascript. I think a lot of developers are just going to use some other language that compiles to Javascript anyway, why not give us an "assembler" option.


You don't get the problem that needs solving. JS is hard to optimize so even if you compile to it from a other language you going to end slow because of the bottelnack of JS. If we had a IL that is good to optimize language that compile to we could get to much faster speeds in the browser.


I think you didn't get the point of jbjohns post. You're just restating his message (more clearly, though). Did you maybe overlook the 'hamstrung' and the 'why not give us an "assembler" option' at the end?


> I think a lot of developers are just going to use some other language that compiles to Javascript anyway

That's a disgusting thought...

What happens is usually developers who don't want to program in more than one language (or One True Language, as many of them may think) rely on such tools to generate an unreadable, sub-optimal and utterly messed up tangle of JavaScript code that kills a baby jesus every time it runs.

And who then blame JavaScript for their problems. And downvotes anyone who disagrees with them ;-)


Thats just stupid you can make good source to source compiler that compile to better make faster code then a normal dev would do.


Fine. Name one anything-to-JavaScript compiler that generates better, faster code than a "normal" developer would do.

I assume we have different interpretations of what's normal.


GWT


Have you ever read GWT-generated code? Have you ever debugged it?


yeah and i have been working with gwt for years. faster code is not maintainable code, which is precisely why a mechanical optimizer makes sense. You wouldn't try to work directly with your minified js would you? But you do minify it, because it is a better experience for your users.


Minifying the code does not optimize it in any way. It just makes it more compact (and less readable).

I am not sure we are talking about the same thing here. What we are talking here is you inputting Java (and the most readable Java code is not that readable to begin with) into utterly unreadable JavaScript that's also less efficient than what a human experienced in JavaScript would be able to write and larger than the minified version of whatever that experienced human would be able to write.


"Minifying the code does not optimize it in any way. It just makes it more compact (and less readable)."

The thing with JavaScript is that the initial time for loading and parsing a script in the browser counts as part of the performance. This is why every implementation (except for CL-JavaScript: http://svrg.net/CL-Javascript%200.10.05.html) hasn't moved to compilers but JITs instead. So minifying JS does have some performance benefit in that it reduces load and parse time.


Unless your JS parser is absurdly broken, ignoring whitespace should not be such a computationally intensive task. The improvements you see must derive mostly from the size reduction and download time improvement.

And no. Minification of JS would not count as code optimization in any self-respecting compsi course.


"Unless your JS parser is absurdly broken, ignoring whitespace should not be such a computationally intensive task."

It is because every character the parser reads is a loop iteration at best, a state machine transition at worst. Minification does more than just deleting whitespace - identifiers are shortened too.

"The improvements you see must derive mostly from the size reduction and download time improvement."

The former yes because more code fits in cache and you need less time to parse it, the latter no because gzip compression does a better job than minification.

"And no. Minification of JS would not count as code optimization in any self-respecting compsi course."

This is why you learn about compilers by writing them instead of taking courses.


Thanks for the laugh


People using CoffeeScript aren't doing it to use a One True Language for everything.

[People using node.js however.. ;-)]


Well, Silverlight claims to give me the ability to write client-side Ruby and Python, but the last time I tried Moonlight it consistently crashed Firefox...


Silverlight != Moonlight



this is pretty damning for ie9 js performance isn't it? they are going to have to rework their static code analysis and it sounds like they don't have the js knowhow - what else might they incorrectly be optimizing?


> this is pretty damning for ie9 js performance isn't it?

I think the whole point of the post is to poke fun at how Microsoft is cheating the benchmark and how it was caught. The part on the definition of Angles and that the "optimizer" should throw an exception it doesn't (most probably because it's not optimizing anything) is hilarious.


Mmm but this goes beyond cheating on the benchmark - that func(1,2) example is atrocious. You could (possibly) understand them having a direct match on the math-cordic benchmark to deliver a super-optimization, but they've also got general dead code removal wrong.


From what I see, it seems like they want to have a dead code remover for ie9 but don't know how to do it properly (because in the case of dynamic languages it is tricky). In the meantime, they have a wrong implementation that also detects a specific benchmark and cuts some corners with it. I could almost bet this has something to do with a performance evaluation being linked to a benchmark score.


The part on the definition of Angles and that the "optimizer" should throw an exception it doesn't (most probably because it's not optimizing anything) is hilarious.

What are you talking about? That code is getting optimized away into a no-op when it shouldn't be, which is why no exception is thrown. This means that

a. the optimization is definitely happening

b. it is unsound


Unsound in a very specific and suspicious way - yesterday it was pointed out adding a single, do-nothing, line to the middle of the function makes the "optimizer" fail.

I also think the "for beginners" in the title is somewhat directed towards whoever implemented the "optimizer".


Unsound in a very specific and suspicious way - yesterday it was pointed out adding a single, do-nothing, line to the middle of the function makes the "optimizer" fail.

That isn't unsoundness, that's incompleteness. An optimization being incomplete means that something that could be optimized isn't -- one being unsound means that something that shouldn't be optimized is. These terms come from mathematical logic.


Sorry. I am an engineer. I got it as "not solid", "wrong", "flaky". I appears all three apply perfectly to the optimizer at hand. Being it both unsound (as it optimizes too much of a benchmark) and incomplete (as it doesn't recognize the invalidity of the code being optimized - the Angles thing - and fails to optimize too much when a no-op is added) we could just be more concise and settle for "broken".


I think you're still mixing up what "unsound" means. If I say that applying an optimization is unsound, it means that if I apply that optimization, I can no longer guarantee that the resulting code is semantically equivalent to the original code.

Being incomplete means the code you generate is not as fast as possible. Being unsound means the code you generate may be wrong. We don't normally call an optimizer "broken" if it does not optimize something even though we can recognize the optimization could be applied - no optimizer does every conceivable thing. But generated code that is not semantically equivalent to what was given violates correctness.


I think the "Angles" removal demonstrates the optimization is unsound. The "optimized" code gives out a result when a correctly optimized one would give a ReferenceError. This optimizer ignores invalid code (that references a non-existent thing) and happily short-circuits it.

The only reason to trust this code to do other optimizations well is all the evidence it's targeted at one specific benchmark.


The "optimization" looks like a peephole replacement of a delimited range of tokens, rather than an actual optimization of a particular code pattern. I think it's a bit of a stretch to call it an optimization; it really is more of a hack.


What are the real benefits of dead code elimination on actual production code in real applications? Seems to me that only synthetic tests contain dead code (by intend) whereas dead code in production code is by definition unnecessary and, by most coding standards, a bug.


It's important to be able to delete dead code even if your developers never write any, because earlier compiler optimization passes can create dead code. For example, consider this simple function written in a hypothetical dynamic language:

  function add(a,b) { return a + b }
Our hypothetical compiler turns this function into some IL that looks like this:

  function add1(a,b)
  {
    if(is_null(a)) throw ArgumentNullException()
    if(is_null(b)) throw ArgumentNullException()
    low_level_generic_add(a,b)
  }
...where low_level_generic_add sums a and b if they're integers, or looks for a suitable overloaded operator+ and invokes it if they're not.

Suppose further that our hypothetical language has a tracing JIT, and that most of the time when a and b are called, they're both integers. In that case, the compiler might perform the following transformation:

  function add2(a,b)
  {
    if(is_integer(a) && is_integer(b)) return add_integers(a,b)
    else return add_generic(a,b)
  }
...where add_generic is just a copy of add1 above, and add_integers is a copy of add1 with low_level_generic_add() replaced by the less-expensive perform_integer_addition()

  function add_integers(a,b)
  {
    if(is_null(a)) throw ArgumentNullException()
    if(is_null(b)) throw ArgumentNullException()
    perform_integer_addition(a,b)
  }
But suppose one more thing: let's say integers in this language are value types; that is, that is_null(some_integer) is never true. Then you can get another performance win in add_integers by removing the two null checks - but you can only find out about that if you have a dead code detector.

This has been your woefully incomplete dynamic compilation moment.


Not true, a compiler can find deadcode where a developer could not find it.


The compiler itself may introduce a fair amount of dead code related to temporary variables created during code generation, and some optimizations can reveal dead code that previously looked live.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: