Just in time compilation, I think it should be called “continuous profile guided compilation” instead, describes better the awesomeness that happens...
JITs do more than just profile-guided optimizations. Their secret weapon is speculative optimizations that mean they don't need to work hard (and often fail) to prove the soundness of certain optimizations. They're allowed to guess and be wrong.
Another advantage they have is that they can focus the spent optimization cycles on hot code.
AOT compilers can't afford running optimization passes in a loop (inline, optimize, inline, optimize, ...) until they reach a fixed point because that would blow up compile times if that were applied to the whole program.
This almost never matters. You just start at the leaves and go up and then you're done. Most people aren't interested in complicated superoptimizations, because a predictable compiler is more important.
That's not true. A lot of very simple and useful optimizations are very hard to prove correct (e.g. devirtualization) and so can't be done with AOT compilers. It doesn't matter to people using languages that require the programmer to carefully control the compiler -- like C/C++ or Rust -- but it matters a great deal to languages that offer a smaller number of more general abstractions. It is virtually impossible to compile, say, JavaScript efficiently with an AOT compiler, but when compiled with a JIT it can have excellent performance.
This isn't speculation as in speculative execution, it's speculation as in speculating that a condition is true that cannot be proved to be true, so it's not the same thing.
An example of this kind of speculation is speculating that there will only ever be one thread in a system, and removing locks. If that speculation ever proves to be wrong - a second thread is created - the locks are put back into the system.
That doesn't related to spectre, as when the speculation is reversed the whole programs is first brought to a safe halt - it isn't fine-grained enough to be useful for Spectre.
Unrelated, it is true that compilers need to be aware of Spectre-like vulnerabilities, and Graal does include experimental support for that.
Ok, but in a multi-user system (e.g. a webserver), if user 1 triggers a (de-)optimization, then user 2 can tell that a previous user was in that code path. Now I don't know how to extract useful information from that fact, but it shows that at least some information spills over the user boundaries.
Oh I see what you mean - Spectre-like rather than specifically Spectre. Yes I suppose specialisation (more generally than speculation) could leak information, in the same way as cache status can leak information.
For example a JIT compiler can, based on the runtime information + inference, discard multiple branches of a function that are effectively pointless and possibly even just inline the results if given those inputs the function is deterministic.
Of course, the function it compiled will actually not work with any different but still valid arguments, but that's not really trouble since the JIT compiler will simply evaluate that the already compiled version won't work for those as the function is called and compile a new version of the function for the new types just before. A pure ahead-of-time compiler wouldn't be able to optimize so aggressively since it would lead to an exponential explosion of possible inputs combinations most of which will very likely never happen.
I guess it depends on the perspective and interpretation of soundness. If a JIT-compiler AoT compiles your entire program but infers that your function that implements logic that works for every number (as you defined it using the Number interface within the rules of it's type system) will only use 32 bits integers, then it will compile code that effectively does not hold up to the property that was established. The fact that it will stop execution the moment it reaches an invalid path and correct it doesn't change that.
It could be and that would be uninteresting. But it's not hard to come up with a (contrived, limit-casey) optimization approach that does actually make guesses about soundness.
Let's say you wanted to optimize a short instruction sequence with a small domain of inputs. You could try to generate all (or at least, zillions) of similarly-sized possible instruction sequences and check them for soundness and performance. Now you're really making soundness guesses. Do real JITs actually make that sort of soundness guess (not that kind of attempt at optimization, obviously)?
All the time, and BTW, I didn't say JITs sacrifice soundness but that they don't require proof of soundness. That's different as I'll show.
Let me give you two common examples: virtual calls and branches. A JIT will speculatively devirtualize and inline a virtual call at a particular callsite if it has only encountered one or a small number of concrete instances, even if it can't prove that those are the only instances that can be encountered at that callsite. This is still sound because the JIT will emit a trap that will trigger if an unknown target is ever encountered, in which case it will deoptimize the compilation, go back to the interpreter and then compile again under new assumptions. Another example is branch elimination. If a JIT only ever encounters the program taking one side of a branch, it will only compile that branch (and introduce a trap), even if it can't prove that only that side will ever be taken.
Thanks! I did (eventually) figure it out, I initially misread it as something like:
1. Jettison soundness
2. ???
3. Performance profit.
Which seems like witchcraft, then again JITs are full of witchcraft. But it's also not what you wrote. I've now come to understand the two chief weapons of the JIT remain surprise, fear, ruthless efficiency and an almost fanatical devotion to the Pope.
A JIT compiler could detect that an instruction sequence (or function) is pure (for some range of valid inputs) and auto-memoize them for performance gains. But if you want the JIT to evolve compiled representation by profiling some fitness measurement (performance) and condition (soundness), that will most likely not happen any time soon. The JIT compiler has to balance compile time execution with runtime execution, if it wastes 5s to generate a program that runs in 2s when it could waste 1s to generate a program that would run in 4s then it would not be a good compiler at all. And above all else, JIT compilers, even the notoriously aggressive ones, still need to have some degree of predictability. If the user of the language can't predict the performance of the language, then they can't reliably improve their code performance.
I'm mostly talking about production-ready stuff, such work is certainly some fun playground. The Julia JIT (one of the notoriously aggressive JITs, for good and bad), allows users to, at runtime, add new context-aware behaviors to the compiler [1], and people used it for example to experiment with auto-parallelization of code and overall manipulating the code generated by the compiler. That was basically what got me into the language. So you could probably make a library that would inject some weird risky optimization that abuses the type system.
Maybe I'm getting tripped up in the terminology here but to me this case is still a JIT jittin' - you look at runtime data and decide it's worth it to crank out a special case optimization for the input of 2. You produce that that optimization, which is sound along with a check to make sure it is applied only in the special case. You get to defer other optimization. The advantage here still seems to come from the runtimeness of things rather than from being clever about soundness and there's really no guessing about soundness. So perhaps that's not it.
I think the point is that some JITs never do this kind of optimisation - they just produce the same code an AOT compiler would, but at runtime. Such as the .NET JIT.
I don't think that's the point the comment I'm replying to is making, or at least, it's not the point I'm asking about.
Edit: Your example in the other comment about the locks is the sort of thing I'm asking about. There, an optimization is made which is sound under some specific conditions and then unmade when those conditions change.
I think that is indeed the point pron was making, or at least similar. You can't actually ignore soundness, but JVMs sometimes go farther than I'd expect. (Example: don't check for null, just handle SIGBUS if null is "very rare")
Yes, on re-reading the thread again it might be entirely (or almost) about language. As in, it's really something along the lines of 'The power of the JIT approach comes from runtime information and dynamism. But you can also be a 'just' a JIT without making use of any of that'. And I'm getting stuck on 'secret weapon [...] soundness' and imagining some unfathomable-to-mortals ninja something.
Maybe I'm only familiar with "the main one" and mono... Are there other .NET VMs?
If I recall correctly, it will do constant folding, but won't speculate that a certain parameter is always essentially constant at runtime, but wasn't at compile time.
An easy example is a config loaded from a file as the server boots but never changes for the lifetime of the process. That won't constant fold without speculation.
There is the old style JIT, RyuJIT introduced with .NET Framework 4.6, MDIL, .NET Native, Mono, .NET CF, IL2CPP, and the research ones from Singularity and Midori.
So while it is hard to state what each AOT/JIT compiler is capable of, naturally they aren't 100% all the same.
15 years ago there weren't RyuJIT which replaced the JIT you learned from, MDIL (Windows 8/8.1), .NET Native (UWP), IL2CPP (Unity), and the research ones from Singularity and Midori.
In what concerns the need for it, they have been trying to make C# more relevant for the kinds of C++ workloads and getting among the first places at TechEmpower.
So .NET has been getting Modula-3 like low level handling of value types within a GC environment, RyuJIT is now tiered, supports SIMD and some automatic vectorization.
.NET Framework 4.6 got the first version of what is the .NET way of doing AppCDS.
There are a couple of blog posts regarding RyuJIT improvements with each release after its introduction.
So which of these implementations does speculation? I remember when RyuJIT came out it still wasn't speculative - has that now changed?
If you read the blog posts, they always talk about speculation being something they may try in the future. I've not seen anything where they say they went ahead and implemented it.
Seems like they started trying speculative optimizations about six months ago. Speculative optimizations are not only the foundation of Graal but also of C2, BTW.
Some people differentiate between just-in-time compilation, which is what for example .NET does (or did, last time I checked), where it just literally compiles it as it would ahead-of-time, but at the last second before executing it for the first time, and dynamic compilation, which is for example what Graal does - compiling based on runtime conditions, possibly multiple times with different results as the program executes.
.NET is the platform. There are different implementations for it doing different things.
JIT compilation is still different to AOT even without profile guided optimizations. Simple example: In AOT code you can't embed pointers easily and is often solved with indirection (e.g. something like GOT in ELF).