This looks much nicer than the wasm2c output for that binary. I compiled it with `clang wasm.c -c -target wasm32 -O2` just like in the instructions (I'm on LLVM 10), and used the latest wasm2wat with `wasm2wat -f wasm.o` and got this instead:
Fair enough. I’m still surprised at just how unreadable (for me) the wasm2c output was, though. The compiler must have done quite a bit of optimizing that wasm2c was unable to undo.
It doesn't actually try to undo anything, it just translates Wasm instructions 1:1 (they're in your link at lines 205-221). wasm-decompile does try to "undo" some thing, but it is generally impossible given LLVM's optimized output and how low-level Wasm is (see also article).
This is fascinating. For various reasons, WASM is less like a target bytecode format and more like a peculiar IR for compilers. I’m sure this has all sorts of effects on the tooling.
WASM instead provides traditional control structures. So the compiler either has to preserve control structures through to the IR, or has to work backwards from basic blocks to control structures. Both options are undesirable, from the perspective of compiler writers, and would be unnecessary if the VM were a greenfield project.
Nested, structured control can be broken into basic blocks without flattening it into if/goto first. It sounds easier, in fact.
In fact, the construction of a basic block from flat code is a form of recovery of control structure. If the original code had nested loops, nesting will emerge in the basic block graph.
A recursive traversal of the original structure could produce that graph more directly; it doesn't have to walk a flat list of instructions asking, "does this have a label on it which is the target of a branch".
without having to walk a flat list of instructions asking questions like "is this the target of a branch".
For instance, if we walk an if/then/else AST node, we can recursively get the basic block graph for the test, the then and else part, and then integrate that into a larger basic block graph according to a rigid pattern.
I get the impression WASM is really a very clunky representation, far from what any greenfield project would ever have arrived at. That is to say, its decisions aren't just tradeoffs that people disagree about, they're simply inappropriate for what it's trying to do. More than once I've encountered comments and blog-posts lamenting its fundamentals, e.g. [0].
I don't think that's true. It was designed for its purpose by people that knew what they were doing based on experience gathered from implementing previous incremental steps including emscripten's original JavaScript output and asm.js (a formalization of some techniques emscripten employed). The design rationale is right there in the spec repository:
https://github.com/WebAssembly/design/blob/master/Rationale....
It was built to be a better target than asm.js for compiled languages to run in JavaScript VMs and it seems to have succeeded on that front. That it's not a perfect fully-general bytecode format doesn't seem like a real knock against it, that's a much harder problem to solve. In any event it seems to be getting a lot more traction than any previous entries in the space, which is exciting!
The criticisms in the linked article are simply not grounded in fact, and the article was obviously written by someone without any expertise in how modern compilers work. There are also a number of basic errors in the article.
Liveness information simply doesn’t belong in the bytecode. SSA is trivial to recreate from local mutable variables (it would make a good homework assignment for someone in an undergrad “intro to compilers” class). WebAssembly is obviously a register machine.
> With this, it becomes possible to get rid of locals entirely.
Both factually incorrect and pointless. There is no tangible benefit in getting rid of locals entirely. You are merely changing out one representation (register machine) for a different one (stack machine).
There are valid criticisms of WASM, but the linked article doesn’t have any.
The weird part of WASM is the control structures. The rest of it is a fairly sensible, actually rather nice register machine. You can see that older bytecode systems like the JVM are stack machines, but newer ones tend to be register machines. This isn’t because people are getting stupid, it’s because there are legitimate reasons to prefer register machines, and on the balance of things, my observations are that people with experience in the field tend to prefer register machines.
I would still consider WASM a stack machine and not a register machine. Yes, there are mutable local variables in WASM but Java bytecode has them as well - which you consider a stack machine. BTW the designers of WASM explicitly call WASM a stack machine here: https://github.com/WebAssembly/design/blob/master/Rationale..... With WASM's MVP it was necessary to store e.g. loop state in local variables, thanks to recent changes this doesn't seem to be necessary anymore. I think this was the main argument that blog post considered WASM to be a register machine. javac also makes heavy use of variables in bytecode, but somehow no one considers the JVM a register machine.
> my observations are that people with experience in the field tend to prefer register machines
That's actually the opposite of my observation, they seem to prefer stack machines.
The clang disassembly given in the article sure makes it look like WASM is a nested expression tree, which leaves the choice of stack versus register to the implementation.
The stack machine is a model for the semantics of wasm, in the sense that the safety properties of wasm are defined in terms of stack types rather than SSA of register properties (for those that are unfamiliar with stack machines, this stack is a different type of concept from the call stack, that in wasm is used store the local variables). Whether during execution it is better implemented as a register machine or a stack machine is an implementation detail.
Java has instructions like dup, swap etc. To me, that is the critical difference here, and where I draw the line between “stack machine” and “register machine”.
I have to admit this line seems arbitrary to me. So WASM is a register machine to you but if they would simply add those 2 instruction would it suddenly become a stack machine then? Those instructions would actually be trivial to add. I think those terms are relatively well defined and when you argue that WASM is a register machine even though the inventors explicitly claim it's a stack machine you should have really good arguments for that. Personally I would be surprised if you could point me to any literature that supports your definition.
Turing completeness sounds like an arbitrary distinction to those outside of the field of CS, but it’s not.
To me, the distinction here is that the stack machine in WASM is restricted to the point that it corresponds 1:1 with an expression tree—not even a graph, just a tree. This means that every function in Web Assembly can be thought of as a collections of statements and expressions, and the stack machine abstraction is nothing more than a serialization format for the expressions.
Maybe dial it back a bit with the challenge to point at literature. The literature has not really caught up with the existence of WASM yet.
> Maybe dial it back a bit with the challenge to point at literature. The literature has not really caught up with the existence of WASM yet.
It wasn't me who claimed that WASM is "obviously" a register machine, despite the inventors saying otherwise. They even explicitly state that they decided against a register machine. I guess it's then reasonable for me to ask on what definition of stack vs register you are basing this opinion on. Let me be clear: I was not asking here for literature about WASM specifically but a definition of register/stack machines that supports your claim.
WASM's instruction encoding is very much based on a stack machine. Even with the initial limitations you mentioned I don't think it qualifies as "obviously a register machine". As already mentioned in multiple comments those restrictions were already lifted with the multi value proposal.
I understand that there is a grey area, but simply claiming "obviously a register machine" doesn't seem right to me. Implementation-wise WASM is a stack machine even if it needs/needed locals to be turing-complete.
The multi-value proposal breaks the ability to turn Wasm into expressions easily, and thus makes it even more of a stack machine than it already is. Dup and swap may still be added in the future.
A defining feature of a register machine is that the actual instruction encoding has direct references to source and destination registers in it. Wasm doesn't have those, it has explicit get_local instructions instead.
That said, if you turn off LLVM's WebAssemblyRegStackify pass, all LLVM IR's values will end up in locals, with little to no stack usage. Still no register machine, but a bit more of a grey area :)
Open any compilers book, WASM will look alien. Control flow analysis is normally done through basic blocks. If you make a toy language and don’t do control flow analysis you won’t notice what’s so weird about WASM. If you take an existing compiler and port it to WASM, you’ll cry, then gather yourself together and read how the relooper algorithm works.
The reason wasm is like this is that with "normal" control flow chrome would have had to implement itself the relooper algorithm internally before the MVP release.
Overall the main negative point seems to be that compilers toward wasm are more complex, but under the constraints of the web moving complexity from the client looks like a valuable positive point.
Two other negative points would possibly be performance and code size (which is the reason for a if-then-else specific instruction for example) about which I know very little. Do you think it would have made a difference here?
There’s no good reason for WebAssembly to be SSA. That is a good enough reason for it not to be.
The purpose of SSA is to make it easier to write optimizations. However, you wouldn’t be optimizing WASM anyway, you would necessarily translate it first into some kind of IR suitable for optimization passes. Might as well convert to SSA at that point, rather than bloat the bytecode by exposing implementation details of the target.
Remember that WASM’s purpose is to be portable and safe. Making it more complicated just in order to make sophisticated back-ends slightly faster is a net loss. Using SSA would also make naïve/simple backends slower.
The control structure as nested block are a must be for wasm, because chrome compilation model of Javascript is incompatible with an LLVM-like goto-based control flow.
This is super handy. Pseudocode is very useful for understanding flow - so much more than actual assembly. I've always found it an order of magnitude to understand bad asm-to-C decompilation from IDA or Ghidra over perfect disassembly.
AssemblyScript would certainly do worse at #2, and possibly also at #1. To be translate to Wasm or from Wasm lead to different optimal designs, see for example how these two systems deal with loads and stores.
It would be nice if the decompiled output were runnable through an interpreter so you could step through it with a debugger of some kind and rename or annotate the variables and functions as you reverse engineer what is going on.
I notice that your code supports the 'name' custom section as expected, and furthermore you support a few other custom sections too - 'dylink' for example. Where did you find the documentation for these sections? The reason I ask is that I don't believe the official webassembly specs talk about those sections, so I guess they are somewhat compiler specific perhaps?
One more question please - does this tool support naming of global variables? The official wasm documented 'name' section only supports local variable names I think?
It can pick names from the name section, linking section, or import/export name, in that order of preference (see https://github.com/WebAssembly/wabt/blob/master/docs/decompi...). In the case of globals, the only way to name a global is thus if its imported or exported.
I'm new in wasm code base. Where is export code generation located? How complex to rewrite export?
I want to make export wasm to .acpul programming language for run wasm modules on animation cpu platform. Link to architecture schemas & docs will be a great.
Loving the tooling around wasm getting better. I've been debugging my compiler output with hexl-mode and reading the binary format and while it's not that bad, it'd be nice to do more advanced debugging with a text format.
There was a project I saw too that intended to visualize WebAssembly's execution. That'd be extremely helpful too
> reading the binary format ... it'd be nice to do more advanced debugging with a text format.
Do you know about `wasm2wat` (from the WebAssembly binary toolkit, "WABT")? It produces a 1-to-1 text representation of the bytecode and is meant to always roundtrip via `wat2wasm` back to the same bytecode.
Ah, no, probably doesn't do parsing recovery. But `wasm-validate` from the same toolkit will at least tell you the offset at which your wasm file has an error (I just flipped some bits in a wasm file to test this), which may be helpful!
When I first started learning JavaScript in the late 90s, the primary way I learned new things was from reading other peoples code in my browser. Nowadays this isn't as easy since you often have to run obfuscated code through a prettifier to get it back into a human readable format, but it is still possible with some effort. I was concerned that WASM would make this impossible (despite the stated goal of "Be readable and debuggable — WebAssembly is a low-level assembly language, but it does have a human-readable text format (the specification for which is still being finalized) that allows code to be written, viewed, and debugged by hand."), but WASM-decompile gives me hope.
> Its #1 goal is readability: help guide readers understand what is in a .wasm with as easy to follow code as possible. Its #2 goal is to still represent Wasm as 1:1 as possible, to not lose its utility as a disassembler. Obviously these two goals are not always unifiable.
> This output is not meant to be an actual programming language and there is currently no way to compile it back into Wasm.
Actually I thought about implementing a programming language which is a bit more pleasant to work with than raw wat format, but which still translated roughly 1 to 1 to wasm. Something like in this link, actually. But that seems outside of my capabilities and I'm not sure if it's really useful to anyone.
The great thing about AssemblyScript is it makes it possible to share some of the same code and interfaces and data and tools between JavaScript and WebAssembly.
If you're already developing in TypeScript, WebAssembly is a good way to generate WASM code that interoperates nicely with it, which you can't do with plain old JavaScript.
I've implemented one but still need to get back to it to finish it... WASM is one of the easiest targets out there, hence so many languages already targeting it. But that will change once some of the WASM proposals become a standard, especially things like GC support and WASI, which will take a team (and long term investments) rather than a lone dev to implement: