I am partly responsible for why WebAssembly is this way. You can thank/blame me for the if-else bytecodes. They are indeed, a form of compression, as they don't add expressive power. I measured carefully and they make a big difference in code size. That's why they're there!
The structured control flow requirement is to benefit all consumers. It is only a burden on producers that come from CFGs, not from ASTs and other tree-like IRs. If you have a dominator tree, then you can generate structured control flow in linear time. LLVM does this a particular way, but there are straightforward algorithms.
No, this wasn't Google throwing around its veto power or something like that. There is a good reason why control flow is structured, as hinted in comments here.
1. Structured control flow rules out irreducible loops. Irreducible loops cause problems for all JIT compilers in all browser engines, not just V8, and even in JVMs. Things get really complicated, particularly in register allocation. [1]
2. Structured control flow guarantees a stack discipline for the use of labels, mirroring the stack discipline for values. This is not only a nice symmetry, it means that a consumer that needs to allocate space per label can reuse that space as soon as a control construct is closed. That is essentially optimal for use of consumer space resources.
[1] No kidding. If you have an irreducible loop in Java bytecode, which is possible, you will never be JITed and will get stuck running 100x slower in the interpreter. We thought this through very carefully in V8. If you allow irreducible loops in Wasm, you force all engines to either stick to their lowest execution tier and run 2-100x slower, do relooping themselves, or handle the general case of irreducible loops spending multiple person-years complicating their optimizing tiers' backends for a case that is incredibly rare (and probably introducing lots of bugs). In V8 we would have probably gone for the relooper option because the other two options are bad. So that's a lose, because now the engine is more complicated, doing a rewrite of the code that could as well be done better and more efficiently offline by a producer. And there is no benefit because the engine's code would be no better than what the producer would have come up with. So we'd choose the lesser of the complexity options, but get no performance benefit, in order to avoid the absurdly bad performance hit of not being able to use the optimizing tier. Bad tradeoff now matter how you slice it, IMHO.
I am fully convinced we made the right choice here.
We should have communicated better and the relooper algorithm and tools should textbook, off-the-shelf stuff.
It's bizarre to me that there are complaints about the if-else bytecodes. You can call them "weird" but they're very easy to reason about, make it easy to write toy examples, are easier to generate from a compiler, and are easier to read and understand in disassembly. At the point where the author started talking about customized compression to recover the size gains they should have realized why the bytecodes exist! Early on in the spec process it was very, very useful to have them.
Anyone questioning whether the lack of goto was the result of an invocation of veto power can look at the design repo and see that lots of control flow consideration and discussion happened in the open before the group finally agreed upon a solution. Many of the players involved were not Google employees at the time of the decision (I don't know if they are now):
I definitely recall that people had disagreements about how control flow should work and had different goals or priorities but it was a pretty detailed and drawn-out decision-making process. I don't think it was really possible for everyone to walk away from the table happy.
As context, at the time of those issues, the control-flow restrictions were a compromise to help get the "MVP" off the ground, with the understanding that "more expressive control flow" was expected to be added later:
Unfortunately, we don't actually have data which supports this. The only data that was collected at the time showed that if/else compress better than what wasm has without if/else. There are other ways we could have compressed control flow, but we didn't do the experiments.
> If you have an irreducible loop in Java bytecode, which is possible, you will never be JITed and will get stuck running 100x slower in the interpreter.
As far as I understood, the Java language and many other high-level language can't express such loops (no arbitrary gotos), so JITing them would be a specific optimization for otherwise-crafted bytecode. Also, it probably doesn't happen often for bytecode generation libraries.
So basically a lot of potential for bugs to optimize a super-rare case that could also be solved instead by whoever produced that code.
I am partly responsible for why WebAssembly is this way. You can thank/blame me for the if-else bytecodes. They are indeed, a form of compression, as they don't add expressive power. I measured carefully and they make a big difference in code size. That's why they're there!
The structured control flow requirement is to benefit all consumers. It is only a burden on producers that come from CFGs, not from ASTs and other tree-like IRs. If you have a dominator tree, then you can generate structured control flow in linear time. LLVM does this a particular way, but there are straightforward algorithms.
No, this wasn't Google throwing around its veto power or something like that. There is a good reason why control flow is structured, as hinted in comments here.
1. Structured control flow rules out irreducible loops. Irreducible loops cause problems for all JIT compilers in all browser engines, not just V8, and even in JVMs. Things get really complicated, particularly in register allocation. [1]
2. Structured control flow guarantees a stack discipline for the use of labels, mirroring the stack discipline for values. This is not only a nice symmetry, it means that a consumer that needs to allocate space per label can reuse that space as soon as a control construct is closed. That is essentially optimal for use of consumer space resources.
[1] No kidding. If you have an irreducible loop in Java bytecode, which is possible, you will never be JITed and will get stuck running 100x slower in the interpreter. We thought this through very carefully in V8. If you allow irreducible loops in Wasm, you force all engines to either stick to their lowest execution tier and run 2-100x slower, do relooping themselves, or handle the general case of irreducible loops spending multiple person-years complicating their optimizing tiers' backends for a case that is incredibly rare (and probably introducing lots of bugs). In V8 we would have probably gone for the relooper option because the other two options are bad. So that's a lose, because now the engine is more complicated, doing a rewrite of the code that could as well be done better and more efficiently offline by a producer. And there is no benefit because the engine's code would be no better than what the producer would have come up with. So we'd choose the lesser of the complexity options, but get no performance benefit, in order to avoid the absurdly bad performance hit of not being able to use the optimizing tier. Bad tradeoff now matter how you slice it, IMHO.
I am fully convinced we made the right choice here.
We should have communicated better and the relooper algorithm and tools should textbook, off-the-shelf stuff.