An instruction set that is supported by all major browsers sounds enticing. I have tried the hello_world demo with Emscripten a couple years ago and was stumped that the generated page had multiple megabytes. In the first example in this page, I read
This will output a pkg/ directory containing our wasm module, wrapped in a js object.
So I'm guessing that the result is the same. Why is it so? Hello world requires a print function, which I suppose needs a small subset of some implementation of libc. Why so much space? Why the need for a .js object? Shouldn't we be bypassing the JS engine?
To really understand WASM, you should try to write it by hand! That's right, it's possible, it even has a text format that all WASM runtimes let you run directly.
Once you know what WASM really does, it's obvious why you need JS (or whatever the host language is in your runtime, which could be anything) to provide anything related to IO and why there's zero chance you'll get DOM access as it currently stands... until they finally finish off specifying a whole lot of APIs (they did GC already which was pretty fundamental for that to work, but there's many more things needed for complete access to IO and DOM).
If you use a compiler like Rust, it's going to include WASI which is a sort of POSIX for WASM (and currently completely underdefined, just try to find the spec and you'll see it's mostly being made up as implementers go), which is why you'll get ridiculous amounts of code into your WASM file. If you write it by hand, importing some made up function like `console_log` which you then import from WASM, then your WASM should be just a few bytes! Literally. I wrote a little compiler to do this as it's not hard to generate WASM bytecode, but to make anything useful is very painful due to the complete lack of specifications of how things should work (e.g. to print a string is actually very hard as you'll just be able to give JS or your host a pointer to linear memory, and then you need to write the actual bytes - not UTF or even ASCII characters in WASM itself - to linear memory, which the JS then has to decipher to be able to call its own `console.log()`)... so I am waiting (a few years by now) until this improves to continue.
You need (a little) JS to run Wasm in the same way you need (a little) HTML to run JS; it's a hosted platform. JS handles loading and invoking the wasm code, and because it's close to a pure instruction set there's very little you can do without calling JS APIs, which in turn requires support code to translate across the boundary.
The WASI project specifies wasm-native APIs (modelled on posix) for running locally without JS, so you could imagine something similar for the browser. But the complexity of the DOM is forbidding.
I've not tried Emscripten hello world for a while, but I imagine it depends on things like optimisation level, dead code elim etc. In general to compile C code you'll need a malloc, string support and so on as you say. You can make the wasm file tiny if you lean on JS strings, but that increases the amount of support code again. Languages other than C will have an easier time reusing parts of the JS runtime (like strings or GC).
Yeah. And hello world is (thankfully) much smaller now than it used to be. Bigger than you think if you use printf, which is a quite complex C function. But at a guess, 10kb or something including support files. There are some great guides and tooling around to help shrink wasm size. Eg for rust:
You do need some JS code that asks the browser to run the wasm blob. You can't eg. just have a script tag that refers to a wasm blob yet.
libc does help with things like having an allocator or string operations etc., or for using C libraries that use libc. And that's where emscripten becomes helpful.
Browser functionality like the console or making html elements is exposed through JS interfaces, and the wasm needs
to call out to those. But they may be directly exposed to wasm later (or they may already be at this point in new / experimental browser features).
The hello world in this guide doesn't actually use console.log at all. It adds 2 numbers and sets the page content to the result. All it does is expose an add function from rust and call it from the javascript side.
(1.4 KB for the .html, 8.8 KB for the .js, 14.5 KB for the .wasm - and a whopping 5.5 KB for the 404 page returned by Github pages for the missing favicon - wtf...)
The .js file is basically glue code between the WASM code and the browser runtime environment.
Without the Emscripten "convenience runtime" you can also go smaller, but at a few dozen KBytes more or less it's pretty deep in diminishing returns territory.
The C library is provided via MUSL these days (in Emscripten). But there's a small 'syscall layer' which is implemented in Javascript (basically if you want to run in the browser and access web APIs, at some point you need to go through Javascript).
Can't speak to the size issue as I don't use emscripten, but I agree that most WASM output is waaay too large. Regarding JS, you need a small amount of JS to bootstrap your WASM program. In the browser, JS is the host environment and WASM is the guest. The host instantiates the WASM module and passes in capabilities that map to the module's declared imports.