> By the way, SCRIPT tag tokenization is a hell of an effort. I had to draw a graph [...] Next in turn are the CSS parser and Renderer.
CSS parsing should be ok, but layout computation is hard, especially with all the latest specs. The graph presented in the article will be the size of a postage stamp on an aircraft carrier deck.
The great thing about writing the layout computation code is that specs are mostly additive. You can start by supporting only a few properties and then as you progress support more and more.
I've used this approach for css-layout[1] and in 2 weeks I got enough implemented to support most use cases we've needed to build mobile apps.
Also, Cassowary won't really help you there. It's going to take a much bigger effort to translate CSS into constraints than just reimplementing the steps themselves.
flex-box is a mostly self-contained and powerful spec (if i understand correctly).
however, when you don't need to account for floats, relative layout, mixed box-sizing, negative margins, complex overflow conditions and interaction with a ton of older specs, you vastly simplify the problem space for yourself.
it makes perfect sense for a modern system but is quite far from a general impl that can compute layout from html+css unconditionally. the article starts with:
"Once I got an X idea, but its implementation required a calculated DOM with all its styles and goodies"
so the goal is not "the most useful subset". flexbox is currently the least-used (& least supported) layout, so for the author's purposes which sounds like scraping existing markup would not help very much.
No idea, but it's ~40,000 LOC compared to gumbos 30,000. Hand writing parsers in C in 2016 is nuts. Gumbo at least has the virtue of being gruelingly tested by a heavy hitter.
It is a really fun kind of nuts. I built one myself a couple of weeks ago, because I wanted to play around with some ideas in the "nanopass framework" for compiler design, but I don't speak Scheme.
Interesting to see, just took handmade xml parser as a personal challenge, in python though, I've been hitting nasty performance issues compared to libxml2.
Basic XML parsing should be very simple, it's deterministic with one-symbol lookahead. There's a number of small C parsers out there and even one written in assembly. If you want to validate it though or parse DTDs, that's a different story.
DTDs really are the thing that should never have been in XML. Namespaces in XML make them pretty useless, given they only have a concept of QNames and not namespace URL & local name tuples, and they don't compose in any sensible way. They are such a large part of the complexity of the XML it's just sad.
CSS parsing should be ok, but layout computation is hard, especially with all the latest specs. The graph presented in the article will be the size of a postage stamp on an aircraft carrier deck.
Take a look at the Cassowary constraint solver, btw: http://overconstrained.io/
> I'm writing them all by myself, still full of energy.
I wish the author the best of luck.