The title made me think this could actually layout and paint HTML, but I couldn't find anything remotely layout-related in the source tree. Then I found this comment saying even block sizing isn't done: https://github.com/lexbor/lexbor/issues/219#issuecomment-207.... Looks like a nice groundwork, though. It's nice to see things like parsing and Unicode being part of the same source tree.
We have a decent chunk of layout and paint implemented in an HTML renderer I'm working on (https://github.com/DioxusLabs/blitz), which is targeting the "electron" use case (but with a rust scripting interface rather than a JS one).
The implementation is currently very immature and there are a lot of bugs and missing features (I only got a first cut of inline layout working yesterday (but we already have flexbox and grid implemented)), but we're already seeing pretty decent results on a bunch of real-world web pages and hope to be at the point where we can render most of the web (excl. JS) in the next 6 - 12 months.
I interpreted your comment as this being unfinished but then I heard that PHP has already switched from libxml2 to Lexbor so I guess it's production-ready.
We have been using https://github.com/rushter/selectolax as a faster alternative to BeautifulSoup with html5lib because many malformed webpages in the wild don't work with lxml.
The problem is that libxml2's 20-year old HTML parser never supported HTML5 [1], leading to more and more problems with downstream consumers like lxml, PHP or Nokogiri. PHP recently switched to Lexbor [2] and Nokogiri to libgumbo [3]. That said, I'm hopeful to receive enough funding to implement a HTML5 parser in libxml2.
Speaking of which, I don't understand why not. It seems like it would have been trivial to keep html5 a true xml. I do not understand what the actual technical reason for not doing that was. Naively, it just seems like breaking compatibility out of disdain rather than actually useful progress. Saving a couple of characters every once in a while does not justify the change, so I presume there must be a better reason?
There was XHTML and HTML5 is a direct result of finding out that was not the right solution. The main issue that was being solved there was that browsers do not parse invalid plain HTML consistently, which XHTML solved by requiring invalid XHTML to be rejected outright. This did not work. HTML5 solves this by defining the parsing rules such that there is a concept of document being invalid, every sequence of bytes deterministically maps to one particular DOM tree. This feature essentially precluded basing HTML5 on either XML (simply impossible) or SGML (that might be possible, but is in fact redundant formalism and describing the syntax in prose makes more sense, as everybody is going to hand-craft the parser anyway).
I felt XHTML had fairly limited adoption on the web and in many cases web page authors seem to have preferred the »render tag soup« approach that in most cases did the intended thing than having to deal with XML namespaces, proper nesting and escaping, etc. Even though in most cases HTML nowadays seems to be authored as if it was XML with every element painstakingly closed and often even making elements that need no closing self-closing.
Probably because XML would need to be extended quite a bit to accommodate all of the multimedia stuff, attributes without values or quotes, special names for certain characters, optional or disallowed closing tags and whatnot that's in HTML5.
I think pushing in both the layout design conveniences and the strictness of XML data transfer in the same standard would be quite bulky at best. In practice we'd likely see a lot of nasty security issues in implementations and so on.
Step 1 is a bit of a "draw the rest of the owl" step in that it's either done for you on your specific platform with default settings already or you have to go do all of the actually hard stuff of building the app (and sure enough that's where the typical cmake build step is hidden as well). Step 2 is just "and remember to link your code against the hard part when you compile it, by the way here's a single minimal example".