What are you using for server side dom manipulation? jsdom? apricot? node-xml? libxmljs? I spent a lot of the weekend working on a webcrawler, but couldn't find a xml parser that didn't choke on the internet-at-large.
I haven't seen their code, but I'm not sure why you'd need server side DOM manipulation for this. I'd implement all of that in the browser, and just let the server handle passing events back and forth.
Since they're proxying the page, they replace all <a>s with proxy <a>s, as well as adding a <script> and a <div> with some content to the bottom. Try viewing source on http://starcraft2destroyedmymarrage.no.de:3000/app/4
I don't recall the urls - I was recursively crawling from a user-provided seed url and had trouble with apricot, node-xml, and libxmljs. I had the least time to play w/ libxmljs because of hassle w/ joyent --- I had to compile v8 and node to link against and scons wasn't playing nice w/ the build environment.
--
Edit: I should add that I'm using libxmljs in production (http://newsbasis.com/news) as an rss parser (streaming sax-push-parser) and it works quite well for that sort of xml.
Any chance you'd consider open-sourcing?