Hacker News new | past | comments | ask | show | jobs | submit login

> haha, cue firefox going back to using ridiculous amounts of memory in 5, 4, 3, 2, 1.

I just had to force-kill Firefox (v. 33.0.2, on a OS X v. 10.8.1 with 4GB of memory) when it decided to not respond anymore after I had tried (silly me) to open a 14 MB XML page stored on disk. Looking in the Activity Monitor that XML file caused FF to load one of the CPUs at 100% and to use 1.5GB of memory (for opening a 14MB file, that's two orders of magnitude less, I cannot even understand how this can still happen, in 2014).

Don't get me wrong, I've sticked to using FF for more than 10 years now, but they need to get their s.it together.

Later edit: I also checked in Chrome (v. 38.0.2125.111) and saw the same behavior when trying to open that 14MB XML file. At least in Chrome's case killing the tab was much faster (in FF's case I had to kill the whole window from inside Activity Monitor). I've taken a quick look and found bug reports like this one (for FF: https://bugzilla.mozilla.org/show_bug.cgi?id=291643), opened in 2005 and still not closed, which mentiond something like "it all happens because of XML processing". My stupid question is: why in God's name does a browser need at least two orders of magnitude more memory in order to process XML? Isn't XML just marked-up text? I don't get it. What does the browser need to "process"? This is just stupid.




>I don't get it. What does the browser need to "process"? This is just stupid.

All of the code is open source; this question is amenable to research. It might be better to do a little bit of that research before calling present implementations stupid. Real people put a lot of work into them.


Most browsers will show a raw XML file as an element tree, allowing you to open and close branches/elements by clicking on a "+" or "-" symbol to the left of each element.

That requires either building a memory intensive tree model of the XML (DOM), or, repeatedly reparsing the XML every time the user wants to open/close part of the view.

The standard "DOM" methodology requires building a tree node for every element and text chunk between the elements, and maybe for the attributes on the elements (or at least attaching an associative array to each element for its attributes). All of these little pieces fragment memory, unless a very clever compaction and/or string interning setup is used.


> Real people put a lot of work into them.

I know this, I've been involved in open-source projects myself (some good years ago), but how does that address the fact that the project has what appears to be quadratic time "performance"?


...but they are stupid :)




I bet if you work out how many nodes and attributes are in that XML file, multiply that by (what's a good size for a C++ object? 256 bytes?) and I bet it's a substantial fraction of that 1.5GB.


Exactly. If you have to have a "DOM" of an XML, it's going to take A LOT of memory.

I recently replaced some generalized Java code (my own, alas) that built a DOM of some XML with substring operations to find the element text in the only element in the XML that mattered. (fortunately, I could guarantee that the text had no markup related special chars in it) This caused about 6 GC cycles (in a 32 bit JVM) to disappear from this process.

However, if you have to display or navigate the document tree, you are stuck with the memory hogging DOM.


nah, the guy is right there's better tech since 2010, just nobody has bothered to swap out the old style DOM stuff for pugixml/rapidxml/vtdxml.

http://pugixml.org/benchmark/

specifically: http://pugixml.files.wordpress.com/2010/10/dom-memory-compar...


I meant "DOM" as in the general "tree in memory" data structure. You did find an implementation that uses about 1/3 the memory of some of the piggier ones, though.


closer to 1/5th.

also, in actual use -> those are peak values, so while libxml will hog the memory until the DOM is freed, the streaming style parsers hold on to the smaller amount of memory for a shorter time.


In the problem above, I did try using the SAX parser that the WebLogic-JVM "factory factory factory" returned, but the element text was about 2 MB, and it wanted to return it in pieces by repeatedly firing the event handler.

Manually finding the index of the open/close elements and doing a substring to get the element text was SO incredibly much faster and smaller, albeit something that only worked for a VERY specific situation.


I wonder why those haven't got traction with things like Mozilla though. Presumably -someone- would at least have looked at them if they're that big a win?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: