Hacker News new | past | comments | ask | show | jobs | submit login

I bet if you work out how many nodes and attributes are in that XML file, multiply that by (what's a good size for a C++ object? 256 bytes?) and I bet it's a substantial fraction of that 1.5GB.



Exactly. If you have to have a "DOM" of an XML, it's going to take A LOT of memory.

I recently replaced some generalized Java code (my own, alas) that built a DOM of some XML with substring operations to find the element text in the only element in the XML that mattered. (fortunately, I could guarantee that the text had no markup related special chars in it) This caused about 6 GC cycles (in a 32 bit JVM) to disappear from this process.

However, if you have to display or navigate the document tree, you are stuck with the memory hogging DOM.


nah, the guy is right there's better tech since 2010, just nobody has bothered to swap out the old style DOM stuff for pugixml/rapidxml/vtdxml.

http://pugixml.org/benchmark/

specifically: http://pugixml.files.wordpress.com/2010/10/dom-memory-compar...


I meant "DOM" as in the general "tree in memory" data structure. You did find an implementation that uses about 1/3 the memory of some of the piggier ones, though.


closer to 1/5th.

also, in actual use -> those are peak values, so while libxml will hog the memory until the DOM is freed, the streaming style parsers hold on to the smaller amount of memory for a shorter time.


In the problem above, I did try using the SAX parser that the WebLogic-JVM "factory factory factory" returned, but the element text was about 2 MB, and it wanted to return it in pieces by repeatedly firing the event handler.

Manually finding the index of the open/close elements and doing a substring to get the element text was SO incredibly much faster and smaller, albeit something that only worked for a VERY specific situation.


I wonder why those haven't got traction with things like Mozilla though. Presumably -someone- would at least have looked at them if they're that big a win?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: