Google Closure: How not to write Javascript

paulsmith · on Nov 12, 2009

SitePoint swallowed my comment, so I'll just post it here:

I found some of your claims questionable, so I cherry-picked one to test for myself.

  This code converts arguments[i] to a string object using the
  String conversion function. This is possibly the slowest way to
  perform such a conversion, although it would be the most
  obvious to many developers coming from other languages.

  Much quicker is to add an empty string ("") to the value you
  wish to convert:

I compared the time it took to run 100,000 of each of String(fn) and fn + "", and found that the results were more ambiguous and less definitive than your assertion. In fact, fn + "" was 6.6% faster on Firefox 3.5, but 15.8% slower on a recent Chromium build. In any case, the difference was, even over 100,000 runs, a few milliseconds, so it hardly seems like this particular case would be an area to focus on for optimization.

I wonder how many of your other strong claims would stand up to similar testing.

Test code and results: http://gist.github.com/232944

latortuga · on Nov 12, 2009

I've done a lot of testing on creating strings by concatenation and it turns out that + is actually one of the slowest. It's much faster, for example, to create an array and then join it with "". Maybe try that in your test and see how it compares.

Example:

  x = []
  x.push("latortuga")
  x.push(" is so right")
  str = x.join("")

paulsmith · on Nov 12, 2009

That's a different issue than the what the OP was writing about, which is getting a string representation of an object, in this case, a function.

apgwoz · on Nov 12, 2009

This used to be true in Python as well, but I'm not sure if it still holds for recent versions.

tumult · on Nov 12, 2009

Some Sitepoint Article: How not to Critique a Javascript Library

    for (var i = fromIndex; i < arr.length; i++) {
    
    This for loop looks up the .length property of the array (arr)
    each time through the loop. Simply by setting a variable to
    store this number at the start of the loop, you can make the 
    loop run much faster:

From six lines above, in array.js:

    if (arr.indexOf) {
        return arr.indexOf(obj, opt_fromIndex);
      }
    if (Array.indexOf) {
        return Array.indexOf(arr, obj, opt_fromIndex);
      }

It only falls through to there if Array already has no indexOf prototype method on it at all. In that case, performance on "for" loop is the least of your concerns, because you're using a terribly slow browser where it won't matter anyway. They used a cached length property later on in the very same file, in places where it might actually matter.

    Google’s developers seem to have figured this trick out
    later on in the same file. From array.js, line 153:
    var l = arr.length;  // must be fixed during loop... see docs
    ...
    for (var i = l - 1; i >= 0; --i) {
    
    This loop is better in that it avoids a property lookup each
    time through the loop, but this particular for loop is so
    simple that it could be further simplified into a while loop,
    which will run much faster again:
    var i = arr.length;
    ...
    while (i--) {

On V8 and SquirrelFish Extreme, these two loops are exactly the same speed (and indeed, probably generate near-identical native code.) On FireFox and older Chrome/WebKit, the average difference on an array of 10,000 elements is something like the difference between 0.068ms and 0.072 ms to iterate (I don't remember the actual figure on my machine, it's negligible.) In V8, Google's own native iterators are implemented in JavaScript using that incrementing 'for' loop.

    But not all of Closure Library’s performance woes are due
    to poorly optimized loops. From dom.js, line 797:
    switch (node.tagName) {
      case goog.dom.TagName.APPLET:
      case goog.dom.TagName.AREA:
      case goog.dom.TagName.BR:
      case goog.dom.TagName.COL:
      case goog.dom.TagName.FRAME:
      case goog.dom.TagName.HR:
      case goog.dom.TagName.IMG:
      case goog.dom.TagName.INPUT:
      case goog.dom.TagName.IFRAME:
      case goog.dom.TagName.ISINDEX:
      case goog.dom.TagName.LINK:
      case goog.dom.TagName.NOFRAMES:
      case goog.dom.TagName.NOSCRIPT:
      case goog.dom.TagName.META:
      case goog.dom.TagName.OBJECT:
      case goog.dom.TagName.PARAM:
      case goog.dom.TagName.SCRIPT:
      case goog.dom.TagName.STYLE:
        return false;
    }
    return true;
    
    This kind of code is actually pretty common in Java, and
    will perform just fine there. In JavaScript, however,
    this switch statement will perform like a dog each and
    every time a developer checks if a particular HTML element
    is allowed to have children.

    Experienced JavaScript developers know that it’s much
    quicker to create an object to encapsulate this logic:

    var takesChildren = {}
    takesChildren[goog.dom.TagName.APPLET] = 1;
    takesChildren[goog.dom.TagName.AREA] = 1;
    ...
    This code can be further bulletproofed against outside
    interference using hasOwnProperty (see below for a full
    explanation of this).
    
    return !takesChildren.hasOwnProperty(node.tagName);

Really? You tested it? Because a 'switch' does not necessarily generate the same code as a dynamically constructed set of object properties. In Google's original example, if none of the cases match, it immediately returns 'true'. In Sitepoint's example, if the property does not match on the highest level, it must crawl the property chain on the object until it eventually reaches 'undefined'. Not to mention you are making two inner function calls and a fully dynamic property lookup (!(), hasOwnProperty, and node.tagName), none of which I believe can be calculated statically.

Actually, I'm going to end this rant early, because I just read more of the article, and it looks like the author just keeps pulling random shit that he remembers reading from "Javascript: The Good Parts" or whatever and vomiting it back up onto his keyboard.

zemaj · on Nov 12, 2009

Agreed, very poorly written article. You can't approach optimisation in isolation like that. There's a whole range of factors like readability, situational awareness, concurrency etc... that need to be considered.

Plus this is the first release of what has previously been an internal-only library. It needs a little time to mature. jQuery wasn't perfect of the gate either.

andreyf · on Nov 12, 2009

it must crawl the property chain on the object until it eventually reaches 'undefined'

If it were a normal property lookup, yes, but not with `hasOwnProperty' - that's the same as any old hash table lookup. So efficiency here really depends on the implementation, and Google's is probably faster. Regardless, we're really splitting hairs - these optimizations will get a few milliseconds, on average. As much as I am a perfectionist, it's all completely negligible in terms of UX.

tumult · on Nov 12, 2009

I'm talking about the class tracing V8 does, not prototype chains. "Property chain" meaning in V8, it must climb the series of what it calls "hidden classes" in order to find all of the properties. Other engines do various things when you pass a totally dynamic property lookup in. Some slower engines probably use naïve hash tables, I don't know.

Feel free to correct me if I'm wrong on this one, though. I've never really ripped open V8 or any of the other engines to look inside, I just go by what I read from their design docs.

grayrest · on Nov 12, 2009

From my reading of the V8 docs, the hidden class climbing is done when properties are added to the class. Each added property causes a transition to the next hidden class. The final hidden class has all the properties as offsets and a failed lookup might chain through prototypes (not sure there) but does not work though the other hidden classes.

jodrellblank · on Nov 13, 2009

it looks like the author just keeps pulling random shit that he remembers reading from "Javascript: The Good Parts" or whatever

In defense of the author, he is just the messenger:

"""“I’ll make you a deal,” I told him. “Send me some examples of this terrible code and I’ll publish it on SitePoint.”"""

charlesmarshall · on Nov 12, 2009

It also seems like the author also forgot that code brevity is very important when maintaining such a large code base. Everyone can understand that switch, but the suggested object replacement is not so obvious.

dukejeffrie · on Nov 26, 2009

One thing we can't deny: "case META" should come before "case NOFRAMES". :-)

gojomo · on Nov 12, 2009

Might some of these little idiomatic optimizations be automatically performed by the Closure Compiler, making their absence from the original source irrelevant?

asher · on Nov 12, 2009

That was my first thought. I ran the "slow loop" through the online compiler at:

http://closure-compiler.appspot.com/home

And whether I checked "Simple" or "Advanced" I got:

for(var a = 0;a < arr.length;a++)

So it's not optimizing that pattern yet.

jeff18 · on Nov 12, 2009

It would have to determine that arr.length doesn't change during the loop. Pretty difficult problem. :)

gojomo · on Nov 12, 2009

I'm unsure if the smiley means you're being sarcastic, but if not:

Such determinations are a pretty common thing for an optimizing compiler to attempt:

http://en.wikipedia.org/wiki/Loop-invariant_code_motion

qeorge · on Nov 12, 2009

Perhaps, but it does seem that the Google Closure developers are unaware of these optimizations. Unless the compiler was written by an entirely different group, I would be surprised.

Of course, there's an easy way to find out. :)

nostrademons · on Nov 12, 2009

Closure was written by many, many different people. I can attest that the core developers (the ones whose names are on the project) are very aware of these optimizations - one of them interviewed me when I was being hired, and another was in charge of my JS readability review. But Closure's a big chunk of code, though, and the core developers tend to be very busy people, and not everyone can review everything.

statictype · on Nov 12, 2009

Well, I learned something new from most of the points mentioned.

Only thing I would take exception to is:

>Those 1.0s are telling. Languages like Java represent integers (1) differently from floating point numbers (1.0). In JavaScript, however, numbers are numbers. (1 - factor) would have worked just as well.

It works just as well but the code seems to be clearer (without any loss of functionality or performance) for explicitly making it a '1.0'

tolmasky · on Nov 12, 2009

Yes this point was particularly ridiculous. Yes, this is a convention from Java (and C++), but I actually do this in my JS code as well. Every minifier will easily fix this for you, and in my opinion it makes things clearer. In fact, there exists an argument that this form of clarity is more important in a language like JavaScript precisely because you don't have a type system (in other words, seeing a 1.0 when an operation should clearly be dealing with ints can flag incorrect logic).

nuggien · on Nov 12, 2009

Way too nitpicky, and the memory leak point regarding the unbound cache is reaching a little bit.

apgwoz · on Nov 12, 2009

I agree, but his example is strong. While calculating something every time the mouse moves would be expensive and memoization is probably a good idea, the mouse pointer in most situations has some locality, and so some LRU cache is probably a better approach. There's probably examples where you'll get more bang for your buck using a different cache storage as well.

Does that mean in general their memoization function is flawed because it doesn't support that, no. If you need such specialization, for optimization, you're free to roll your own--which is why I agreed with you.

mustpax · on Nov 12, 2009

I don't know what definition of memory leak the author is working with, but a real memory leak occurs when all references dynamically allocated chunk of memory are lost without free()ing the memory. An unbound cache does not fit this definition.

camccann · on Nov 12, 2009

The term "memory leak" is also sometimes applied, particularly in GC environments, to mean "memory that is still referenced, but will never be used again".

Poorly-thought-out caching is an easy way to do that, especially if the cached values themselves have a chain of references to other values.

mustpax · on Nov 17, 2009

With caching I don’t think you can assume any values are sure to never be used again.

Yes, there are some situations where circular references to DOM elements and the like will cause reference count based GC engines to leak memory. But an unbounded cache does not fit that description. It could be poor memory usage but not a memory leak.

harto · on Nov 12, 2009

The section "The Slow Loop" posits that many for-loops in the library are much slower than they could be because they "[look] up the .length property of the array ... each time through the loop".

I wonder what the performance difference is in terms of referencing a local var vs. referencing an object property? Surely it's negligible?

mark_h · on Nov 12, 2009

I wondered that too. I wonder if they just don't notice it because the modern VMs they mainly use would cache or inline it, but older implementations do a search for the property each time? (I'm speculating on both those points)

nostrademons · on Nov 12, 2009

I think it's an oversight. The coding conventions I work with ban that form of a for-loop, precisely because the .length access can potentially be very costly. (I'm in a different part of the company, one that doesn't use Closure, but our latency constraints are even tighter than theirs are.) If you're iterating over a NodeList and not just an array, it can actually turn an O(N) loop into an O(N^2) one.

mark_h · on Nov 12, 2009

Interesting, thanks.

qeorge · on Nov 12, 2009

Checking the length of an array on each iteration of a loop is an amateur mistake, in any language. Some of these points were pretty nitpicky, but I thought this was a valid complaint.

nostrademons · on Nov 12, 2009

In compiled languages it's usually not worth breaking convention to factor it out, because the compiler will apply loop-invariant code motion and pull it out of the loop body anyway. The exception is if you're calling a function that the compiler can't prove to be referentially transparent, eg. strlen.

qeorge · on Nov 12, 2009

Really interesting, thanks. I don't spend a lot of time with compiled languages these days.

Question: how can the compiler be sure an array's length wouldn't have changed? I suppose it could examine the code within the loop, but then the same could be said for strlen. What's the difference?

nostrademons · on Nov 12, 2009

Most languages don't have functionality to resize an array in place. If you realloc in C, you get back a pointer that may or may not be the same as what you passed in. If you 'new' in Java, you must assign that pointer to your initial variable. If the loop body never assigns to the array, the compiler can be sure that it still points to the same block of memory, and its length will be the same.

This is a specific example of dataflow analysis, which is a fascinating area of compiler design. Usually you'll use something like a use-def chain to figure out all the points where a variable might've been defined. If all of those are outside of the loop, then the computation itself may be moved outside of the loop too, and the loop just refers to the temporary that's the result. I'd encourage you to pick up a book on compiler design if you're interested; here's the Wikipedia article:

http://en.wikipedia.org/wiki/Use-define_chain

This explicitly doesn't apply to "smart" containers like Vectors, which automatically resize themselves. If you were to inline these methods, you'd see that the internal buffer pointer may change inside the loop (assuming that you call functions that mutate the object; if you don't, and the language has a notion of const-correctness, a sufficiently smart compiler could probably optimize them too), and so the compiler can't move any code outside. Same with strlen; its return value depends upon the contents of the memory buffer pointed to, so unless the compiler can prove that that memory buffer never changes (hard; any function may reach in through a pointer and set a character to null), it can't optimize the call away.

qeorge · on Nov 12, 2009

Brilliant, thank you for the explanation and the Wikipedia link. I'm going to stop by the library this weekend to see if they have any books on compiler design, do you have any recommendations there?

Thanks again, its great to learn something new.

nostrademons · on Nov 12, 2009

The classic is the "Dragon Book":

http://www.amazon.com/Compilers-Principles-Techniques-Alfred...

I'm also partial to Appel's "Modern Compiler Implementation in ML", because it goes into more detail about techniques for compiling functional languages:

http://www.amazon.com/Modern-Compiler-Implementation-Andrew-...

qeorge · on Nov 12, 2009

Great, thanks. I'll check these out.

niels · on Nov 12, 2009

No, because accessing a object property requires a hash table lookup.

brown9-2 · on Nov 12, 2009

But just stating why a more performant option exists doesn't tell us how much performance you'll squeeze out just by making this change.

In other words, it's not very helpful to say "this is so slow, you should do this instead" without at least trying to quantify the difference somehow.

asher · on Nov 12, 2009

I just tried it.

With 10M entries in the array, the "slow" way takes 509 ms and the "faster" way takes 362 ms. So we can save roughly 15 nanoseconds per iteration. Or a 30% speedup - assuming nothing happens inside the loop.

(Of course the numbers may vary wildly per browser.)

Somehow I doubt this is the performance problem behind slow web pages.

brown9-2 · on Nov 12, 2009

Agreed - which makes pointing this out as the first problem in a "gee Google can't write JavaScript article" seems a bit overblown.

manvsmachine · on Nov 12, 2009

In a case like this, it doesn't really matter how much of a performance benefit you get because it's ridiculously trivial to inmplement. This in particular is one of those optimizations that I remember being told back in CS101.

wgj · on Nov 12, 2009

A commenter at sitepoint noted the positive point about Google open sourcing Closure. I think it's a good point. It leaves room for people to review, critique, and help improve things.

I thought the more interesting thing about Closure was the optimizer (and I wonder if the optimizer offsets any of the issues described in this critique.)

Regarding the library, I think the interface is more important at this point than the implementation. The latter can be refactored, while the former will always be there.

charlesmarshall · on Nov 12, 2009

blatant self promo - http://closureoptimizer.com/ - for a simple form to play with the optimiser basics

bluebird · on Nov 12, 2009

Hey,

goog.isString = function(val) { return typeof val == 'string'; };

var b = new String("I am also a string!"); alert(goog.isString(b)); // Will output FALSE

If that doesn't scream ignorance of the language, what does?

edd · on Nov 12, 2009

For reference for anyone else who doesn't know how this should be done correctly (I had to go and look at the dojo source).

    dojo.isString = function(/*anything*/ it){
        // summary:
        // Return true if it is a String
        return (typeof it == "string" || it instanceof String); // Boolean
    }

dgreensp · on Nov 12, 2009

I write a lot of JavaScript and I'm not even sure this is preferable. The "object wrappers" like String are considered deprecated in some circles as a matter of style, e.g. by Crockford. From this point of view, a stricter check is cleaner. And contrary to what the article implies, String(foo) is a cast to a primitive string, and doesn't create an object like 'new String' does.

So the Google function correctly returns whether String(foo) is a no-op.

jhawk28 · on Nov 12, 2009

The project is open source and patches can be submitted.

tlrobinson · on Nov 12, 2009

If you want to super careful, obj.hasOwnProperty(key) is actually dangerous. What happens when obj has it's own property named "hasOwnProperty"?

The safest thing to do is:

    Object.prototype.hasOwnProperty(obj, key)

Of course, as with the "undefined" example, malicious or incompetent code could overwrite Object.prototype.hasOwnProperty

n8agrin · on Nov 12, 2009

My favorite part:

This function checks if a particular variable has a value defined. Or it does, unless a 3rd party script sets the global undefined variable to something else. This single line of code anywhere in the page will bring Closure Library crashing down:

var undefined = 5;

Relying on the global undefined variable is another rookie mistake for JavaScript library authors.

Granted, yes I think libraries in particular should code defensively and I originally saw the var undefined; trick in jQuery some time ago, but realistically, I think they are making a mountain of a mole hill. In my mind it's more of a rookie mistake to trample the undefined type in your code than it is to assume the undefined type has been trampled. I'm no elite js interpreter wizard but in fact I believe the other reason why libraries use the var undefined; trick is to gain a slight speed boost.

boucher · on Nov 12, 2009

Some code is so bad that it's hard to have sympathy for the people who wrote it when stuff breaks. Overriding undefined is one of those things.

If we're getting that nitpicky, having an isDef method at all is adding an unnecessary function call that will slow down your program. And while you certainly could create an undefined variable for checking against each time, I'd hardly say not doing so is a "rookie mistake".

kunley · on Nov 12, 2009

Funny.

Blogger complaints that there's permanent lack of some optimizations in the code because authors have chosen an intuitive way of expressing things in the language. The problem is maybe not the authors, but the fact that efficient way of programming in JS is usually counter-intuitive.

Wake up JS, it's XXI century! ;P

aarongough · on Nov 12, 2009

Wow, sometime reading around HN is a little scary! The sheer amount of expertise that the community can bring to bear on nearly any topic is pretty crazy. Even more-so with something like this that impinges on a core subject for many of the people here...

lhorie · on Nov 12, 2009

Is it just me, or is it becoming fashionable to criticize google's code? This one is a few days old:

http://webreflection.blogspot.com/2009/11/google-closure-im-...

jpcx01 · on Nov 12, 2009

Pretty good critique. Hopefully Google gets these (mostly minor issues) cleaned up in the next release.

boucher · on Nov 12, 2009

I'd say it isn't. As you said, most of the problems are minor. Some of the performance assumptions are false. Very little of this seems to be of any consequence.

jomohke · on Nov 12, 2009

Agreed. I was expecting a critique of the library API. JavaScript APIs (including, especially, the browser APIs) have a long history of being overly-verbose, and designed as if it was Java.

Has anyone used the closure library API enough to comment?

jollojou · on Nov 12, 2009

The article succeeds at least in one thing. It shows that not everything Google releases is perfect and great. Keep your eyes open.