I actually have a similar question re. js since a while now [with Chromium 34].....

chewxy · on May 20, 2014

I could be wrong (and if so, pie my face), but I believe it's mostly due to one of the many the inline cache optimizations that v8 employs.

Let's consider the receiver (i.e the `this` value) of Example 1 and 2. The receiver of Example 1 is Benchmark, if invoked normally. The receiver of Example 2 is the empty function object function(){}.

When you call makeKeyCodepointObj.makeKey() - the VM looks up the object's prototype chain and finds the function. This call site is cached (think of it as a K:V store, where the key is "makeKeyCodepointObj.makeKey" and the value is the call site of the function.)

When you call makeKeyCodepoint(), the VM has to, for each call, look up the prototype chain until it finds the variable. The variable is then resolved into the function call site. Because of scoping issues in JS, I don't think this is cached (or if it's cached, it'd be invalidated a lot), and a lookup has to happen every time. (I know in my JS engine, I tried to perform caching optimization for global object properties and I gave up).

TL;DR: Function lookups happen all the time when the function is a method of the global object. When a function is a method of an object, the lookup is cached.

If I am talking out of my arse, please feel free to correct me.

Stratoscope · on May 20, 2014

I don't think a global variable lookup is the reason for the difference. Here is the code that jsperf generates for the function version of the test:

    (Benchmark.uid1400600789397runScript || function() {})();
    Benchmark.uid1400600789397createFunction = function(window, t14006007893970) {
        
        var global = window,
            clearTimeout = global.clearTimeout,
            setTimeout = global.setTimeout;
            
        var r14006007893970, s14006007893970, m14006007893970 = this,
            f14006007893970 = m14006007893970.fn,
            i14006007893970 = m14006007893970.count,
            n14006007893970 = t14006007893970.ns;
        
        // Test Setup
        var makeKeyCodepoint = function(word) {
            var len = word.length;
            if (len > 255) {
                return undefined;
            }
            var i = len >> 2;
            return String.fromCharCode(
                (word.charCodeAt(    0) & 0x03) << 14 |
                (word.charCodeAt(    i) & 0x03) << 12 |
                (word.charCodeAt(  i+i) & 0x03) << 10 |
                (word.charCodeAt(i+i+i) & 0x03) <<  8 |
                len
            );
        };
        
        s14006007893970 = n14006007893970.now();
        while (i14006007893970--) {
            // Test Code
            var key;
            
            key = makeKeyCodepoint('www.wired.com');
            key = makeKeyCodepoint('www.youtube.com');
            key = makeKeyCodepoint('scorecardresearch.com');
            key = makeKeyCodepoint('www.google-analytics.com');
        }
        r14006007893970 = (n14006007893970.now() - s14006007893970) / 1e3;
        
        return {
            elapsed: r14006007893970,
            uid: "uid14006007893970"
        }
    }

The test setup and the test itself are all part of the same function, and makeKeyCodepoint is a local variable in that function.

chewxy · on May 20, 2014

variable and property lookups do go through different processes (and hence optimized differently).

variables are stored on activation records (a Context object in v8), while properties are stored in well, a magic hidden class type of thing (for v8).

The latter can be cached, the former not so much. Plus, the former also creates quite a bit of garbage, so gc should theroetically kick in more often

comex · on May 20, 2014

I would expect any local variables to be stored in registers once the optimizer kicks in. I bet there's a different explanation.

Stratoscope · on May 20, 2014

That is true, but I don't see how it explains the difference either. The function version and method version both reference a local variable, named makeKeyCodepoint or makeKeyCodepointObj respectively. The method version doesn't appear to have any fewer variable references than the function version.

thedufer · on May 20, 2014

The first two tests on that jsperf don't show the same behavior, though, and they differ in the same way.

tantalor · on May 20, 2014

A perf test without side effects is suspect because the compiler can remove dead code. You should add asserts on the return values.

gorhill · on May 20, 2014

I thought that was an interesting comment, I did wonder originally if this could be something like that, but didn't follow up.

So now I took the time to try to go around this by rearranging the calls, and all of a sudden results make more sense:

http://jsperf.com/makekey-concat-vs-join/10

Results:

1. Firefox 29 makeKeyConcat / makeKeyConcatObj = ~440 Mops/s

2. Firefox 29 makeKeyCodepoint / makeKeyCodepointObj = ~64 Mops/s

3. Chrome 34 makeKeyCodepoint / makeKeyCodepointObj = ~5.7 Mops/s

4. Chrome 34 makeKeyConcat / makeKeyConcatObj = = ~2.2 Mops/s

bzbarsky · on May 21, 2014

For what it's worth, as far as I can tell SpiderMonkey is still more or less optimizing away the makeKeyConcat / makeKeyConcatObj testcases in Firefox 29, and all of them on trunk. I bet it's inlining the functions, discovering the arguments are constant strings and hence the return values are constant, constant-folding the if conditions, etc...

I tried to work around that in http://jsperf.com/makekey-concat-vs-join/11 but clearly that's not good enough.

Microbenchmarking a decent optimizing compiler is hard; it will typically be able to figure out that the ubench is pointless and optimize it all away...

tantalor · on May 20, 2014

Cool! Now it looks like the object case is slightly slower, which is exactly what I'd expect.

mike-cardwell · on May 20, 2014

On my 64bit Linux Firefox 29 desktop:

  codepoint    : 1,172,182,887 ops/sec
  codepoint obj: 1,168,116,461 ops/sec

No significant difference.

acdha · on May 20, 2014

Chrome 34:

    function:      5,572,574
      method:    903,375,064

Firefox 29:

    function:  1,747,475,085 
      method:  1,727,244,041

bzbarsky · on May 20, 2014

Any time you see jsperf numbers over about 1e9, that means that the testcase was compeletely optimized out by the JIT (presumably because it has no detectable side-effects so got dead-code eliminated). 1.7e9 iterations per second on typical modern 3Ghz hardware means 2 clock ticks or less per iteration. That's about enough time to increment the loop counter and compare it to the loop limit and nothing else.

Stratoscope · on May 20, 2014

Indeed, if you add a completely empty test case, it is only a tiny bit faster than the method version that appears to have such high performance. (jsperf doesn't allow a completely empty test, but you can use // to fool it.)

thedufer · on May 20, 2014

As bzbarsky points out, tests in the realm of 1e9 op/sec look like the entire function is being optimized away because there are no side effects. Something about the method allows it to do this, while it doesn't think its okay for the function version.

One thing I found out is that dropping `String.fromCharCode` in favor of a local function (`var fromCharCode = String.fromCharCode.bind(String);`) causes neither of them to be optimizable. See http://jsperf.com/makekey-concat-vs-join/9

ciupicri · on May 20, 2014

On Fedora 20 x86_66 with midori-0.5.8-1 (webkitgtk-2.2.7-1):

   concat    | concat obj | codepoint | codepoint obj
   ----------+------------+-----------+--------------
   2,243,214 | 1,983,801  | 1,823,882 | 1,746,316

On Fedora 20 x86_66 with epiphany-3.10.3-1 (webkitgtk3-2.2.7-1):

   concat    | concat obj | codepoint | codepoint obj
   ----------+------------+-----------+--------------
   2,515,750 | 2,280,291  | 2,187,448 | 1,957,199

vishal0123 · on May 20, 2014

If this is not enough try http://jsperf.com/single-vs-multiple-times-2. Running function 4 times is faster than running single time.

amalcon · on May 20, 2014

Which JS VM are you using to test this? That matters a lot for this sort of thing.

I ran it past the one in my browser (current Firefox, Linux) and didn't see a significant difference.

acdha · on May 20, 2014

Try Chrome 34 – the difference is massive. What's interesting is that the best result in Chrome is 45% slower than the worst result in Firefox 29 so it's probably a question of why v8 is failing to JIT the first 3 versions.

gorhill · on May 20, 2014

Argh sorry, forgot to mention the browser. It's Chromium 34/Linux 64-bit.

NDizzle · on May 20, 2014

Interesting. Chrome 34 here as well, and the scores from the jsperf link are 1.6M, 1.6M, 6.3M, 960M.

SixSigma · on May 20, 2014

They are not exactly the same ergo they are different.

gorhill · on May 20, 2014

The "two pieces of code" I am referring to are obviously the body of the function and method. (Following your comment I had to look again, I thought I missed something).