Three JavaScript performance fundamentals that make Bluebird fast

macspoofing · on Sept 5, 2016

Interesting article now take these suggestions and toss them out the window. Architecture, correctness, maintainability and readability of code are much more important than optimizing for these extreme edge cases.

If there are performance issues, profile and refactor only the affected areas. Most of the time your code isn't going to get exercised enough to make any difference. You're also fighting JIT assumptions, since all VM devs will attempt to optimize against common patterns - so though these performance 'bottlenecks' may be true today, that may not be the case tomorrow.

Anectode: I had a co-op student preemptively apologize for using a double for-loop brute force search as non-optimal and not 'clean', during his code-review. The lists in question were guaranteed to have a size of 5, and that code was executed once per life-time of the app.

dimal · on Sept 5, 2016

That's not a fair reading. I don't see anywhere that the article makes a any suggestion about when to optimize, it only suggests techniques that you could presumably use when you actually need to. Just as it's silly to prematurely optimize, it's silly to toss these suggestions out the window. If you hit a point where you need to optimize, these suggestions can be very useful.

> You're also fighting JIT assumptions, since all VM devs will attempt to optimize against common patterns

Optimization nearly always means that you're optimizing against a particular implementation, which may not be the same in the future. But your app lives today, in today's world. If you actually do hit a bottleneck, should you throw up your hands and say, "Don't worry, I'm sure it won't matter in a few years?" Of course not. Just as you shouldn't prematurely optimize, you should be willing to do it when necessary. Having some knowledge about what might work is definitely helpful.

bastawhiz · on Sept 5, 2016

In fairness, unless you only care about Chrome, there are many implementations already. Many of these types of optimizations do not necessarily have a net positive effect on performance across the board.

Rapzid · on Sept 5, 2016

Or node.js running V8...

macspoofing · on Sept 5, 2016

>That's not a fair reading. I don't see anywhere that the article makes a any suggestion about when to optimize, it only suggests techniques that you could presumably use when you actually need to.

Then you don't have a fair reading of my comment. I didn't say to never use it. There may be time-sensitive areas that perform a lot of work, very often. In those cases, go nuts, squeeze out every ms per frame you can. In the vast majority of cases, don't bother. Focus on maintainability and architecture and choose the right data structures (that's where your bottlenecks are going to be). Don't bother with low level details.

huckyaus · on Sept 6, 2016

> I didn't say to never use it.

"now take these suggestions and toss them out the window" seems pretty unambiguous. dimal's reading of your comment was entirely fair, I think.

Klathmon · on Sept 5, 2016

Re: your last paragraph.

I tend to do something similar in the form of comments. I'll use what is most likely horribly inefficient (but quick and easy to reason about) way of doing things, then i'll throw a comment in the docblock about how it's probably pretty inefficient.

My reasoning is that i've had parts of a program that were very lightly used become very heavily used as the program evolves, and taking a few seconds to comment that this is a low hanging fruit if i need it has helped future-me at least once.

As with most things, it's not a hard "rule", just something I'll do when it makes sense.

mattmanser · on Sept 5, 2016

Why bother? The comment is a waste of time. Performance profile it and go from there.

If you're doing anything other than using a performance profiling tool on real code with vaguely realistic data you're completely and utterly wasting your time taking stabs in the dark.

Leaving comments that will become obsolete and misleading as the code evolves seems just as bad as pre-optimizing code.

I've been working on a project at the moment where exactly what you've suggested was done, but they'd used todo comments. Last winter they'd had to up to an azure large SQL instance and it'd still max out DTU every now and again (DTU being MS's db performance measure on azure). Client was hoping I might fix a couple more problems as the volume is expected to triple or more this winter.

In 6 days of simply running a local profiler I found and fixed 3 major performance problems, none in any way related to the pointless comments claiming X method needed optimising, none of which were even 1/10000th of the program's execution time. One of those problems was even caused by a previous "optimisation". I confirmed the problem spots by checking the web logs and avg page response times to confirm production was having the same problems.

The site now potters along on a small SQL instance never going above 5% DTU, the worst pages are all being served 5 times faster.

Done the same thing to a site with slow rendering (js problems). Chrome's profiler is good enough.

If you don't know how to use your language's profiler, learn, now.

Klathmon · on Sept 5, 2016

Don't get me wrong, I do use the profiler a ton, and it's the first thing i reach for when i start noticing issues, but having that little comment about why i did it that way lets me relax a bit. I'd never mark it as a TODO or something that would clutter a tool up with crap, but having it there could help.

A comment out of something I did recently where it's a loop that iterates over an array that's expected to be a few hundred long, and each iteration loops over the array again:

// This is O(n^2) out of pure laziness. Might cause issues if the array gets really big.

It's not hurting anyone, took like 10 seconds to write, and if it ever becomes a problem, the first thing i'll see is that in the function docblock which lets me (or someone else) know that there is no technical reason it needed to be O(n^2).

tlarkworthy · on Sept 5, 2016

Profilers help identify hot spots, but don't really explain the dynamic behaviour of why something is a hot spot. So comments saying that something is O(n^2) when it could be O(n) is very helpful when you see the inner function hogging 90% of resources. Profilers don't get complexity classes.

mattmanser · on Sept 6, 2016

Not really, comments like this are often wrong or misleading. After all, the programmer hasn't actually tested anything, they're just making assumptions. Otherwise they'd have actually fixed it.

It's just pointless noise.

A profiler will tell you that a particular function was run 10,000 times costing 0.5ms per run. You're not going to have a problem tracking down the problem, the comment won't help.

macspoofing · on Sept 5, 2016

>// This is O(n^2) out of pure laziness. Might cause issues if the array gets really big

It just feels lazy. You're passing the buck to another dev who may be working on the code, weeks, months or years down the line. What are they supposed to do about it? Randomly rewrite working code and risk introducing a bug?

If you think the input might reasonably scale to a size that will cause an issue then do some preemptive defensive coding and fix it. Otherwise don't. Stand by your decision. TODOs and FIXMEs tend to linger and get ignored.

If I'm looking for performance problems, I'm not scanning for source code TODOs. I load a profiler and I'll see which methods take too much time.

Klathmon · on Sept 5, 2016

>It just feels lazy. You're passing the buck to another dev who may be working on the code, weeks, months or years down the line. What are they supposed to do about it? Randomly rewrite working code and risk introducing a bug?

Yes. If it's a performance problem weeks, months, or years down the road, the developer then is supposed to rewrite it, possibly introducing bugs, but also solving the performance problems that manifested in the past weeks, months, or years.

>If you think the input might reasonably scale to a size that will cause an issue then do some preemptive defensive coding and fix it. Otherwise don't.

I don't know about you, but I'm not omniscient. I can make some pretty good guesses, but I can't predict the future. Optimizing the code now to be super efficient when I'm pretty sure it won't be necessary would be a massive waste of time. But on the off chance it is necessary, I'd like to give a hint to myself (or another dev) about why I did what I did, and give some assurances that it's a place that can be pretty easily optimized if needed.

>If I'm looking for performance problems, I'm not scanning for source code TODOs. I load a profiler and I'll see which methods take too much time.

Again, I'm not leaving TODOs I'm leaving comments explaining the code a little. But I completely agree with that whole statement, and it's exactly what I do as well. But when I get to a method that is taking too much time, I can now see a small comment saying it's running in O(n^2) and there is no technical reason why it needs to. Now I can spend less time checking if it needs to run that slowly, and can get right into fixing it.

The handful of bytes taken up by that comment, and the few seconds it took me to write it isn't going to make anything worse for anyone, and it might make life easier for someone in the future, so i'll do it.

macspoofing · on Sept 5, 2016

>I can now see a small comment saying it's running in O(n^2) and there is no technical reason why it needs to.

What technical reason?!!? That was my point! In your comment there is an implicit assumption that an O(N^2) algorithm is a potential problem simply because it is an O(N^2) algorithm. That's just wrong. Would you put a comment like: "This is O(n). Might cause issues if the 'c' gets too big"? It's useless. __IF__ it is a potential problem, your laziness simply created a time-bomb that will blow up in somebody's face later. Your comment isn't helping anybody.

For example, when your product is new, you may have 10 users, taking 10 courses in various permutations. You write your naive O(N^2) search and everything looks great, and you put your comment in. Everything is great for months, then you hit it big, and now you have 100 courses and 10,000 users and now you're dead. Thanks for your comment, it was really helpful!

Devs tend to be allergic to O(N^2) because it is beaten into us in school and during interviews. On the one hand, that's good, it gets people thinking about runtime performance. On the other hand, if you just have a shallow view of it it's a crutch. Those same people may for example, wrap that algorithm in an a lambda that has an internal time-complexity of O(N^2) anyway (but at least you don't see the double for-loop) or use a chain of lambda calls, getting their O(N) but nevertheless losing performance because of constant costs around execution and allocation.

>and it might make life easier for someone in the future, so i'll do it.

That's where I disagree. It really doesn't.

Klathmon · on Sept 6, 2016

>For example, when your product is new, you may have 10 users, taking 10 courses in various permutations. You write your naive O(N^2) search and everything looks great, and you put your comment in. Everything is great for months, then you hit it big, and now you have 100 courses and 10,000 users and now you're dead. Thanks for your comment, it was really helpful!

You seem to think that it's inevitable that my O(n^2) function will become a problem, it won't. I'm using it in the context of tasks in a build system. There is no way that any human is going to have 100,000 tasks, 100,000 independent targets. It's not going to happen, so optimizing this now is completely pointless. The time savings on the CPU would literally never add up to the amount of extra time i'd spend on it.

But in the future let's say that the platform pivots and the dependency mapping system gets pulled out and used for something else, or we switch to another architecture where each file involved in the task becomes a "task" itself, now the chance that there are 100,000 tasks is not only possible, but pretty likely, and that O(n^2) algorithm is now a problem. Switching to something like a depth first search of the dependency tree and some creative use of tags could probably do the same job in linear time, but it adds complexity to the code, takes longer to develop, will be harder to test, and will provide 0 benefit to the codebase as it is.

You can see how optimizing for a possibility that the whole system might change purpose and that code is still there is a silly rabbit hole to go down. (how would you ever actually produce anything?)

>>and it might make life easier for someone in the future, so i'll do it.

>That's where I disagree. It really doesn't.

Then there is no harm no foul. Nobody is harmed by the extra comment, and life goes on. There isn't a scenario where a one-line comment explaining why the code is the way it is causes anyone an issue, so there's no reason to not do it.

But the point of the code is to give assurance to the future developer that the only reason why the code is that way is because there was no reason to optimize it that way at the time. On the other hand, if the algorithm is required to be O(n^2) (like in a maximum matching graph), then i'd put a comment explaining that it's needed. Again, it might be entirely useless (in which case, nobody loses anything), or it might help the dev skip over that in the preliminary searches and look deeper for where any possible speedups can be found, saving time and money.

jononor · on Sept 5, 2016

Same, I'll throw in "// PERF: foo" comments for future. I like it because don't have to think about it anymore if it is written down: I've made a choice to not prematurely optimize it, documented this fact and any potential nugget of insight I might have had that could maybe help in the future. The peace of mind / keeping in the flow is the most important.

macspoofing · on Sept 5, 2016

>I like it because don't have to think about it anymore if it is written down ... The peace of mind / keeping in the flow is the most important.

Exactly. It's lazy. Let the guy down the road deal with it.

drtz · on Sept 6, 2016

One coder's "lazy" is another Engineer's tradeoff of efficiency for design simplicity and development resources.

yid · on Sept 5, 2016

> and taking a few seconds to comment that this is a low hanging fruit if i need it has helped future-me at least once.

The problem with "fruit" like this is that it depends heavily on the exact version of the JS runtime being used. If the next iteration of V8 introduces an automatic optimization or (worse) de-optimization for your low-hanging fruit, you're left with sometimes odd-looking code for a behavior that no longer exists.

As we've learned from the history of C++, if you depend on undefined/undocumented behavior quirks of a compiler or runtime, you're going to have a bad time. It's better to bump micro-optimizations like this down to a lower-level language. At best, you're buying a constant factor in whatever scaling curve you're trying to beat.

Klathmon · on Sept 5, 2016

I'm not talking about JIT tricks or JS engine quirks, but more like "Heads up, this function is O(n^2) because i'm lazy"

rektide · on Sept 6, 2016

I strongly tend to think the hard part is community, making a practice. I believe that we could optimize around the kind of practice spoken here- having flyweight objects, with little depth to them, few captured functions- believe is possible and doable. It's a very mixin style architecture, but more about structures than what a lot of focus is- methods.

We are struggling to figuring out the answers to what you are challenging here- finding broad architecture, evident correctness, general maintainability, patterned readability- it's all hard. I tend to think the actual decisions factor in little though, that only at Facebook scale to mixins- particularly data-centric ones, rather than function centric ones- become hard to practice well, intra-organizationally.

And different patterns will have some different balance. But overall I think it's mostly exposure- the good, advantageous patterns need leadership and example for them. Most advantages are portable, but take practice to borrow. Having your flyweight have a length property on it is not going to be a serious barrier to any of these ideas- architecture, correctness, maintainability, reliability. your side.

I really really like the ideas suggested in this article- use a flyweight context, don't capture functions, don't nest. It's fast, bleedingly fast JS, and many folks could emulate and practice this, and not complicate their lives, and spread a practice that is fast.

pizlonator · on Sept 5, 2016

As a VM dev what I find intriguing about these three optimizations is that they are unlikely to be obviated by any VM optimizations anytime soon.

- VMs will indeed optimize the heck out of object allocations, but I think that hoisting out functions to reduce the rate of allocation is always going to be some kind of win. It definitely reduces work on the front-end, so if your app has a lot of code, it's a great idea. Remember, the VM may optimize away allocations once your code gets super hot and you end up in the optimizing JIT - but that doesn't happen for a lot of web framework code since it's only used for brief interactions. Avoiding allocations in that code is still worthwhile, because the more of that you do the lower the probability of a GC during page load or other important interaction. I'm surprised that function object allocation in particular was a hot spot for them, because I've struggled to find benchmarks where this is a huge deal. But I still think that optimization of theirs is a solid idea and it will continue to be a good thing for as long as I can foresee.

- Rolling your own bitfields is always a win. The proof burden for the VM to do this optimization for you is very high, so it's totally not like register allocation or some other optimization that reliably Just Works. If a VM did infer that a cluster of boolean fields could be compacted into a bitfield, then this optimization would likely either happen late (so it wouldn't benefit short-running code) or it would incur early costs (so the presence of the optimization would penalize all code, including code to which the optimization does not apply). WebKit can infer that properties are boolean, but it won't build a bitfield out of your properties because of these concerns - the benefit is low and the cost is high. Also, lets not forget that this optimization is still not done by much more mature languages like C++. Last I checked, JVMs also didn't do this for you, because concurrency means that this optimization would be a perf fail in the common case. So, if you're writing code and you see an opportunity for a bitfield, please roll it yourself! That's a great idea and it will continue to be a great idea for a long time to come!

- The no-op function trick is amazeballs. This isn't working around the VM - this is working with the VM to get the VM to perform some code specialization for you. I would equate it to using templates in C++ to specialize your code for two distinct cases (bad thing doesn't happen vs bad thing does happen, for example). Except here, you can specialize a lot of different things simultaneously. At least in WebKit, this will scale because even if you start overwriting the no-op functions, we'll recompile with exponential back-off, so the worst case isn't bad. Also, the back-off means that if you overwrite multiple no-op functions at a time, it will probably only cause one recompile because we'll wait before recompiling. The no-op function thing is just smart hacking, and I'm super happy that people are doing this in JS. Also, I'd be curious to hear exactly how anyone would think that this is a worse idiom than any other even ignoring perf. The hook that starts empty and is replaced with non-trivial functionality feels to me like an idiomatic use of the functional part of JS.

albinofrenchy · on Sept 6, 2016

I'm finding it hard to believe that the bit field thing is ever really worth it. Even if you could collapse a dozen booleans into a single bit field, will this have a measurable impact?

The other two optimizations I could see either having no impact on code maintenance, or maybe even improving it. The bitfields thing though I'd have to see super compelling benchmarks to be sold on.

pizlonator · on Sept 6, 2016

Imagine an object with 10 Boolean property. In WebKit, you pay 8 bytes per property. That's 80 bytes, plus the 16 bytes for the object header, so 96 bytes total.

Now imagine the same object with a bitfield. That's one property (it'll happily hold 32 bits without hacks). So, with a header, that's 24 bytes.

That's a 4x savings. This will add up quick if you allocate the object frequently.

the_other · on Sept 7, 2016

How much does the machinery that makes this usable (the accessors and consts) eat into your "4x saving"?

pizlonator · on Sept 7, 2016

The savings I'm referring to are space savings and reduction in GC frequency. The one-time cost of allocating accessors in a prototype somewhere, and having some consts in global variables, is going to be insignificant by comparison to the accumulated per-object space wins.

But maybe you're talking about speed. After warm-up, accessors and consts carry no cost. Before warm-up, it's probably still a net win to have bitfields.

After warm-up, in WebKit at least, accessors will get inlined and the compiler will be smart enough to constant-fold references to global variables (or almost any other kind of variables) that were effectively constant.

As for before warm-up, it depends on how frequently the object in question is allocated. Accessing through an accessor before warm-up carries some cost, but on the other hand, allocating 4x fewer bytes could be more important. It depends on the application. If your app uses too much memory to run reliably on some devices, then you'll probably be fine with losing some perf before warm-up in exchange for being able to not run out of memory. Also, GC is expensive and it is triggered by the combination of how much you allocate (in bytes) and how much is live at any time (in bytes). So reducing the sizes of objects reduces the amount of GCing you do, which causes your program to run faster. Probably in lots of large apps, the speed-up from GCing less will be greater than the slow-down from accessor calls during warm-up.

samth · on Sept 5, 2016

- I'm surprised that no current JS VM does lambda lifting (which is the transformation being done manually here). The analysis is easy, the wins are big for function-heavy code, and the tech is easy and well-known. JSC should try it! ;)

- I was just last week sketching such an optimization, but in a more complicated case, and with types to guide you. Seems very hard to do in JS.

pizlonator · on Sept 6, 2016

We do most of lambda lifting. The part we don't do is to address the observability of function identity:

function foo() { function bar() { } return bar; } foo() !== foo() // this must be true

Therefore, we will at a minimum allocate a function object that just says "hey man I'm not like any other function!".

We could avoid the allocation if:

- We proved that no identity operations ever happen. That's hard!

- We used some other UID system to identify different functions. But that would need a GC anyway!

samth · on Sept 6, 2016

Proving that identity isn't needed here is easy: if the function is only used in a first-order way (ie, it's always called) then the identity can't possibly matter. That should work in most cases that matter.

pizlonator · on Sept 6, 2016

You're right, it's easy in strict mode. I hadn't realized this before!

In sloppy mode, there's function.arguments. You can walk the stack to find the damned function instance.

samth · on Sept 8, 2016

In that case, the only identity has to come via function.arguments. Could you therefore allocate the identity lazily, since there's nothing else to compare to?

pizlonator · on Sept 11, 2016

Yes, you could. I want to implement this eventually.

gonyea · on Sept 5, 2016

Yes, yes, yes, yes. I am SO TIRED of seeing pre-optimizations from people who should know better.

These optimizations are a neat FYI. But they do not belong anywhere near the business logic of a production app. If you have a massive app (doubtful) with bottlenecks that warrant these optimizations (doubtful), extract them into a library that is isolated, documented, and tested. Business logic should describe the needs of the business in the clearest and simplest terms.

zokier · on Sept 5, 2016

Did you read the article? It describes optimizations applied inside a relatively low-level building block library, not anywhere near business logic. Heck, the author goes as far to say:

> I apply it in libraries and platform code because applications that I don't control or will not know about will depend on them and therefore I cannot make the assumption whether performance will matter or not for them. Everyday applications I work on typically don't have tight performance requirements so optimization is less prioritized

gonyea · on Sept 6, 2016

Yes, I did. There's a lot of people who will ignore his caveat and try to use these optimizations where they don't belong.

I see it a lot and it's painful. Hence the "throw it out the window" advocacy.

pjmlp · on Sept 6, 2016

I see this a lot in the C community as well.

Never had a use case of program being too slow due to the better type safety of the language being used.

And on the few cases it actually mattered, like 1% of the code base after profiling, only there was the safety level turned off if no other means of optimization were possible.

Apparently some programming language cultures are bound to the "micro-optimize every code line" cargo cult, even if it doesn't matter the difference to the use case at hand.

tehlike · on Sept 5, 2016

premature optimization is the root of all evil.

mpweiher · on Sept 5, 2016

The actual quote is a bit different:

"We should forget about small efficiencies, say about 97% of the time; premature optimization is the root of all evil."

It goes on:

"Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified."

Just before that section:

"The conventional wisdom shared by many of today's software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by penny- wise-and-pound-foolish programmers, who can't debug or maintain their "optimized" programs. In established engineering disciplines a 12 % improvement, easily obtained, is never considered marginal; and I believ the same viewpoint should prevail in software engineering. Of course I wouldn't bother making such optimizations on a one-shot job, but when it's a question of preparing quality programs, I don't want to restrict myself to tools that deny me such efficiencies"

From "Structured Programming with go to statements" Knuth, 1974.

tehlike · on Sept 5, 2016

I think my emphasis was still fair, even though it is probably a misquote.

There are usually many bottlenecks in any given system, be it lack of proper parallelization, waiting on xyz while you could use that opportunity to do other tasks and so on that language tend to account for small part of these.

Sure, if you are hitting the limit of that kind of optimizations, language could be the next thing you want to use to optimize further, but usually systems are far from that.

pizlonator · on Sept 5, 2016

Wow, the function overwriting trick is really great. WebKit is likely to inline the no-op function in the third and fourth optimizing tiers, so if the rare thing does not happen, the no-op hook call is basically free. Cool!

I wasn't aware that function object allocation was such a big deal. It's a big deal if the function is used as a constructor, but other than that, function objects are some of the cheapest objects we allocate, at least in WebKit. Also, WebKit's fourth tier will eliminate nested function allocations if all of the calls to those functions are inlined. From helping with that work, I know that the cost of function allocation is small because even in the best case, the speed-ups from this optimization weren't earth-shattering.

Pelam · on Sept 5, 2016

I always appreciate when libraries and tools are carefully engineered for performance, provided the abstraction holds and the thing solves the problem it is supposed to.

It seems to make sense to invest the most energy in optimizing and otherwise improving the tools that are more general and used widely. In which case this library clearly fits the bill.

On the other hand in application code, possibly while still experimenting or developing ideas it makes most sense to aim for maximum readability and simplicity. Optimization can be left for when there is a proven need for it.

tonyle · on Sept 5, 2016

He also has a nice page on his github with lots of info on making stuff fast on node.

https://github.com/petkaantonov/bluebird/wiki/Optimization-k...

romanovcode · on Sept 6, 2016

I don't get the obsession with benchmarking and making node "faster".

If you want your application to be blazing fast why pick relatively slow language like JS in the first place?

qwertyuiop924 · on Sept 5, 2016

Okay, so we're supposed to avoid function allocations like the plague, and using bitfields to represent sets of data. They 80's called, they want their optimizations back.

Using bitfields, in particular, is a heavy sacrifice in clarity for relatively few wins.

As for function allocations, internal functions can be hidden by binding the actual function to the return value of a closure that instantiates both the main and internal functions, and returns the main. You can't get rid of the cost of functions in the general case, though: a function's lexical environment has to be recreated each time the function is called (otherwise, all the variables are what C calls "static"), and JS's object semantics mean you can't just make one function, and rebind it to the appropriate parent lexical environment at call time (I have no idea if that's a useable optimization, but it occurs as a way to cut down on allocations) although you could bind the function code to multiple function objects with different parent lexical environments, which would largely accomplish the same thing (but now I'm just thinking out loud).

If you really need these optimizations, go ahead, use them, but don't take these tips unreservedly, and use them in your next project.

userbinator · on Sept 6, 2016

Using bitfields, in particular, is a heavy sacrifice in clarity for relatively few wins.

Only to those who don't understand bitfields, which IMHO means they don't even understand the basics of how a computer works, let alone how to effectively program one. JS is a high level language, but it still has bitwise operators for a reason.

albinofrenchy · on Sept 6, 2016

I disagree. Bitfields of somewhat unrelated fields is a sacrifice in clarity even if you understand bitfields perfectly well -- I'd argue anything requiring boilerplate sacrifices something in the way of being able to understand a code base.

qwertyuiop924 · on Sept 6, 2016

I disagree. Bitfields mean that you have to do a lot of boilerplate operations which obscure the actual meaning of what you're doing, and can confuse even those who know what's going on, as an error can creep in easily.

catnaroek · on Sept 5, 2016

Aren't most of these “optimizations” things that a decent programming language implementation should automatically do for you in 2016? (Lambda lifting, using efficient representations for variant types, etc.) Rolling them by hand is awfully low-level, and completely defeats the point to using a high-level language.

---

I can't properly reply to your replies, because “I'm submitting too fast”, so I'l reply here.

@leppr: I'm not denying the importance of these optimizations. I'm saying that they're being performed by the wrong programmer, or rather, by a programmer performing the wrong role (language user, rather than language implementor).

@pizlonator:

> Avoiding allocations is generally good advice. Compilers have a hard time eliminating them

The most effective way to avoid allocations is to make the language value-oriented (where values are merely represented in physical memory, but not identified with those representations) rather than object-oriented (where the physical identity of every object matters). An implementation of a value-oriented language is free to store the entire representation of an arbitrarily large compound value in a single memory block. By comparison, the gross amount of micromanagement shown in the article is a lot of effort, whose benefits have only limited local significance.

> Using bitfields is a classic trick that is harder to do for the implementation than it is for the user.

Then provide an abstract bitfield type as a library, and let that type be used by everyone else as if it were a primitive type.

> Finally, I love the no-op function trick. This is not a low-level optimization at all!

It is. What you really want here is an abstract type of lazy suspensions. Then, again, that type can be used by everyone else, without caring about its internal implementation.

@macspoofing

Wholeheartedly agree. I have no idea why you're being downvoted.

leppr · on Sept 5, 2016

>Rolling them by hand is awfully low-level, and completely defeats the point to using a high-level language.

The problem is that you can't have "compiler/interpreter plugins" on the web, you have to use the language itself in order to extend it. This isn't React, Angular or even jQuery we're talking about, it's a library implementing a "fundamental" language feature that may be used tens of thousands of times in an app. This is why these kind of extremely low level optimizations matter for the makers of such a low-level library.

pizlonator · on Sept 5, 2016

I don't think this article is rolling anything too low-level by hand.

Avoiding allocations is generally good advice. Compilers have a hard time eliminating them for you and so you can't be guaranteed that the compiler will eliminate a pointless optimization. Any allocation that isn't eliminated brings you one step closer to a GC, and it makes sense to optimize GCs away.

Using bitfields is a classic trick that is harder to do for the implementation than it is for the user. If an implementation wanted to turn a bunch of boolean fields into a bitfield, it would have to first prove that they are all boolean, then it would have to recompile all accesses to those fields into bitfield accesses, and then it would have to walk the whole heap to make sure that there weren't any more objects that used the old shape (separate fields) instead of the new one (bitfield). You could skip the last step if boolean fields were always eagerly represented as bitfields (which would surely slow down start-up time, something that JS implementations don't want to do because page loading is so important) or if you were happy to live with objects having both representations simultaneously (which would slow down every access). I'm guessing that a good implementation would wait for the next GC after realizing that a cluster of fields is proved boolean, and then it would reshape those objects while GCing. So, it would be one of those "maybe you'll get it if you're lucky" optimizations that make you look good on benchmarks but won't really help developers. Even if an implementation did have any version of this trick - even one that worked eagerly, there are so many parts to the proof burden for the optimization that the smart play for users would still be to do the optimization themselves rather than hope that the compiler gets it right.

Finally, I love the no-op function trick. This is not a low-level optimization at all! It's just a simple and clever way to structure your code that results in you getting to hook into the JS implementation's existing watchpoint optimizations: JS VMs will certainly set a watchpoint on a global function pointer that has been immutable so far, and they will definitely inline calls to that function via that global pointer subject to that watchpoint. So, by doing something that looks perfectly clean at a high level you are getting some really gnarly optimizations for free.

macspoofing · on Sept 5, 2016

Even if it doesn't, how much time is this actually saving? If you're doing heavy work on every frame ... maybe (and you'll have to profile to figure out if it's worth). If you're handling a user input event, or some IO event (like in the author's example) it makes no difference at all. It'll confuse some developer (possibly a junior or co-op tasked with clearing the bug queue) down the road, and probably will lead to a bug.

yxhuvud · on Sept 5, 2016

I don't understand how value oriented vs object oriented matter considering the allocations in question was functions. Functions are hard to optimize away due to the nature of functions being closures with pointers to the defining scope. Can you please elaborate what you mean?

catnaroek · on Sept 5, 2016

If the identity of a function object doesn't matter, then a compiler can automatically lambda-lift inner functions that don't capture variables, without introducing new arguments.

dalbin · on Sept 5, 2016

Are there people working on JS Engine ?

I would like to have more info on why JS create unnecessary new function objects if there are only called in the parent function scope.

vickychijwani · on Sept 6, 2016

The author answers this in the comments [1].

> Functions are also first class objects with observable identity, just like any other object. Identity can be observed in many ways, for example by doing an equality comparison or assigning a property. In the first example optimization is indeed possible as it's easy to see that the functions cannot be used for their identity. Optimizations have maintenance costs however and if similar simple scenarios are not common in the real world the optimization might not be worth implementing.

[1]: https://reaktor.com/blog/javascript-performance-fundamentals...

rastapasta42 · on Sept 6, 2016

Wouldn't bit-shifting to store booleans in 1 bit actually decrease performance, for minimum size improvement?

mannykannot · on Sept 7, 2016

When you have caching and also when you have dynamic allocation, size has its costs. Not every time and certainly not linearly, but more likely than the opposite.

z3t4 · on Sept 5, 2016

u have to measure and pick your battles. Im gonna try the int bool trick later to see if there are any gains.

marknadal · on Sept 5, 2016

I can attest strongly to these optimizations - and for everybody who is saying "don't worry about it" in these threads you would be surprised at how wrong/bad JS perf is without these.

Every function call in JS (even named functions that do not have to be created at execution time) have logarithmic decrease in performance (this means your 2nd function call can half your performance).

Using these techniques and others, we've been able to get some unbelievable results (30M to 50M ops/sec) for our JavaScript library. Check out this link for a podcast on how we did it as well as more information: https://github.com/amark/gun/wiki/100000-ops-sec-in-IE6-on-2... .

franze · on Sept 5, 2016

An article about performance on a 2.2 MB [1] page (~40% of this is JS[2]). With a start render event at about 6.6 seconds on 3G Fast / US [3].

The medium is the message.

[1] https://tools.pingdom.com/#!/cBcAKw/https://reaktor.com/blog...

[2] https://www.webpagetest.org/result/160905_P8_P45/

[3] https://www.webpagetest.org/result/160905_P8_P45/1/details/#...

Klathmon · on Sept 5, 2016

Would you feel the same way about a book on performance optimizations?

The medium says nothing about the message. The message isn't even about load time performance, it's about runtime optimizations.

tuxracer · on Sept 5, 2016

> The message isn't even about load time performance, it's about runtime optimizations.

There tends to be a fetishization of load time and time to paint performance on web, and gross neglect of run time performance and user experience once the web app loads. Some devs seem to forget someone is going to want to actually interact with this thing once it loads and 5fps needlessly jittery, janky UI (among other things) has contributed to the death of mobile web altogether in favor of native apps.

underwater · on Sept 5, 2016

Hand optimizing hot code paths is a completely different challenge than minimizing JS overhead for a web page.

Your complaint has very little to do with the content of the article. And, even if it did, it wouldn't make the tips in the post any less valid.