Plain english explanation of Big O

jacquesm · on July 16, 2010

That's a very clear and accessible explanation of what big O complexity is all about.

But (and that's a pretty big but), 'Big O' is not all there is, and once you've picked your algorithm based on the one that has the best expected runtime based on 'Big O', you really have to try to make sure that:

  - your algorithm is executed with the lowest possible frequency

  - you concentrate on those pesky n's that you eliminated during analysis to 
    make sure that you don't end up wasting all your CPU time on some 
    little detail somewhere

  - you take in to account the effects of your code on caches and virtual memory
 
  - you profile your code afterwards to make sure that all your assumptions regarding 
    the above are correct

It is very easy to pick the 'right' algorithm and still get crappy runtime if you skip those steps.

arethuza · on July 16, 2010

Over the years I've spent a lot time of profiling and optimizing code and one thing I've noticed is that most developers intuition of where the bottlenecks are in their code is usually completely wrong.

So if you are going to optimize, do it on the basis of cold empirical data, not intuition.

marte · on July 16, 2010

Some examples:

Even though Winograd's algorithm (matrix multiplication) is theoretically faster than Strassen's, the constant is so high that Winograd's is only faster in matrices so large you can't practically compute in the first place.

Quicksort is a more familiar example. Although its worst case complexity is O(n^2), many techniques have been invented to avoid the worst cases and execute in just O(n log n) (average case), and it's usually faster than merge sort in practice.

tshaddox · on July 16, 2010

Technically, you can't ever guarantee that you'll avoid the worst case of quicksort. All the techniques to make it nlogn are probabilistic, and you can always engineer a pathogenic input that will make quicksort run quadratic. Mathematically this is expected, but I'm not disagreeing that quicksort is in practice even faster than the true nlogn comparison sorts.

I don't consider this a limitation of big O notation at all, but rather a common misconception among students when they first learn about the notation.

vecter · on July 16, 2010

This is false. Quicksort can be made worst case O(n lg n) with linear time selection and pivoting around the median:

http://www.cs.princeton.edu/~wayne/cs423/lectures/selection-... http://en.wikipedia.org/wiki/Selection_algorithm#Linear_gene... http://ocw.mit.edu/courses/electrical-engineering-and-comput...

tshaddox · on July 18, 2010

Is it true that quicksort is worst-case O(n lg n) if you always select the median? What if there are a lot of duplicate values at the median? My understanding has always been that for any quicksort algorithm on given hardware you can construct a pathogenic input that will be sorted in quadratic time.

vecter · on July 18, 2010

Hardware is irrelevant. The analysis is mathematical and abstract. The model that people typically work in when doing this analysis is the comparison model: one < comparison costs 1 (http://en.wikipedia.org/wiki/Comparison_sort). There are other models that more closely resemble computers. For example, in the cell probe model (http://www.itl.nist.gov/div897/sqg/dads/HTML/cellProbeModel....) you have a word size (e.g. 32 bits, but the exact number is irrelevant) and you can construct radix sort in that model.

If all values in your set of numbers to sort are unique, then quicksort is indeed O(n lg n) worst case if you pivot about the median. If there are duplicate values, can you think of a way to still guarantee worst case O(n lg n)?

silvestrov · on July 16, 2010

Big O only says something about n = Infinite. But n is never infinite in computer programs.

Big O is a tool for making estimates, and like all other (such) tools, you have to know its limitations to use it properly.

edanm · on July 16, 2010

"Big O only says something about n = Infinite."

Err, no it doesn't. Big O is a tool for saying how quickly an algorithm runs based on the size of n, and it's usually used to measure real things, like what will happen when the filesystem has to handle 100,000 files, then 1,000,000, etc.

silvestrov · on July 16, 2010

Nope. You ignore that the constant factor is removed (i.e. all factors but the one that is dominant for n going to infinite).

More precisly, it tells how the relative speed difference between 2 algorithms change when n goes to toward infinity. An O(n) will win over O(n^2), but the latter case might have a huge constant factor that makes it slower than the former for e.g. n <= million.

And very often, n is smaller than a million.

whimsy · on July 17, 2010

This is true for O(n) but not for o(n). f(n) is the class of O(g(n)) of g(n) if g(n) dominates f(n) for sufficiently large n. o(g(n)) strictly dominates f(n), though, no matter how small n is.

Big O notation, or Landau notation, is not just O(g(n)). It's also o(), Ω(), ω() and Θ(). It's understood, though, that n is considered to usually be rather large - another name for this notation is, after all, "asymptotic notation." In any case that you're really worried about speed, you should probably be calculating the speed of your algorithm directly rather than using mere asymptotic generalizations.

Aside: I can't think of any algorithms with huge constants like you described; in theory, you're correct, but in practice, the asymptotic generalizations apply for n <= 1000 or even often 100.

edanm · on July 17, 2010

That's true in some cases, but in a lot of cases the difference does come out even when n is in the thousands/millions range

tshaddox · on July 16, 2010

That explanation is not right either (at least I don't find it intuitive). It's not about the size of n at all, but about how the running time changes when n changes.

tshaddox · on July 16, 2010

Actually, it's not meant for making estimates, it's meant to compare algorithms. You can't look at a single quadratic algorithm and try to estimate how long it will take to run on a certain input and certain hardware. That would be pointless, because any quadratic algorithm would result in exactly the same estimate.

steamboiler · on July 16, 2010

Interestingly the top rated answer was given by the gentleman who got turned down by Google

http://news.ycombinator.com/item?id=1520323

_zhqs · on July 16, 2010

I read his blog post (the entire one, which is unusual for me) and he writes, early on, that Google discovered him via his prolific presence on Stack Overflow.

gz · on July 16, 2010

Knuth's take: http://micromath.wordpress.com/2008/04/14/donald-knuth-calcu...

aj · on July 16, 2010

Really good and simple explanation. Elegant examples to illustrate his points as well.

beza1e1 · on July 16, 2010

Understand the effect of the constant factor hidden within the Big O. Often a O(n) algorithm is faster than a O(1) algorithm, because n is too small. E.g. arrays vs hashmaps.

Make sure what n is in each case. For example a graph algorithm with O(n^2) and n being the number of nodes may actually be O(n) for n being the graph size (number of edges).

erikpukinskis · on July 17, 2010

Apparently computational complexity is no longer the first thing that comes to mind when I read "Big O".

d0m · on July 16, 2010

It starts with: "The simplest explanation" - and it goes on and on for 5 pages.

jrp · on July 17, 2010

Simple is not short. A very short explanation could be given in formal mathematics but it would not be simple.

zandorg · on July 16, 2010

A friend of mine at University said the only true 'Big O' is Roy Orbison.

autarch · on July 16, 2010

That anime was really confusing. I'm not sure I could come up with a simple plain English explanation.

Was it a dream? A computer simulation? An alternate reality? I mean, really, wtf?