That's a very clear and accessible explanation of what big O complexity is all about.
But (and that's a pretty big but), 'Big O' is not all there is, and once you've picked your algorithm based on the one that has the best expected runtime based on 'Big O', you really have to try to make sure that:
- your algorithm is executed with the lowest possible frequency
- you concentrate on those pesky n's that you eliminated during analysis to
make sure that you don't end up wasting all your CPU time on some
little detail somewhere
- you take in to account the effects of your code on caches and virtual memory
- you profile your code afterwards to make sure that all your assumptions regarding
the above are correct
It is very easy to pick the 'right' algorithm and still get crappy runtime if you skip those steps.
Over the years I've spent a lot time of profiling and optimizing code and one thing I've noticed is that most developers intuition of where the bottlenecks are in their code is usually completely wrong.
So if you are going to optimize, do it on the basis of cold empirical data, not intuition.
Even though Winograd's algorithm (matrix multiplication) is theoretically faster than Strassen's, the constant is so high that Winograd's is only faster in matrices so large you can't practically compute in the first place.
Quicksort is a more familiar example. Although its worst case complexity is O(n^2), many techniques have been invented to avoid the worst cases and execute in just O(n log n) (average case), and it's usually faster than merge sort in practice.
Technically, you can't ever guarantee that you'll avoid the worst case of quicksort. All the techniques to make it nlogn are probabilistic, and you can always engineer a pathogenic input that will make quicksort run quadratic. Mathematically this is expected, but I'm not disagreeing that quicksort is in practice even faster than the true nlogn comparison sorts.
I don't consider this a limitation of big O notation at all, but rather a common misconception among students when they first learn about the notation.
Is it true that quicksort is worst-case O(n lg n) if you always select the median? What if there are a lot of duplicate values at the median? My understanding has always been that for any quicksort algorithm on given hardware you can construct a pathogenic input that will be sorted in quadratic time.
Hardware is irrelevant. The analysis is mathematical and abstract. The model that people typically work in when doing this analysis is the comparison model: one < comparison costs 1 (http://en.wikipedia.org/wiki/Comparison_sort). There are other models that more closely resemble computers. For example, in the cell probe model (http://www.itl.nist.gov/div897/sqg/dads/HTML/cellProbeModel....) you have a word size (e.g. 32 bits, but the exact number is irrelevant) and you can construct radix sort in that model.
If all values in your set of numbers to sort are unique, then quicksort is indeed O(n lg n) worst case if you pivot about the median. If there are duplicate values, can you think of a way to still guarantee worst case O(n lg n)?
Err, no it doesn't. Big O is a tool for saying how quickly an algorithm runs based on the size of n, and it's usually used to measure real things, like what will happen when the filesystem has to handle 100,000 files, then 1,000,000, etc.
Nope. You ignore that the constant factor is removed (i.e. all factors but the one that is dominant for n going to infinite).
More precisly, it tells how the relative speed difference between 2 algorithms change when n goes to toward infinity. An O(n) will win over O(n^2), but the latter case might have a huge constant factor that makes it slower than the former for e.g. n <= million.
This is true for O(n) but not for o(n).
f(n) is the class of O(g(n)) of g(n) if g(n) dominates f(n) for sufficiently large n. o(g(n)) strictly dominates f(n), though, no matter how small n is.
Big O notation, or Landau notation, is not just O(g(n)). It's also o(), Ω(), ω() and Θ(). It's understood, though, that n is considered to usually be rather large - another name for this notation is, after all, "asymptotic notation." In any case that you're really worried about speed, you should probably be calculating the speed of your algorithm directly rather than using mere asymptotic generalizations.
Aside: I can't think of any algorithms with huge constants like you described; in theory, you're correct, but in practice, the asymptotic generalizations apply for n <= 1000 or even often 100.
That explanation is not right either (at least I don't find it intuitive). It's not about the size of n at all, but about how the running time changes when n changes.
Actually, it's not meant for making estimates, it's meant to compare algorithms. You can't look at a single quadratic algorithm and try to estimate how long it will take to run on a certain input and certain hardware. That would be pointless, because any quadratic algorithm would result in exactly the same estimate.
I read his blog post (the entire one, which is unusual for me) and he writes, early on, that Google discovered him via his prolific presence on Stack Overflow.
Understand the effect of the constant factor hidden within the Big O. Often a O(n) algorithm is faster than a O(1) algorithm, because n is too small. E.g. arrays vs hashmaps.
Make sure what n is in each case. For example a graph algorithm with O(n^2) and n being the number of nodes may actually be O(n) for n being the graph size (number of edges).
But (and that's a pretty big but), 'Big O' is not all there is, and once you've picked your algorithm based on the one that has the best expected runtime based on 'Big O', you really have to try to make sure that:
It is very easy to pick the 'right' algorithm and still get crappy runtime if you skip those steps.