Big-O Misconceptions

btilly · on Oct 15, 2012

This is a worthy attempt, but the horse has already left the barn. Programmers everywhere understand big-O to mean what big-Theta should mean. And since O is easier to type than Θ and Θ is the more useful concept, this will not change.

Given that language is defined by usage, you need to figure out what usage the other person likely has, and work with that. I will happily use the correct notation with people who understand it, but don't bother complaining about incorrect use of the notation from people who will never have reason to care.

Homunculiheaded · on Oct 15, 2012

Just as important, nobody actually studying algorithms (where having this misconceptions would matter) actually has these misconceptions. It's completely fine for Big-O to mean one thing when chatting with a bunch of devs and another when sitting down for research.

The distinction between all the Big/Small letters is only really useful if you're trying to prove the bounds and something and proving an upper or lower bound is an easier place to start. Honestly practicing programmers shouldn't even be worried about these type of concerns since in practice there are some very small Ns and very large Cs that can make 2^n surprisingly workable and n surprisingly slow. Speaking in rough terms of 'it grows like this' is just fine.

tylerneylon · on Oct 15, 2012

I am curious to see how many HN-reading developers agree that big-O "really means" or "should mean" an expression of exact complexity (like theta does traditionally). I disagree.

There is an extremely active world of researchers and many (I believe) developers for which big-O definitely and clearly means "upper bound" and not the stronger theta notation. Basically all research papers in CS use O as an upper bound, and those, to me, are the definitive active voices. Beyond that, purposefully confounding O and theta is actually clashing different ideas together - ideas that have been separated for their utility. Often knowing the O notation requires some thought, and knowing theta is even harder; in practice what is most concerning is an upper bound, so that idea is important and useful. If the default object in language became exact complexity (not an upper bound), our lives would be much more difficult -- the practical thing to talk about would end up being whatever new notation replaced the idea of upper bound for complexity (aka the "traditional" big-oh).

sesqu · on Oct 15, 2012

I'm not sure I agree that Θ is more useful than Ο; rather, I think Ο∘E, the expected (time) complexity, is most useful.

Take quicksort for example (see misconception 4). Few people care that it's Ο(n²), and even fewer bother to ask if the implementation is Ω(1). It's not even Θ anything. Yet it's one of the canonical examples of complexity analysis, with the understanding that most people are really concerned with average performance.

btilly · on Oct 15, 2012

Ironic, you just illustrated misconceptions #3 and #4, while trying to educate me about why I was wrong to comment on misconception #2.

All of these notations are just notations for functions, and the functions can have any relation to reality that you want. Here are precise and accurate statements about quicksort using Θ notation. The average and best case performance of quicksort is Θ(n log(n)). The worst case performance of quicksort is Θ(n²).

I maintain that most decent hackers would make those same statements using O notation (which is true), but many would jump to correct someone who said that the expected performance of quicksort is O(n²). According to what is taught in CS, that "correction" is wrong. But given that language is established by usage, they aren't really wrong. And given that arbitrary upper bounds aren't very useful, the common usage makes sense.

Before disagreeing, tell me what value there is in the true statement that the expected run time of all major sorting algorithms is O(nⁿ).

scott_s · on Oct 15, 2012

I can tell you that all of the theorists I talk to say and write O, not Θ when discussing algorithms. I think this is because that the upper bound is generally more interesting, better understood, and more useful when reasoning. In fact, if one of them used Θ, my next question would be "Wait, do we even care about the lower bound?"

I'm not disagreeing with you, just pointing out it's not just hackers. And I think this behavior is because the lower bound generally isn't relevant.

sadga · on Oct 15, 2012

This is why you care about the lower bound:

If I say something is O(n^n)), you may come back and say "I wonder if it is O(n^2)

If you say Omega(n^2), I know not to bother.

scott_s · on Oct 15, 2012

In discussions with a theorist, they would never tell me that something is O(n^n) if it is not also Θ(n^n). My point is not on what people may say, but on what they do say. btilly made a point about programmers, not theorists.

I added the observation that even theorists - the people who are experts in this area - talk like that, and gave a possible reason why. My reason, which perhaps I did not communicate well, is that the lower bound is generally not of practical interest.

omaranto · on Oct 15, 2012

About: "In discussions with a theorist, they would never tell me that something is O(n^n) if it is not also Θ(n^n)". So you've never heard someone say, for example, that Strassen's algorithm shows that matrix multiplication for n x n matrices can be done with O(n^2.808) multiplications?

For some problems, the best known upper bounds don't match the best known lower bounds. Also, in cases where there are further improvements in the upper bounds, it doesn't automatically mean people stop talking about previous upper bounds.

(Ignore this whole comment if you specifically meant the function n^n, I have no examples relevant to that function.)

scott_s · on Oct 15, 2012

I literally meant n^n, which is also what I assumed sadga meant. That is, an upper bound which allows for absurdly fast growth.

I have discussed matrix multiplication's unknown lower bound with theorists, but more as a trivia item because there is a relatively recent result showing that it is O(n^2.3727) (http://www.cs.berkeley.edu/~virgi/matrixmult.pdf). That is, the unknown lower bound was the point of the conversation, and it was interesting because the lower and upper bound are not the same; it's still open research. The other point the theorist made during that conversation is similar to what I've said: getting a tighter bound on matrix multiplication is theoretically interesting, but there's not much practical application.

I was rather talking about when we discuss algorithm when trying to solve problems.

Aardwolf · on Oct 15, 2012

I agree. Nobody is ever going to attempt to write the theta in ASCII or bother typing it given that it's not on most keyboards. Math and practical programming are different worlds, each with their own notations. Programmers are quite used to using ASCII.

Also, if someone says "My algorithm is O(n log n)", doesn't it make sense that he means both the lower and upper bound? Even if technically correct, you wouldn't say "My algorithm is O(n^100)" when it's O(n log n). It's an English language thing...

P.S. I didn't read the article because it's not loading, but can kind of guess the context from your response and noticed this problem before.

B-Con · on Oct 15, 2012

I'm confused by what the article is saying about the meaning of Big-O... FTA:

> Misconception 4: Big-O Is About Worst Case

But it is. Big-O is an upper bound which is by definition a maximum (aka, "worst") case. "f = O(n)" means that at worst "f" will perform linear work, not that it is expected to. He gave this definition himself in the very beginning.

Then he states Misconception #4 and he says that Big-O is not about worst case. He doesn't sound like he is implying that programmers casually use Big-O this way, he seems to be saying this is how the real definition actually is. But it isn't.

tylerneylon · on Oct 15, 2012

Upper bound != worst case, though I can see why the language is confusing.

When we have a timing function f(n) we pretend f has one input, which is n = the size of the input to the algorithm. In reality, it's actually more like f(n, m) where m = a parameter specifying the _exact_ input to the algorithm. (Example: sorting algorithms have many possible input lists of length n.)

So when we write f(n) = O(g(n)), we are trying to make two simplifications: (1) Simplify away the actual input, replaced by only the _size_ of the input; and (2) simplify the actual function f to something that describes f but is much simpler.

I will make up an artificial example that is not perfect but is useful to explain the ideas. Suppose for any input size n we have parameters m in the range [-n-sin(n), n+sin(n)] and the exact timing function

f(n, m) = n + sin(n) + m

Then the first simplification is to determine how to ignore m. We could pick out the best m for n (the smallest f(n, m) for a fixed n), pick out the worst m, or take the average over all m. This is best-case, average-case, and worst-case complexity. For this function, we have:

f(n) = 0 [best case] f(n) = n + sin(n) [avg case] f(n) = 2(n + sin(n)) [worst case]

Then we could get an upper bound on _any_ of these.

f(n) = O(1) [best case] f(n) = O(n) [avg and worst case, not always the same, but they are the same in this example]

If we wanted a lower bound, we could say that

f(n) = Omega(n) [avg & worst case]

because f(n) >= n - 1 for all n (in general there could be more room for flexibility on the right-hand side, but it's not needed in this example).

The point here is that step 1 chooses a simplification across input instances, that we call worse case or average case, etc; and step 2 chooses a simplification in how to write down the function f(n), and this is what we call an upper bound for big-oh, or for theta, both an upper and lower bound.

B-Con · on Oct 16, 2012

Yes. Big-O is the worst case for the input, not for the algorithm. If the input is the expected average, then its Big-O is the upper-bound/worst-case for the average. But if the input is the worst case or just the algorithm itself, its Big-O will be an upper-bound/worst-case for the entire algorithm.

In other words, the original article's point should have been about vocabulary. Big-O is about worst-case, but sometimes it's implied that the Big-O analysis is about the algorithm's average case, not the algorithm's worst case.

btilly · on Oct 15, 2012

Let me try to say it simply.

Big-O is an upper bound on the growth of a function. The function being bounded can (per his item #3) represent anything of interest.

In practice it is common for the function to be the average running time of an algorithm. In this case the upper bound on the growth of that function says nothing about how badly the algorithm might perform upon rare occasion.

His example is quicksort. A naive quicksort sorts n things in on average O(n log(n)) time, but if presented with an already sorted set will take time O(n²).

Another one that I'd offer is a hash. The average lookup time for an element in a standard hash is O(1). The worst case lookup time is O(n).

If I am trying to figure out how long a batch job is that has to do a ton of work, including a lot of hash lookups, I care about the average lookup time, because I do not expect to encounter the worst case.)

If I am trying to construct a hard real-time system that absolutely must complete an operation in fixed time, despite 1000 trial runs that found the hash to be fast enough, the slow worst case scenario should scare me into using a more predictable data structure.

B-Con · on Oct 16, 2012

> Big-O is an upper bound on the growth of a function. The function being bounded can (per his item #3) represent anything of interest.

> In practice it is common for the function to be the average running time of an algorithm.

In other words, Big-O is the worst-case of the input. You do the complexity analysis and then assign it to a Big-O class, the Big-O class represents the worst-case for that given analysis. If you analyze the average, its Big-O class will be the worst-case/upper-bound of the average and you obviously can't expect it to say anything about the actual full algorithm's worst case scenario because, well, you didn't analyze the algorithm's worst-case scenario.

Big-O is worst-case, you just have to know what it's the worst case of. If the input is just "here is the quicksort algorithm", then Big-O would be the upper-bound for the algorithm itself, aka the worst case for the algorithm, namely O(n²).

This seems to be just an issue of convention, does "Big-O" refer to the upper bound on the algorithm or on the average algorithm performance? That's been discussed elsewhere, but it seems like programmers imply the average while the strict definition does not. So my point was that the original article should have just said to be careful about vocabulary, not asserted that Big-O isn't about worst case, since it is.

esrauch · on Oct 15, 2012

It's my understanding that the horse left the barn before Big-Theta was ever invented; Big-O was widely used even in formal papers more loosely and Big-Theta was only in dusty obscure text books until Knuth came along.

mturmon · on Oct 15, 2012

I think you're right. Moreover, the related Landau big-O notation was used in real analysis before the algorithm analysis big-O notation was in use (Landau died a month after Knuth was born, in the thirties).

Related number theorists had notations for big-Omega and little-o as well, but I think big-Theta was pretty much Knuth's idea.

Knuth's summary of the history of these notations is at:

http://www.phil.uu.nl/datastructuren/10-11/knuth_big_omicron...

(Parenthetically, one look at the above link and you know why Knuth was driven to write TeX.)

omaranto · on Oct 15, 2012

I think your "language is defined by usage" is a reasonable position to take, but in my experience this stuff about O really is a misunderstanding: the programmer's I've talked too that use O as Theta when asked what they mean by O will quote the standard mathematical definition, so they really are inconsistent.

anonimo · on Oct 15, 2012

> Given that language is defined by usage

What?! This is not natural language. It's a mathematical symbol with a well-known meaning.

One might argue that a lot of programmers don't need to worry about algorithmic complexity on a daily basis. But anyone who wants to talk about it should get the basics right.

sanderjd · on Oct 15, 2012

Arguing that language of any sort (including the mathematical sort) is not malleable depending on the context of its use is completely futile. This is a perfect example of the more common usage and the more correct usage of a "mathematical symbol with a well-known meaning" being totally at odds with one another.

When confronted with someone using different definitions for their language usage than those you are using you can either argue with them about what the correct usage is or allow that language is not absolute and cooperate to calibrate your terms so that you can understand one another. It seems that btilly has decided to do the latter in this specific case, which is laudable.

buro9 · on Oct 15, 2012

The biggest misconception I ever came across was during a job interview in which I was the interviewee.

After explaining skip-lists and sketching up the Big-O complexities for best and worst case for sorted and unsorted data... the interviewer turned to me and asked why wouldn't I use a skip-list (it should already have rung bells that it wasn't "in which instances might you not use").

My answer was that each problem should be understood in terms of the data and task before choosing a data structure and algorithm. That answer was rejected.

The interviewers answer: "No, you shouldn't use skip lists because the memory of all those extra pointers is too much for the JVM."

The biggest misconception I've come across is that Big-O says anything at all about the language you're coding in and your implementation.

I did point this out, but was shot down pretty fast.

The company in that scenario was AWS, the position was architect level. I didn't take the job. Interestingly they offered me another job different from the one that I had been interviewed devoid of any product dev at all on the basis that I was "too technical" for any role that included product.

Big-O doesn't care for your implementation.

jiggy2011 · on Oct 15, 2012

Weird, I don't quite understand how the pointers would be "too much for the JVM". I suppose this will depend on how many layers your skip list has and how big it is etc.

Is there some reason (related to GC or whatever?) that it is bad in particular to use pointer heavy data structures on the JVM?

buro9 · on Oct 15, 2012

I questioned the same thing, but he answered with "the list might have millions of items".

Java isn't my strong point but I struggled to think how it might be an issue. Interviews aren't the place to possibly end up making the interviewer look foolish. I just let it go.

If I was working with him I'd say "Show me". As I'm still curious as to whether it makes any difference. Wasn't the point though, Big-O still doesn't care about the implementation.

dbaupp · on Oct 15, 2012

Java isn't my strong point either, so this is more of a question than an answer. But Java is built around objects (and so everything is a pointer), so the JVM is designed to work with pointer-heavy code, I would think.

(Or is that incorrect?)

jiggy2011 · on Oct 15, 2012

Java does have primitive types like int , char etc which I believe are stack allocated. Java programmers do like to "box" these into classes like "Integer" etc though.

What I find confusing about the answer is that it implies that there is some reason that a skip list (compared to another structure) would be particularly inefficient in the JVM (compared to say native C++).

Basically a skip list will cost you some extra memory by storing more pointers which allow you to "skip" ahead the list but as you add extra layers each layer will have less pointers than the one below it.

A skip list might be a bad idea if you are working a severely memory constrained system and saving memory is more important than lookup speed on your data structure. They would also of course be a bad idea if there is another structure better suited to what you are doing (B tree or whatever).

But saying "Skip lists are bad because there are too many pointers for Java" just seems like a big wtf to me, unless there is something I am missing here.

Espressosaurus · on Oct 15, 2012

While he worded it poorly, he is correct. Kinda. In a skip list with millions of items, you're going to have something like n log(n) SkipListObjects. The references are virtually free. The objects are not.

On a typical JVM implementation, each object takes ~100 bytes. IIRC even an empty ArrayList will cost you something like 60-80 bytes.

If the extra log(n) of space complexity is going to make a difference, then a skip list might not make sense.

But honestly, if you've got millions of objects, you might want to consider a different language/environment. Java is pretty fast these days, but it will still chew up considerably more memory than most native solutions.

btilly · on Oct 15, 2012

The factor isn't log(n), it is a constant. Depending on the implementation, that constant is generally 2. (That's because each level is half the size of the previous. So n + n/2 + n/4 + ... = 2n.)

That said, I could well believe that the constant associated with a SkipList is worse than the constants with other data structures. What works well in C++ does not always translate directly to other languages.

DanWaterworth · on Oct 15, 2012

The alternative would be some sort of tree. The obvious choice, some sort of balanced binary search tree, also has two pointers per item, though half of them don't point to anything.

jimbokun · on Oct 15, 2012

"If I was working with him I'd say "Show me"."

An interview might be a good time to ask this question, too. Remember, a job interview is also your way of figuring out whether you want to work for the company.

vph · on Oct 15, 2012

Based on your testimony, it seems the interviewer did not ask you a question about big-O, and you thought he did.

buro9 · on Oct 15, 2012

No, he did. The question was roughly: "As you seem to like algorithms could you explain what skip lists are, draw on the whiteboard if necessary.".

After that came, "Now, using Big-O, explain why this could be better than a normal linked list".

Before finally the "Why wouldn't you use skip lists?" and that classic answer.

And there were 2 interviewers in the room. One was leading and the other was observing and learning how to interview.

cube13 · on Oct 15, 2012

>Before finally the "Why wouldn't you use skip lists?" and that classic answer.

That answer was a non-answer, though. Saying "You pick the best structure for the data" doesn't answer the question "Why would you not use a skiplist".

A few reasons, off the top of my head:

1. Size of the list structures can get gigantic, especially as the total size of the list increases. This is especially true with 64-bit pointers, and small data(ASCII names, integers, etc.).

2. While the list is very good(O(log n) ) for searches, additions and deletions can be problematic, especially if they're more common than lookups.

3. Depending on the implementation/language, it might be a good idea to pre-allocate the pointer array. Increasing the list size above the supported pointer array size can be a major headache.

flebron · on Oct 15, 2012

I wrote this as a comment there, but I'll repaste it here since it's not showing up.

-----------

One other thing I'd like to add, when considering asymptotic analysis, one needs to take into account both the cost model (i.e. what is assumed about the cost of operations) and the input size (i.e. what the variables mean).

So for instance, if I tell you that the following program to find a number's divisors is O(n) operations, I wouldn't necessarily be wrong:

  for(i = 1; i <= n; ++i) if(n % i == 0) printf("%d\n", i);

After all, this is doing a single loop, and it will loop exactly n times. So why isn't factoring a solved problem? We can factor numbers "in polynomial time!", I'll just check this list!

Factoring, as a problem, is measured using bit complexity. That is, the input size is the number of bits needed to store the number I want to factor. So my program is actually O(2^m) operations, where m is the input size. And so this isn't factoring "in linear time", I just misunderstood what the input size was.

Likewise, one can do analyses where the operation cost (also called computational model) is different from the uniform model. If I am doing crypto, using GMP, it would not be smart to say that adding any two GMP integers is O(1) operations, like we assume with our standard model of costs, where adding integers is O(1). We can certainly assume that, and call that a basic operation, but we will see the number of operations and time it takes to be wildly different, so one should use an analysis where the cost of adding, say, the numbers n and m, is O(log n + log m) operations for instance, if one wants operation-counting analyses to give a hint at runtime. It is important to remember that it is the analyst who defines what operations are basic, and one can do analysis of the same algorithm in different cost models, to get different information about it.

So asymptotic analyses of algorithms depend on both the input size, and the cost model. My divisors finding algorithm, for instance, is best measured using the logarithmic model of computation, and logarithmic input size.

As a trick, it's fun to try to construct two functions f:N -> N, g:N -> N, such that neither f is O(g), nor g is O(f). Bonus points if they're both monotonically increasing. :) It's useful to dispel the notion that functions are given a total order by big-O notation.

gsibble · on Oct 15, 2012

The biggest misconception of Big-O is that it actually matters at most start-ups. I get asked about Big-O all the time. My response is, "You haven't launched. You have zero traction. You have no users yet. Stop worrying about Big-O and just build the damn product."

jrockway · on Oct 15, 2012

This attitude is the reason why the most iconic image related to Twitter is the fail whale.

recursive · on Oct 15, 2012

Better to be twitter with a fail whale than app.ly-ify'r with no users, but awesome scalability.

xsmasher · on Oct 15, 2012

If a startup is thinking short-term, then they'll ask technical/tactical questions. Explain X command or language/os feature. Find out what the candidate knows now. Like you implied, strategy doesn't matter if you're dead.

If they're thinking long-term, they'll ask theoretical/strategic questions to assess general education level (traditional or self-taught) and ability to adapt. What will the candidate will be like in three to five years? Talking about algorithms and complexity can help there. You don't know what tech/language you'll be using in the future, but fundamentals will still apply.

Neither one should be asking you questions about light bulbs or where to bury the survivors.

tylerneylon · on Oct 15, 2012

For folks who genuinely love math (I do!), there's a great but little-known book by Hardy called Orders of Infinity that can help readers grok the possibilities of big-oh notation. It's not about algorithms, but about getting a solid understanding of the many ways functions can behave as n goes to infinity. As a bonus, it's a short (< 100 page) book.

If you're more of a CS person than a mathy, I recommend Sipser's Introduction to the Theory of Computation. This one is written in friendly yet precise terms, and I found it a pleasure to read.

ambrop7 · on Oct 15, 2012

Fun fact: there exist pairs of positive monotonic functions f(x) and g(x) such that neither f(x)=O(g(x)) nor g(x)=O(f(x)), i.e. the functions are not comparable asymptotically.

Example:

f(x)=gamma(floor(x)+1) (=floor(x)!)

g(x)=gamma(floor(x-0.5)+1.5)

There exists no such constant M>0 that f(x)<M*g(x) for x from some point on, and the converse. The two functions periodically cross each other in a way that neither is asymptotically bound by the other.

oskarkv · on Oct 16, 2012

The correct statement about lower bounds is this: "In the worst case, any comparison based sorting algorithm must make Ω(n log(n)) comparisons."

You are making the same mistake that people that have Misconception 1 make. "The algorithm must make <a set of functions> comparisons." is a nonsensical statement.

adrianr · on Oct 15, 2012

Misconception 4 should be the first.

bnegreve · on Oct 15, 2012

I find this a too informal and ultimately confusing.

> Misconception 1: “The Equals Sign Means Equality”

The equal sign does mean equals.

> The left-hand-side is a function, the right-hand-side is a … what, exactly? There is no help to be found in the definition. It just says “we write” without concerning itself with the fact that what “we write” is total nonsense.

This is not clear at all !

> If you take it at face value, you can deduce that since 5n and 3n are both equal to O(n), then 3n must be equal to 5n and so 3=5.

Yes well, f(x) = f(y) does not implies x = y. I think people know this ..

... and so on

esrauch · on Oct 15, 2012

> This is not clear at all !

He is using 'type check' to point out the flaw. x = y means x and y are the same thing. If x is a function and y is an integer, then it can't possibly hold that x = y.

Consider x = O(n). The implication would be that whatever type of thing O(n) is, x also is. This is trivially untrue.

If x = y, and x is a function, then x(K) = y(K). It would be ridiculous to say that x(K) = O(n)(k).

> Yes well, f(x) = f(y) does not implies x = y. I think people know this ..

I don't understand why you are mentioning that. The issue is that x = f(z) and y = f(z), then x = y. You can't get two different values by applying the same function to the same value.

Saying that f is O(n), the "is" is not an equality statement at all, "is" is some other function exactly as you mention that people understand. It is clearly some function that is non-transitive.

In this case the relationship is most easily expressed using existing operators by having O(n) be a set of functions and the "is" in that sentence be the "exists-in" set operator, which is a case where the transitive property doesn't hold. Using the = symbol in that context is just bizarre when = is a otherwise always used to mean equality, which is function with special properties that don't apply here.

lmkg · on Oct 15, 2012

> The equal sign does mean equals.

In the particular case of big-O notation, the equals sign does not mean equals. Trying to parse the equal sign literally leads to confusing and contradictory interpretations (which is why their explanation is confusion).

You're misunderstanding the 3=5 example. Let f(n)=3n, let g(n)=5n. Using big-O notation, f(n)=O(n), g(n)=O(n). Were the equal sign behaving normally, then O(n) would be a single well-defined object, and one could use transitivity to conclude that 3n=5n. But of course this is false, because O(n) isn't a single function, nor really something that other functions can equal. It's a class of functions, not a function itself.

It would be more appropriate to say that "f(n) belongs to O(n)," or more compactly, "f(n) ∈ O(n)." This is much closer to the real definition, and behaves appropriately when taken literally.

tsahyt · on Oct 15, 2012

> The equal sign does mean equals.

Yes. This is why using it with Big-O notation is formally wrong when we're being pedantic.

maeon3 · on Oct 15, 2012

The equals sign in the big-O notation makes about as much sense as the folowing mathamatical expression:

Pidgeons = arraylist(1, ..., n)

It is an exercise in nonsense, it helps young computer scientists unwravel nonsense in the world by making it part of the curriculum.

Sure big-o notation can be useful, but using equations to describe it is like trying to type with boxing gloves.