Stop Using Linked-Lists (in areas where linked-lists are the wrong structure to ...

ajross · on May 22, 2013

More: "Linked lists are much, much slower than you think they are and should not be used in performance-sensitive code." That fact isn't remotely obvious to most hackers, and simply brushing it aside as an area where linked lists are "wrong" is missing important details.

gizmo686 · on May 22, 2013

Even in performance sensitive code, linked lists might be the right way to go. If you need to store an unknown amount of data, a resizable array probably does amoratize to a better performance than linked lists. But, all of the 'slowness' happens at the same time, so it might be worth slowing down the average case to avoid the worst case. The most notable examples I can think of are video games where FPS is king, and kernels, where you always want to exit quickly.

Linked lists can also work better in limited memory environments because, with the overhead of 1 pointer per element, you can make use of fragmented memory.

olivier1664 · on May 23, 2013

Or something between the two: "Unrolled linked list": In computer programming, an unrolled linked list is a variation on the linked list which stores multiple elements in each node. https://en.wikipedia.org/wiki/Unrolled_linked_list

betterunix · on May 22, 2013

What the article misses is that asymptotic analysis does matter. Yes, you can find specific cases where arrays are faster even when you are doing a lot of insertions and deletions. The problem size often grows, and often grows without proportionate increases in cache sizes and the other factors mentioned in the article. I would much rather have an algorithm that scales well than one that performs well for the specific problem size I was thinking about when I wrote the code.

dllthomas · on May 22, 2013

But perhaps you should be counting the horrifically expensive operations (cache misses) in your asymptotic analysis, not the virtually-free operations (writes).

betterunix · on May 22, 2013

Only if those operations become more expensive as the problem size grows. The point of asymptotic analysis is the relationship between problem size and expense, not the expense of individual operations.

To put it another way, imagine an strange implementation of quicksort that caused a cache miss on every comparison and every swap, and an implementation of bubble sort that hardly ever caused a cache miss. Which would you use in your code?

dllthomas · on May 24, 2013

Hm, glib answers aside, you're right. I think the real point is that counting writes makes no sense because it is both not the most expensive and not the most frequent operation involved. Really, the whole thing comes from looking at lists wrong. Find-and-delete is not O(1), and converting half of an O(n) operation to O(1) doesn't actually make an asymptotic difference. If you are genuinely just performing inserts and removals of known elements things will look different.

ajross · on May 22, 2013

In a real-world app that was actually subject to performance requirements? If the bubble sort meets the goal where quick sort doesn't, hell yes that's what'll get picked.

Not all optimization is premature optimization. Some apps care. And the point of this article is that some operations which people are trained to think of as "fast" (like the "O(1)" pointer operations in a list traversal) actually aren't.

betterunix · on May 23, 2013

I suspect that the real-world app using bubble sort would be fine...until the day it was not fine. By that point, you'll have a bunch of people who depend on the software, while you scramble to figure out why things became so slow.

Constant factors should not be ignored, but neither should scalability. That is the reason that production-grade sorting algorithms switch from quicksort to bubblesort/selection sort when the sequence is short enough. That is also the reason we typically use quicksort rather than heapsort -- quicksort usually has a lower constant factor on modern architectures, which is what matters when the asymptotics are the same.

The problem with focusing on the performance of your system for particular input sizes is that you are usually not guaranteed that the input size will not increase. It is not premature optimization if you have a good reason to believe that the input size will not grow, or that it will not grow enough to matter. Such is the case with matrix multiplication. If you have no evidence of that, though, you are almost certainly better off choosing the asymptotically better algorithm first, and coming back to tune that / switch to difference algorithms on smaller inputs / etc. later on.

angersock · on May 22, 2013

Mmm...how do you represent your free memory block lists in your memory allocators for easy coalescing and whatnot?

Just curious.

ajross · on May 22, 2013

Good grief. Why the sarcasm? Again I'm seeing a deliberate attempt to interpret an insane absolutist stance from what is actually (if you read it and not the headling) a well-considered argument. Linked list traversal can be slow, and they can be slow in ways that (again, if you read the article) you probably find surprising. Does that mean that some special applications (like malloc) can't find a user for them? Of course not.

But to respond to the technical point: the malloc case is very special. Free memory is by definition unused and probably cache-cold, so there's little value in improving the locality of reference. And it's "already memory", so there is no value in trying to put it into a "container" when you can just drop the link pointers right into the buffer. So your'e right, but in a specious way that is still missing the point of the linked article.

skw · on May 22, 2013

This! Linked lists / trees are great in applicable use cases, I think the author or the article just being an attention whore... to be fair I haven't actually needed to use binary trees or linked lists in ages...