My first guess would be that a FIFO queue is more efficient as a list than an array, but maybe there's something weird about ruby that makes that untrue. I'd love to hear why.
Unless you have an extraordinarily clever implementation of "linked list in Ruby" (in which case this is still a terrible interview question), no. Arrays are more efficient for every reasonable access pattern. If the links of your list are references to Ruby objects (which every online "linked list in ruby" page uses), they're not so much "more efficient" as they are "tractable" compared to linked-list's "intractable".
Something like that. rbx takes about 0.5 secs to start up, but after that it generally runs in the neighborhood of 10x faster, assuming you're spending enough time in actual ruby code to make that possible.
Can you show me your list implementation? I just installed rbx and tried this; the Array implementation ran in single-digit seconds, but I gave up and CTR-C'd the list one (which works fine for smaller lists).
Why do I get the sinking feeling I'm about to embarrass myself by implementing a linked list wrong? Anyway, here's the test case I used, where I'm emulating a FIFO.
class Link
attr_accessor :next, :v
end
i = 2000
j = 10000
k = 100
if true
p "linked list"
j.times {
head = Link.new
tail = head
i.times {
n = Link.new
n.v = 12
tail.next = n
tail = n
}
(i-k).times {
n = head
head = head.next
}
}
else
p "array"
j.times {
arr = []
i.times {
n = Link.new
n.v = 13
arr << n
}
(i-k).times {
n = arr.pop
}
}
end
1. Make a 1,000,000 node linked list of integers (I used doubly linked lists but I don't think it matters).
2. 1,000 times, insert into the middle of the list; I wrote a trivial O(n) stateless insert and called it with an offset of 500,000 1,000 times.
3. Make a 1,000,000 element Array of integers --- I just did "1000000.times.map".
4. 1,000 times call "insert" with an index of 500,000.
Step (2) takes so long I kill the process. Step (4) takes a barely perceptible amount of time. (Both in Rubinius).
It looks to me like:
* You're using much smaller data structures than I am
* Your "Array" case is still building Link objects, so still incurs the object management overhead
* You're inserting at the head of the list every time, which doesn't incur seek time. But inserts at the head or tail of a contiguous array don't need to seek or reallocate either.
If you have to seek into the list, of course it's going to suck, that's not a good application for a list. Your test would blow just as bad in C. I'm not sure that says anything interesting about lists vs arrays in ruby in particular.
I am using the same data structure, I guess it would be a little smaller without the next attr. Still seems fair. Building an array of objects is going to require allocating them. I build hand rolled linked lists by directly linking the objects of interest. Calling the class Link may have been a misnomer, it could have been called AnyClass.
It's easy to insert at the tail of an array (I'm actually inserting at the tail of the list, btw), but popping from the front means having to shift everything down. That's what makes the FIFO case interesting. If we're going to prove that ruby is too slow for linked lists to be viable, we need to be testing a scenario where linked lists generally are viable.
The most naive possible C implementation takes 2.5 seconds to run this same test. I just tried it.
The point of a list is "insert and delete from the middle", so I don't know how to respond to the idea that actually inserting and deleting from the middle of the list is a bad benchmark.
It's difficult to respond to your last point about FIFOs for a different reason. Popping 1000 times from a 1,000,000 Array is so fast that I'd have to write code to benchmark it. And Ruby doesn't even optimize for that case; in real code, I'd use a ring buffer so that pops are just pointer adds. Ruby is, to the best of my knowledge, actually copying every single time. Copies are just way way way faster than you seem to expect them to be.
The reason manipulating linked lists in the middle is faster than arrays is so you can save the O(n) work of moving elements. But you lose that if you still do O(n) worth of seeking. There is no big-O difference between inserting and deleting in the middle of an array and a plain linked list, if the linked list is forced to seek.
I'm sure this is as obvious to you as it is to me, but perhaps the perspective it comes from explains why people are thinking that your described operations on the list are a dodgy benchmark for judging linked lists vs arrays; absent constant factors, they should perform the same. (And yes, for reasonable input on modern machines, the constant factors will dominate.)
I'm sure I agree with everything you're saying here, but it carries a whiff of tautology. You lose the benefit of middle-insertion in a list if you have to seek, but part of the point of using an array is not having to seek ever.
Anyways the only thing that moved me to comment is the general inferiority of linked list data structures compared to arrays, which are what Ruby (sensibly) uses.
I'm reading this thread and kind of tearing my hair out.
One theoretical case where lists beat arrays is seeking to the middle (once) and inserting 1000000 items in that one spot.
If you want to test if lists can ever be useful in Ruby then you want to test the case that they are theoretically useful. Your test is a theoretical dead heat as arrays and lists both perform O(n) in it. All you learn is that arrays are faster in Ruby for the same complexity, which we knew already.
I probably shouldn't have replied and dragged it out, and I apologise for my tone but you seem to be almost wilfully missing the point.
There are problems where a linked list is theoretically better suited than an array. Your test is a problem where they are theoretically evenly matched. The interesting question is: do the benefits of native code etc, that you get when using an array outweigh the costs that you incur for using a non-optimal data structure for a given problem? To answer this you would need to come up with a situation where a list should be faster if the array and list were both native or both higher level ruby implementations. Then compare the actual running times to see what impact native code vs ruby code has.
All your test shows is that in a problem where neither data structure has an advantage in complexity, i.e. they both perform in O(n), the one backed by native code is faster. My assertion is that is a pretty boring thing to discover and "we", or most people, would guess that anyway.
I think we're not communicating well because we're deep in a tangential subthread.
I take your point that repeated insertion at a specific held reference to the middle of a list is faster than insertion into the middle of an array.
I'm just saying that in Ruby, where every list node incurs object overhead and where list iteration is done by tree walking and array referencing is done by pointer dereferencing in the background, lists underperform arrays even in cases where you'd expect the algorithmic complexity of a list to yield a big win.
Of course blitting half of an array will be much faster than dereferencing an equal number of list nodes. I don't think you would want to use a linked list in a situation that resembles your benchmark. If you were going to insert many more elements at a time than 1 (like, say, all 1,000), or if you had to traverse the entire list and also update some parts of it, then I would expect it to do significantly better.
The point of doing 1000 list middle-inserts isn't to see how fast 1000 list middle-inserts are; it's to capture how much faster those inserts are (or aren't) than the same number of Array middle-inserts.
As it turns out here: Arrays way faster than lists.
This whole thread has gone off the rails a bit (albeit in the most enjoyable possible way --- the kind that makes us write code to test assumptions). The real point is: linked lists in Ruby are pretty silly, and an even sillier interview question.
All I meant to say is that on MRI, you might as well give up on reference-based linked lists; they just don't work at scale.
I think that, sadly, this is true of almost any well engineered data structure. MRI's overhead is so vast that even if you perform an operation at, say, O(n) with the logical solution, using a naive built-in or iterative technique with retrieval of O(n * n) will be faster up until the often rather distant point where the lines cross.
It's just as bad on Rubinius and just as bad on JRuby. I think this has more to do with tree-walking (and using a data structure that intensively requires that) than simply MRI overhead.