> It's possible to gradually resize a hash table to prevent spikes.
Can you provide details? The only trick I know of is the same one as with arrays where you overallocate. That still doesn't get rid of spikes and the C++ STL and Wikipedia both list worst case time as O(size of hash table). It's also not enough to just have capacity - often times hash tables will abort finding a slot if gone far enough in favor of resizing & rehashing the table.
> For collisions, if you use a good hash then you can adjust your load factor and chaining strategy to make sure it's never a problem. Where 'never' is 'less likely than the computer spontaneously self-destructing'.
Yes, but the vast majority of implementations don't do a good job here. Howard's "Types Don't Know #"[1] paper is extremely well thought out and I have yet to see this adopted (except for Rust - Rust has a good track record of adopting good ideas like this and having the benefit of coming late to the game). I think it's safe to say that most hash tables don't have good worst case times (which when combined with resizing behavior of insertions, means you have super-linear worst case times on insertion).
> Can you provide details? The only trick I know of is the same one as with arrays where you overallocate. That still doesn't get rid of spikes and the C++ STL and Wikipedia both list worst case time as O(size of hash table).
When you want to grow your hash table, go ahead and allocate a bigger but blank chunk of memory. (You can either design your memory allocation system to make this fast, or you can prerequest the memory when you're x% away from full.)
Put your new item in the big chunk of memory.
Every time you search for something, check both chunks.
Every time the hash table has to allocate a new slot, move over four items from the small chunk of memory to the big chunk.
Eventually the big chunk of memory will be 65% full and the small chunk will be empty, and then you deallocate the small chunk.
And that'll work forever just fine.
The ability to shrink is only slightly more complicated, exercise for the reader.
> It's also not enough to just have capacity - often times hash tables will abort finding a slot if gone far enough in favor of resizing & rehashing the table.
Either use methods that won't abort, or use a secure hash and tune the hash table so you can make a statistical guarantee that the chance of an emergency resize is a million times smaller than the chance of the computer spontaneously self-destructing.
Edit:
> C++ STL and Wikipedia
It's no surprise the STL wouldn't so something as fussy as keeping two most-of-a-hash-tables behind the curtain. As for Wikipedia, it has a whole section talking about how you don't have to resize all-at-once. The O(n) at the top is an oversimplification.
Can you provide details? The only trick I know of is the same one as with arrays where you overallocate. That still doesn't get rid of spikes and the C++ STL and Wikipedia both list worst case time as O(size of hash table). It's also not enough to just have capacity - often times hash tables will abort finding a slot if gone far enough in favor of resizing & rehashing the table.
> For collisions, if you use a good hash then you can adjust your load factor and chaining strategy to make sure it's never a problem. Where 'never' is 'less likely than the computer spontaneously self-destructing'.
Yes, but the vast majority of implementations don't do a good job here. Howard's "Types Don't Know #"[1] paper is extremely well thought out and I have yet to see this adopted (except for Rust - Rust has a good track record of adopting good ideas like this and having the benefit of coming late to the game). I think it's safe to say that most hash tables don't have good worst case times (which when combined with resizing behavior of insertions, means you have super-linear worst case times on insertion).
[1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n398...