> It's possible to gradually resize a hash table to prevent spikes. Can you prov...

Dylan16807 · on Jan 3, 2022

> Can you provide details? The only trick I know of is the same one as with arrays where you overallocate. That still doesn't get rid of spikes and the C++ STL and Wikipedia both list worst case time as O(size of hash table).

When you want to grow your hash table, go ahead and allocate a bigger but blank chunk of memory. (You can either design your memory allocation system to make this fast, or you can prerequest the memory when you're x% away from full.)

Put your new item in the big chunk of memory.

Every time you search for something, check both chunks.

Every time the hash table has to allocate a new slot, move over four items from the small chunk of memory to the big chunk.

Eventually the big chunk of memory will be 65% full and the small chunk will be empty, and then you deallocate the small chunk.

And that'll work forever just fine.

The ability to shrink is only slightly more complicated, exercise for the reader.

> It's also not enough to just have capacity - often times hash tables will abort finding a slot if gone far enough in favor of resizing & rehashing the table.

Either use methods that won't abort, or use a secure hash and tune the hash table so you can make a statistical guarantee that the chance of an emergency resize is a million times smaller than the chance of the computer spontaneously self-destructing.

Edit:

> C++ STL and Wikipedia

It's no surprise the STL wouldn't so something as fussy as keeping two most-of-a-hash-tables behind the curtain. As for Wikipedia, it has a whole section talking about how you don't have to resize all-at-once. The O(n) at the top is an oversimplification.

vlovich123 · on Jan 3, 2022

That's a really neat idea. I don't think I've seen any containers that implement this idea. Wonder why.