Hacker News new | past | comments | ask | show | jobs | submit login

It depends on the implementation, but typically hash tables are used to store the elements and values of associative arrays:

https://www.gnu.org/software/gawk/manual/html_node/Array-Int...

I suspect that it's designed so that hash collisions are impossible until you get to an unrealistic number of characters per line.




I doubt it's designed to silently break in some cases. Unrealistic isn't realistic until one day it is and that is a bad day. I suppose it could just throw an error in the case of a hash collision, but I doubt it.


But what does it do, then? The page I links states that it uses a hash table. Hash tables apply a hash function to the key. Hash functions map arbitrary input data onto data of a fixed size. It's inevitable that collisions will occur. ~~even if you use some sort of clever workaround in the case of collisions, eventually you use up all the available outputs.~~ (my bad)

I'm not claiming that it will silently break! I'd be very interested in exploring the internals a little more and finding out how hard it is to get a collision in various implementations and how they behave subsequently.

EDIT: I've read chasil's comment and agree that it must be storing raw keys in the array. I guess awk uses separate chaining or something to get around hash collisions.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: