Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ok, I can see how I didn't manage to fully explain myself - by replying to your "collisions" argument I actually meant something more than that and while it seemed obvious in my head what I wanted to say, I can agree that my wording was imprecise. Sorry about that.

What I pictured in my head is that the ingestion workloads in contrast to the read-only workloads are different in the fact that collisions in former can trigger hash-map resizing/rehashing and which is why I say that these are two distinctive workloads. Moving bunch of data to and from the buckets.

So, my hypothesis is the following: if I PGO the implementation using the dataset (volume, data_distribution) that it triggers a handful of such resizing/rehashing operations, and we observe the positive improvements in wall-time runtime performance, what are the chances that using the same binary but now over a completely different dataset will preserve the same runtime performance improvements? What you're saying, if I understand correctly, is that the improvements will be the same regardless of the dataset (and even type of workload) and I am saying I am not so sure about that.



Lets analyze one cpu instruction in your scenario: the comparison used to decide whether to resize. You have to remember that PGO operates to optimize each individual branching instruction, not the whole program.

So every time you add something to the hashtable, it will check the current size against the next size it needs to resize. 90% of the time, then 99% of the time, then 99.9% of the time and so on the answer will be 'no, there is space left'. This is just because the size goes up geometrically (in general) so a very low % of insert operations trigger a resize. The 'high probability branch' for "if current_size == max_size" should always be false. This is independent of workload. The compiler would not know whether that comparison is 'almost always true' or 'almost always false' which might lead the CPU's branch predictor to guess wrong, which will lead to pipeline stalls.

Read up on how pipelined, superscaler cpu's work to understand why this is very important.

https://en.wikipedia.org/wiki/Pipeline_stall

https://en.wikipedia.org/wiki/Predication_(computer_architec...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: