This is a bit of a weird post. The author sets up benchmarks, shows that hstore appears to work best, then suggests to use HyperLogLog despite performing worse. The reason being because it scales better, but the author didn't really discuss how HLL works so I'm not sure why it scales better other than that it has a cool name.
Looks like I could have been much clearer, apologies!
The main ideas around picking HLL was that it was a good compromise between fast operation, forget nothing nature of the assoc table, and avoided the should-you-really-do-that nature of storing possibly 100s of thousands or millions of numeric kvs in a hstore column alone.
There’s definitely a whole ‘nother post in there on pushing it to it’s limits but it was already long enough, I thought!
I agree; why not do the work to show that it scales better? It violates scientific integrity to have a hypothesis, run the experiment, find that your hypothesis was wrong but then say it is probably right regardless
They also point out that HLL doesn't have false positives but it does have incorrect counts (I don't really see how that's different tbh), and later say it has the same "absolute correctness" as an association table...
Since the author works for an organization that is built on Postgres I imagine this started out as a content marketing idea for Postgres. Somewhere along the way it went off the rails and became a piece about HLL technology (which is independent of the underlying DB tech).
I do like articles like this that attempt to solve real world challenges though! Keep up the good work.