Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I felt like I didn't understand the blog post very well, either.

It spends a lot time building a benchmark framework and measuring things. At the end, hstore is the choice supported by the data. But instead the author writes:

> If we go strictly with the data, the best way looks to be the hstore-powered solution, but I think the HLL is probably the right choice.

?!? Why bother defining a framework for measurement if you're just going to substitute your own judgment? Perhaps the framework didn't capture something important -- like the absolute number of people looking at any given post, concurrent updates, etc.

I'm also confused by the benchmark. The autocannon script has:

    // 10% of the time, get *all* the users
    request.method = 'GET'
    request.path = `/posts/${postId}/seen-by/users`

Does this imply there's a path it can hit to get the list of users who saw a post? HLL is a probabilistic data structure, so it doesn't support that with 100% accuracy. You could reverse engineer it from the list of users, but the performance will vary with the number of users, which didn't seem to be captured by the benchmark. I tried to look at the source code, but the GitLab URL was 404ing for me.


> ?!? Why bother defining a framework for measurement if you're just going to substitute your own judgment? Perhaps the framework didn't capture something important -- like the absolute number of people looking at any given post, concurrent updates, etc.

Just guessing: the benchmark tells you the time complexity. It doesn't tell you the space complexity. The author is optimizing between the time- and space-complexity of the solutions, with the time-complexity benchmarks as an input. (Also, at a scale the benchmark doesn't reach, space-complexity starts to affect time-complexity, as large datasets become less able to be hot in disk cache.)


That sounds like it's a bad benchmark, then? I mean, it's still interesting, but a synthetic benchmark that fails to map to real use is... I guess actually negative value since it's misleading.


Yeah, this is probably the reason and the author even elaborates on this in the sentences following the bit quoted by cldellow:

> Even though the data says hstore, knowing that posts will be seen by more and more people over time, I might choose the HLL solution for an actual implementation. It's far less likely to pose a bloated row problem, [...]


Just posting here too, but yup this is exactly what I was trying to convey.

Hstore might have been the fastest but the way I am using it or the scales the use case could scale to might not work out.

Bigger benchmarks could have been done! Maybe a multi part post would have been better so I could split apart methodology and results/blabbering about approach!


Yup this is it! You’ve said it much better than I did, thank you.


I don't think I'm getting 404s, but the repo's gitlab url does keep bouncing me to a login page.

> You need to sign in or sign up before continuing.

Has gitlab always enforced that, even for public repos? Or is the linked repo not public?


Hey sorry for that, we've changed the links to a public Github repo, so everyone can take a look or submit a PR: https://github.com/VADOSWARE/supabase-seen-by


I can follow the links on the post now - thanks for updating!


Linked repo isn’t public, you can check through the author’s public repos on gitlab.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: