Scraping 6 million HN posts (out of a total of 40 million, 6M = some ~3 years of...

RGamma · on April 8, 2024

Having this natively here would be pretty useful. There's so much good stuff that "disappears" in the barrage of submissions, that can be hard to surface via simplistic search.

isoprophlex · on April 8, 2024

Interesting, what would the ideal user interface look like, according to you? As a user, how would you like to interact with a system like this?

RGamma · on April 8, 2024

I'd most definitely want it to mainly link to the source comments/threads, no regurgitation necessary.

UI could be anything really as I'm not picky, self-hosted would be a huge plus.

Example queries could look like: "(what's a) setup for X?" (where X might be "on-prem ETL" for instance), "troubleshooting Y", "books about Z" or "obscure web finds" (this one's tricky because it's subjective and perhaps implicit). I guess many more inspirations can be found in the Ask section. Pretty sure a lot of questions get repeated.

I'd already be happy if the results were "ontologically obvious" close to my query, effectively searching phrases modulo synonyms.

With search as it currently is, I have to rephrase my query because I can't conjure the exact wording to exclude the many false positives (or include any true positives ;)), usually resorting to site: queries on other search engines or giving up.

Also, I should really have curated better myself these past years... HN is a cool place, but it, like other feed-like sites, surely has a short memory.

Similarly, imagine the info that is stored in Discord chat history, never to be retrieved again unless the right users come together at the right time and the info is repeated (as if we're back to oral history, except this time with perfect recordings).

Hope this helped somewhat, trying to keep it brief.