Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Precisely this. This article might seem reasonable to anybody who has never tried to organize something as simple as a local music collection.


Made me think about John Wilkins' "philosophical language" which I first heard about in Neal Stephenson's book Quicksilver

https://en.wikipedia.org/wiki/An_Essay_Towards_a_Real_Charac...

I'm sure there have been countless similar attempts at categorizing knowledge

one of the more successful ones being the dewey decimal system

I have my doubts about whether the thing the OP alleges we have "failed" at is even possible at all


Well, this runs straight into one of the massive, concrete pillars of computing: naming things.

Because that’s what a lot of this falls into.

Overwhelming amount of stuff with no names. No categories, no nothing.

With extended file attributes we could hang all sorts of meta bits off of arbitrary files. But that’s very fragile.

So we ask the systems to make up names for data based on their content, which turns out to not necessarily work as well as we might like.


No names is not the biggest problem. You just have to come up with a name. The problem is when things have multiple names, or even worse when people disagree on what names are appropriate for something. The world rarely allows you to neatly categorize large datasets. There are always outliers.

For example, you have a set of balls and you want to sort them by color. Where does orange stop and red begin? What about striped balls or ones with logos printed on them? What if it is a hypercolor ball that changes based on heat? It gets messy very fast.


Not everything has to be named once and put into a hierarchy like a directory tree. Tags work well for data. A system like an LLM that understands synonyms and antonyms should be able to find and even update tags for concepts that don’t have a full set already - as long as there are a few appropriate tags on the concept to start.


In practice if you're making up tags on the fly it's not much better than untagged data. A LLM that can figure out what the tags mean can probably just infer it from the data anyway.


In practice applying flags from a curated list is not much at all like making up new tags on the fly.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: