No names is not the biggest problem. You just have to come up with a name. The problem is when things have multiple names, or even worse when people disagree on what names are appropriate for something. The world rarely allows you to neatly categorize large datasets. There are always outliers.
For example, you have a set of balls and you want to sort them by color. Where does orange stop and red begin? What about striped balls or ones with logos printed on them? What if it is a hypercolor ball that changes based on heat? It gets messy very fast.
Not everything has to be named once and put into a hierarchy like a directory tree. Tags work well for data. A system like an LLM that understands synonyms and antonyms should be able to find and even update tags for concepts that don’t have a full set already - as long as there are a few appropriate tags on the concept to start.
In practice if you're making up tags on the fly it's not much better than untagged data. A LLM that can figure out what the tags mean can probably just infer it from the data anyway.