Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I can't find anything on how to design and implement anymore more than the barebones basics of a system.

All of this stuff (horse/horses etc) is extensively discussed, maybe look under "taxonomy" or "ontology".

Now, whether you want to use any of those solutions or not or find the discussion useful or not... if you aren't finding anything about it at all, you aren't looking in the right places.

(I learned about it in librarian school)



To be fair to OP, the biggest hurdle in learning anything is knowing what questions to ask. When you don't have ontology as part of your vocabulary it's hard to find literature regarding, say, "comparison of ontologies for user-generated text content".

I suppose this flows back into library science, which is all about systematizing where to look for answers to questions, but I'm always astonished to find that there's oceans of literature and research in questions I haven't even thought to ask.


I think OP is referring to finding software-engineering related design discussions surrounding tagging systems, but yes, I’m sure there is a great depth of ontology material and librarian knowledge that could add to software system designs.


(I learned about it in librarian school)

As the rest of us learned during the first tagging boom, the librarian is the natural apex predator of tagging.


I've been a librarian for more than 15 years and I can only speak from personal experience when I say that I am the apex predator of nothing. Every once and a while I will get it in my head to systematize my personal knowledge base with a controlled vocabulary and ontology and I just fall on my face. I really want it for some twisted reason, though.

Turns out LC subject headings -- for all their failures -- are pretty good.


Library of Congress classifications and subject headings (those are two separate things, for those unfamiliar) are not perfect, but they're pretty good, apply to a huge copus, and to my mind most importantly, have evolved over a bit over a century under numerous circumstances, including an absolute explosion of published materials, substantial changes to understanding organisation and classification of knowledge, and an awareness of the social and cultural aspects of these (as well as the institutional bias that's often embodied within them). That is, they have evolved a change management process.

The Classifications are substantively hierarchical, though that's really an outgrowth of the fact that they're used to locate books within physical shelf space, in which a record must occupy an address (physical space), and given that the Library's settled on subject classification as its storage and retrieval basis, this maps what's effectively a folded linear structure (shelf space) onto the multidimensional subject classification. It's not ideal, but it's workable. And many of the quirks of the LoCCS come out of the fact that it addresses both the composition (comprehensive, but still US-centred) and process (shelving, search, and retrieval) of the Library.

The Subject Headings are not hierarchical, though they're structured. In particular, they're relational, with numerous subject headings referring to others. There's some parent-child relations (though the top level hierarchy is broad), numerous retired classifications, and many "use that instead of this" notes.

(I've made ... some progress ... at a structured parsing of the subject headings, though that work's been stranded Because Reasons.)


I've learned to accept that my personal life and knowledge management is going to be a mess. (I'm also a librarian). I just don't want to do more organizing when I get home. I do also feel the temptation to do it 'right' once in a while, but it never sticks. I'd wager a lot of it has to do with the fact that managing an ontology completely on your own just sucks.


> controlled vocabulary

Are you using English? English words can almost mean whatever you want them to. Perhaps design your own language that removes ambiguity. Probably requires a knowledge of philosophy to distinguish between say concrete and abstract, good luck.

Maybe start with correcting the ontology of: https://cuberule.com/ (which takes a geometric approach to defining food types).

Also perhaps decide whether you want to work top-down like a directory tree (or Dewey Decimal?): resulting in standard book classification issues. Or bottom up: resulting in conflicts and discrepancies - https://news.ycombinator.com/item?id=33254025


> Perhaps design your own language that removes ambiguity.

That's what a controlled vocabulary is. It's essentially a set of tags which are clearly defined. So instead of #horses being defined purely by the word "horses," it has an attached definition along the lines of, "The category 'horses' includes equine biology, sports relating to horses, the cultural history of horses, and all other topics involving real horses. Metaphorical horses such as saw horses are not included." Tags like #horse would be redirected to #horses, since there is only one canonical horse tag in the vocabulary.


Librarians are the people that we (technologists) should learn from. But all I see is programmers trying to invent things from first principles.


Eh, as the librarian who wrote the post you're replying to... I am actually ambivalent.

I wish librarianship as a field and industry were more what I'd fantasize it should/could be, but it's not so much.


How so?

What's missing / what would you remove and/or change?


The problem isn't knowing what the problem is (taxonomy and ontology), but how to implement it effectively.

I've seen enough of Hillel's posts over the years that I am fairly sure he is aware of taxonomy/ontology too.


Yeah, the content for learning has been around for over a decade or mor

Plus we have plenty of content for AI now

https://towardsdatascience.com/machine-learning-classifiers-...


Can you link some resources about it then?


This is a good basic overview, goes beyond tagging/indexing, was the textbook in LIS501 Information Organization and Access at UIUC-GSLIS (now the iSchool at Illinois) in 2006:

https://mitpress.mit.edu/9780262512619/the-intellectual-foun...

Controlled vocab standards:

https://www.niso.org/publications/ansiniso-z3919-2005-r2010

(this one is deprecated in favor the one that follows)

https://www.niso.org/schemas/iso25964

https://www.w3.org/2004/02/skos/

The book we used in my thesaurus construction class at UIUC:

https://www.alastore.ala.org/content/essential-thesaurus-con...

My favorite intro to semantic modeling with RDF/OWL/SPARQL:

http://workingontologist.org/

Topic Maps are dead but i still have a soft spot for them:

https://www.isotopicmaps.org/

I also recommend Heather Hedden, linked in jrockhind's post.


I could, but honestly I'd just be googling "taxonomy". But ok that's not entirely true, I know how to refine my search and recognize when something is what I'm thinking of, from some familiarity with the field.

(But if you want to look around, in addition to "taxonomy" and "ontology", other good terms are "information architecture" and "controlled vocabulary").

These are not things I have vetted, this is literally just me googling and taking a quick skim...

https://blog.optimalworkshop.com/how-to-develop-a-taxonomy-f...

https://www.uxbooth.com/articles/introduction-to-taxonomies/

https://www.nngroup.com/articles/taxonomy-101/

http://accidental-taxonomist.blogspot.com/2020/11/what-it-th...

Or how about some textbooks:

https://narrowgaugebooks.indielite.org/book/9781627055802

https://www.hedden-information.com/accidental-taxonomist/


This is German, but I found it very good:

https://www.isi.hhu.de/fileadmin/redaktion/Fakultaeten/Philo...

Books:

* Cataloging the World

* Organising Knowledge. Taxonomies, Knowledge and Organisational Effectiveness

* The Intellectual Foundation of Information Organization

* The Oxford Guide to Library Research





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: