Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> So lemmatizing is rules based, plus more.

Fundamentally, the rule of lemmatizing is that you encounter a word, you look it up in a table, and your output is whatever the table says. There are no other rules. Thus, the lemma of seraphim is seraph and the lemma of interim is interim. (I'm also puzzled by your invocation of "context", since this is an entirely context-free process.)

There has never been any period in linguistic analysis or its ancestor, philology, in which this wasn't done. The only reason to do it on a computer is that you don't have a digital representation of the mapping from token to lemma. But it's not an approach to language processing, it's an approach to lack of resources.




We don't disagree. A look up table with exact rules is a rules system to me from an NLP/GOFAI perspective. I was aware of how the libraries tend to work because I had often used things like looking up lemmas/word sense/pos in NLTK and Spacy in the past, and I know the libraries code fairly well.

Context today may mean more (e.g. the whole sentence, or string, or the prompt context for an LLM), and obviously context has a meaning in computational linguistics (e.g. "context free grammar"), but the point here is stemmers arbitrary follow the same process without a second stage. If a stemmer encounters "best" and "good" it by definition does not have a stage to use the same lemma for them. Context is just one of those overloaded terms unfortunately.

Lemmatizing, in terms of how it works on simple scenarios (lets imagine reviews) helps to lump those words together and correctly identify the proportion of term frequencies for words we might be interested in more consistently than stemming can. It's still limited by using word breaks like spaces or punctuation ofcourse.


I see your point about context-free table lookup, but it looks to me as though authorfly's distinctions would apply to how the tables get written in the first place.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: