Notice how "morpheme" gets broken into three tokens, and none of them matches "morpheme"'s two morphemes. "Lexical" and "units" are each a single token, when they have three and two morphemes respectively.
Or in French, the word "cafetière" gets chopped willy-nilly into "c/afet/ière". The canonical breakdown is "cafe/t/ière".
In either case, isn't this something we already do well?