I think, though, that most instruction texts do the reader a disservice by starting with LZW (and thus having to deal with the kwkwk case from the get go). They should start with LZ78, which is much more simple and elegant; then show how -- although unimportant asymptotically -- we can save some bits by using Welch's trick (W in LZW), and show how that gives rise to the special case which needs to be handled.
Somewhat unexpectedly (for me anyway), if you write the decompressor as a memory stream decompressor rather than as a dictionary decompressor -- that is, instead of putting <w>,k pairs into your decoder dictionary while decoding, you note the output <offset> of where that <w> first appeared - there is no special case, and the implementation goes back to being LZ78 elegant and uniform.
I think, though, that most instruction texts do the reader a disservice by starting with LZW (and thus having to deal with the kwkwk case from the get go). They should start with LZ78, which is much more simple and elegant; then show how -- although unimportant asymptotically -- we can save some bits by using Welch's trick (W in LZW), and show how that gives rise to the special case which needs to be handled.
Somewhat unexpectedly (for me anyway), if you write the decompressor as a memory stream decompressor rather than as a dictionary decompressor -- that is, instead of putting <w>,k pairs into your decoder dictionary while decoding, you note the output <offset> of where that <w> first appeared - there is no special case, and the implementation goes back to being LZ78 elegant and uniform.