As an aside, I've noticed that some modern compilers (Go and Clang are the ones I've studied recently) bundle the tokenization phase and the lexing phase into a higher-level lexer. Which is to say that instead of, say, turning a token pair (PUNCTUATION, "&&") they produce (LOGICAL_AND, "&&"), for example. It makes sense, but surprised me, since if you look at classical compiler books, they generally promote the more formal two-level pipeline of tokenization before lexical analysis.