Could you substantiate this claim? I don't necessarily not believe you, but it's an easy claim to make without any quantifiable argument. It's also hard to test against eg clang, where preprocessing is performed in the same process as the compiler and assembler, and the performance of a preprocessor is really less interesting than total compile time.
Let me clarify—could you substantiate the performance claims? I have no doubt it's a correct preprocessor, but I do have doubts about performance claims in the context of the compilation of a project. cpp (and the rest of the gnu compiler toolchain) itself is a pretty terrible example of a well-written program, but clang is a well-written, holistic, performance-centered program, and it would strike me as difficult for an isolated preprocessor to improve significantly on it compared to the built in one.
Or another way of putting this is—you speak compellingly about both the algorithm side and the constant side (no tokenizing), but you don't speak about "real world" performance when interacting with many levels of caches, processes, and filesystems. Could you speak to this at all?