Let me clarify—could you substantiate the performance claims? I have no doubt it's a correct preprocessor, but I do have doubts about performance claims in the context of the compilation of a project. cpp (and the rest of the gnu compiler toolchain) itself is a pretty terrible example of a well-written program, but clang is a well-written, holistic, performance-centered program, and it would strike me as difficult for an isolated preprocessor to improve significantly on it compared to the built in one.
Or another way of putting this is—you speak compellingly about both the algorithm side and the constant side (no tokenizing), but you don't speak about "real world" performance when interacting with many levels of caches, processes, and filesystems. Could you speak to this at all?
I've written a couple of macro processing programs, including a make, so I know this kind of program can work with that.