Hacker News new | past | comments | ask | show | jobs | submit login

Without information about how identifiers are declared, you do not know how to parse this:

  (A)(B);
It could be a cast of B to type A, or function A being called with argument B.

Or this (like the puts(puts) in the article):

  A(B):
Could be a declaration of B as an identifier of type A, or a call to a function A with argument B.

Back in 1999 I made a small C module called "sfx" (side effects) which parses and identifies C expressions that could plausibly contain side effects. This is one of the bits provided in a small collection called Kazlib.

This can be used to make macros safer; it lets you write a #define macro that inserts an argument multiple times into the expansion. Such a macro could be unsafe if the argument has side effects. With this module, you can write the macro in such a way that it will catch the situation (albeit at run time!). It's like a valgrind for side effects in macros, so to speak.

https://git.savannah.gnu.org/cgit/kazlib.git/tree/sfx.c

In the sfx.c module, there is a rudimentary C expression parser which has to work in the absence of declaration info. In other words it has to make sense of an input like (A)(B).

I made it so that when the parser encounters an ambiguity, it will try parsing it both ways, using backtracking via exception handling (provided by except.c). When it hits a syntax error, it can backtrack to an earlier point and parse alternatively.

Consider (A)(A+B). When we are looking at the left part (A), that could plausibly be a cast or declaration. In recursive descent mode, we are going left to right and looking at left derivations. If we parse it as a declaration, we will hit a syntax error on the +, because there is no such operator in the declarator grammar. So we backtrack and parse it as a cast expression, and then we are good.

Hard to believe that was 26 years ago now. I think I was just on the verge of getting into Lisp.

I see the sfx.c code assumes it would never deal with negative character values, so it cheerfully uses the <ctype.h> functions without a cast to unsigned char. It's a reasonable assumption there since the inputs under the intended use case would be expressions in the user's program, stringified by the preprocessor. Funny bytes would only occur in a multi-byte string literal (e.g. UTF-8). When I review code today, this kind of potential issue immediately stands out.

The same exception module is (still?) used in the Ethereal/Wireshark packet capture and analysis tool. It's used to abort "dissecting" packets that are corrupt or truncated.






Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: