Hacker News new | past | comments | ask | show | jobs | submit login
Goto decorator for python (github.com/albertz)
127 points by phreeza on March 11, 2012 | hide | past | favorite | 28 comments



How this works, for the interested:

Python internally compiles to a bytecode for a stack-based virtual machine. If you import the module `dis` and execute `dis.dis` on a function, you can see this bytecode.

Like most bytecode VMs, there are jump instructions for compiling for and while loops to. For example,

  def foo(n):
      a, b = 0, 1
      for i in range(n):
          a, b = b, a + b
      return a
compiles to

  2           0 LOAD_CONST               3 ((0, 1)) 
              3 UNPACK_SEQUENCE          2 
              6 STORE_FAST               1 (a) 
              9 STORE_FAST               2 (b) 
  
  3          12 SETUP_LOOP              37 (to 52) 
             15 LOAD_GLOBAL              0 (range) 
             18 LOAD_FAST                0 (n) 
             21 CALL_FUNCTION            1 
             24 GET_ITER             
        >>   25 FOR_ITER                23 (to 51) 
             28 STORE_FAST               3 (i) 
  
  4          31 LOAD_FAST                2 (b) 
             34 LOAD_FAST                1 (a) 
             37 LOAD_FAST                2 (b) 
             40 BINARY_ADD           
             41 ROT_TWO              
             42 STORE_FAST               1 (a) 
             45 STORE_FAST               2 (b) 
             48 JUMP_ABSOLUTE           25 
        >>   51 POP_BLOCK    
  
  5     >>   52 LOAD_FAST                1 (a) 
             55 RETURN_VALUE                 
Note the `JUMP_ABSOLUTE` bytecode at position 48 (which jumps to the `FOR_ITER` at position 25). What we want to do is insert more of these instructions by manually modifying bytecode (something Python allows us to do).

The rest is a simple job of identifying the syntax (`goto .x` is really `goto.x`, an attribute access, which is simple to identify as the `LOAD_ATTR` instruction) and inserting the proper jumps. So in fact this is a pure-python modification, which works solely by changing how Python compiles to bytecode.


That's not exactly a new idea: http://entrian.com/goto/

Jokes aside, Kay Schluehr did a package called generator_tools: http://www.fiber-space.de/generator_tools/doc/generator_tool...

I used it to do iterators serialisation in Python, which is quite an ambitious task (BTW, it should be supported by the language really).


From the linked Activestate page:

This is the recipe for you if you are sick of the slow speed of the existing goto module http://entrian.com/goto/. The goto in this recipe is about 60x faster


How would you support a generator that reads lines from a file, or something else that depends on some volatile external state?


Obviously goto is a bad idea. However this code implies it's possible to add new syntax to Python from within Python. That is pretty cool.


"Obviously" goto is bad, but to quote one of Lua's designers, continuations are much worse, and yet it's considered "cool" to support continuations. It's programmer groupthink; one very smart person writes a famous article about how goto is bad and forevermore it's "obviously" bad. Goto is sometimes useful, and like lots of things it can be abused.


I can't imagine a clearer way of treating errors in C than the macros from Zed Shaw, which use goto (http://c.learncodethehardway.org/book/learn-c-the-hard-waych...):

        int test_check(char *file_name)
        34  {
        35      FILE *input = NULL; 
        36      char *block = NULL;
        37  
        38      block = malloc(100); 
        39      check_mem(block); // should work
        40  
        41      input = fopen(file_name,"r"); 
        42      check(input, "Failed to open %s.", file_name);
        43  
        44      free(block); 
        45      fclose(input);
        46      return 0;
        47  
        48  error:
        49      if(block) free(block); 
        50      if(input) fclose(input);
        51      return -1;
        52  }


Yes, the Linux kernel has used this error handling strategy for a long time (http://kerneltrap.org/node/553/2131). A variant on this is to have multiple error labels so that you uninitialize the correct set of things without the if statements (which may not be possible in cases where the resource is represented by an opaque handle).


And the reason the smart person decided Goto was bad has little relevance on programming today - he showed that Goto made it harder to prove, mathematically, that a program was correct.

When was the last time someone proved a program correct?


Several seconds ago I would assume. Just because real computer science doesn't get talked about much here doesn't mean that it doesn't exist.

As an addendum GOTO is still bad because it is much harder to parse (as a human reading the code) than using equivalent branching control statements. That was the whole point of Goto considered harmful, it's not mathematically needed so it's not syntactically needed either.


There's no new syntax. Notice what the labels and gotos look like:

   label  .error
   goto   .error
They are just method accesses on the global variables "label" and "goto". The OP just looks through the compiled byte code for access to them, saving the locations of the labels before replacing them with NOPs and replacing the gotos with jumps to the labels' locations. Cute.


I too am more interested in the bytecode hackery that's going on here than the fact that it is a goto implementation. You might be interested in byteplay (http://code.google.com/p/byteplay/), which is a module that tries to simplify this kind of process.

Two points: 1) Do not take the presented code as evidence that arbitrarily new syntax can be created. Any code that raises a SyntaxError will still be invalid.

2) I believe there is a legitimate reason to use bytecode hackery -- I've come across it many times: the slowness of Python.

For example, let's say I have three functions as follows.

def f(a): return a + 1

def g(a): return 2a

def h(a): return 2a + 1

It's clear that f(g(x)) == h(x) for all values of x. But h will run faster (marginally in this case, but try to imagine more complicated cases) because the Python interpreter will never make such an optimization. Function calls will be made, frames generated, etc. -- and that's where bytecode hackery comes in! Imagine pasting f and g together so that I never have to write h but still get the benefit of speed.

I do recognize that in many cases, it may be preferable to just use a more low-level language to optimize away such functions, but I always like to envision a pure Python solution first.


your example breaks down when objects overriding __mul__ and __add__ are passed as a. that's the most important reason python doesn't do anything with code like this. you want a jit doing specialization and inlining for you (see tracing jit in pypy or, to a lesser extent, psyco.)


For what value of __mull__ and __add__ does f(g(x)) produce a different result than h(x)? You could do this by making __mul__ or __add__ change behavior based on stack inspection, but I am having trouble thinking of another way.


A backtrace would be different, for a start. If __mul__ raises an exception, it would not show as coming from g(). Makes debugging a little bit harder.


Yeah, although it uses implementation-dependent bytecode, so it may not work on e.g. PyPy.


Writing an efficient lexer without goto is a complete nightmare.


I recall an article, possibly in one of the April (spoof) editions of Computer Language magazine, outlining a method of adding GOTO to Forth. Probably considerably easier in principle - a buffer containing unresolved jumps, a support function to check the buffer and fill in the actual BRANCH addresses, and a couple of immediate words GOTO and LABEL to populate the buffer and call the support function. So provided your buffer data structure was clever enough and your GOTOs always matched a LABEL somewhere or other, it would handle forward and backward jumps easily. Evil, evil, evil.


Neat. I'd sometimes wondered if munging the AST in a decorator to add some new control flow was practical. The code looks a bit fragile but not excessively painful.


The problem is that in the decorator, you just get the compiled function. So you can modify the bytecode but not operate on the AST anymore.

I thought about this and wrote a patch for CPython that it also stores the original AST of any compiled function in the function object.

Some discussion about this:

http://thread.gmane.org/gmane.comp.python.devel/126754

http://thread.gmane.org/gmane.comp.python.devel/126760


You can find the line number that the func was defined in and reparse the source. I do this in quite a few decorators that need source.

The other option is to wrap the thing decorated in a doc string

    @fancy_source_needing_decorator
    def foo(s):
        """
        put some haskell or DSL stuff in here
        """
Then you don't have to reparse the source, etc etc, but it still gets packaged as a function.


Would you mind linking to some examples? I'd love to read them.


I remember this being used in a unit testing framework of some sort.. maybe twisted's?


Much more ancient (2004) goto, comefrom and label keywords for Python: http://entrian.com/goto/


Wow, very creative ! Made my day :-D


Will this work in PyPy?


Yes. PyPy supports the CPython bytecode.


Wait, is it April 1 already?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: