Goto decorator for python

pavpanchekha · on March 11, 2012

How this works, for the interested:

Python internally compiles to a bytecode for a stack-based virtual machine. If you import the module `dis` and execute `dis.dis` on a function, you can see this bytecode.

Like most bytecode VMs, there are jump instructions for compiling for and while loops to. For example,

  def foo(n):
      a, b = 0, 1
      for i in range(n):
          a, b = b, a + b
      return a

compiles to

  2           0 LOAD_CONST               3 ((0, 1)) 
              3 UNPACK_SEQUENCE          2 
              6 STORE_FAST               1 (a) 
              9 STORE_FAST               2 (b) 
  
  3          12 SETUP_LOOP              37 (to 52) 
             15 LOAD_GLOBAL              0 (range) 
             18 LOAD_FAST                0 (n) 
             21 CALL_FUNCTION            1 
             24 GET_ITER             
        >>   25 FOR_ITER                23 (to 51) 
             28 STORE_FAST               3 (i) 
  
  4          31 LOAD_FAST                2 (b) 
             34 LOAD_FAST                1 (a) 
             37 LOAD_FAST                2 (b) 
             40 BINARY_ADD           
             41 ROT_TWO              
             42 STORE_FAST               1 (a) 
             45 STORE_FAST               2 (b) 
             48 JUMP_ABSOLUTE           25 
        >>   51 POP_BLOCK    
  
  5     >>   52 LOAD_FAST                1 (a) 
             55 RETURN_VALUE

Note the `JUMP_ABSOLUTE` bytecode at position 48 (which jumps to the `FOR_ITER` at position 25). What we want to do is insert more of these instructions by manually modifying bytecode (something Python allows us to do).

The rest is a simple job of identifying the syntax (`goto .x` is really `goto.x`, an attribute access, which is simple to identify as the `LOAD_ATTR` instruction) and inserting the proper jumps. So in fact this is a pure-python modification, which works solely by changing how Python compiles to bytecode.

majke · on March 11, 2012

That's not exactly a new idea: http://entrian.com/goto/

Jokes aside, Kay Schluehr did a package called generator_tools: http://www.fiber-space.de/generator_tools/doc/generator_tool...

I used it to do iterators serialisation in Python, which is quite an ambitious task (BTW, it should be supported by the language really).

gaius · on March 12, 2012

From the linked Activestate page:

This is the recipe for you if you are sick of the slow speed of the existing goto module http://entrian.com/goto/. The goto in this recipe is about 60x faster

rcthompson · on March 11, 2012

How would you support a generator that reads lines from a file, or something else that depends on some volatile external state?

rmc · on March 11, 2012

Obviously goto is a bad idea. However this code implies it's possible to add new syntax to Python from within Python. That is pretty cool.

haberman · on March 12, 2012

"Obviously" goto is bad, but to quote one of Lua's designers, continuations are much worse, and yet it's considered "cool" to support continuations. It's programmer groupthink; one very smart person writes a famous article about how goto is bad and forevermore it's "obviously" bad. Goto is sometimes useful, and like lots of things it can be abused.

swah · on March 12, 2012

I can't imagine a clearer way of treating errors in C than the macros from Zed Shaw, which use goto (http://c.learncodethehardway.org/book/learn-c-the-hard-waych...):

        int test_check(char *file_name)
        34  {
        35      FILE *input = NULL; 
        36      char *block = NULL;
        37  
        38      block = malloc(100); 
        39      check_mem(block); // should work
        40  
        41      input = fopen(file_name,"r"); 
        42      check(input, "Failed to open %s.", file_name);
        43  
        44      free(block); 
        45      fclose(input);
        46      return 0;
        47  
        48  error:
        49      if(block) free(block); 
        50      if(input) fclose(input);
        51      return -1;
        52  }

haberman · on March 12, 2012

Yes, the Linux kernel has used this error handling strategy for a long time (http://kerneltrap.org/node/553/2131). A variant on this is to have multiple error labels so that you uninitialize the correct set of things without the if statements (which may not be possible in cases where the resource is represented by an opaque handle).

edanm · on March 12, 2012

And the reason the smart person decided Goto was bad has little relevance on programming today - he showed that Goto made it harder to prove, mathematically, that a program was correct.

When was the last time someone proved a program correct?

indiecore · on March 12, 2012

Several seconds ago I would assume. Just because real computer science doesn't get talked about much here doesn't mean that it doesn't exist.

As an addendum GOTO is still bad because it is much harder to parse (as a human reading the code) than using equivalent branching control statements. That was the whole point of Goto considered harmful, it's not mathematically needed so it's not syntactically needed either.

onedognight · on March 11, 2012

There's no new syntax. Notice what the labels and gotos look like:

   label  .error
   goto   .error

They are just method accesses on the global variables "label" and "goto". The OP just looks through the compiled byte code for access to them, saving the locations of the labels before replacing them with NOPs and replacing the gotos with jumps to the labels' locations. Cute.

stryker · on March 11, 2012

I too am more interested in the bytecode hackery that's going on here than the fact that it is a goto implementation. You might be interested in byteplay (http://code.google.com/p/byteplay/), which is a module that tries to simplify this kind of process.

Two points: 1) Do not take the presented code as evidence that arbitrarily new syntax can be created. Any code that raises a SyntaxError will still be invalid.

2) I believe there is a legitimate reason to use bytecode hackery -- I've come across it many times: the slowness of Python.

For example, let's say I have three functions as follows.

def f(a): return a + 1

def g(a): return 2a

def h(a): return 2a + 1

It's clear that f(g(x)) == h(x) for all values of x. But h will run faster (marginally in this case, but try to imagine more complicated cases) because the Python interpreter will never make such an optimization. Function calls will be made, frames generated, etc. -- and that's where bytecode hackery comes in! Imagine pasting f and g together so that I never have to write h but still get the benefit of speed.

I do recognize that in many cases, it may be preferable to just use a more low-level language to optimize away such functions, but I always like to envision a pure Python solution first.

baq · on March 11, 2012

your example breaks down when objects overriding __mul__ and __add__ are passed as a. that's the most important reason python doesn't do anything with code like this. you want a jit doing specialization and inlining for you (see tracing jit in pypy or, to a lesser extent, psyco.)

anonymoushn · on March 11, 2012

For what value of __mull__ and __add__ does f(g(x)) produce a different result than h(x)? You could do this by making __mul__ or __add__ change behavior based on stack inspection, but I am having trouble thinking of another way.

alextgordon · on March 12, 2012

A backtrace would be different, for a start. If __mul__ raises an exception, it would not show as coming from g(). Makes debugging a little bit harder.

icebraining · on March 11, 2012

Yeah, although it uses implementation-dependent bytecode, so it may not work on e.g. PyPy.

sambeau · on March 12, 2012

Writing an efficient lexer without goto is a complete nightmare.

etfb · on March 12, 2012

I recall an article, possibly in one of the April (spoof) editions of Computer Language magazine, outlining a method of adding GOTO to Forth. Probably considerably easier in principle - a buffer containing unresolved jumps, a support function to check the buffer and fill in the actual BRANCH addresses, and a couple of immediate words GOTO and LABEL to populate the buffer and call the support function. So provided your buffer data structure was clever enough and your GOTOs always matched a LABEL somewhere or other, it would handle forward and backward jumps easily. Evil, evil, evil.

tlb · on March 11, 2012

Neat. I'd sometimes wondered if munging the AST in a decorator to add some new control flow was practical. The code looks a bit fragile but not excessively painful.

albertzeyer · on March 12, 2012

The problem is that in the decorator, you just get the compiled function. So you can modify the bytecode but not operate on the AST anymore.

I thought about this and wrote a patch for CPython that it also stores the original AST of any compiled function in the function object.

Some discussion about this:

http://thread.gmane.org/gmane.comp.python.devel/126754

http://thread.gmane.org/gmane.comp.python.devel/126760

sitkack · on March 12, 2012

You can find the line number that the func was defined in and reparse the source. I do this in quite a few decorators that need source.

The other option is to wrap the thing decorated in a doc string

    @fancy_source_needing_decorator
    def foo(s):
        """
        put some haskell or DSL stuff in here
        """

Then you don't have to reparse the source, etc etc, but it still gets packaged as a function.

drx · on March 12, 2012

Would you mind linking to some examples? I'd love to read them.

cheatercheater · on March 13, 2012

I remember this being used in a unit testing framework of some sort.. maybe twisted's?

lloeki · on March 12, 2012

Much more ancient (2004) goto, comefrom and label keywords for Python: http://entrian.com/goto/

worldimperator · on March 12, 2012

Wow, very creative ! Made my day :-D

zacharyvoase · on March 12, 2012

Will this work in PyPy?

albertzeyer · on March 12, 2012

Yes. PyPy supports the CPython bytecode.

btown · on March 12, 2012

Wait, is it April 1 already?