Hacker News new | past | comments | ask | show | jobs | submit login
Nastiest Python list comprehension ever (garlicsim.org)
56 points by cool-RR on Feb 25, 2011 | hide | past | favorite | 52 comments



    >>> def mystery(n):
It's a buggy implementation of primes_below(n) through a sieve.

"There should be one - and preferably only one - obvious way to do it." And this ain't it.


> Does anyone have a suggestion on how to do it?

Yes, provide more details so I don't have to spend twenty minutes figuring out what it does.

(Going by my three-second-look gut feeling: It's the Sieve of Eratosthenes)


Looking at it a bit more, it's sort-of-obvious that that's what it is, if you've seen the Sieve before. However:

> i for a[::i] in <whatever>

Mind blown. How does assigning to an array even work? I'm guessing this is way-undefined-speak for setting elements, and I am simultaneously dazzled by its brilliance and horrified by its stupidity.


Yes, that was the desired effect :)


yeah, i figured out what it did by playing in the repl, but i still can't find it documented anywhere.


I tried a bit in the REPL and couldn't replicate it completely, but I didn't try hard. It seems to only work because of the CPython internals, so it seems to be implementation-specific and undocumented.


Not true, it's part of Python's syntax. Here it is in Pypy:

        Python 2.5.2 (79560, Nov 26 2010, 14:57:55)
	[PyPy 1.4.0] on win32
	>>> a = [1, 2 ,3]
	>>> for a[:] in [[6, 7, 8]]:
	...     pass
	>>> print(a)
	[6, 7, 8]


So what does it do? Run the expression first and assign to whatever's there? It seems that it would only work to assign to a slice, as anything else would be overwritten, but why?


     (Going by my three-second-look gut feeling: It's the Sieve of Eratosthenes)
A broken one, this seems like advertising that you hole in one'd a sand trap.


Yes, but I could not understand why one needs

  list(range(n)) as opposed to range(n)
and why the double indexing of a in

  i in a[:][2:] if a[i] == i].
Am I missing something ? I think it would work without those.

Edit: Yo La Tengo. Deviousness aside the code is quite nice. It would be fun to optimize it but without breaking its spirit.


list(range(n)) is probably because the author was using Python 3, in which range() returns an iterator (think xrange in 2.x). The double indexing is probably because a[:] returns a copy of a and the author doesn't want to iterate over the modified array, since he modifies it down the line (although the [2:] would copy it too, so no idea).

An alternative to a[:][2:] would just be range(2, n), but the author clearly doesn't want it to be readable.


This is probably a Python 3 program. The list() wrapped around range() makes a list out of an iterator -- range() now returns an iterator in Py 3. See http://diveintopython3.org/porting-code-to-python-3-with-2to...


list(range(n)) -- I sometimes do this because I never can remember if they have finally gotten around to making range a generator yet, and better safe than sorry.

a[:][2:] -- maybe the author doesn't know that any slice returns a new list? Maybe he just wants to be even more mysterious?


The `list(range(n))` is for Python 3 compatibility :)

The [:] you mentioned is indeed redundant, I removed it. Thanks.


it generates a list of prime numbers up to 'n', including some false positives.

The author wasn't kidding about it being nasty to read...


Can somebody from the Perl community please but this guy in the right place by supplying a one-liner Perl golf quadruple map call?


You mean like this implementation of the Sieve of Eratosthenes?

  sub sieve3 {
      grep{@_[map$a*$_,$_..@_/($a=$_)]=0if$_[$_]>1}@_=0..pop
  }
I shamelessly stole that from http://www.perlmonks.org/index.pl/?node_id=81769. If you just want a short solution, but don't care about the algorithm, then the following works:

  sub sieve {
      sub p{$_[0],@_>1?p(grep$_%$_[0],@_):1}p 2..pop
  }
See http://www.perlmonks.org/index.pl/?node_id=81771 for the original.


Wow... and I was happy with:

    sub mystery{for($t=3;$t*$t<$_[0];$t+=2){if(!$c[$t]){for($s=$t*$t;$s<$_[0];$s+=$t*2)
    {$c[$s]++}}}$r[++$#r]='2';for($t=3;$t<$_[0];$t+=2){$c[$t]||push(@r,$t)}return(@r)};
(Based on http://www.c2.com/cgi/wiki?SieveOfEratosthenesInManyProgramm... but with a fun push alternative and in function form)


Any time you see a three arg for loop in perl golf, it's probably Doing It Wrong.


Are you sure that's Perl? Are you sure it's not... nothing?

Seriously, I cannot fathom why this would work, let alone how. The syntax is completely alien to me.


Add some spaces:

    sub sieve {
        sub p {
           $_[0] , @_>1 ? p(grep$_%$_[0],@_) : 1;
        }
        p(2..pop)
    }
And if you get rid of some terse syntax:

    sub p {
        my ($first, @rest) = @_;
        if (@rest > 1){
            return $first, p(grep { $_ % $first } @rest);
        }
        return 1;
    }

    sub sieve {
        my $arg = shift;
        return p(2..$arg)
    }

BTW, you can even have Perl do this automatically for you, by running:

    $ perl -MO=Deparse
    <paste in the code>
    sub p {
        $_[0], @_ > 1 ? p(grep(($_ % $_[0]), @_)) : 1;
    }
    sub sieve {
        p 2 .. pop @_;
    }
But honestly, the first program is not incomprehensible if you've ever programmed in Perl. Sure, it ain't literate programming, but it also isn't beyond understanding if you have three minutes to spend on it.

It's an amusement, not your payroll system, after all.


Thanks, I have indeed never programmed in perl and was feeling a bit facetious.


I am sure that this is Perl. Let me build it up as it executes.

First, pop returns the last element of an array. If no array is specified, @_ is used, which is the default array into which function arguments are passed. Therefore pop gets at the last argument. If this function is called with one argument, it gets at the first argument.

  sub sieve3 {
      pop
  }
Next .. is the range operator. It builds up a list from one number to another. In our case it will be building up a list of the possible numbers we will be searching for primes in.

  sub sieve3 {
      0..pop
  }
As I mentioned before, @_ is the default array. Let's assign that list there so that it is available in a convenient way. To get at element $n (in Perl, '$' basically means 'the') you would write $_[$n]. If you have an array @things, you can create a slice with @_[@stuff].

  sub sieve3 {
      @_=0..pop
  }
Next what grep does is run some piece of code for every element in a list and returns the values that were true. The list element is always aliased to $_. (Default things are named _, and as before, $ means the. So $_ is "the default thing.")

  sub sieve3 {
      grep{ ... test $_'s primality ... }@_=0..pop
  }
We started with an array @_. The algorithm is that we'll go through that list, every time we encounter a prime we drop all of the multiples from the list and have a true value so we encounter it. Every time we encounter something that has been struck out we see that it has been struck out, and that test is false.

We will be striking out elements by assigning 0 or undef to them, this test for whether we're less than 2 eliminates 0, 1, and all previously struck out non-primes.

  sub sieve3 {
      grep{ ... filter out multiples of $_ ... if$_[$_]>1}@_=0..pop
  }
We'll do the striking by assigning 0 to a list of multiples. This sets the first to 0 and the rest to undef.

  sub sieve3 {
      grep{@_[ ... multiples of $_ ... ]=0if$_[$_]>1}@_=0..pop
  }
We need a list of all of the numbers from $_ to the number of elements over $_. Luckily @_ used in a calculation tells us the number of elements it has.

  sub sieve3 {
      grep{@_[... turn into list of multiples... $_..@_/$_]=0if$_[$_]>1}@_=0..pop
  }
Unfortunately the code that we'll use to make the list of multiples will reuse the default variable $_, so we need to alias this $_ to a new variable, $a.

  sub sieve3 {
      grep{@_[ ... turn into list of multiples ... $_..@_/($a=$_)]=0if$_[$_]>1}@_=0..pop
  }
And now we introduce map. Map is like grep, except that it transforms elements in place. The transformation here is that we want $a * $_. Note that there are 2 loops here, one from the grep and one from the map. $a is the variable from the grep, and $_ from the map.

  sub sieve3 {
      grep{@_[map$a*$_,$_..@_/($a=$_)]=0if$_[$_]>1}@_=0..pop
  }
And now for the nastiest bit (you thought we were done??). Grep evaluates that slice in scalar context. In scalar context an array tells you the size of the array. The slice we're assigning to always has size at least 1 because we're assigning 0 to it, so whether or not the list of indexes is 0 we get a true value. (Don't worry if you didn't understand that, the upshot is that we're returning a true value when we see primes, so that grep will return them.)

And there is the entire Sieve of Eratosthenes. In surprisingly little Perl.


Huh, it actually makes sense now, thank you.


Good point. "Least readable Python code" is a distinction similar to "smallest market cap on the Fortune 50" or "fattest Olympic decathlete."


I disbelieve. Which would you prefer to read to get the output of a system call?

  `your system call and args`

  subprocess.Popen(['your', 'system', 'call', 'and', 'args'], stdout=subprocess.PIPE).communicate()[0]
The former is the traditional syntax supported in various shells, Perl, and Ruby. The latter is what you need if you want your code to run in Python 2.6.

And this is before we get into the fact that the minutia of syntax and layout is only a minor factor in the maintainability of large projects.


Python still has (even in 3.2) os.system, which is the equivalent of the backticks from perl/shell. The subprocess module is preferred because you have more control and options than simply deferring to the C library's system() call.


Python still has (even in 3.2) os.system, which is the equivalent of the backticks from perl/shell.

Sorry, but you're wrong. The os.system call is the equivalent of Perl's system call. It runs an external command and gives you a return code, but does not give you the text generated by said system call. The officially recommended (as of Python 2.6) way to get said text is what I wrote.

The deprecated but possibly still working solution is to use os.popen. That is cleaner for this common use case, but I chose to stick to the the officially recommended solution instead.

Eventually the Python folks realized that the blessed One True Way to do it was actually pretty horrible, and so they added subprocess.check_output in 2.7.


You can pass shell=True if you want it too:

subprocess.check_output("ls -al", shell=True)

Note the possible resulting security vulnerabilities.


Yes, you're correct (it's been a while since I had to use that, should have checked, and I can't edit the comment).

os.popen() is still available in python3, and uses subprocess.Popen() internally.


That is certainly ugly, but sadly it's the option I prefer nowadays. In perl I use 3 argument open with a pipe instead of backticks. Especially if any of your args come from a variable--suddenly you have to think of shell quoting in your perl code. Yuck.

Perl desperately needs a built-in backtick() function:

    my $x = backtick(qw(my-program -l -V), "--option=$something", @files);
Maybe in perl you'd call it ``() or something instead of backtick().


You can do that quite cleanly in Python:

import commands print commands.getoutput("your system call and args")


You picked an example where Python took a step backwards in readability. That's fair but surely not typical. I agree it was a loss since I'm a huge fan of language usability-- in Python 2.5 os.popen is really rather simple to read and use.


shells were specifically made to operate with external programs (the Unix way) while Python is a more general programming language, so naturally it will be more verbose for this kind of task.

Of course the first syntax breaks when one of the arguments has a space or any other meta-character in it.


I'm fully aware of the reasons why things are as they are, and what the limitations of the different solutions are. My point still stands though, just because something is Python, written in the One True Way, does not guarantee readability.

To be absolutely clear, I actually like Python. I just strongly disagree with the belief that Python magically ensures readability. Particularly because my impression is that people who think that Python magic pixie dust automatically makes for good code are actually more likely to write bad code. (Because they have not thought deeply enough about what makes code good.)


Of course Python doesn't automatically ensure good code. But the Python code in your example is readable -- pretty match as readable as it gets, considering the constraints I mentioned.


I still disbelieve. Ruby is as much a general programming language as Python, yet supports a syntax that I find much more readable. I even find the deprecated os.popen interface in Python to be more readable than the official Python 2.6 interface. And, of course, the Python 2.7 syntax is better.

Therefore I can't agree that the Python that I presented is as good as is possible for a general purpose programming language.


I'd have to disagree. In this case, the backticks are quite idiomatic, whereas the Python is less so.

The Python example can be broken down by someone who knows OOP concepts in general, but not Python. It says:

Invoke function Popen on the object subprocess, giving the arguments (for example) 'ls' and '-al' in a list. Standard output should be handled in some way that is indicated by the value of PIPE. On the object returned from Popen, call communicate(), which returns an array; take the first element off that returned array.

Now, in order to make sense of this you would need to know what the arguments to Popen are, what communicate() means and what it returns.

Does that require a trip to the documentation? Yes. Is it less readable? No, it's more readable. The understanding of what was read is not immediately conveyed, however.

The Ruby/Perl/Bash backticks mean nothing even if you have an OO background. Maybe if the command is 'ls' and you know some Unix, you can work it out in your head to understand that it's invoking a command in the shell. What if it's calling something obscure with a string of arcane arguments appended? That will tell nothing, and the backticks will also add zero information.

Backticks are more convenient, but readability is the ability of someone who doesn't know that particular concept, or even that language, to still parse a statement and extract some level of understanding.


Disagree with which? That the Python version is as readable as it could be? Don't argue that with me, argue with the decision to add a utility method that makes Python 2.7 better.

As for readability, I emphatically disagree with you on how it should be defined. Readability is the result of the interaction between the programmer and the code. The interaction cannot be left out of this. There is such a thing as well-written, readable Greek. But if you don't speak Greek, it will all be Greek to you. COBOL was designed to be readable. Can you read any significant COBOL program? Old FORTRAN programmers complained that Smalltalk hideously unreadable because they didn't understand the basic concepts of OO.

That said, something that is long and full of moving parts is inherently a challenge to read. Even if you understand every piece of that Python example (I happen to), when you encounter it in code there is a lot of semantic noise to what is written. That complexity can and should be hidden. As the Python 2.7 API does.


I think you've mistaken my argument. I didn't say that the Python version looks good or is the best way to do it. It's not.

I was saying that "something that is long and full of moving parts" is easier to read than something that is short and idiomatic. I am not saying either of them represent the ideal way to do it.

My definition of readability includes the ability to convey the programmer's intention to the mind of the reader. The reader should be able to work out what is intended by the code with minimal requirements placed on them, and I think I detailed earlier why I think the Python code does a better job than simple surrounding a shell command with backticks.


The reader should be able to work out what is intended by the code with minimal requirements placed on them....

That's where the argument goes wrong, if the idea of "minimal requirements" means "avoid idioms".


Well, I'd welcome an explanation of your opinion here.


Unless I'm writing training material for novices, I write code with the assumption that anyone else maintaining it is reasonably competent with the languages, libraries, and tools in question. That to me is a minimal requirement.


I think it's a testimony to the readability of Python that I've never even programed in the language, nor am I familiar with the finer points of the list syntax (a[::2]?!), and yet this mucked up piece of shit nonetheless gave me the first impression of a prime number sieve. I actually still don't understand how the sieve works, but it somehow just looks like one. Or is it just the case that assuming every obfuscated loop is a prime number sieve is disproportionately likely to be true?


Does that even run? :)

Nested comprehensions are one of the things I hate about Python actually. The functions in itertools are so much clearer. I can't really think of a good reason to use nested comprehensions... they become unreadable really fast!


The nesting of list comprehension is just an aperitif in this function.


Corrected version:

    def mystery(n):
         a = list(range(n))
         return [[i for a[::i] in [([0]*n)[::i]]][0] for
             i in a[2:] if a[i]]


Isn't the point of a list comprehension - comprehension?

(It's a paraphrase of something Feynman once said to Murray Gell-Mann.)


Wow. I hadn't realized that syntax like "for a[::i] in" is valid Python. Attribute assignment (for x.foo in) seem to work too. I'm not sure if there will ever be any reason to use those in real code, but it's nice trivia to know for Python golfing.


This is why moving away from higher order functions and towards list comprehensions is a mistake.

Bad BDFL, bad!


What, you mean because HOFs can't be tortured into unreadability?

Readability requires that the programmer conspires with both the language and the reader. Unreadable code can be written in any style, any paradigm and any language. Sometimes it doesn't even take effort, talent or skill.


My favorite to date is:

bps = [map(int, bps.split('/')) if isinstance(bps, str) else bps for bps in bps if bps]

The clause "else bps for bps in bps if bps" made me incredibly incredibly happy.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: