Python 3.6 – Asynchronous Comprehensions

1st1 · on Dec 10, 2016

Surprised to see this here :) I'm the author of the PEP, feel free to ask questions.

sagnew · on Dec 10, 2016

Just wanted to say excellent work. This is some convenient wizardy!

1st1 · on Dec 11, 2016

Thanks ;)

orf · on Dec 11, 2016

Thank you so much for the pep, I've just finished writing a pretty neat tool using all the new 3.6 features and they are awesome, especially the comprehensions.

Thanks :)

1st1 · on Dec 11, 2016

Thanks! Did you use new async generators in your tool?

jchassoul · on Dec 11, 2016

I just want to say thank you for make this possible! this are the progressive little things that are not always gloriously announce that keep making python what it is step by step.

Animats · on Dec 11, 2016

When is this useful? There's no real multiprocessing concurrency in Python. So using this on a computational example such as the one given:

    result = [i async for i in aiter() if i % 2]

won't speed anything up or put multiple CPUs to work on it.

Python's new "async" features are useful mostly for servers running many network connections simultaneously. Too many for one thread per connection. That use case used to be handled by "Twisted Python", but now there's language support. That's useful. But a list comprehension doesn't make sense for that use case. The PEP lacks any useful example.

The motivation seems to be to make async comprehensions have the same capabilities as regular comprehensions, even though that's not particularly useful.

yladiz · on Dec 11, 2016

I could see it being useful for gathering data via an API and returning it in an array, like if an API doesn't return an array of JSON objects but just an object, this could be useful for that use case. In general as long as it's not processor bound like computational tasks but may not be synchronous this could be useful. It also naturally extends the async for syntax to list comprehension.

It would be really nice if somehow these kinds of tasks could be tied into the multiprocessing package, so that async could handle processor intensive applications well.

rjbwork · on Dec 11, 2016

This. This is essentially what you can do with the AsParallel() LINQ method, or iterating over an IEnumerable of tasks generated by async methods does in C#.

spangry · on Dec 11, 2016

What about 'as_completed' in concurrent.futures? From my understanding, they fixed the performance issues with ProcessPoolExecutor in 3.4.

m_mueller · on Dec 11, 2016

I recently tried to parallelize some non trivial code I have with multiprocessing.map/imap/imap_unordered - the state of parallel computing in python is really poor have to say. If the program's CPU runtime has gotten so long that you'd think about parallelizing, its datastructures tend to be too much to push through the pickle bottleneck and no gain would be had. Except of course number crunching stuff for which there's numpy. Bottom line is, you can do parallel computing, but trying to parallelize after the fact will usually fail, you have to plan for it from the very beginning due to no shared memory, which would add significant time until your first version is up and running

I long for a dynamic but strictly typed language with rich standard lib that gets this right. Nim?

bmh100 · on Dec 11, 2016

You should consider Clojure with its optional typing library. I used Clojure, and later converted a single-threaded application into multi-threaded with dramatic gains. I did not need to plan in advance for parallelization and did not do so. Despite that, the conversion was fairly pain free. The application did the following:

1. read csv 2. create data structure 3. calculate statistics for each column (max values, unique values, etc.) 4. combine those statistics with original columns to generate new columns 5. write new columns to new csv

Processing each type of statistics in its own thread, combined with promises, allowed the processors to independently work on the same data structure. Because of the need to handle non-numeric values in numeric columns, e.g. convert nulls to zeroes, matrix manipulations would not have been appropriate.

I just wanted to point out a type of situation from my experience that goes against your generalization, to give a more complete picture.

m_mueller · on Dec 11, 2016

Thanks, I'll consider clojure in the future. How would you say its standard lib and third party libraries hold up against python?

With my point about the need to plan ahead I specifically meant python by the way, and I think all runtimes that can't make use of concurrent multithreading have this in common. As soon as you can do multithreading with shared memory you become a lot more flexible in parallelising later on, and you can adjust your code style accordingly to accommodate this without much overhead. That's why the GIL IMO is so unfortunate, it's a real shame that Python 3 wasn't able to get rid of this when it started with a clean slate.

bmh100 · on Dec 11, 2016

I see now what you meant. I apologize for my misunderstanding in that to be a general statement. Knowing that, I agree with your statements much more now. From what I can tell, the multi-process architecture in Python would require everything you mentioned. Not removing the GIL is indeed unfortunate.

Clojure is niche in a way. It is a Lisp dialect. However, it is based on the JVM and has built-in conveniences for using Java functions, classes, and libraries. Therefore, comparing libraries for enterprise applications is more accurately Clojure+Java vs. Python. My expertise is not in either language ecosystem, but my impression is that Python has better libraries for data analysis and as a system scripting language, and Clojure+Java is better for general applications and everything else.

One benefit of Clojure that makes parallelization easy is the immutable data structures. New data structures are cheaply made by seamlessly referencing originating data structures. Example: making a copy of array A and appending an item as array B, means that array B is mostly a reference to array A. Immutable data structures also offer expectations of behavior that mutable state naturally offends, such as avoiding race conditions or locks. In the case of my earlier example, I simply added some "future" wrapper functions and "dereference" annotations [1]. The entire program would execute to reach the final output, then work in reverse order to trigger and plan the future functions. Order of execution was based on dependencies, not strictly on the order of the code. It was fascinating was to watch the logs for when functions would start or complete. If you are familiar with operations planning and critical path analysis, the overall run-time reflected the critical path.

[1]: https://www.conj.io/store/v1/org.clojure/clojure/1.8.0/clj/c...

srean · on Dec 11, 2016

Its a bit immature at the moment, but here's a shout out for Julia. If you are used to python, numpy you will not feel out of place.

BTW, as far as I recall Nim is statically typed, please correct me if I am wrong.

m_mueller · on Dec 11, 2016

Yes, Nim is static, however I've been told using lots of type inference and unions it can get almost the ease of use of dynamic languages. I never did something with it so far though, so I can't say whether that really holds up. My main concern would still be libraries - hard to beat python or the JVM in that regard I think.

nimmer · on Dec 11, 2016

Although the standard lib is still improving, Nim is doing it right.

sametmax · on Dec 11, 2016

async / await for on anything that implement the interface, not just asyncio. Which means it works for Future tied to multiprocessing pools, so yes, you can use async / await for you CPU bound program.

rickycook · on Dec 11, 2016

even if you assume it's not useful (i'm sure it is; arrays of API requests you have to process for example), it's probably a laudable goal and worth the effort to make things standard. i was surprised when i saw this and thought "oh you can't await in a comprehension?"; i expected you to be able to.

jfaucett · on Dec 10, 2016

This is somewhat tangential but could some of you pythonistas opine on the status of concurrency in python as of 3.6? Are we talking JVM level goodness or even some message passing builtins like erlang or is python still meandering around problems other scripting languages like ruby have had in this area? Basically, how easy is it to build concurrent code that actually does improve with added processors assuming you structure your stuff correctly.

dom0 · on Dec 10, 2016

~depends

Python is still fundamentally a n=1 language, where only one thread can run the interpreter at any time (GIL), like many scripting languages. Thus any gains in performance from using multiple threads has to stem from using external resources (syscalls, calls into C code) -- running pure Python code in multiple threads won't increase performance (but still provide concurrency).

Most libraries that do heavy lifting in native code and so on are written with this in mind, and allow thus ample performance gains, should these operations actually limit performance.

Of course, nothing keeps one from using entirely different means to concurrency, eg. message passing, like the ZeroMQ approach, RPC, shared memory and so on are all possible with Python.

So I'd say it's quite manageable, and that the GIL isn't really a significant limitation for most applications. However, it also means that in many cases process-level parallelism is preferable, eg. in the context of web applications this is the usual approach (and it doesn't have to cost a ton of memory - see uwsgi/pre-forking appservers), besides async.

paulddraper · on Dec 11, 2016

Everything true, though I would add that Python makes multiprocess parallelism fairly painless with the standard multiprocessing package.

lima · on Dec 10, 2016

Useful feature.

Another important async improvement in 3.6 are asynchronous iterators:

https://www.python.org/dev/peps/pep-0525/

(yielding from a coroutine)

tejasmanohar · on Dec 11, 2016

I can see a use case for `await` in comprehensions- e.g.

  responses = [await r() for r in requests]

... but, what's the 'practical' application of `async` in comprehensions? I can't think of one that shouldn't call an independent `async` function for readability.

dorianm · on Dec 11, 2016

I find the Ruby solution more elegant: https://github.com/bruceadams/pmap

e.g.: Replace:

    require 'open-uri'
    (1..100).map { |i| open("http://httpbin.org/get?a=#{i}").read }

To:

   require 'open-uri'
   require 'pmap'
   (1..100).pmap { |i| open("http://httpbin.org/get?a=#{i}").read }

From 17.55s to 0.72s.

lars512 · on Dec 11, 2016

Parallel map is built in to Python already with pools.

    import requests
    import multiprocessing as mp
    
    pool = mp.Pool()
    urls = ["http://httpbin.org/get?a=#{}".format(i) for i in range(100)]
    pool.map(requests.get, urls)

noobermin · on Dec 11, 2016

As a non-rubyist, this took me a minute to guess through. I prefer list comps.

hint: I'm implying familiarity over elegance.

dorianm · on Dec 11, 2016

I guess in Python it would be going from:

    list(map(lambda x: x**2, [1, 2, 3, 4]))

To:

   list(pmap(lambda x: x**2, [1, 2, 3, 4]))

noobermin · on Dec 11, 2016

Right, and neither of those are pythonic. There is no need for a lambda when you have comprehensions.

This comment may relate better to your point: I would personally prefer "afor" over "async for", FWIW. One thing I don't feel "async for" is being more explicit over being more verbose. The symmetry with "await foo" is broken though.

PudgePacket · on Dec 11, 2016

Offtopic but the python website is looking nice, last time I visited it definitely wasn't so shmick.

noobermin · on Dec 11, 2016

Hmm, I remember the design from a year or so ago if I'm not mistaken and it seemed the same.

jayajay · on Dec 10, 2016

This is str8 dope... when you are accustomed to seeing node.js callbacks pre-es6.

jayajay · on Dec 11, 2016

Trigger Warning, do not read if easily offended or if against introspection.

Downvotes? Would this have been upvoted?:

"Python 3.6 is really cool and looks fairly compact, especially when you're used to nesting callbacks, using promises, or fibers in Node.js. That said, arrow functions provide some of that compactness in ES6."

It's interesting how a certain verbal style resonates better or worse in certain communities, even if the content is held approximately fixed:

"Python 3.6 is really cool because of comprehensions."

"Python 3.6 is lit af bruh cuz dem list comps."

"dis lang is dope cuz ONE liners tho."

"python is fucking [i for its in me]"

For example, if I tried to teach a bunch of inner city kids about Python sounding like the average HackerNews user, they would never learn anything! For those of you that teach kids, you'll probably understand, everyone else will probably take offense to this, don't say I didn't warn you.

Communities like this (HackerNews) lack diversity -- we're all the same flavor of nerd, and that's boring as fuck, isn't it? We tend to agree with each other, and are diametrically opposed to people who try to burst our bubble. Clearly we're right about everything because we have been validated by our fancy BS's in honors physics, mathematics, chemistry, biology, and other sciences (oh, and engineering, if you're second-rate (shh! some people don't like mean jokes)).

There was a post on here about democracy after the Trump victory (or right before?). And the majority consensus of the nerds on this site was that government should not have unlimited power, and it should make itself easy to change, or replace, if needed. I.e., there should be no runaway governments, which we're stuck with. People should have the power to change things. For the most part, you guys agree with that, which I find incredibly ironic.

bushin · on Dec 11, 2016

HackerNews community is extremely diverse, that's why people strive to write more clearly, otherwise some C++ programmer from Russia or a hardware engineer from China wouldn't understand a thing.

sametmax · on Dec 11, 2016

I +1 for the overall insight, but it's logicial and desirable to have small communities of people alike. As long as you have many of them, that are all very different and their members are part of several of them. It know it's the case for me (I'm part active on HN, reddit, imgur, twitter but IRL in sport, charity and porn), and I believe a lot of people are too here.

So I think we are OK.

HN is indeed a bubble, but it's a fantastic one that would lose a lot of value if the liquid it came from was diluted.

coldtea · on Dec 11, 2016

Consider how you'd view a person you first meet wearing a normal t-shirt and jeans or whatever, and another dressed up and speaking like Vanilla Ice.

Sure, they might both be equally deep when you get to know them (yeah, right), but appearances do matter and/or can get in one's nerves.

zo1 · on Dec 11, 2016

>"For example, if I tried to teach a bunch of inner city kids about Python sounding like the average HackerNews user, they would never learn anything! "

I wouldn't try to do that myself, precisely because they're illiterate. They have much bigger problems than the accessibility of learning a programming language. They need to be functional members of society, and one important part of that is communication.