I agree with the portions of this dealing in obsolete libraries (e.g. asyncore, SimpleHTTPServer, array).
I also agree that __del__ and copy expose pain points and should be treated carefully.
isinstance() is often a code smell for sure, but I don't think it's potentially harmful in the same way as using os.system/os.popen; sometimes, INSIDE your own code, you want to fail early with a more informative message if a class does not explicitly contract to provide a certain set of interface behaviors. (I don't necessarily want to find out that a large/complex behavior is wrong only after it has occurred).
(I don't find it particularly nice to start writing tons of IWhatever objects and such to get behaviors already provided by built-in language constructs.)
"if __name__ == '__main__':" is ugly buts its purpose is not to be pretty, it is to make your module more fool-proof and explicit about what should happen in a script run vs an import. I don't see a good general substitute. If you are sure that only you or someone sensible will be importing/running your module, then I can agree with the advice just to keep those two kinds of scripts separate.
And I agree that any Python code compiling giant strings is in serious need of fixing (including that nasty bit in namedtuple as well as Simoniato's otherwise very nice non-stdlib code for signature-preserving decorators - there's really no other way I know of yet for doing that without ugly hacks)
The isinstance() criticism is in principle outdated after PEP 3119 [1] but in practice it is indeed mis(/over)-used more often than not.
Compiling strings is usually a code smell but at least in the case of namedtuple I'm pretty sure it is justified for at least one reason: performance. It's certainly possible to be implemented in more idiomatic python but the result would probably be less efficient compared to, say, dicts or regular objects.
I am with you, it seems like a knee jerk reaction to call all exec or eval code bad. Complexity has been pushed into a library, and the amount of clarity that namedtuple adds is more than it has "hurt" by its use of exec.
Yes; to clarify, I believe that if it is necessary to compile big old strings to get a certain sort of behavior, then that might be a point at which Python should be improved to allow similar code to be constructed dynamically (and more safely).
And if it is necessary for performance reasons, Python should probably be able to do a similar thing efficiently without compiling big old blobs of text.
It is true, of course, that this can be hidden as an implementation detail with relatively little impact to end users... but the mere possibility of relatively clean and judicious uses do not change the fact that as a pattern, it is tricky and hard to read and easy to mess up in horrible ways. It is not that it is fundamentally unworkable. But if it is the only/best way, then that probably indicates a place where Python could be incrementally improved...
It should be possible to create the exact same classes dynamically, without using exec. The only advantage I can see of the exec is that the class template string makes it a little clearer what the equivalent "normal" class definition would look like.
There are other drawbacks to exec though, some people have disabled exec for security reasons and it's opaque to things like pypy.
> I agree with the portions of this dealing in obsolete libraries (e.g. asyncore, SimpleHTTPServer, array).
The existence of third-party libraries that are arguably more featureful does not render core libraries obsolete. Handling the 80% case easily is a virtue, not a vice.
While this is a good point, it's not unusual for the third-party library to be more Pythonic and easier to use than the core lib analogue. For example, Requests.
Most of this I agree with, with a few comments/caveats:
> Easy Stuff First
The unfortunate part about this particular example is that subprocess's interface isn't really that well-designed, and so people resort to os.system simply because it is less complicated than subprocess.
> Ducks In A Row
I don't think isinstance is really that bad, but checking based on something's exact type is definitely wrong.
> Toys are for Children
One of the problems of a batteries-included stdlib is that you have to support it just about forever. Though I would like to note that for basic async programming Tornado is surprisingly good, and less complex than Twisted.
> Foreign Concepts
Apparently this is especially useful for scientific and mathematical computing. I can't see how it's especially dangerous since just about everyone will use list instead.
> Apparently this is especially useful for scientific and mathematical computing. I can't see how it's especially dangerous since just about everyone will use list instead.
Actually, I use Python for mathematical computing, and I've never used Python's built-in arrays. NumPy arrays are much more convenient, and NumPy is written in C, so it's way faster than using a list.
It's a win-win: the speed of C, combined with the flexibility and syntax of Python.
The only reason array really exists in Python (this implementation, that is), is to support some other libraries that need O(1) indexing but don't want to depend on NumPy. For everything else, you should really just use NumPy.
Not a surprise. Python's `array` doesn't really do much, it just allows you to store and access homogenous data without wasting memory. That's all. It doesn't really deserve to be on this list.
I do really miss the 'command' module's convienience... so much that I often code my own version using subprocess, just so I can get at Std-out, Std-Err, and the return code all in one place.
I think people might be complaining about how hard it is to do the equivalent of procs = `ps -ef` in ruby, not to mention crazy ways you can combine it with string interpolation.
One loads the subprocess documentation and is confronted by a (on my monitor/fonts) ~11 page morass.
The little part you link to starts off simple, sure, but then says "Calling the program through the shell is usually not required.", and proceeds to present a "more realistic example", leading to an immediate "WTF?" for anyone just learning of subprocess.
People who have been doing things the obvious way for years will not take well to a new mechanism that they don't understand the function nor implications of, and whose use is immediately discouraged by its own documentation.
You're talking about what you think programmers should do. I'm talking about what they actually do.
99% of system() calls (regardless of language) will never be so adorned. It's used for quick hacks, rarely anything more. If the return code is checked at all, people only pay attention to whether it's something other than 0.
> Foreign Concepts ... I can't see how it's especially dangerous since just about everyone will use list instead.
I think it's potentially dangerous if/when Python newbies from other languages pick up old idioms and (ab)use them thinking they're "Pythonic".
I haven't seen it with a Python codebase, but I have worked on a Java codebase that was developed from scratch by C developers new to Java. They used lots of arrays (same sin), minimal Collections, and lots of for(i = 0; i < arr.length; i++) instead of foreach. Urgh.
So it isn't Python's fault, but I see how it could pose a danger.
Some of that latter part could just be older Java code; foreach wasn't introduced until Java had been out for 8 years already (2004). I'm under 30 and some of my older Java code uses that style of loop, because it was written in 2002-03 before foreach was available.
Not this time, this project was written from scratch in Java 6. :)
It's possible they learned Java at some earlier point in their lives, but my understanding was the team came from embedded/systems C backgrounds and I think they just did what felt right.
I completely agree about subprocess. One thing I like about Perl is that it's dead easy to make a system call. I wish subprocess had a shortcut that's easy to remember/use.
Okay, here's one no one has disagreed with yet: copy.
If I write a Python class that is going to be used with someone else's code, then it is up to me to make sure that my class behaves reasonably -- in the context of Python's normal behavior. For example, if I define a bracket operator, then Python will make me an iterator; I need to be sure that iterator behaves well.
And similarly with "copy". Someone might want to write a generator that spits out instances of my class. If we apply list() to the output of that generator, then the result will not be correct unless the generated objects are actual separate objects. copy.copy is a good way to make things actually distinct, that might otherwise not be.
That means that instances of my class need to behave well if someone calls copy.copy on one of them.
I don't see that the issues here are much different from those involving regular old methods in a class. If I'm putting together a class, and I write a method that does something nasty, then code that uses that class and calls the method, will have bad results. Similarly copy.copy needs to avoid doing nasty things, too.
SO, I don't have a problem with expecting copy.copy to produce reasonable behavior, when I use it with a class written by someone else.
In any case, this is a nice article. Thanks for posting.
I was surprised to see "copy" there. I'd agree that deepcopy (and its friends like pickle) could be tagged with a big "use with care" sign, but especially when one can implement __copy__ to control exactly how shallow copies are made, I don't think it's a problem at all.
I found myself nodding along with this until the bit about the array module. The array module is very well suited to doing real work. Maybe it seems arcane if you're coming from a web dev background, but given the space inefficiency of Python's lists, the array module is awfully useful in a lot of situations.
Not just space, but speed as well. I've often noted significant speedups in many operations by switching to array.array. In some cases, the speedups are comparable to or even slightly better than using numpy (which is much more heavy-weight).
I find numpy too heavy weight for a lot of stuff and also not very good for communication/interchange with other programs. For a lot of my work, I often need to write data files to disk (for C programs to use) or mmap'ed files for concurrent access. In these situations, the fact that array.array offers precise data alignment rules and toString() and toFile() methods that are actually sane, I end up using it instead of numpy.
Of course, if I do significant computations in my python program itself, I just use numpy.
The usual #python response to people using the array module was to instead consider NumPy (with its excellent arrays) or use normal lists and PyPy to make it fast.
> If you're debugging at the interactive prompt consider debugging with a small script instead.
I don't understand this suggestion. The ability to interact with my code via the REPL is a very important python feature for me, is the author suggesting I abandon that?
I think the author is just saying if you're debugging something in a REPL and you're finding you need to reload() frequently, then you might want to consider switching to debugging with a small script instead (and of course there's no reason the script couldn't just import the things you're debugging, and then dump you into a REPL).
This article / talk is a collection of relatively random python things of which only a few are unarguably good advice such as not using deprecated terms or checking for exact types. The rest is not a good basis for pythonic best practice that I would recommend.
With regards to "Ducks in a row", I don't think the answer is try..except. You either have a code smell, or you should use type-dispatch multimethods. (I think Guido's "Five-minute Multimethods in Python" was recently submitted here, but anyway: http://www.artima.com/weblogs/viewpost.jsp?thread=101605 )
This is just a lot of nitpicking, parroting of common wisdom, plus uninformed opinions and misapplication of use cases.
To complain about SimpleHTTPServer is not for use as production public web server is ridiculous. That was never its intended used and most people understand this. Not every HTTP server is a public web server however. It is very convenient to fire up one as a test server and to get around some of the browser security constraint on the file:// scheme. It is also useful for a lot of light weight internal integration and configuration UI. To deploy SimpleHTTPServer for production web server is not a correct use. But neither is deploying a heavy weight HTTP server like Apache for testing or light weight integration appropriate.
I agree JSON is preferable to pickle in most situations. But you have to understand the history context. Pickle was the standard way to serialize object well before JSON is popularized. Serialization is an important topic in general and you will find many other computer language provides similar facility. Also JSON only address a subset of the serialization problem.
The array module is used not only for performance. It also support tight packing of data. If you have a million integer, array pack them tight as 1 million x 4 bytes. To store them into a list will have a much larger memory footprint because the are stored as 1 million individual objects plus the data structure of the list. I happen to have written a proprietary database engine in Python where memory, disk and I/O footprint matter hugely. Go easy on berating it because there are use cases you are not aware of.
It is sad that the author do not understand namedtuple but choose to mock it. I studied the code extensively and emulated it in a my application. To appreciate namedtuple, think about what is the alternative to this implementation? Try to write your own that serve the same function. Yes you can more easily and the code more readable by doing it with __setattr__ and __getattr__. The only problem every time you access an attribute you get hit with a big overhead. The use of exec is not a hack but a deliberated decision to provide attribute access at a speed comparable to standard attribute access. I agree with the author, "do not do this at home". Leave the heavy lifting to the experts unless you really known Python inside out like namedtuple's author Raymond Hettinger.
is a useful one-liner to serve a directory over http. There's no good reason to have it in the standard library — mapping directories to html listings and http paths to filesystem paths is a bit idiosyncratic — but now that it's there I'm happy to use it.
I've found it useful as a basis for handling HTTP requests in a test framework I'm working on. Not for testing HTTP, mind you, just as an adjunct for other tests. The 'one header line per packet' responses are a little idiosyncratic though!
> This leaves only one place where pickle makes sense — short lived data being passed between processes, just like what the multiprocessing module does.
try treating the object as the first data type you expect, and catching the failure if that type wasn't that type, and then try the second. This allows users to create objects that are close enough to the types you expect and still use your code.
Is chaining try-catches as a trial-and-error way to figure out a type really the Right Way to do things these days?
I agree with most of these, but a couple seem pretty specific to the use of Python as a language for serious, complex, fully packaged applications. Sorry, but Python has another use as well - to create quick utility scripts in a better language than bash or perl. For example...
* When discussing "__name__ == '__main__' he complains about 13 non-alphanumeric characters, and then proposes a setuptools-based alternative that's 12 extra lines.
* He disses asyncore/asynchat, and then proposes the much more complex Twisted instead.
If your project is already big enough to have multiple files, already complex enough to require Twisted-level functionality (though even then Twisted sucks compared to Tornado or just about anything else), if it's already being packaged for general use via setuptools, then following these suggestions is almost free. OTOH, they're way overkill for other situations. About three years ago some colleagues and I wrote an asynchat-based server to coordinate certain administrative actions on a 1000-node system. It was stable, it performed as well as it needed to, and - even after working around some "infelicities" in asynchat - it was still only half as much code as would have been necessary in Perverted.
I don't think I'll be signing up for any idioms or coding standards that are based on an assumption of using Python as a direct replacement for Java. Neither should anyone else.
His criticism of asyncore/asynchat and simplehttpserver is clear: try to do anything nontrivial and immediately you hit barriers. Twisted, albeit a monstrosity, can get around most of those deficiencies.
When I tried using async{hat,ore} I came across a few pitfalls:
- the async* library is not thread safe: loop and poll* both use a global socket_map (read the code to see what I'm saying -- on my mac its at /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/asyncore.py)
- There's no way to integrate timers in the loop (there are constructs like timerfd in linux to get around this deficiency in the linux C api)
That being said, IRC bots (more generally, simple chat applications) are the type of application perfectly suited for asyncore, and I'm pretty sure someone has written a tutorial on it using an IRC bot as the goal.
What is meant by "timers"? Something like "execute this function in 5 minutes"?
My bot [0] has an idle function, which is called every 30 seconds and does background jobs like checking git repos for new commits. The 30 seconds are the timeout of the select/poll call asyncore does internally, which can of course be changed. Based on this the bot got a cron-like timing infrastructure.
It's a legacy of the import system and I'd rather see it go.
The non-alphanumeric argument is pretty weak though.
I don't know enough about asyncore/chat but SimpleHTTPServer is quite spot on. It's fine as a development tool, but really should be avoided in production environments.
(btw, Twisted is awesome. A comparison with Tornado only compares a small part of Twisted. Sure it was developed pre-PEP8 but it's still solid kit. Nothing quite like it on the Python market, afaik. Also the docs have gotten a lot better in recent years.)
I don't think I'll be signing up for any idioms or coding standards that are based on an assumption of using Python as a direct replacement for Java. Neither should anyone else.
I am not sure, But if I'm using Java or C++ or another equivalent say even Clojure. Then the glue language for the rest is going to be Perl/Bash not Python. The reason is Perl does the scripting far too well than Python.
Python is more fashionable today courtesy Django, Twisted and other frameworks.
And I think that's the problem here. Python is trying to be half serious in both the Java and Perl worlds. Java and Perl have nearly orthogonal goals. If you try to do half work in both the worlds. You end up pleasing neither. Try to be either in this camp or the other.
Python certainly imposes greater overhead if all you are doing is writing a short shell script of a few lines in a few minutes. I feel that Perl is closer to bash than Python is (whether that's good or bad just depends).
But if you want to be able to go back to that code later, or reuse it in a bigger system, or have things like robust error handling and testing, you are out of the domain of short shell scripts anyway. Now you have a choice of whether to write nice Perl code, which takes a little overhead and effort and thought more than just using Perl, and writing nice Python, which is pretty directly what Python's language design is about. I figure it's a matter of what you know best and what you like and what libraries you need.
I reckon the reason you won't use Python is that you are relatively uncomfortable with Python for whatever reason. That's legitimate, but it doesn't mean that Python won't be a great solution for someone else doing the same task.
As a 20-year old language from the Unix/C world with mostly boring Algol-family syntax, I don't find Python very fashionable at all, and I think it is just weird to accuse Python of "trying to be half serious" in anything except being Python. If people prefer to write big apps in Python or Rakudo Perl, and those languages facilitate writing big apps, it doesn't really have anything directly to do with Java (except that all the involved languages have some shared heritage, concepts of objects, etc.) The same is not true of C# or Scala, which have really deep and undeniable debts to Java, nothing like the very general family similarity you see between Java and 90s interpreted languages like Perl and Python.
The problem with treating Python as Java is that Python is NOT Java (and has never attempted to be Java) - so the results of treating it like Java are often not as a Java programmer expects, giving the impression that Python is a bad Java. But it isn't supposed to be a Java at all; this is just Doing It Wrong. The same is undoubtedly true with other language combinations - it isn't fair to judge Ruby harshly for responding poorly to C idioms, for example, because it just ain't C.
I also agree that __del__ and copy expose pain points and should be treated carefully.
isinstance() is often a code smell for sure, but I don't think it's potentially harmful in the same way as using os.system/os.popen; sometimes, INSIDE your own code, you want to fail early with a more informative message if a class does not explicitly contract to provide a certain set of interface behaviors. (I don't necessarily want to find out that a large/complex behavior is wrong only after it has occurred).
(I don't find it particularly nice to start writing tons of IWhatever objects and such to get behaviors already provided by built-in language constructs.)
"if __name__ == '__main__':" is ugly buts its purpose is not to be pretty, it is to make your module more fool-proof and explicit about what should happen in a script run vs an import. I don't see a good general substitute. If you are sure that only you or someone sensible will be importing/running your module, then I can agree with the advice just to keep those two kinds of scripts separate.
And I agree that any Python code compiling giant strings is in serious need of fixing (including that nasty bit in namedtuple as well as Simoniato's otherwise very nice non-stdlib code for signature-preserving decorators - there's really no other way I know of yet for doing that without ugly hacks)