Hacker News new | past | comments | ask | show | jobs | submit login
Widely useful Python utilities by Peter Norvig (cs.berkeley.edu)
95 points by madars on Sept 14, 2009 | hide | past | favorite | 22 comments



Newer version of these can be found on AIMA-Python site, which is an excellent read alone: http://code.google.com/p/aima-python/source/browse/trunk/uti...


Note that more of them have been merged to recent Pythons than just the top: the `dict` constructor can now take kwargs (and a first dict argument, which means you can create an extended copy of dict d with `dict(d, foo='bar', baz='buzz')`), there's a `collections.defaultdict` in 2.5, `namedtuple` in 2.6 is pretty similar to his Struct (though not exactly, `namedtuple` is a type builder, Norvig's `struct` is basically a dict with args instead of items).

`every` and `some` can be emulated by composing `map` (or a listcomp) with the builtin `any` and `all` (which, for some reason, don't take an optional predicate)


`every` and `some` can be emulated by composing `map` (or a listcomp) with the builtin `any` and `all` (which, for some reason, don't take an optional predicate)

No they can't - this version wouldn't short circuit evaluation, which is completely the wrong behavior. Map would have to apply the predicate to every element of the list before any/all kicked in.

As a trivial example, consider I had a million item list and want to find out if every element is prime. If 8 is the first element, 'every' would return false after looking at the first item - while your composition would apply the prime predicate to all million elements (which would be ridiculously slow ...) before even starting to apply 'all' (which would short circuit, after seeing the first false).


If you have enough elements to desire short circuit evaluation you can, as `masklinn` mentioned, use a list comprehension. Specifically a generator list comprehension.

http://www.python.org/dev/peps/pep-0289/

    big_list = make_big_list()
    if any( not prime( a ) for a in big_list ) :
      do_thing()
When you surround the list comprehension syntax with parenthesis instead of square brackets, either as a lone argument to a function or independently, you create a generator that will iterate once over the given data.

This allows for functions accepting sequences to short circuit without completing the function mapping.


> No they can't - this version wouldn't short circuit evaluation, which is completely the wrong behavior. Map would have to apply the predicate to every element of the list before any/all kicked in.

Just use the lazy map itertools.imap (or Python 3, where `map` is lazy) or replace listcomps by gencomps, any & all take any iterable if you're worried about that. Stop creating problems where there is none.


"(which, for some reason, don't take an optional predicate)"

This kills me pretty much every time I use those functions. It seems like 9 times out of 10, I need the ability to pass in a predicate function.

This always struck me as a major oversight... does anyone know if this design decision was made for a particular reason?


You want to be able to say all(items, lambda x: x.is_cool()) instead of all(item.is_cool() for item in items)? The second one seems clearer to me and isn't substantially longer.


Good point. I guess usually crops up when I've already got a predicate function handy and I want to say `all(items, f)`. I always just expect that to work for some reason. Using a generator expression works just as well, though, and is probably clearer in the long run.

Besides, I guess if `any` had a signature like

    def any(xs, pred=lambda x: bool(x))
or something, that would reverse the signatures of map, filter, reduce, etc.

On that note, I always want the arguments of filter and reduce, but not those of map, to be reversed for some reason.


If you already have a predicate function, it is not hard to say

  all(filter(f, items))
Composing calls like this looks cleaner than incorporating into the function interface all list manipulations that the implementor expects you to want.


Said before, but any(pred(x) for x in xs) is too complex?


Quoting myself, now: "Using a generator expression works just as well, though, and is probably clearer in the long run."

It's just that my first instinct is always to use all(xs, pred).


A useful utility I've rarely seen included in such a list is inverse_image(collection, function) which computes the inverse image of the function restricted to the collection.

A simple example:

    students_by_grade = inverse_image(students, lambda s: s.grade)
Then students_by_gender["A"] contains a list of all the A students (and so on).


Agreed - sometimes known as "group by".


On a semi-related note, the code for AIMA taught me a lot about Python, just from reading through it at trying to understand.

http://code.google.com/p/aima-python/source/browse/#svn/trun...


http://urchin.earth.li/~twic/twic.py

excellent. and great comment: """This is Tom's personal Python subroutine library. You are not expected to understand this.

Warning: this file contains HIGH MAGIC and DEEP VOODOO, as well as quite a bit of QUESTIONABLE CODE and very few COMMENTS. Use with caution.

ha!


so....what's excellent about it.... or do you mean you find it useful to use?

i'm personally very cautious about introducing "magic" and "voodoo" into my repositories. i many times have to tweak libraries (add features, fix bugs), so it's worth them making sense without me having to refactor them first (i did this with qunit, tho...it's hard to generalize, eh?).

more importantly, it's important to me that new developers ramp up fast. there's something about collaboration that many lone programmers don't grok. similarly, the code must make sense to me next month. extensibility and maintainability are almost inversely related to voodoo and magic.

good abstraction, handling complexity through elegance, now these are worth having.

it can be quite hard separating good programming from bad. it is even harder when the work itself is complex in nature, although that is when good programming really shines.


These are pretty good, some common sense or helper like things that I have implemented here or there at one time or another. Granted, some default programming styles may alleviate the need for some of these things, or ultimately you may want something different in your specific application. Good to see!


These are useful! It always helps to take the monotony out of writing code (especially with methods that check boundary conditions and overflows) and spend more time on the actual design.


somebody want to explain this?

infinity = 1.0e400


the largest representable float in Python is around 10^300 something or so. Anything bigger is considered infinite. Peter Norvig wanted a name for the float bit pattern that represents infinity.


float("inf")?


    >>> 1.0e400 == float("inf")
    True
I don't know, if there is a difference. Maybe the interpreter can evaluate 1.0e400 a little faster?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: