One of my favorites: >>> print "* "* 50 to quickly print a separator on my termi...

Animats · on Nov 27, 2014

That's cute, but the result of a bad design decision.

Python overloads "+" as concatenate for strings. This also applies to lists. So

    [1,2,3] + [4,5,6]   yields  [1,2,3,4,5,6]

This is cute, but not what you want for numerical work.

Then, viewing multiplication as repeated addition, Python gives us

    [1,2,3]*4  yields [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]

This is rarely what was wanted.

Then there's numpy, which has its own array type, needed because Python's array math is slow. Numpy arrays have different semantics - add and multiply perform the usual numeric operations. You can mix numpy arrays and built-in lists with "+" and "*". Mixed expressions are evaluated using numpy's semantics. Since Python doesn't have type checking, it's possible to pass the wrong kind of numeric array to a function and have it treated with the wrong semantics.

Moral: when designing your language, using "+" for concatenation is tempting, but generalizing that concept comes back to bite you.

dalke · on Nov 27, 2014

You went from "not what you want for numerical work" to "generalizing that concept comes back to bite you". I don't think you can make that step.

I do non-numeric scientific computing. (Meaning, I touch numpy about once a year.) My own code does things like

    [0] * N  # could be replaced with something like numpy.zeros()
    
    [QUERIES_FPS] * 501
    
    to_trans = [None]*256  # constructing a 256 byte translation table
        # (I could use zeros(), but used None to force an exception
        # if I missed a cell)
    
    self.assertEqual(self._get_records(simple.record*2),
                     [(simple.title, simple.record)]*2)
        # I parse a string containing two records and test I should
        # be able to get the (id, record) for each one
    
    ["--queries", SIMPLE_FPS, "-k", "3", "--threshold",
       "0.8"] + self.extra_args  # Construct a command-line from 2 lists

These idioms also exist in the standard library, like:

    webbrowser.py:  cmdline = [self.name] + [arg.replace("%s", url)
    sre_compile.py: table = [-1] + ([0]*len(prefix))
    ntpath.py: rel_list = [pardir] * (len(start_list)-i) + path_list[i:]
    traceback.py: list = ['Traceback (most recent call last):\n']
                  list = list + format_tb(tb, limit)

So while I think you are correct, in that "+" causes confusion across multiple domains with different meaning for "+", I think the moral is that operating overloading is intrinsically confusing and should be avoided for all but the clearest of use cases.

There is no best generalization to "+". For example, if you pick the vector math meaning, then regular Python would have that:

    ["A", "B"] + ["C", "D"] == ["AC", "BD"]

which has its own logic, but is likely not what most people who haven't done vector math expect.

throwaway283719 · on Nov 27, 2014

You think it's a "bad" design decision because you think that a Python list should represent a vector of numbers (not even an array - a mathematical vector).

But a list is much more than that - conceptually it's any ordered collection of things, and not necessarily even the same type of thing. Overloading `+` to mean concatenation and `*` to mean replication means that the operators can work with any kind of list, not just lists that are mathematical vectors.

If you do want a mathematical vector, you should use a numpy array - not only are you making it clear that you have a vector of numbers, but your operations will be more efficient (because using a numpy array guarantees that the elements in it are all of the same type, so you don't have to do type dispatching on every element).

wodenokoto · on Nov 27, 2014

Then what would you have

    [1,2,'q',[1,('a',2)]] + 4

yield? The reason why numpy lets you do math operation on each element in an array is because you can safely assume that each element is a number. You can assume absolutely nothing about the types of the elements in a list.

Animats · on Nov 27, 2014

"TypeError: cannot add 'str' and 'int' objects."

Just because you can define semantics for nonsense doesn't mean you should.

dalke · on Nov 27, 2014

I'll modify the example slightly to something which doesn't have a type error:

    [1,2,'q',[1,('a',2)]] * 4

With element-by-element operations, that would be

    [1 * 4,2 * 4,'q' * 4,[1,('a',2)] * 4]

giving

    [4, 8, 'qqqq', [1 * 4, ('a', 2) * 4]]

applying that again, assuming that tuple * scalar is also applied element-wise gives:

    [4, 8, 'qqqq', [4, ('a' * 4, 2 * 4)]]

and ends up with

    [4, 8, 'qqqq', [4, ('aaaa', 8)]]

I can't think of any case where that's meaningful.

Also, what should this do:

    x = [1]
    x.append(x)
    print(x + x)
    print(x * 4)

? Currently these print:

    [1, [1, [...]], 1, [1, [...]]]
    [1, [1, [...]], 1, [1, [...]], 1, [1, [...]], 1, [1, [...]]]

because the print function knows how to handle recursive definitions. Do all of the element-wise operations need to handle cyclical cases like this? I think numpy can get away with not worrying about this precisely because, as wodenokoto pointed out, it can assume a flat structure.

ubernostrum · on Nov 27, 2014

You apparently want lst * intval to be equivalent to map(lambda n: n * intval for n in lst) or [n * intval for n in lst]. Since Python has a convenient built-in and even syntactic sugar for doing what you want, why not let the operator overloading handle a different case?

(also, your issue is not with "nonsense semantics", it's with "my idea of how this operator should've been overloaded is different from their idea", and perhaps is even a beef with the idea of operator overloading in general, though if you like numpy I think you wouldn't like losing operator overloading)

nikki93 · on Nov 27, 2014

This is because you're assuming 'array' is supposed to mean 'vector' (as in the linear algebraic vector). It isn't, and it's a list -- it's meant to be a container. In this case, add meaning concatenate and multiplication meaning self-concatenate multiple times makes sense.

philsnow · on Nov 27, 2014

Even worse IMHO is the semantics of strings being implicitly iterable. Often it ends up that you're intending to iterate over something

    for item in orders:
        do_something_with(item)

So if `foo` is usually `[Order(...), Order(...), ...]` but due to a bug elsewhere, sometimes `foo` is "some string". Then you get a mysterious exception somewhere down in `do_something_with` or one of its callees at run time, and all because the above snippet calls do_something_with('s'), do_something_with('o'), etc.

In my experience, this behavior is so seldom what is wanted that it should be removable (with a from __future__ style declaration) or just off by default.

dalke · on Nov 27, 2014

I use "for c in s", to read characters in a string, pretty often. Here's an example from Python3.2's quopri.py:

    def unhex(s):
        """Get the integer value of a hexadecimal number."""
        bits = 0
        for c in s:
            c = bytes((c,))
            if b'0' <= c <= b'9':
                i = ord('0')
            elif b'a' <= c <= b'f':
               i = ord('a')-10
            elif b'A' <= c <= b'F':
                i = ord(b'A')-10
            else:
                assert False, "non-hex digit "+repr(c)
            bits = bits*16 + (ord(c) - i)
        return bits

Here's another example of iterating over characters in a string, from pydoc.py:

    if any((0xD800 <= ord(ch) <= 0xDFFF) for ch in name)

It seems like a pretty heavy-weight prohibition for little gain. After all, you could pass foo = open("/etc/passwd") and still end up with a large gap between making the bug and its consequences.

unwind · on Nov 27, 2014

Shouldn't unhex() just be int(s, 16)?

Not sure what it adds, but I don't quite understand it yet and perhaps there's something magic in the context of MIME quoted printable that I'm missing.

dalke · on Nov 27, 2014

That is an excellent point!

Based on my reading, there's nothing magic. The context is:

    elif i+2 < n and ishex(line[i+1:i+2]) and ishex(line[i+2:i+3]):
        new = new + bytes((unhex(line[i+1:i+3]),)); i = i+3

I tweaked it to

        new = new + bytes((int(line[i+1:i+3], 16),)); i = i+3

and the self-tests still pass. (I also changed the 16 to 15 to double-check that the tests were actually exercising that code.)

It's not part of the public API, so it looks like it can simply be removed.

Do you want to file the bug report? Or perhaps it's best to update http://bugs.python.org/issue21869 ("Clean up quopri, correct method names encodestring and decodestring")?

yoha · on Nov 27, 2014

> It's not part of the public API, so it looks like it can simply be removed.

https://docs.python.org/3/library/functions.html#int

So that is actually standard. Maybe I just don't know what you mean by public API though.

dalke · on Nov 27, 2014

"it" == "unhex", not "int"

yoha · on Nov 27, 2014

Oh, right! I had not read the sentence correctly.

ajuc · on Nov 27, 2014

* on lists can also mean elementwise multiplication, dot or cross product if you treat them as vectors. There's no way to choose the objectively best meaning. I'd even argue that vector math isn't the most popular use for lists in python, not because of + and * semantics, but because of performance.

So it was good design decision not to bother with math semantics for general use datastructure.

And besides Python has nice general syntax for elementwise operations if you don't care about performance:

    [x*y for (x,y) in zip(xs,ys)]

I agree it would be better not to implement + for lists at all.

_optl · on Nov 27, 2014

"This is rarely what was wanted."

I don't know what else you would have expected...

TheEzEzz · on Nov 27, 2014

[4,8,12]?

ubernostrum · on Nov 27, 2014

If you want to perform an operation on each item of an iterable, do that :)

[n * 4 for n in [1, 2 3]]

or

map(lambda n: n * 4, [1, 2, 3])

_optl · on Nov 27, 2014

With that logic, they should have expected [1,2,3] + [1,2,3] == [2,4,6]

euphemize · on Nov 27, 2014

I'm very aware of what you mentioned but...all I "wanted" in this case is a visual separator in my terminal when I'm working with lots of output. I don't care whether each "* " refers to the same object, I just want a line :)

With that being said, if I want to merge two lists and apply an operation on each, I don't see what's the issue with:

    In [1]: a = [1,2,3]
    In [2]: b = [5,6,7]
    In [3]: c = a+b
    In [4]: c
    Out[4]: [1, 2, 3, 5, 6, 7]
    In [5]: d = [x*4 for x in c]
    Out[6]: [4, 8, 12, 20, 24, 28]

sullyj3 · on Nov 27, 2014

I really like haskell's "++" for list concatenation. Makes a lot of sense.

willvarfar · on Nov 27, 2014

Although the `++` is associated with increment from anyone coming to python from the C languages.

Its tricky; if you want to do vectors, use numpy.

dllthomas · on Nov 27, 2014

Haskell also uses <> for combining any monoid, but of course in Python that was once not-equal... Maybe a dot? It's string concatenation in Perl, and function composition in Haskell. Interestingly, both of those are monoids...

horb · on Nov 27, 2014

Multiplication _is_ repeated addition.