Pip has dropped support for Python 2

ChuckMcM · on Jan 24, 2021

This is awesome in terms of avoiding all of the weird things when a person typed pip rather than pip3 and module didn't seem to get installed anywhere. That said, watching perl trying to kill perl5 with perl6 (unsuccessful) and python trying to kill python 2 with python 3 (more successful) it struck me how ridiculous it is that open source languages have to put up with this. Clearly "major" numbers are insufficient, the only real answer is to rename the entire freaking language when you make incompatible changes to it.

oconnor663 · on Jan 24, 2021

I think a lot of important lessons got learned in both cases. Clearly perl6 should have had a different name. But I think python2->python3 could've been much less painful if they'd known to prioritize single-codebase compatibility from the very beginning. I think you can see that lesson applied with e.g. Rust editions, which as far as I can tell have been a complete success.

Blikkentrekker · on Jan 24, 2021

Hindsight might very well be easy in this case, but I cannot think otherwise than finding the Python developers to have been ridiculously naïve in how they handled this, and foolish to even begin it from the start.

The minor improvements of Python 3 did not warrant breaking backwards compatibility and most could have been handled in a way that would not break it opt-in directives.

The very swarms of users that chanted “just upgrade” as if that not incur a significant cost also seemed ridiculously naïve to me, not understanding the real cost that large projects have by having to rewrite very extensive codebases and dealing the potential regressions that that might involve.

Everything about the switch, from it's very conception, to it's execution, was handled in a veritably disastrous way by the team, that really did not seem to appreciate even a fraction of what obviously is involved with projects that have millions of lines of code and of course would rather not have to rewrite that all.

This is why many projects such as Linux, Windows, Rust, Cobol, Fortran, C, and C++ take backwards compatibility quite serious. Serious enterprises do not like to invest in something if it mean that 10 years later they would have to rewrite their entire codebase again.

Even on my home computer, I simply do not have the time to rewrite the many Python 2 scripts that have written over the years that run my computer. — it is cumbersome enough that once in a while part of my desktop stops functioning because my distribution removed a Python 2 library which I had relied upon as a system library that I now have to install as a user library but that was hitherto quite easily fixed.

Do these men think that time is free?

wirrbel · on Jan 24, 2021

I have migrated tons of Python codebases from 2 to 3, i guess starting with the release of Python3.4 which was when Python3 reached a kind of production readiness (and also had gained enough trust and IIRC it had also reestablished compatibility in some parts).

I think the incompatibilities between Python 2 and Python 3 fell into two categories

1. trivial and totally avoidable API changes by the Python developers ( like `iteritems()` being renamed to `items()` and the Python2-`items` being removed from the language). The bet on the Python-dev side was that `2to3` would take care of that and in this they totally underestimated that libraries couldn't and wouldn't just make a python3 migration in lockstep with the primary python release.

2. Change to unicode-strings by default with a clear distinction between unicode-strings and byte-buffers for all data encoded in any other fashion.

Most people on Python3 nowadays won't actually know how beneficial Change no. 2 was overall for the health of the Python ecosystem and the stability of their code bases. But it also was the tricky part of the migration for some code bases that would do a lot of string / file content plumbing like mercurial as a prominent example).

Change no. 1. was a PITA and a lot could have been avoided, but it wasn't a huge problem. The huge problem for the ecosystem was the unicode change, but I don't think anyone questions its usefulness (except for Armin Ronacher maybe, who is the most prominent voice with a dislike for that).

josephorjoe · on Jan 24, 2021

The string change (rightly) is always brought up, but the one that has always amazed me is they changed how division works.

Whenever I show this to people they are (rightly) horrified.

  ~ $ python
  Python 2.7.16
  >>> 3/2
  1
  
  ~ $ python3
  Python 3.7.7
  >>> 3/2
  1.5

wirrbel · on Jan 29, 2021

Well, breaking change, but harmless in most instances, the other way around would be more harmful. Also that was I think a „from future import“ option so you could enable it while on python 2.7 on a per file basis

ehutch79 · on Jan 25, 2021

I am so thankful for that unicode change. 2.x's unicode was a problem for me.

wirrbel · on Jan 29, 2021

Well, in general yes. Had people who had to deploy console apps To non Unicode terminals and ran into massive overhead...

orf · on Jan 24, 2021

> The minor improvements of Python 3 did not warrant breaking backwards compatibility and most could have been handled in a way that would not break it opt-in directives.

There where large changes to fundamental parts of the type system, as well as core types. Pretending this isn’t the case belays ignorance or at the very least cherry picking.

How would you have handled the string/bytes split in a way that’s backwards compatible? Or the removal of old-style classes?

naniwaduni · on Jan 24, 2021

Let's not pretend that py3's string changes weren't fundamentally wrong and didn't create years of issues trying to decode things that could properly be arbitrary byte sacks as utf-8.

So my answer is that it was a deeply misconceived change that shouldn't have been made at all, let alone been taken as the cornerstone of a "necessary" break in backward compatibility.

orf · on Jan 24, 2021

The string changes where both necessary and correct. There is a difference between bytes and strings and treating them as the same led to so many issues. Thank god I’ve not seen a UnicodeDecodeError in decades.

And the ecosystem agrees.

nerdponx · on Jan 24, 2021

what's wrong about strings representing text?

You're not making an argument about backward compatibility here, you're making a strong claim that representing text as a sequence of Unicode code points is fundamentally wrong. I have never heard anyone make this point before, and I am inclined to disagree, but I'm curious what your reasoning is for it.

naniwaduni · on Jan 24, 2021

Indeed, representing text as a sequence of Unicode code points is fundamentally wrong.

There are no operations on sequences of Unicode code points that are more correct than an analogous operation on bytes.

(Everyone's favourite example, length, actually becomes less correct—a byte array's length at least corresponds to the amount of space one might have to allocate for it in a particular encoding. A length in codepoints is absolutely meaningless both technically and linguistically. And this is, for what little it's worth, close to the only operation you can do on a string without imposing additional restrictions about its context.)

orf · on Jan 24, 2021

> There are no operations on sequences of Unicode code points that are more correct than an analogous operation on bytes.

That’s ridiculous. Uppercasing/lowercasing, slicing, “startswith”, splitting, etc etc.

Your statement is correct if you only care about ascii.

hvdijk · on Jan 24, 2021

Uppercasing/lowercasing cannot be done on Unicode code points, because that fails to handle things like ﬁ -> FI where the uppercased version does not consist of the same number of Unicode code points. Slicing and splitting cannot be done on Unicode code points because it may separate a character from a subsequent combining character. "startswith" cannot be done on Unicode code points because some distinct code points need to be treated as equivalent. These are pretty much the same problems you also have when you perform those same operations on bytes. You might encounter those problems in fewer cases when you perform operations on code points rather than on bytes, but you won't have solved the problems entirely.

naniwaduni · on Jan 24, 2021

Worse, you'll have pushed the problematic cases out of the realm of obviously wrong and not sensible to do, into subtly wrong and will break down the line in ways that will be hard to recognize and debug.

naniwaduni · on Jan 24, 2021

None of those operations are correct on Unicode codepoints. Your statement is only just barely tenable if you only care about well-edited and normalized formal prose in common Western languages.

Even then, upper/lower-casing is iffy.

lolc · on Jan 24, 2021

> There are no operations on sequences of Unicode code points that are more correct than an analogous operation on bytes.

Wow. I wonder how you arrived at this point. You can't, for example, truncate a UTF-8 byte array without the risk of producing a broken string. But this is only the start. Here are two strings, six letters each, one in NFC, the other in NFD, and their byte-length in UTF-8:

    "Åström" is 8 bytes in UTF-8

    "Åström" is 10 bytes in UTF-8

If your software tells the user that one is eight and the other is 10 letters long, it is not "less correct". It is incorrect. Further, if searching for "Åström" won't find "Åström", your software is less useful than it could be if it knew Unicode. (And it's sad how often software gets this wrong.)

tsimionescu · on Jan 24, 2021

> If your software tells the user that one is eight and the other is 10 letters long, it is not "less correct". It is incorrect.

In fact, if the software tells you that either of the strings is either 8 or 10 letters wrong, then either way the software is incorrect - those are both obviously 6 letter strings.

Now, does UTF8 help you discover they are 6 letter strings better than other representations? There are certainly text-oriented libraries that can do that, but not those that simply count the UTF8 code points - they must have an understanding of all of Unicode. Even worse, the question "how many letters does this string have" is not generally meaningful - there are plenty of perfectly valid unicode strings for which this question doesn't have a meaningful answer.

However, the question "how many unicode code points does this string have" is almost never of interest. You either care about some notion of unique glyphs, or you care about byte lengths.

lolc · on Jan 25, 2021

> then either way the software is incorrect - those are both obviously 6 letter strings.

What I wanted to get at is that in Unicode, I have a chance to count letters to some useful degree. Why should I consider starting at byte-arrays?

> there are plenty of perfectly valid Unicode strings for which this question doesn't have a meaningful answer.

I don't get it. Why does the existence of degenerate cases invalidate the usefulness of a Unicode lib? If I want to know how many letters are in a string, I can probably get a useful answer from a Unicode lib. Not for all edge-cases, but I can decide on the trade-offs. If I have a byte-array, I start at a lower level.

a1369209993 · on Jan 27, 2021

> What I wanted to get at is that in Unicode, I have a chance to count letters to some useful degree.

You do not. You merely happen to get the right answer by coincidence in some cases, same as bytes-that-probably-are(n't)-ASCII. To throw your own words back at you:

  "Åström" is 6 code points in Unicode

  "Åström" is 8 code points in Unicode

If your software tells the user that one is six and the other is 8 letters long, it is not "less correct". It is incorrect. Further, if searching for "Åström" won't find "Åström", your software is less useful than it could be if it knew text. (And it's sad how often software gets this wrong.)

naniwaduni · on Jan 24, 2021

You can't truncate a sequence of Unicode codepoints without the risk of producing a broken string, either. What do you get if you truncate "Åström" after the first "o"? What do you get if you truncate 🇨🇦 after the first codepoint?

Normalization is not a real solution unless you restrict yourself to working with well-edited formal prose in common Western languages.

This is not a claim made from ignorance.

lolc · on Jan 24, 2021

Sorry, we're mixing two layers. Of course, if I truncate a string, it may lose its meaning. And having accents fall off is problematic. But it's not the same as truncating a byte-array, because then an invalid sequence of bytes may result.

Stop treating these cases as equivalent. They're not.

naniwaduni · on Jan 24, 2021

They are equivalent. The only reason you find it problematic that a sequence of bytes is "invalid" (read: can't be decoded in your preferred encoding) is because you've manufactured the problem.

In the end, the only layer at which it really matters whether your byte sequence can be decoded is the font renderer, and just being valid utf-8 isn't good enough for it either.

lolc · on Jan 24, 2021

> In the end, the only layer at which it really matters whether your byte sequence can be decoded is the font renderer

Ok that explains how we ended up here. I'm considering some other common uses! A search-index for example greatly profits from being able to normalize representations and split words.

naniwaduni · on Jan 24, 2021

Search index use cases probably also benefit from normalizing inputs across encodings, that's no shining example of utf-8-onlyism.

You can still best-effort split words! You can do a pretty good job splitting words without ensuring that the words decode in your preferred encoding.

lolc · on Jan 25, 2021

Here's the thing: I don't want to work in UTF8. I want to work in Unicode. Big difference. Because tracking the encoding of my strings would increase complexity. So at the earliest convenience, I validate my assumptions about encoding and let a lower layer handle it from then on.

I understand you're arguing about some sort of equivalency between byte-arrays and Unicode strings. Sure there are half-baked ways to do word-splitting on a byte-array. But why do you consider that a viable option? Under what circumstances would you do that?

naniwaduni · on Jan 25, 2021

Every circumstance. Why do you consider it unviable? What problems do you think having a Unicode sequence solves?

lolc · on Jan 25, 2021

Convince me. Here's a little library function that turns text into a set of words:

    def keywords(text):
        return set(filter(None, re.split("\W+", unicodedata.normalize("NFKC", input_str).lower())))

How would this look if strings were byte-arrays? How would `normalize()`, `lower()`, and `split()` know what encoding to use?

The way I see it: If the encoding is implicit, you have global state. If it's explicit, you have to pass the encoding. Both is extra state to worry about. When the passed value is a Unicode string, this question doesn't come up.

naniwaduni · on Jan 25, 2021

It looks pretty much the some, except that you assume the input is already in your library's canonical encoding (probably utf-8 nowadays).

I realize this sounds like a total cop-out, but when the use-case is destructively best-effort tokenizing an input string using library functions, it doesn't really matter whether your internal encoding is utf-32 or utf-8. I mean, under the hood, normalize still has to map arbitrary-length sequences to arbitrary-length sequences even when working with utf-32 (see: unicodedata.normalize("NFKC", "a\u0301 ﬃ") == "\xe1 ffi").

So on the happy path, you don't see much of a difference.

The main observable difference is that if you take input without decoding it explicitly, then the always-decode approach has already crashed long before reaching this function, while the assume-the-encoding approach probably spouts gibberish at this point. And sure, there are plenty of plausible scenarios where you'd rather get the crash than subtly broken behaviour. But ... I don't see this reasonably being one of them, considering that you're apparently okay with discarding all \W+.

suzuki · on Jan 25, 2021

I agree with you. I wish Python 3 had strings as byte sequences mainly in UTF-8 as Python 2 had once and Go has now. Then things would be kept simple in Japan. Python 3 feels cumbersome. To handle a raw input as a string, you must decode it in some encoding first. It is a fragile process. It would be adequate to treat the input bytes transparently and put an optional stage to convert other encodings to UTF-8 if necessary.

lolc · on Jan 25, 2021

I know this from PHP, where I have to be aware of the encoding the strings are in. I still don't see what the advantage should be of that.

Jasper_ · on Jan 24, 2021

So in one case, the text becomes corrupted and unreadable (i.e. loses its meaning), and in the other, it becomes corrupted and unreadable. What's the difference?

Having "accents fall off" has gotten people murdered [0]. Accents aren't things peppered in for effect, they turn letters into different letters, spelling different words. Analogously, imagine that a bunch of software accidentally turned every "d" into a "c" because some committee halfway around the world decided "d" should be composed of the "c" and "|" glyphs. That's the kind of text corruption that regularly happens in other languages when dealing with text at the code point layer.

[0] https://languagelog.ldc.upenn.edu/nll/?p=73 . Note that this is Turkish, which has the "dotted i" problem, meaning that this was more than likely a .toupper() gone wrong rather than a truncation issue.

lolc · on Jan 24, 2021

The difference is that for truncating, I can work within Unicode to deal with the situation. I can accept the possibility of mutilated letters, I can convert to NFC, I can truncate on word-boundaries, I have choice.

If I have an byte-array, I can do none of these things short of implementing a good chunk of Unicode. If I truncate, I risk ending up with an invalid UTF-8 string. End of story.

tsimionescu · on Jan 24, 2021

And what is wrong with an invalid UTF-8 string? Why were you truncating the string in the first place?

Basically, I believe the point here is that a Unicode aware truncation should be done in a Unicode aware truncate method. There is no good reason to parse a string as UTF-8 ahead of time - just keep it as a blob of bytes until you need to do some something "texty" with it. It is the truncate-at-word-boundaries() method that should interpret the bytes as UTF-8 and fail if they are not valid. Why parse it sooner?

Jasper_ · on Jan 24, 2021

> If I have an byte-array, I can do none of these things short of implementing a good chunk of Unicode. If I truncate, I risk ending up with an invalid UTF-8 string.

Yes, and? You can have an invalid sequence of Unicode code points too, such as an unpaired surrogate (something Python's text model actually abuses to store "invalid Unicode" in a special, non-standard way).

If you truncate at the byte level, you are just truncating "between code points"; it's a closer granularity than at the code point layer, so you can also convert to NFC, truncate on word boundaries, etc. You just need to ignore the parts of the UTF-8 string that are invalid; which isn't difficult, because UTF-8 is self-synchronizing.

Blikkentrekker · on Jan 24, 2021

> How would you have handled the string/bytes split in a way that’s backwards compatible?

A language pragma.

All functions that return `bytes` continue to do so unless specifically opted in on a per file basis, then they return `unicode`.

`str` thus returns `bytes` as it does in 2, unless the pragma ask otherwise.

> Or the removal of old-style classes?

They would obviously not be removed and still be available but depræcated.

orf · on Jan 24, 2021

> All functions that return `bytes` continue to do so unless specifically opted in on a per file basis, then they return `unicode`.

Nothing in py2 returns bytes. They all return strings. That is the issue. What about subclasses or type wrappers? What about functions that return bytes or utf8 strings? How would you handle code that then calls “.startswith()” on a returned string/bytes value?

A language pragma that fundamentally alters a built in type across all the code you have in a program is never going to work and pushes the burden onto library authors to support a large matrix of different behaviours and types.

It would make the already ridiculous py2 str/bytes situation even more ridiculous.

> They would obviously not be removed and still be available but depræcated.

Having two almost separate object models in the same language is rather silly.

Blikkentrekker · on Jan 24, 2021

> Nothing in py2 returns bytes. They all return strings. That is the issue.

No, that is not an issue, that is semantics.

What one calls it does not change the behavior. And aside that the system could perfectly well be designed that this pragma changes that `str` is synonymous with either `bytes` or `unicode` depending on it's state.

What about subclasses or type wrappers? What about functions that return bytes or utf8 strings? How would you handle code that then calls “.startswith()” on a returned string/bytes value?

You would now which is which by using the pragma or not.

Not using the pragma defaults to the old behavior, as said, one only receives the new, breaking behavior, when one opts in.

Python could even support always opting in by a configuration file option for those that really want it and don't want to add the pragma at the top of every file.

> A language pragma that fundamentally alters a built in type across all the code you have in a program is never going to work and pushes the burden onto library authors to support a large matrix of different behaviours and types.

Opposed to the burden they already had of maintaining a 2 and 3 version?

Any new code can of course always return `unicode` rather than `str` which in this scheme is normally `bytes` but becomes `unicode` with the pragma.

It would make the already ridiculous py2 str/bytes situation even more ridiculous.

> Having two almost separate object models in the same language is rather silly.

Yes, it is, and you will find that most languages are full of such legacy things that no new code uses but are simply for legacy purposes.

“It is silly.” turns out to be a rather small price to pay to achieve. “We have not broken backwards compatibility.”.

orf · on Jan 24, 2021

I don’t really have the time of inclination to continue arguing, but I will point out that you say all this as though the approach the team took failed. It worked. The ecosystem is on py3.

You can imagine some world with a crazy context-dependent string/bytes type. Cool. In reality this would have caused endless confusion, especially with beginners and the scientific community, and likely killed the language or at the very least made the language a shadow of what it is now.

They made the right choice given the outcome. Anything else is armchair postulation that was discussed previously and outright rejected for obvious reasons.

dataflow · on Jan 24, 2021

> It worked. The ecosystem is on py3.

Because they're doing everything they can to force py2 to go away. It's not it's dying a natural death out of disuse. Exhibit A is everyone else in this post still wanting to use it.

If you think strings "work" under py3, my guess is you've never had to deal with all the edge cases, especially across all the 3 major desktop platforms. Possibly because your applications are limited in scope. (You're definitely not writing general-purpose libraries that guarantee correctness for a wide variety of usage.) Most things Python treats as Unicode text by default (file contents, file paths, command-line arguments, stdio streams, etc.) are not guaranteed to be contain only Unicode. They can have invalid Unicode mixed into them, either accidentally or intentionally, breaking programs needlessly.

As a small example, try these and compare:

  python2 -c "import sys; print('Your input was:'); print(sys.argv[1])" $'\x80' | xxd
  python3 -c "import sys; print('Your input was:'); print(sys.argv[1])" $'\x80' | xxd

This program is content-agnostic (like `cat`, `printf`, etc.), and hence, with a decent standard library implementation, you would expect it to be able to pass arbitrary data through just fine. But it doesn't, because Python insists on treating arguments as Unicode strings rather than as raw data, and it behaves worse on Python 3 than Python 2. You really have to go out of your way to make it work correctly—and the solution is often pretty much to just ditch strings in many places and deal with bytes as much as possible... i.e., you realize Unicode strings were the wrong data type. But since you're still forced to deal with them in some ways, you get the worst of both worlds and that increases the complexity dramatically and it become increasingly painful to ensure your program still works correctly as it evolves.

I say all these because I've run into these and dealt with them, and it's become clear to me that others who love Unicode strings just haven't gone very far in trying to use them. Often this seems to be because they (a) are writing limited-scope programs rather than libraries, (b) confine themselves to nice, sanitized systems & inputs, and/or (c) take an "out-of-sight -> out-of-mind" attitude towards issues that don't immediately crop up on their systems & inputs.

orf · on Jan 24, 2021

> You're definitely not writing general-purpose libraries that guarantee correctness for a wide variety of usage.

At the risk of sounding like a dick, I’m a member of the Django technical board and have been involved with its development for quite a while. Is that widely used or general purpose enough?

If you want a string then it needs to be a valid string with a known encoding (not necessarily utf8). If you want to pass through any data regardless of its contents then you use bytes. They are two very different things with very different use cases.

If I read a file as utf8 I want it to error if it contains garbage, non-text contents because the decoding failed. Any other way pushes the error down later into your system to places that assume a string contains a string but it’s actually arbitrary bytes. We did this in py2 and it was a nightmare.

I concede that it’s convenient to ignore the difference in some circumstances, but differentiating between bytes/str has a lot of advantages and makes Python code more resilient and easier to read.

dataflow · on Jan 24, 2021

> I’m a member of the Django technical board and have been involved with its development for quite a while. Is that widely used or general purpose enough?

That's not quite what I was saying here. Note I said "wide variety of usage", not "widely used". Django is a web development framework—and its purpose is very clear and specific: to build a web app. Crucially, a web framework knows what its encoding constraints are at its boundaries, and it is supposed to enforce them. For examples, HTTP headers are known to be ASCII, HTML files have <meta ...> tags to declare encodings, etc. So if a user says (say) "what if I want to output non-ASCII in the headers?", your response is supposed to be "we don't let you do that because that's actually wrong". Contrast this with platform I/O where the library is supposed to work transparently without any knowledge of any encoding (or lack thereof) for the data it deals with, because that's a higher-level concern and you don't expect the library to impose artificial constraints of its own.

naniwaduni · on Jan 24, 2021

"If I read a book as Russian, I want it to error if it contains French, non-Russian contents because the decoding failed. Any other way pushes the error down later into your system to readers that assume a Russian passage contains Russian but it's actually arbitrary text. We did this in War and Peace and it was a nightmare."

orf · on Jan 24, 2021

“If I expect a delivery of war and peace in English, I want it to error if I actually receive a stone tablet containing Neanderthal cave paintings thrown through my window at night”. They are two very different things, even if they both contain some form of information.

naniwaduni · on Jan 24, 2021

You are engaged in some deep magical thinking about what encodings, to believe that knowing the encoding of a so-called string allows you to perform any operations on it more correctly than on sack of bytes. (Fewer, in fact—at least the length of a byte array has any meaning at all.)

It's an easy but very much confused mistake to make if the text you work with is limited to European languages and Chinese.

orf · on Jan 24, 2021

> You are engaged in some deep magical thinking about what encodings, to believe that knowing the encoding of a so-called string allows you to perform any operations on it more correctly than on sack of bytes.

Not really. How would “.toupper()” work on a raw set of bytes, which would either contain an MP3 file or UTF8 encoded text?

Every single operation on a string-that-might-not-be-a-string-really would have to be fallible, which is a terrible interface to have for the happy path.

How would slicing work? I want the first 4 characters of a given string. That’s completely meaningless without an encoding (not that it means much with it).

How would concatenation work? I’m not saying Python does this, but concatenation two graphemes together doesn’t necessarily create a string with len() == 2.

How would “.startswith()” work with regards to grapheme clusters?

Text is different from bytes. There’s extra meaning and information attached to an arbitrary stream of 1s and 0s that allows you to do things you wouldn’t have been able to do if your base type is “just bytes”.

Sure you could make all of these return garbage if your “string” is actually an mp3 file, aka the JavaScript way, but... why?

naniwaduni · on Jan 24, 2021

> Not really. How would “.toupper()” work on a raw set of bytes, which would either contain an MP3 file or UTF8 encoded text?

It doesn't. It doesn't work with Unicode either. No, not "would need giant tables", literally doesn't work—you need to know whether your text is Turkish.

> How would slicing work? I want the first 4 characters of a given string. That’s completely meaningless without an encoding.

It's meaningless with an encoding: what are the first four characters of "áíúéó"? Do you expect "áí"? What are the first four characters of "ﷺ"? Trick question, that's one unicode codepoint.

At least with bytes you know that your result after slicing four bytes will fit in a 4-byte buffer.

> How would concatenation work? I’m not saying Python does this, but concatenation two graphemes together doesn’t necessarily create a string with len() == 2.

It doesn't work with Unicode either. I'm sure you've enjoyed the results of concatenating a string with an RTL marker with unsuspecting text.

It gets worse if we remember try to ascribe linguistic meaning to the text. What's the result of concatenating "ranch dips" with "hit singles"?

> How would “.startswith()” work with regards to grapheme clusters?

It doesn't. "🇨" is a prefix of "🇨🇦"; "i" is not a prefix of "ĳ".

> Text is different from bytes. There’s extra meaning and information attached to an arbitrary stream of 1s and 0s that allows you to do things you wouldn’t have been able to before.

None of the distinctions you're trying to make are tenable.

mannykannot · on Jan 24, 2021

It is not clear to me whether there is a material difference here. Any text string is a sequence of bytes for which some interpretation is intended, and many meaningful operations on those bytes will not be meaningful unless that interpretation is taken into account.

The problem that you have raised here seems to be one of what alphabet or language is being used, but that issue cannot even arise without taking the interpretation into account. If you want alphabet-aware, language-aware, spelling-aware or grammar-aware operators, these will all have to be layered on top of merely byte-aware operations, and this cannot be done without taking into account the intended interpretation of the bytes sequence.

Note that it is not unusual to embed strings of one language within strings written in another. I do not suppose it would be surprising to see some French in a Russian-language War and Peace.

naniwaduni · on Jan 24, 2021

This implies that you should have types for every intended use of a text string. This is, in fact, a sensible approach, reasonably popular in languages with GADTs, even if a bit cumbersome to apply universally.

A type to specify encoding alone? Totally useless. You can just as well implement those operations on top of a byte string assuming the encoding and language &c., as you can implement those operations on top of a Unicode sequence assuming language and culture &c..

mannykannot · on Jan 24, 2021

To implement any of the above, while studiously avoiding anything making explicit the fact that the interpretation of the bytes as a sequence of glyphs is an intended, necessary and separable step on the way, would be bizzarre and tendentious.

I see you have been editing your post concurrently with my reply:

> You can just as well implement those operations on top of a byte string assuming the encoding and language &c., as you can implement those operations on top of a Unicode sequence assuming language and culture &c..

Of course you can (though maybe not "just as well"), but that does not mean it is the best way to do so, and certainly not that it is "totally useless" to implement the decoding as a separate step. Separation of concerns is a key aspect of software engineering.

naniwaduni · on Jan 24, 2021

> To implement any of the above, while studiously avoiding anything making explicit the fact that the interpretation of the bytes as a sequence of glyphs is an intended, necessary and separable step on the way, would be bizzarre and tendentious.

Codepoints are not glyphs. Nor are any useful operations generally performed on glyphs in the first place. Almost all interpretable operations you might want to do are better conceived of as operating as substrings of arbitrary length, rather than glyphs, and byte substrings do this better than unicode codepoint sequences anyway.

So I contest the position that interpreting bytes as a glyph sequence is a viable step at all.

mannykannot · on Jan 24, 2021

Fair enough, codepoints, but the issue remains the same: you keep asserting that it is pointless - harmful, actually - to make use of this one particular interpretation from the hierarchy that exists, without offering any valid justification for why this one particular interpretation must be avoided, while both lower-level and higher-level interpretations are useful (necessary, even.)

Going back to the post I originally replied to, how would going down to a bytes view avoid the problems you see?

naniwaduni · on Jan 24, 2021

Let me rephrase. Codepoints are even less useful than abstract glyphs, cf. https://manishearth.github.io/blog/2017/01/14/stop-ascribing... (I don't agree 100% with the write-up, and in particular I would say that working on EGCs is still just punting the problem one more layer without resolving it; see some of my other posts in this thread. But it makes an attempt at clarifying the issue here.)

The choice of the bytes view specifically is just that it's the most popular view from which you can achieve one specific primitive: figuring out how much space a (sub)string occupies in whatever representation you store it in. A byte length achieves this. Of course, a length in bits or in utf-32 code units also achieves this, but I've found it rather uncommon to use utf-32 as a transfer encoding. So we need at least one string type with this property.

Other than this one particular niche, a codepoint view doesn't do much worse at most tasks. But it adds a layer of complexity while also not actually solving any of the problems you'd want it to. In fact, it papers over many of them, making it less obvious that the problems are still there to a team of eurocentric developers ... up until emoji suddenly become popular.

Now, I can understand the appeal of making your immediate problems vanish and leaving it for your successors, but I hope we can agree that it's not in good taste.

mannykannot · on Jan 24, 2021

While all the facts in this post appear correct, they do not seem to me to amount to an argument either for the proposition that an implementation at the utf-8 level is uniquely harmful, or that a bytes-level approach avoids these problems.

For example, working with the utf-8 view does not somehow foreclose on knowing how much memory a (sub)string occupies, and it certainly does not follow that, because this involves regarding the string as a sequence of bytes, this is the only way to regard it.

For another, let's consider a point from the linked article: "One false assumption that’s often made is that code points are a single column wide. They’re not. They sometimes bunch up to form characters that fit in single “columns”. This is often dependent on the font, and if your application relies on this, you should be querying the font." How does taking a bytes view make this any less of a potential problem?

Is a team of eurocentric developers likely to do any better working with bytes? Their misconceptions would seem to be at a higher level of abstraction than either bytes or utf-8.

You are claiming that taking a utf-8 view is an additional layer of complexity, but how does it simplify things to do all your operations at the byte level? Using utf-8 is more complex than using ascii, but that is beside the point: we have left ascii behind and replaced it with other, more capable abstractions, and it is a universal principle of software engineering that we should make use of abstractions, because they simplify things. It is also quite widely acknowledged that the use of types reduces the scope for error (every high-level language uses them.)

naniwaduni · on Jan 24, 2021

The burden of proof is on showing that the unicode view is, in your words, a more capable abstraction. My thesis is that it is not. This is not because it necessarily does anything worse (though it does). It must simply do something better. If there were actually anything at all it did better—well, I still wouldn't necessarily want it as a default but it would be a defensible abstraction.

The heart of the matter is that a Unicode codepoint sequence view of a string has no real use case.

There is no "universal principle" that we use abstractions always, regardless of whether they fit the problem; that's cargo-culting. An abstraction that does no work is, ceteris paribus, worse than not having it at all.

mannykannot · on Jan 25, 2021

> The burden of proof is on showing that the unicode view is, in your words, a more capable abstraction. My thesis is that it is not.

The quote, as you presented it, leaves open the question: more capable than what? Well, there's no doubt about it if you go back to my original post: more capable than ascii. Up until now, as far as I can tell, your thesis has not been that unicode is less capable than ascii, but if that's what your argument hangs on, go ahead - make that case.

What your thesis has been, up to this point, is that manipulating text as bytes is better, to the extent that doing it as unicode is harmful.

> It must simply do something better. If there were actually anything at all it did better...

It is amusing that you mentioned the burden of proof earlier, because what you have completely avoided doing so far is justify your position that manipulating bytes is better - for example, you have not answered any of the questions I posed in my previous post.

> The heart of the matter is that a Unicode codepoint sequence view of a string has no real use case.

Here we have another assertion presented without justification.

> There is no "universal principle" that we use abstractions always, regardless of whether they fit the problem...

It is about as close as anthing gets to a universal principle in software engineering, and if you want to disagree on that, go ahead, I'm ready to defend that point of view.

>... that's cargo-culting.

How about presenting an actual argument, instead of this bullshit?

Furthermore, you could take that statement out of my previous post, and it would do nothing to support the thesis you had been pushing up to that point. You seem to be seeking anything in my words that you think you can argue against, without regard to relevance - but in doing so, you might be digging a deeper hole.

> An abstraction that does no work is, ceteris paribus, worse than not having it at all.

Your use of a Latin phrase does not alter the fact that you are still making unsubstantiated claims.

naniwaduni · on Jan 25, 2021

Put it this way: claim a use-case you believe the unicode view does better on than an array of bytes. Since you're making the positive claim, this should be easy.

I guarantee you there will be a quick counterexample to demonstrate that the claimed use-case is incorrect. There always is.

You may review the gish gallop in the other branch of this thread for inspiration.

gugagore · on Jan 24, 2021

If I recall, this is the solution: https://stackoverflow.com/a/27185688

I don't know why there isn't a sys.argvb as there is is.environb

Wowfunhappy · on Jan 24, 2021

Fwiw, the python3 version didn't run at all for me in Python 3.9.0 on Mac.

    "UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 0: surrogates not allowed

Chris2048 · on Jan 24, 2021

> It worked. The ecosystem is on py3.

Primarily because they killed py2, not because they won people over with their Unicode approach.

Wowfunhappy · on Jan 24, 2021

I agree with this, but I would make one important tweak: make the new behavior opt-out, instead of opt-in, with a configuration file option for switching the default.

You're still breaking code by default this way, but no one would have trouble updating.

My concern is that, if you don't make the preferred behavior clear, a lot of people would simply never adopt it. I don't think that Python's userbase in particular is going to spend time reading documentation on best practices.

Blikkentrekker · on Jan 24, 2021

I do believe that such a trivial change would indeed be fine. If one can go to the effort of installing the new version, one can add one line in a configuration file to depend upon old behavior.

IshKebab · on Jan 24, 2021

Yeah even CMake got the bit right. CMake!!

ktpsns · on Jan 24, 2021

I think some modular approach could have solved the incompatiblity issue, such as "from future import ...". Shorthands could have been invented to define everything in a single line.

Perl5 has similar flags ("use strict"), and Racket brings it even further to define the whole fucking language of the rest of the file ("#lang racket/gui"). Having the language being choosable by the user is against the "zen of python", I guess. In other words: Such an attemp does not feel "pythonic".

orf · on Jan 24, 2021

For some syntactical sugar, like print-function, sure. But there are more fundamental changes that couldn’t be papered over.

jamiesonbecker · on Jan 24, 2021

Fundamental changes shouldn't have been papered over by calling the new language Python, as it was a fundamentally different language at that point.

orf · on Jan 24, 2021

No, it’s the same language but with different semantics around a specific type. That’s not a different language and code can co-exist with a bit of thought.

JasonFruit · on Jan 24, 2021

Every language goes through this at some point in its development: flaws that limit future development have to be fixed. Should every language rename itself and split its community at that point? That seems like an extreme response to a common problem.

Chris2048 · on Jan 24, 2021

> Should every language rename itself and split its community at that point?

yes. If breaking, fundamental changes are common, that's a problem.

JasonFruit · on Jan 24, 2021

That people can make an initial plan that is self-consistent, logical, and foresees and provides for all future use-cases is a basic tenet of waterfall-style development. The history of software engineering does not uphold that principle. Why would it be different for language designers?

Chris2048 · on Jan 24, 2021

Just because an unforeseen issue arises, doesn't mean you need to introduce a breaking change right away.

djur · on Jan 24, 2021

The "new language" is called Python 3.

a1369209993 · on Jan 27, 2021

Yes, yes it is. And, like "Perl 6" and to a lesser extent "C++", that name is misleading (and therefore bad), because there is already a different language called "Python" (respectively "Perl", "C"), with significant superficial similarities that it could be confused with.

lizmat · on Feb 1, 2021

Please note that the misleading part of Perl 6 has been fixed by renaming it to the Raku Programming Language (https://raku.org using the #rakulang tag on social media).

nicoburns · on Jan 24, 2021

> How would you have handled the string/bytes split in a way that’s backwards compatible?

My understanding is that the corresponding types are available in both 2 and 3, they're just named differently. A different one is "string". So you could have had some kind of mode directive at the top of the file which controlled which version that file was in, and allow files from 2 and 3 to be run together.

orf · on Jan 24, 2021

Actually think about it. bytes is str in Python 2. There is no bytes type in py2. How would a per-file directive (of all things) help?

What if one function running in “py2 mode” returned a string-that-is-actually-bytes, how would a function in “py3 mode” consume it? What would the type be? If different, how would it be detected or converted? What if it retuned a utf8 string OR bytes? What if that py3 function then passed it to a py2 function - would it become a string again? Would you have two string types - py2string that accepts anything and py3string that only works with utf8? How would this all work with C modules?

nicoburns · on Jan 24, 2021

> What if one function running in “py2 mode” returned a string-that-is-actually-bytes, how would a function in “py3 mode” consume it? What would the type be?

It would be bytes. Because py2 string === py3 bytes.

> What if that py3 function then passed it to a py2 function - would it become a string again?

Yes

> Would you have two string types - py2string that accepts anything and py3string that only works with utf8?

Yes. You already have those two types in python3. bytes and string. You'd just alias those as string and utf8 or whatever you want to call it in python2.

How would this all work with C modules?

They'd have specify which mode they were working with too.

orf · on Jan 24, 2021

But all this would require huge rewrites of code and would never be backward compatible. You’re trading “py2 vs py3” with “py2 mode vs py3 mode”.

So you’d have some magic code that switches py2str to bytes. Which means every py3 caller has to cast bytes into a string to do anything useful with it, because returning strings is the most common case. Then that code has to be removed when the code it’s calling is updated to py3 mode. Which is basically the blue/green issue you see with async functions but way, way worse.

Then you’d need to handle subclasses, wrappers of bytes/str, returning collections of strings across py2/py3 boundaries (would these be copies? Different types? How would type(value[0]) work?), ending up with mixed lists/dicts of bytes and strings depending on the function context, etc etc.

It would become an absolute complete clusterfuck of corner cases that would have killed the language outright.

nicoburns · on Jan 24, 2021

> You’re trading “py2 vs py3” with “py2 mode vs py3 mode”.

Yes, that's the whole point. Because compatible modes allow for a gradual transition. Which in practice allows for a much faster transition, because you don't have to transition everything at once (which puts some people off transitioning entirely - making things infinitely harder for everyone else).

Languages like Rust (editions) and JavaScript (strict mode) have done this successfully and relatively painlessly.

> So you’d have some magic code that switches py2str to bytes. Which means every py3 caller has to cast bytes into a string to do anything useful with it, because returning strings is the most common case. Then that code has to be removed when the code it’s calling is updated to py3 mode. Which is basically the blue/green issue you see with async functions but way, way worse.

Well yes, you'd still have to upgrade your code. That goes with a major version bump. But it would allow you to do it on a library-by-library basis rather than forcing you to wait until every dependency has a v3 version. Have that one dependency that keeping you stuck on v2: no problem, upgrade everything else and wrap that one lib in conversion code.

> Then you’d need to handle subclasses, wrappers of bytes/str, returning collections of strings across py2/py3 boundaries (would these be copies? Different types? How would type(value[0]) work?), ending up with mixed lists/dicts of bytes and strings depending on the function context, etc etc.

I'm not sure I understand the problem here. The types themselves are the same between python 2 and 3 (or could have been). It's just the labels that refer to them that are different. A subclass of string in python 2 code would just be a subclass of bytes in python 3 code.

musicale · on Jan 26, 2021

We have lots of existence proofs of languages evolving gracefully and not throwing old code off a cliff.

Python 3 made the wrong trade-off of core developer hours vs. external developer hours.

Nullabillity · on Jan 24, 2021

py2 str == py3 bytes

py2 unicode == py3 str

The problem with this approach is that they wanted to reuse the `str` name, which requires a big "flag day", where it switches meaning and compatibility is effectively impossible across that boundary (without ugly hacks).

What they could have done instead would have been to just rename `str` to `bytes`, but retain a deprecated `str` alias that pointed to `bytes`.

That would keep old scripts running indefinitely, while hopefully spewing enough warnings that any maintained libraries and scripts would make the transition.

Eventually they could remove `str` entirely (though I'd personally be against it), but that would still give an actual transition period where everything would be seamlessly compatible.

Same thing with literals: deprecate bare strings, and transition to having to pick explicitly between `b"foo"` and `u"foo"`. Eventually consider removing bare strings entirely. DO NOT just change the meaning of bare strings while removing the ability to pick the default explicitly (in contrast, 3.0 removed `u"asdf"`, and it was only reintroduced several versions later).

What made me personally lose faith in the Python Core team wasn't that Guido made an old mistake a long time ago. It wasn't that they wanted to fix it. It was the absolutely bone-headed way that they prioritized aesthetics over the migration story.

falcor84 · on Jan 24, 2021

>Would you have two string types - py2string that accepts anything and py3string that only works with utf8? Yes. A single naive one for py2 and two separate ones for py3 bytes and unicode. All casting between the two would have to be made explicit.

> How would this all work with C modules? In non-strict mode, you'd be able to use either py2 strings or py3 bytes with these, and gradually move all modules to strict mode which requires bytes.

And then, gradually after a decade or so attempt to get rid of all py2 types.

musicale · on Jan 24, 2021

> How would you have handled the string/bytes split in a way that’s backwards compatible? Or the removal of old-style classes?

I'm not sure it's the best way to handle it, but I would have been fine with:

    from __python2__ import *

for full backward compatibility; or, more explicitly:

    from __python2__ import ascii_strings, old_style_classes, print_statement, ...

As the parent poster mentions, several other popular languages and systems (C++, Java, etc.) have done a pretty decent job preserving backward compatibility, for good reason: it saves millions of hours of human effort. It's embarrassing and disappointing that Python simply blew it with the Python 2 to 3 transition.

Maybe we could still evolve pypi to support a compatibility layer to allow easy mixing of python2 and python3 code, but I get the feeling that Python 3 has poisoned the well.

imtringued · on Jan 24, 2021

When I was learning Python 6 years ago I was the only one using Python 3 in my group because I use arch linux. It was very basic code and everyone basically solved the same problem. Everyone else's code didn't work on my machine because print is not a statement in Python 3.

That's just plain stupid. Just print a warning and add a python2 flag that hides the warning. Don't release a major version because of something trivial like this.

orf · on Jan 24, 2021

They didn’t release A major version because they changed print from a statement to a function.

TheGoddessInari · on Jan 24, 2021

Python gave everyone 12 years to deal with version 3 being the way forward. There are many fundamental changes.

The fact that people seem to complain exclusively after Python 2' end of life a year ago feels a little telling. Perl's community roffle stomped their previous vision for Perl 6. Python community wasn't vocal about this being a bad change. Rather the opposite, very loud support.

Keep in mind, I dislike Python either way, but I'm not one of the devs that complains about continuing education requirements, or language adding things over each 10 year period. I can work in Python just fine, but that doesn't mean it feels nice & hygienic to use for me personally.

aiiie · on Jan 24, 2021

The Python core devs did not have the time or motivation to support the old codepaths in the CPython runtime, and the legacy code was getting in the way of a lot of longtime wants and needs for improving performance, runtime maintainability, language ergonomics, and the standard library. They also specifically increased the major revision number to signal their intent to move on from that legacy.

But you kind of addressed your entire spiel: Hindsight is exceedingly easy. They didn't realize how inadequate their migration tooling was, or how very entrenched Python 2 is in various places. It's hard when you don't know what you don't know and you're highly motivated by hopeful aspirations.

logicchains · on Jan 24, 2021

>The Python core devs did not have the time or motivation to support the old codepaths in the CPython runtime, and the legacy code was getting in the way of a lot of longtime wants and needs for improving performance, runtime maintainability, language ergonomics, and the standard library.

They could have fixed most of this legacy code without changing the external user-facing API so much.

lexicality · on Jan 24, 2021

> They could have fixed most of this legacy code

They could have, but they didn't want to.

It's an open source project. Is there really much of a difference between "I'm not going to work on this system because it's terrible" and "I'm forking this system and I'm not going to support the previous version"?

In both cases you can say "well someone else will just come along and support it", and for py2 they did, for a bit. In fact I believe you can still pay if you happen to want py2 support.

But if you're not paying, you're saying "hey, this thing you work on and provide to me for free - why are you working on it in the way you want rather than the way I want??"

Chris2048 · on Jan 24, 2021

> In both cases you can say "well someone else will just come along and support it", and for py2 they did, for a bit

Was python-2 handed off to new maintainers? News to me.

> why are you working on it in the way you want rather than the way I want

Is "it" python-2 or python-3?

This isn't users demanding py3 devs support py2 - it's users asking that devs who no longer want to support py2 to hand it off to those that do, rather than blocking it.

tsimionescu · on Jan 24, 2021

When I say "the developers of X could have done Y better" I don't mean that they owe it to me in any way to have done so.

I'm just judging their technical decision making. They are perfectly entitled to delete the whole project and start a new one and I have absolutely no right to say they shouldn't.

But I do have a right to critique their decisions from a technical standpoint.

zbuf · on Jan 24, 2021

Yes it's interesting if this is cited as one of the motivations.

That's a problem of a language being oriented around a single implementation. Is it even defined by this implementation?

Compare to eg. C or C++.

Diversity, and interoperability is important as it is a significant contributor to longevity.

I do like that you've used the term "API" as I think that sums it up. To think of "Python" not as a language agreed by multiple implementors, the behaviour here is that of a "library" with an "API".

sgt · on Jan 24, 2021

It would have taken considerable effort, regardless. This cost was offset onto the development teams in companies doing migrations. It was the decision made and if you don't like it, consider using another programming language. Perhaps consider that it's an open source project with a lot of contributors essentially working for free.

unishark · on Jan 24, 2021

This explanation can be used to justify anyone breaking anything. You might as well argue no project ever needs to care about backwards compatibility.

paulryanrogers · on Jan 24, 2021

How many downstream, Python dependent companies were funding its development? Everyone is entitled to their opinion. But if they build their system on a platform outside their control then they'll have to roll with the changes, fork it, or move to a different platform / fork.

willis936 · on Jan 24, 2021

I don’t think hindsight can be claimed here. It was a decision that was not made from ignorance. The Python developers chose to sacrifice backwards compatibility. Other languages do not typically make such choices and if they do they make updating codebases relatively easy.

Nothing about python versioning is easy. It’s a disaster and the key reason I do not start any projects in python.

Blikkentrekker · on Jan 24, 2021

> The Python developers chose to sacrifice backwards compatibility.

And it is quite clear that that choice was not based on accurate estimates and insights.

The original e.o.l. was laughably short and then had to be doubled. It was quite clear they based their choice on the assumption that consumers would have all switched to e at a time when 2 was still used by 80%.

They made that choice based on what can only be seen as complete ignorance of the cost of rewriting software.

Right now, the biggest reason to drop Python 2 for most serious consumers is not any of the improvements that Python 3 brings, but that it is e.o.l..

ZuLuuuuuu · on Jan 24, 2021

> Other languages do not typically make such choices and if they do they make updating codebases relatively easy.

I want to understand what was so hard about porting code from Python 2 to 3. I ported a few tens of thousand lines of Python 2 code to Python 3 and it was pretty trivial. In my experience the only thing that made porting code hard was when a package you depended on was not ported to Python 3 yet. But maybe my experience does not reflect some other cases. Can you eloborate on what was so hard about porting code from Python 2 to 3?

willis936 · on Jan 24, 2021

I have a pretty nasty example here:

https://news.ycombinator.com/item?id=25890126

How do I regression test five different pieces of DAQ hardware? My best plan is to pull them working systems and deal with them missing. I don’t think it’s a good use of resources to buy extra DAQ cards just for a regression test bed.

Regardless of that, moving from python 2.5 to 2.7 is not trivial because not all used libraries were even updated to 2.7 from 2.5. Some that were broke backwards compatibility. How far do I have to bend backward just to get in the right place to update to python 3? I see many comments trivializing the effort needed to update to python 3 because they know of narrow use cases and expect large amounts of resources to maintain code. That isn’t the reality for most users.

OakNinja · on Jan 24, 2021

The hardship of porting from 2 to 3 very much depends on how critical the software is. Porting 1000 lines of python 2 that deals with files encoded in various ways where it’s impossible to test all edge cases and where a failure might lead to huge liability charges is hard not because it’s hard to run 2to3 and do some random tests, but because you don’t know what you have missed. And still, a 300k lines of code project might be fine to just run 2to3 on and then find the bugs as you go. It’s a matter of context.

wegs · on Jan 24, 2021

And the opposite -- tons of little unimportant scripts sitting around that add a lot of value as a whole, but just aren't worth devoting to rewriting on a whim of poor decision making skills of the Python developer team....

I have Python code dating back to when I was an undergrad. It's sad to see the Python team decide to nuke that. My C code from then (mostly) runs fine still.

The team decided to externalize a massive cost on its community without much benefit. That was sad to see at the time, and it continue to be sad to see.

labawi · on Jan 24, 2021

As someone who does a decent part of development in python, I'd say you are using the wrong language, if you can't test your edge cases and have huge liabilities.

Python code is inherently almost-untestable and fragile. These days, when coding something critical and non-trivial, I choose a memory-safe language with static typing and type inference, ADTs, pattern matching and try to write simple yet pure functional code with well defined semantics, that works almost by definition.

carapace · on Jan 24, 2021

Sure, but how does that help? I mean, absent a time machine, you're technically correct but operationally moot.

labawi · on Jan 24, 2021

If you have a liability situation, maybe you could work to rectify it, instead of blaming 2to3 transition that's shaking up your house of cards?

carapace · on Jan 24, 2021

> If you have a liability situation, maybe you could work to rectify it

Well yes, sure, of course.

And like you said, maybe Python isn't the right language in the first place for mission-critical life-is-on-the-line software.

But if you have already gotten yourself into a position where some piece of your business infrastructure is dependent on an obscure bit of hard-to-port-to-Python-3-and-maintain-exact-behaviour Python code, then it is exactly the "2to3 transition that's shaking up your house of cards", no?

And, furthermore, like you said, if you find yourself in this position, you should be looking at some other language entirely rather than porting to Py3, eh?

labawi · on Jan 25, 2021

Note that I am not against using python in mission-critical code.

I was referring to untestable code with a myriad of edge-cases, in which case you have a problem that will surface sooner or later, be it 2to3 transition or something else.

If the code is truly static, you can ignore the transition and deprecation. Otherwise you should probably work on documentation/testing/refactoring and/or porting to another language.

2to3 transition was handled badly, up to about 2.7 and 3.4 or so, but the pains described here seem mostly self-inflicted, and I don't see it as an argument against the needed changes.

Blikkentrekker · on Jan 24, 2021

These are exactly the concerns of serious enterprises that the Python developers have missed, that made them seem as though they be hobbyists that have never dealt software that actually powers infrastructure.

paulryanrogers · on Jan 24, 2021

Python was intended for education originally. It's possible that some uses are just too far outside that wheelhouse to expect it to work well forever. Doubtful I'll ever write desktop GUIs in PHP for example, though it appears some have already done it.

Blikkentrekker · on Jan 24, 2021

The standard library of 2 already came with many facilities that go well beyond that.

They targeted business; it came to be adopted by business; and then they were surprised that business was not enthusiastic about updating currently working code with all the potential regressions and downtime that might come from it.

eesmith · on Jan 24, 2021

Could you explain how "Python was intended for education originally."?

As I recall, Python was designed for the Amoeba operating system, and drew on experience from implementing ABC; ABC was definitely designed for education.

But ABC != Python. Checking now, the first Usenet post for Python 0.9 says:

> Python can be used instead of shell, Awk or Perl scripts, to write prototypes of real applications, or as an extension language of large systems, you name it.

See https://groups.google.com/g/comp.sys.sgi/c/7r8kVgQ84j0 .

It doesn't specifically mention using Python for education.

ZuLuuuuuu · on Jan 24, 2021

There are certainly some cases where even the smallest backward incompatible change would cause serious problems on some systems. Thanks for giving an example, instead of just downvoting.

lmm · on Jan 24, 2021

The problem was that you could only port once all the libraries you use had ported, but libraries didn't want to commit to abandoning Python 2 quickly.

ZuLuuuuuu · on Jan 24, 2021

Agreed, that was also my experience as well, the hardest part was not changing our codebase but if we depended on a package that was not ported to Python 3 yet.

Chris2048 · on Jan 24, 2021

> The Python core devs did not have the time or motivation to support the old codepaths

Then sounds like they didn't want to be python devs anymore, good luck on their new project..

Instead they held onto the reins and drove python into the ground so that their new code could devour the remains of the old.

> They didn't realize how inadequate their migration tooling was

A shame then that they decided that migration was mandatory. They don't need to know either, they just have to encourage users to migrate, rather than force them to. Saying "They didn't realize ... how very entrenched Python 2 is" is basically saying "we didn't think we'd encounter (significant) resistance". Their "hopeful aspirations" was that everybody (that mattered) would be onboard, which is why they didn't bother ask..

rtpg · on Jan 24, 2021

There are a billion blog posts about the python core developers acknowledging their mistakes and saying they would handle future changes much differently.

This post might be true, but it's roughly 10 years late in terms of hitting the intended audience. Everyone gets this now, and "beating a dead horse" might be an understatement

jacquesm · on Jan 24, 2021

It's in response to something happening today.

Some dead horses need a serious beating every now and then to remind people that they can resurrect if you're not careful. All of the lessons the python team did not put into practice were well known at the time, but they knew better and here we are.

The day after tomorrow someone will make breaking changes to some API, framework, language or OS who still needs to learn this lesson, maybe we'll get to them in time.

vflagger · on Jan 24, 2021

The lessons have not arrived at the current Python cabal. They just deprecated unittest and are seriously considering to break parts of the C-API again.

For the people who work in the correct companies this will generate many billable hours for no gain.

For others it will be a lot of unpaid work again. At this stage Python should be forked.

carapace · on Jan 24, 2021

> At this stage Python should be forked.

I seriously (I mean seriously) thought about forking Py2 (Tauthon is great BTW) but then I found out that PyPy has a Python2 mode and will for the foreseeable future. Just to be clear: PyPy runs Python 2 code, and always will. (As far as I know. Although it occurs to me that I have no idea what it's like if you're trying to work with the C API.)

(Also I got into Prolog, but that's another story.)

orf · on Jan 24, 2021

Where has unittest been deprecated?

hwmartins · on Jan 24, 2021

Probably a mixup with distutils, which has just been deprecated.

musicale · on Jan 26, 2021

> Python should be forked

It is pretty much forked.

Blikkentrekker · on Jan 24, 2021

Apparently not everyone gets it, seeing that many are arguing against it, and every time this subject lands there are many ignorant users that say “Just upgrade your code.” as if that be free.

rtpg · on Jan 25, 2021

Python 2.7 still works as a binary. You can vendor all your requirements. The rug is not being pulled out from anyone, we’re looking at 10+ years of this.

pypi is a mostly volunteer-only endeavor, so it’s tough to support stuff forever. And even there older pips still will work!

Python 2 still works! It’s still there! Nobody is taking it away from you in any real sense. But Python developers don’t want to continue developing in that environment so are choosing to not handle it for future stuff.

Python 2 works. You can use it forever if you want. Nobody is forcing you to upgrade... except if you want the free labor from the community. And you have had years and years and years.

hitekker · on Jan 24, 2021

Uh, have they acknowledged their mistakes?

One or two of them are in this comment section right now, denying culpability and fanning flamewars.

Not quite the behavior of the repentant.

ZuLuuuuuu · on Jan 24, 2021

> Serious enterprises do not like to invest in something if it mean that 10 years later they would have to rewrite their entire codebase again.

Python 2 to Python 3 was nothing like rewriting an entire codebase. Most of the difficulty was if you depended on a package that only supported Python 2, other than that it was pretty easy to port a Python 2 codebase to Python 3. If you have millions of lines of code it might take more time understandably, but still it was nothing like rewriting a whole codebase.

bildung · on Jan 24, 2021

> The very swarms of users that chanted “just upgrade” as if that not incur a significant cost also seemed ridiculously naïve to me, not understanding the real cost that large projects have by having to rewrite very extensive codebases and dealing the potential regressions that that might involve.

And yet we haven't heard of this being an actual, real problem, or are there any high profile examples?

I had to migrate multiple small projects (~10k loc) myself. That should be the typical use case for python (power law etc.) The whole thing took about half an hour per 1000 loc, and I had more than 10 years to plan it.

TomSwirly · on Jan 24, 2021

> The minor improvements of Python 3 did not warrant breaking backwards compatibility and most could have been handled in a way that would not break it opt-in directives.

There were serious issues in Python 2 that could not be fixed in any backward compatible way, and would have made further progress forward impossible.

It wasn't done lightly and a lot of smart people thought about it for a long time.

---

And your old Python 2 scripts will continue to work forever, so I'm not quite sure what your beef is.

Chris2048 · on Jan 24, 2021

Progress needs to be made, and sometimes dropping support for stuff you no longer want to spend time supporting makes sense.

That said, I still think situation is mishandled for this reason: py3 is basically another language, similar to py2. Calling it py3 is an exercise in marketing - instead of creating a new language to compete with py2 (along with all similar languages e.g. Julia) the existing py2 community was leveraged/shamed* into supporting the new thing, and most importantly, py2 was killed by it's maintainers (rather than handed off) so it couldn't compete with py3, and so that users would be forced to move somewhere else - py3 being the easiest.

If it had properly been a new language, they could have taken more liberties (compat breaking) to fix issues, like a single official package manager. And migration to py3 would have been more by consent, than by force.

* https://python3wos.appspot.com/

carapace · on Jan 24, 2021

Very much this. It's a separate language that, if it hadn't been pushed by BDFL and co., if it had appeared as an independent project (like e.g. Stackless Python or something), would have had to live or die on its own merits.

- - - -

An additional aspect that I see as an old Python user is the "poisoning of the well" of the inclusive and welcoming spirit of the community. We (I'm speaking as a Pythonista here) have had problems with this in the past (remember how grouchy effbot could be? He's a sweet person IRL though.)

We made great progress and got a lot of acceptance in the educational and academic worlds.

Now just read this very thread and you'll find so many people making curt dismissive comments to folks who aren't on board with Python 3.

I still love and respect GvR (I once, with his permission, gave him a hug!) even though I think he messed up with this 2->3 business (and in any event, the drama around language innovation eventually pushed him to resign, as we all know.) He's a human being. And a pretty good one.

I guess what I'm trying to say is Python 3 won. Let us (all of us) be gracious about it.

Hendrikto · on Jan 24, 2021

> I simply do not have the time to rewrite the many Python 2 scripts that have written over the years that run my computer.

In 12 years? You must have written some very extensive scripts.

blandflakes · on Jan 24, 2021

While I also find the timeline totally reasonable, I think most "I don't have the time" complaints are probably less about being able to finish it in time, and more about wanting to spend time doing something other than rewriting otherwise finished or stable code to satisfy a backwards-incompatible change.

tinus_hn · on Jan 24, 2021

Really nice of them to spend their time supporting your case for free though.

It’s open source, you can fund some program to keep supporting python 2.

a1369209993 · on Jan 24, 2021

> It’s open source, you can fund some program to keep supporting python 2.

No, actually, you can't - last time I checked, they were specifically threatening[0][1][others] to sue anyone who tried to continue developing Python for trademark infringement (despite that they are ones falsely using the trademark for something other what it got it's reputation from).

0: https://lwn.net/Articles/711092/

1: https://github.com/naftaliharris/placeholder/issues/47

tinus_hn · on Jan 24, 2021

So they chose their own name and released under that.

Here’s the Open Source Definition:

https://opensource.org/osd

> The license may require derived works to carry a different name or version number from the original software.

pyrale · on Jan 24, 2021

> Do these men think that time is free?

Written by someone that, for some reason, did not decide to maintain their own fork of python2. If time isn't free, why is it expected from maintainers to support other companies' lifestyle with their own time?

Chris2048 · on Jan 24, 2021

It isn't - just hand it off.

If you don't like the laws, are you a hypocrite for not starting your own country?

Arguments in this thread seem to miss a discrepancy:

    "We don't want to support py2, and so why should we? Our time isn't free and we do what we want!"

    "We know you don't want to migrate your py2 code, but you have to."

Forks aren't easy, especially when you get no support from the "official" python-2 maintainers. At the very least, a fork would not own the name.

Here's a question - why isn't python-3 a fork of python? Answer: because forks are hard, and the devs wanted to keep all the momentum/resources of python-2.

pyrale · on Jan 24, 2021

The fork comment is not meant to be a realistic suggestion, it just points out that there is work needed to maintain compatibility. The thing is, you can't both complain about the time it takes to migrate your project _and_ expect maintainers to spend an incommensurate amount of time maintaining suff for you, free of charge.

I understand that some people and companies are now caught between a rock and a hard place right now. But honestly, that rock has been coming for 12 years now, and the alternative is to put other people in that situation.

Chris2048 · on Jan 25, 2021

Sure, but the assumption is py3 devs do the work in order to dismiss the idea and suggest people are entitled.

py3 devs don't need to do the work, they just need to hand it off.

> The thing is, you can't both..

Yes you can, if "maintaining" is handing it off, as opposed to the straw man of forcing py3 devs to do it. Why do the gatekeepers only allow for themselves to do the work?

> that rock has been coming for 12 years

notice is not consent.

> the alternative is to put other people in that situation

12 years is enough time to hand off to people who are happy to maintain py2. But there was no choice given.

raverbashing · on Jan 24, 2021

I've already tried running old C++ projects and every time something breaks, so it's not as clear cut as you make it to be

Some things in Python 2 were not fixable by keeping it backwards compatible. Print as a statement? Sure. But strings/byte arrays, no way.

Of course they could have made the Py2 implementation less broken and less stupid (yes please do use ASCII as the default, ignore the existence of unicode, be trigger-happy about errors, etc)

flohofwoe · on Jan 24, 2021

The whole string/bytearray disaster could have been prevented if strings would always be UTF-8 encoded. That way strings and bytearrays can continue to be different views on the same data. The great divide between byte- and string-data was completely pointless, especially in 2008 when python3 was started because by that time UTF-8 was already firmly established for at least a decade (it would be an excusable design fault only in the 1990's).

jeroenhd · on Jan 24, 2021

> by that time UTF-8 was already firmly established for at least a decade

For text files, maybe, but various APIs like the Window API and the Java String API still use UTF-16.

UTF-8 dependence is also a major pain for many where the local character set conflicts with UTF-8. For example, there's still a lot of Japanese files out there in SJIS that need to be decoded accordingly. The country of Myanmar officially switched to unicode less than two years ago so if you still need to operate on older data, you're going to need to support their old character set.

UTF-8 as a fixed encoding only works if you manage to write mappers from and to alternative character set for practically any language outside US English. Instead of breaking compatibility with most libraries, python3 would have broken compatibility with most libraries and a few countries instead.

Just like the rest of the world has to deal with three countries refusing to switch to metric, python3 needed to deal with countries refusing to switch to UTF8.

flohofwoe · on Jan 24, 2021

> UTF-8 as a fixed encoding only works if you manage to write mappers from and to alternative character set for practically any language outside US English.

Huh? I'm using UTF-8 exclusively for string data since around 20 years in C and C++ and never had to deal with language specifics (also true for non-European languages, we need to deal with various East Asian languages, and Arabic for instance). You need to convert from and to operating system specific encodings when talking to OS APIs (like UTF-16 on Windows), but that's it (and this is not language specific, code pages are an "8-bit pseudo-ASCII" thing that's irrelevant when working with UTF encodings.

When dealing with "vintage" text files with older language-specific encodings, you need to know the encoding/codepage used in those files anyway, and do the conversion from and to UTF-8 while writing or reading such files. Those conversions shouldn't be hardwired into the "string class".

bogeholm · on Jan 24, 2021

> UTF-8 as a fixed encoding only works if you manage to write mappers from and to alternative character set for practically any language outside US English.

From a European perspective, this sounds very unlikely. Sure, you may have to deal with deprecated _encodings_, but I’d like to hear about mainstream languages with writing derived from the Latin alphabet, that aren’t supported by UTF-8

suzuki · on Jan 25, 2021

From a Japanese perspective, it also sounds very unlikely. Go with UTF-8 strings is successful nowadays in Japan.

raverbashing · on Jan 24, 2021

They could have made the Windows version UTF-16 by default for example.

Or they could have fixed setdefaultencoding or give us a way to set the default encoding https://stackoverflow.com/questions/3828723/why-should-we-no...

I don't buy the "discouragement" part there, if anything they could have made it mandatory or at least it set it to UTF-8

> For example, there's still a lot of Japanese files out there in SJIS that need to be decoded accordingly.

Yes but you would have to work on those cases anyway and ASCII would have made it blow anyway. But convert it to UTF-8/16 and it works.

EDIT: the reason is apparently that "(setdefaultencoding) will allow these to work for me, but won't necessarily work for people who don't use UTF-8. The default of ASCII ensures that assumptions of encoding are not baked into code"

Really. I can't explain my anger at how this is such an idiotic excuse. Yes, your program will fail if you use Latin-1 encoding, duh. Configure your environment correctly and it will work. Sounds like the kind of pedantry that made Guido quit over the walrus operator

xorcist · on Jan 24, 2021

So keep the conversion functions around then. Or just do what Go did.

Either way, it's no big deal. There are no excuses for Python 3. Some people are just stubborn.

LockAndLol · on Jan 24, 2021

Another typically ungrateful and entitled comment about opensource on HN. Color me surprised.

> Do these men think that time is free?

Do you think they have endless time to plan a migration with minute detail for every possible usecase?

Users have about a decade to migrate their codebases and stop writing new projects in Python 2. Do you need another decade? Or are you personally going to take over the maintenance of the python2 runtime?

Does anybody actually pay the core dev team for support? Do you? Does your company? Have they been coordinating all these years with the core devs and are unhappy with the result they paid for? I kinda doubt it.

It would be really nice if people were just thankful for all the free stuff they got and built their enterprises on.

Blikkentrekker · on Jan 24, 2021

> Do you think they have endless time to plan a migration with minute detail for every possible usecase?

My point is that for these small changes from 2 to 3, there should have never been a migration to begin with.

It's not an accusation of lack of effort; it's an accusation of ignorance on their part.

The migration has not only cost everyone else time and money; it has cost them time and money that was better spent elsewhere.

It has been a net detriment to all parties, including them, because they severely underestimated the cost of rewriting software and dealing with the regressions it might lead to.

I will damned well call a man foolish, for pointing a gun at his foot and getting shot in it, because he underestimated how easily the trigger would go off by accident, instead of being thankfully that he was willing to put in the effort to aim it at his foot.

paulryanrogers · on Jan 24, 2021

You may be right or blinded by hindsight. Personally I'd rather see more breaking changes in languages like PHP that have a lot of baggage holding them back.

sgt · on Jan 24, 2021

Agreed, although I see our opinions are being heavily downvoted.

hluska · on Jan 24, 2021

It’s the 21st century. Questions like “do these men think that time is free?” have no place in this century. Why not be accurate and say ‘people’? You chose to use a number of big words when simple ones would suffice. Why not take the same care in addressing people fairly?

tsimionescu · on Jan 24, 2021

For many non-native speakers, 'man' tends to be used as 'person' instead of 'male', probably because the translation of many common idioms involving 'man' uses a neutral word in their language.

For example, when translating between English and Romanian, 'man' often gets translated to 'om', which doesn't imply a gender in modern Romanian.

Even in English, 'man' is sometimes used without a gendered connotation. For example, if I say 'man is evil', I am unlikely to be referring to males, but rather people. Similarly, 'hey, man!' is not reserved for males.

hluska · on Jan 24, 2021

I know that but take a moment to read OP’s finer use of the language and use of more precise language. Every single word is perfectly precise, right down to the tone.

That was a pointlessly gendered comment and has no place in our industry. You can keep defending it but I’m not going to stop calling it out.

Have a nice Sunday!

Blikkentrekker · on Jan 24, 2021

Is that so? I find I used various open phrases such as “Python developers” without going into the exact semantics of which ones, as I'm sure many objected to it, or “serious enterprises” without naming them, an incomplete list of programming languages, and so forth.

It was certainly an informal statement I made, not a formal specification.

There was nothing gendered about that statement, and most do not seem to have interpreted it as such, nor was it so intended by me.

hluska · on Jan 24, 2021

You don’t have to get defensive just be aware. The times have changed, the world is different and our default language absolutely has to adapt.

Language adapts or does. That’s how English got here. You can adapt too - it’s not as hard as you’re making it out to be. Heck, if you spent an eighth of the time thinking about inclusivity as you do about individual words, I wouldn’t have had to say this.

However, again, I’m glad I said something and regardless of what you claim, I’m not going to stop calling these kinds of grammatical monstrosities out. Language is important. Full stop.

tsimionescu · on Jan 24, 2021

My point is that, on an international site where English is only used as a means of communication, you should generally be more sensitive to cultural differences in the use of a language such as English. It is often used as a common common language between people who don't speak English natively, and so idioms and nuances from their own languages seep into this common English.

The finer points about the semantics of a word such as 'man'/'men' and when it can be taken to refer to people unambiguously vs when it may accidentally imply you are talking about adult males are likely to be lost on a non-native speaker, especially if they come from a culture/language where this distinction and its implications are not subjects of general interest. Even if they are well-versed in the use of English in general.

So it's better to follow HN guidelines and assume the best intentions where meaning is unclear, instead of calling people out on their use of English.

Now, if you know for a fact that the GP is a native English speaker, and especially if you know that they are American, then what I'm saying is not very relevant.

hluska · on Jan 25, 2021

I get your point completely and I’m glad you shared it. When I was in University, I worked with ESL (English as a second language) students and they were always really happy to hear about idiomatic quirks like that. I should rethink how I approach this online. I don’t want to be patronizing because lots of non-native speakers have better written English than I do, but I’ll think about it and find a new line.

yrimaxi · on Jan 24, 2021

Singular man is more likely to be gender-neutral.

Good example: “man is evil” clearly means people, since one would say “men are evil” if referring to males.

Blikkentrekker · on Jan 24, 2021

“Men are evil.” can also refer to humans in general.

I just searched for the phrase and it's about half split between either meaning from context inference. Yet, the meaning pertaining to the species is mostly from discussions by educated philosophers, and the other one are annoying identity politics arguments about why one's North American dating life is disappointing, — not exactly the audience I am ever interested in reaching, frankness be.

hluska · on Jan 24, 2021

So basically, you won’t be kind because you can’t find suitable sources?? Okay then.

Blikkentrekker · on Jan 24, 2021

I'm simply disputing the claim that “Men are evil.” would be construed by English speakers to automatically refer to males.

The reason I'm not what you call “kind” is simply because this is how English works, and how it has always worked and how English speakers would interpret and parse that word.

I see no reason to avoid using a word in a perfectly acceptable, current, and historic use simply because you find that it has a different, secondary use. You call that “not being kind”. I call it “You don't own the English language any more than I do.”

You may speak as you will, I do not deny that the current usage of the word “man” has acquired a secondary meaning of “adult male human” opposed to it's historical meaning of “human" and if you wish to use it as such, then I'm confident I can usually discriminate by context. I merely ask that I be allowed the same and speak as I will and use the word in it's original meaning, that obviously still sees current use.

tsimionescu · on Jan 24, 2021

> You may speak as you will, I do not deny that the current usage of the word “man” has acquired a secondary meaning of “adult male human” opposed to it's historical meaning of “human" and if you wish to use it as such, then I'm confident I can usually discriminate by context.

To be fair, while I consider your original wording to be pretty clear, this is wrong. According to Wikipedia, the word 'man' has adopted the meaning of 'adult male human' as its primary meaning starting with Middle English, when it displaced Old English 'wer'. There are still uses where it retains the much older meaning, but its primary meaning today is 'adult male human', and has been for a good few hundred years.

Blikkentrekker · on Jan 24, 2021

I do not find that to be the case investigating the Global Web-Based Corpus, which contains modern, global internet-published English:

https://i.imgur.com/EG4zaoU.png [sadly the corpus cannot be easily linked, but one may search in it here: https://www.english-corpora.org/glowbe/]

The way I look at it, the usage therein of the word “man” to specifically discriminate sex is very rare but definitely occurs. What does occur is the use of the word “man” to refer to a specific individual, which would typically be male, but in most cases where the word “man” is used indeterminately to refer to a class, it seems to be used without regard to sex.

Apart from that the most common usage seems to simply be vocatively as address, which is also gender neutral.

I would agree that it is rare, outside of compounds, to use the word “man” in a determinate sense for a female man, such as “that man over there” which would mostly be used in a military context, but in an indeterminate context to speak of “a man in general” or “men in general”, the most common usage from context seems to be sexless to this day.

tsimionescu · on Jan 24, 2021

Reading through the first 100 results, I see it mostly used to refer specifically to adult male individuals, or to "a man" meaning specifically an adult male ("would've flipped out if a weird man said some creepy remarks"). There are some uses where it may or may not be gender neutral ("you are a Spammier man than I" - may refer to a man or a woman, but it is probably used because the author is male; a woman might have written "a Spammier woman than I" instead, while also addressing both men and women).

There are also clear cases where "a man" is used to refer to "a human", such as "wheat growing taller than a man".

Rather more interestingly, if you instead search for "men", you'll see that is used essentially exclusively to mean "adult males". The only exceptions I found was "and because the greed of a few men is such that they think it is necessary that they own everything" and even there I'm not sure.

Blikkentrekker · on Jan 24, 2021

> Reading through the first 100 results, I see it mostly used to refer specifically to adult male individuals, or to "a man" meaning specifically an adult male ("would've flipped out if a weird man said some creepy remarks"). There are some uses where it may or may not be gender neutral ("you are a Spammier man than I" - may refer to a man or a woman, but it is probably used because the author is male; a woman might have written "a Spammier woman than I" instead, while also addressing both men and women).

I disagree; the first uses of “man” in an indeterminate sense are these:

> down the economy, Here is the truth the republicans feel uncomfortable with a black man in the with house and a lot of voters are riding the republicans coat tail

> someday you might ask me to help you move. Or, to kill a man. # Leonard: I'll doubt he'll ask you to kill a man

> say, in 35 years of working I have almost always had at least one man who I felt " wrong " about. (the exception? Disney Studios!

> boyfriend, well husband, but either way would've flipped out if a weird man said some creepy remarks regarding me at a christmas party. To me this says

I have specifically included up till your reference, which was the first of an indeterminate usage of the word “man” that by implication is most likely gendered, whereas all the others are most likely not.

So there are three sexless ones before the first gendered one.

tsimionescu · on Jan 25, 2021

I would argue that the one about 'a black man in the white house' was in fact gendered, though it is somewhat debatable. It was referencing Barack Obama specifically. If there had been a black woman president, the phrase would have definitely been written to specifically say 'a black woman in the whitehouse'. On the other hand, if it had been written before either a black man or a black woman had (tried to) become president, it may have still used 'man' in a gender less way.

yrimaxi · on Jan 24, 2021

> You may speak as you will, I do not deny that the current usage of the word “man” has acquired a secondary meaning of “adult male human” opposed to it's historical meaning of “human" and if you wish to use it as such, then I'm confident I can usually discriminate by context.

You mean primary.

hluska · on Jan 24, 2021

You wrote:

“Do these men think that time is free?“

That’s not even the same structure as ‘all men are evil.’ Instead what you wrote is gendered and thus completely inaccurate.

So again, you could have used ‘people’ to be respectful and inclusive but you’re choosing to stick with ‘man’ because that’s what you know.

That’s unkind. You know that this is an issue within our community but you are fully choosing to go against the norms because of ‘your language’?

I’m sorry but I thought we could have a conversation. This many replies in and I realize that you don’t actually have much sympathy, understanding or even basic caring.

Be better. It’s easy.

Blikkentrekker · on Jan 24, 2021

> That’s not even the same structure as ‘all men are evil.’

Indeed it is not. I merely separately disagreed with that the statement “All men are evil.” would also by necessity be interpreted as such. Either can be, depending on context, but this is not such a context.

> Instead what you wrote is gendered and thus completely inaccurate.*

You seem to be of the minority that has interpreted it as such. I would not quickly use votes for an argument except when they pertain to popular opinion, and this is a matter of which interpretation is more common.

I certainly didn't mean any gendered statement, and I also believe that most readers did not read any gender into it.

> So again, you could have used ‘people’ to be respectful and inclusive but you’re choosing to stick with ‘man’ because that’s what you know.

I could, and you could also change your language to avoid any and all possible ambiguities that would not be a problem in practice due to the power of contextual inference.

You seem to ask that this specific word be given special treatment above all others.

> That’s unkind. You know that this is an issue within our community but you are fully choosing to go against the norms because of ‘your language’?

Such as here, the word “our community” is quite vague. You used the word “our” which is ambiguous in English as it's unclear whether it includes the listener or not, and on top of that also what it includes.

I can however perfectly well infer from context that this is an “our” that includes the listener, and can make a reasonable guess to the extent of the “community” you refer to.

Finally, do not know that it is “an issue” and I certainly do not know that there are “norms” about this. It very much seems that the majority sides with me on this issue given the votes, at least here. I do not believe I am going against any norms, not that I would consider an argumentum ad populum a strong one, but you were the one that raised it here.

> I’m sorry but I thought we could have a conversation. This many replies in and I realize that you don’t actually have much sympathy, understanding or even basic caring.

Well, frankness be, it seems from your language as though your default expectation is that your arbitrary wims, at least on this particular issue should be accommodated, and that everyone who disagrees with you is unkind or lacks sympathy.

You call it a conversation, but it seems as though you started it from the assumption that you are right, and everyone who disagrees is wrong.

> Be better. It’s easy.

It is your opinion that this is better that this is better indeed. Not everyone has to agree with you on that matter, and not everyone does.

hluska · on Jan 25, 2021

Nobody ever has to agree with me and I’m proud to be a minority of one.

However, you’re a beautiful writer and beautiful writers can cause immeasurable pain. I’ll always speak out in case another minority of one feels pain but is too ??? to speak out.

Seriously, take good care. This has been a wonderful thread and again, you’re a really beautiful writer. :)

yrimaxi · on Jan 24, 2021

Not terribly pertinent, then. One is more likely to fall into conversations about mundane topics with uneducated people than to stumble upon existential conversations with educated philosophers, even though the latter might produce a large corpus.

One would also think that “man is evil” would be preferred by the erudite philosopher to the more ambigious “men are evil”, although one can never overestimate the fondness that an educated person might have towards pedantry, frankly.

Blikkentrekker · on Jan 24, 2021

> Not terribly pertinent, then. One is more likely to fall into conversations about mundane topics with uneducated people than to stumble upon existential conversations with educated philosophers, even though the latter might produce a large corpus.

“Mundane people” is an entirely different segment than “raging identity politics aficionados complaining about their romantic life”.

The common man on the street will think nothing ill of the word being used as such, even when he be a blue collar construction worker, and will normally interpret it as intended.

I have never met such a raging identity politics aficionado in real life. I would assume not living in the U.S.A., where most of them seem to be centred, reduces my chances. But even there, it seems to be a rather small segment that is isolated to weblogs, as even newspaper columns do not seem to find it mainstream enough to dedicate segments to it.

I'd gander that if I were to find myself in New York and strike a conversation with a blue collar local and say something such as “A beautiful city isn't it? all these millions of men, working as an organized beehive.”, that he'll not interpret me wrongly or even think much of it.

yrimaxi · on Jan 24, 2021

I said mundane topics... I won’t bother with the rest.

dTal · on Jan 24, 2021

>I'd gander that if I were to find myself in New York and strike a conversation with a blue collar local and say something such as “A beautiful city isn't it? all these millions of men, working as an organized beehive.”, that he'll not interpret me wrongly or even think much of it.

Actually I think there's a very good chance she'll object.

The problem is that in your mind, males are the "default" human, and using sexist language reinforces this. This is not a recent opinion confined to "raging identity politics aficionados" or "weblogs" - at this point it's the wrong side of history for the better part of half a century. Consider this piece of satire by Douglas Hofstadter, written in 1985, which substitutes racist language for sexist language in a precisely analogous way:

https://www.cs.virginia.edu/~evans/cs655/readings/purity.htm...

Blikkentrekker · on Jan 24, 2021

> Actually I think there's a very good chance she'll object.

If you mean to suggest that this position runs across gender lines, then I very much object and find that a naive, but common, assumption.

It reminds me of a Canadian act that sought to introduce the word “fisherwoman” as a sign of good faith to the female fishermen, but it revealed that, overwhelmingly, the fishermen, male or female, did not like this change and found the word to sound silly.

I have noticed no correlation with the gender as to what position one takes on this, as many females as males seem to either favor, or object to, innovations such as “chairwoman” or “councilwoman”.

> The problem is that in your mind, males are the "default" human

No, that would be in the mind of those that read the word “man” and must compulsively attach a gender to a statement containing it.

I've certainly noticed that those so interested in gender language police invariably seem incapable of abstractly thinking of a person without attaching a gender thereto.

> and using sexist language reinforces this

The sexist history is to use the word that has always simply meant “human” and giving it a gendered, ageist meaning. — you reverse the history of the word here.

> at this point it's the wrong side of history for the better part of half a century.

What would you mean with “wrong side of history”? It is undeniable that the meaning of the word “man” to mean “human” is the original meaning of the word and that the secondary usage to mean “adult male human” is a later innovation.

dTal · on Jan 24, 2021

No, you missed the point entirely. The point is that you pictured this "blue collar local" as a man, as evidenced by your use of the pronoun "he". Don't tell me that it's about the word "man" and its historical role to mean "human".

>I've certainly noticed that those so interested in gender language police invariably seem incapable of abstractly thinking of a person without attaching a gender thereto.

The irony. Next time say "they" instead of "he".

Blikkentrekker · on Jan 24, 2021

> No, you missed the point entirely. The point is that you pictured this "blue collar local" as a man, as evidenced by your use of the pronoun "he".

No I didn't. The pronoun “he” in English is also very often used to refer to an indeterminate, hypothetical person of irrelevant and unspecified sex.

I didn't picture him as anything in particular, given that I am partially aphantastic and never draw mental pictures about such scenarios.

> The irony. Next time say "they" instead of "he".

There is no irony here; you infer that he is male because of the pronoun and I find such usage to not be universal at all.

The pronoun “he” has a very long history in English for use with a hypothetical person, from which the listener is not meant to infer any particular gender. It is also true that some use the pronoun “they” in that case, but that is not a universal behavior and either may be encountered.

Use of “she” for such hypothetical persons has also seen recent use, and was probably innovated deliberately; some auctors deliberately alternate both in even distribution.

All of this is how the English language is used by different speakers. I am not telling you which is better and how you should use it; I am telling you that if you are denying that all have currency, you are but certainly being willfully ignorant because you do not like the descriptive truth about how English is used by it's speakers.

hluska · on Jan 25, 2021

You’re an excellent writer!! Again, while sad, this has been a wonderful thread, full of amazing writing.

(I’m on your side but that’s an aside. You’re a really beautiful writer.)

Blikkentrekker · on Jan 24, 2021

Well, the other perspective would be that you should stop using the word that refers to human beings and has always done so only for the adult male members thereof.

My perspective is that context is usually sufficient and that this is not the only word in English that is used as such. I never find such passionate debates about the word “chess” for instance which can be used for every game that descended from the Indian game, the European variant specifically, or simply any exercise of great tactical planning.

Such interesting objectivity men are awarded when politics not be in play.

nickserv · on Jan 24, 2021

I agree with you and the sentiment but your tone really discredits the argument. Instead of putting others down it's better to assume good faith, and educate in an elevating way. This way feminism gets a good reputation.

cknader · on Jan 24, 2021

Ironically it is accurate. Python2 was written entirely by men and so is virtually all of Python3.

Women have better things to do with their time, like studying law or medicine.

simias · on Jan 24, 2021

>Rust editions, which as far as I can tell have been a complete success.

Rust's commitment to backward compatibility is certainly extremely commendable, but I don't think the language went through anything resembling the switch from Python 2 to Python 3 in terms of breakage.

Some of the changes in Python 3 are very fundamental. Imagine if Rust had shipped without String/str and they were added after the fact, would Rust manage to avoid splitting the ecosystem? That's an open question as far as I'm concerned.

And I also hope that we never find out. Rust's fundamentals have proven to be very solid so far. Having things like OsStrings (something missing from most programming languages, including Python 3 AFAIK) shows a great amount of foresight and understanding of the problem space. Contrast that with Go which seems very intent on completely ignoring 30 years of programming language evolution.

stjohnswarts · on Jan 24, 2021

Well they had a lot of languages to look at and say "ummm that didn't work out for them so well, let's not do that" such as the Python fiasco.

xmprt · on Jan 24, 2021

Single codebase compatibility meaning that you can have python2 and python3 code in the same application? Isn't that significantly harder with an interpreted language or am I missing something?

slaymaker1907 · on Jan 24, 2021

It's a mostly solved problem with Racket. What they probably should have done was have python 2 code somehow declare itself as python 2 (Racket does this with #lang at the top of files). Then, just have a python 2 compatibility layer that works in two steps. First step is to compile/parse it into a similar form as python 3. Additionally, provide a small python 2 runtime which provides different versions of the functions which changed from 2 to 3. I think the two steps are important because some stuff is easier to solve via compilation like "print" while other stuff may be only possible at runtime like strings being unicode.

You would still have some differences which can't be papered over, but it would have made writing code that works in both python 2 and 3 much easier.

gdwatson · on Jan 24, 2021

Single-codebase compatibility means that the same code can run under both Python 2 and Python 3.

The initial expectation was that translation tools would solve this problem, but it didn’t really work out that way. Adding language features and library shims to make it possible to write pidgin Python that would run under either version meant that you could migrate libraries and parts of large codebases one at a time until the whole thing ran under Python 3.

dwheeler · on Jan 24, 2021

That's the main working solution I found: code works in both versions.

The problem is that it's way trickier than it should be. Has they made that relatively easy, the Python 2->3 transition would have had a much smoother "normal" upgrade process.

uranusjr · on Jan 24, 2021

“Pidgin Python” is such an apt term, why am I only seeing it when it is no longer relevant!

Donckele · on Jan 24, 2021

“Rust editions, which as far as I can tell have been a complete success.”

Can anyone tell me why statements such as these get the emotions going?

richardfey · on Jan 24, 2021

Hello AI!

sgt · on Jan 24, 2021

Hello AI!

linspace · on Jan 24, 2021

I hear very often criticism of Python's transition but considering how insanely popular the language has grown I don't understand why.

EamonnMR · on Jan 24, 2021

Python2 was poised to take over the world and be the next Java. 3 is losing ground to Node, Go, Rust, even Lua. 3 is a really fun and productive language to work in as long as you don't need to think about bytes.

paulryanrogers · on Jan 24, 2021

My guess is Python wouldn't have done much better anyway. The GIL and lack of client side support kept it out of a lot of use cases.

Demiurge · on Jan 24, 2021

Really? I’ve been using Python for 15 years and never heard anyone say “GIL”. How and what is GIL preventing?

stjohnswarts · on Jan 24, 2021

It causes issues with running low overhead multithreaded code. I get by with the multiprocessing library, but then again I don't have a lot of threads (10 is the most I've ever needed), some people want to run hundreds of "light threads" depending on the type of programming that you are looking to do.

paulryanrogers · on Jan 24, 2021

https://wiki.python.org/moin/GlobalInterpreterLock

stjohnswarts · on Jan 24, 2021

Statistics and probability.

pjmlp · on Jan 24, 2021

I am still not sold to Rust editions and think in the long run they won't be much different from solutions in other platforms.

There will come the day when something will change semantics, or require different kinds of runtime support across editions, and then the headaches how to link binary crates from different editions will start.

Editions to me appear only to work, provided everything is compiled from source code with the same compiler, aware of all editions that came into use.