This is awesome in terms of avoiding all of the weird things when a person typed pip rather than pip3 and module didn't seem to get installed anywhere. That said, watching perl trying to kill perl5 with perl6 (unsuccessful) and python trying to kill python 2 with python 3 (more successful) it struck me how ridiculous it is that open source languages have to put up with this. Clearly "major" numbers are insufficient, the only real answer is to rename the entire freaking language when you make incompatible changes to it.
I think a lot of important lessons got learned in both cases. Clearly perl6 should have had a different name. But I think python2->python3 could've been much less painful if they'd known to prioritize single-codebase compatibility from the very beginning. I think you can see that lesson applied with e.g. Rust editions, which as far as I can tell have been a complete success.
Hindsight might very well be easy in this case, but I cannot think otherwise than finding the Python developers to have been ridiculously naïve in how they handled this, and foolish to even begin it from the start.
The minor improvements of Python 3 did not warrant breaking backwards compatibility and most could have been handled in a way that would not break it opt-in directives.
The very swarms of users that chanted “just upgrade” as if that not incur a significant cost also seemed ridiculously naïve to me, not understanding the real cost that large projects have by having to rewrite very extensive codebases and dealing the potential regressions that that might involve.
Everything about the switch, from it's very conception, to it's execution, was handled in a veritably disastrous way by the team, that really did not seem to appreciate even a fraction of what obviously is involved with projects that have millions of lines of code and of course would rather not have to rewrite that all.
This is why many projects such as Linux, Windows, Rust, Cobol, Fortran, C, and C++ take backwards compatibility quite serious. Serious enterprises do not like to invest in something if it mean that 10 years later they would have to rewrite their entire codebase again.
Even on my home computer, I simply do not have the time to rewrite the many Python 2 scripts that have written over the years that run my computer. — it is cumbersome enough that once in a while part of my desktop stops functioning because my distribution removed a Python 2 library which I had relied upon as a system library that I now have to install as a user library but that was hitherto quite easily fixed.
I have migrated tons of Python codebases from 2 to 3, i guess starting with the release of Python3.4 which was when Python3 reached a kind of production readiness (and also had gained enough trust and IIRC it had also reestablished compatibility in some parts).
I think the incompatibilities between Python 2 and Python 3 fell into two categories
1. trivial and totally avoidable API changes by the Python developers ( like `iteritems()` being renamed to `items()` and the Python2-`items` being removed from the language). The bet on the Python-dev side was that `2to3` would take care of that and in this they totally underestimated that libraries couldn't and wouldn't just make a python3 migration in lockstep with the primary python release.
2. Change to unicode-strings by default with a clear distinction between unicode-strings and byte-buffers for all data encoded in any other fashion.
Most people on Python3 nowadays won't actually know how beneficial Change no. 2 was overall for the health of the Python ecosystem and the stability of their code bases. But it also was the tricky part of the migration for some code bases that would do a lot of string / file content plumbing like mercurial as a prominent example).
Change no. 1. was a PITA and a lot could have been avoided, but it wasn't a huge problem. The huge problem for the ecosystem was the unicode change, but I don't think anyone questions its usefulness (except for Armin Ronacher maybe, who is the most prominent voice with a dislike for that).
Well, breaking change, but harmless in most instances, the other way around would be more harmful. Also that was I think a „from future import“ option so you could enable it while on python 2.7 on a per file basis
> The minor improvements of Python 3 did not warrant breaking backwards compatibility and most could have been handled in a way that would not break it opt-in directives.
There where large changes to fundamental parts of the type system, as well as core types. Pretending this isn’t the case belays ignorance or at the very least cherry picking.
How would you have handled the string/bytes split in a way that’s backwards compatible? Or the removal of old-style classes?
Let's not pretend that py3's string changes weren't fundamentally wrong and didn't create years of issues trying to decode things that could properly be arbitrary byte sacks as utf-8.
So my answer is that it was a deeply misconceived change that shouldn't have been made at all, let alone been taken as the cornerstone of a "necessary" break in backward compatibility.
The string changes where both necessary and correct. There is a difference between bytes and strings and treating them as the same led to so many issues. Thank god I’ve not seen a UnicodeDecodeError in decades.
You're not making an argument about backward compatibility here, you're making a strong claim that representing text as a sequence of Unicode code points is fundamentally wrong. I have never heard anyone make this point before, and I am inclined to disagree, but I'm curious what your reasoning is for it.
Indeed, representing text as a sequence of Unicode code points is fundamentally wrong.
There are no operations on sequences of Unicode code points that are more correct than an analogous operation on bytes.
(Everyone's favourite example, length, actually becomes less correct—a byte array's length at least corresponds to the amount of space one might have to allocate for it in a particular encoding. A length in codepoints is absolutely meaningless both technically and linguistically. And this is, for what little it's worth, close to the only operation you can do on a string without imposing additional restrictions about its context.)
Uppercasing/lowercasing cannot be done on Unicode code points, because that fails to handle things like fi -> FI where the uppercased version does not consist of the same number of Unicode code points. Slicing and splitting cannot be done on Unicode code points because it may separate a character from a subsequent combining character. "startswith" cannot be done on Unicode code points because some distinct code points need to be treated as equivalent. These are pretty much the same problems you also have when you perform those same operations on bytes. You might encounter those problems in fewer cases when you perform operations on code points rather than on bytes, but you won't have solved the problems entirely.
Worse, you'll have pushed the problematic cases out of the realm of obviously wrong and not sensible to do, into subtly wrong and will break down the line in ways that will be hard to recognize and debug.
None of those operations are correct on Unicode codepoints. Your statement is only just barely tenable if you only care about well-edited and normalized formal prose in common Western languages.
> There are no operations on sequences of Unicode code points that are more correct than an analogous operation on bytes.
Wow. I wonder how you arrived at this point. You can't, for example, truncate a UTF-8 byte array without the risk of producing a broken string. But this is only the start. Here are two strings, six letters each, one in NFC, the other in NFD, and their byte-length in UTF-8:
"Åström" is 8 bytes in UTF-8
"Åström" is 10 bytes in UTF-8
If your software tells the user that one is eight and the other is 10 letters long, it is not "less correct". It is incorrect. Further, if searching for "Åström" won't find "Åström", your software is less useful than it could be if it knew Unicode. (And it's sad how often software gets this wrong.)
> If your software tells the user that one is eight and the other is 10 letters long, it is not "less correct". It is incorrect.
In fact, if the software tells you that either of the strings is either 8 or 10 letters wrong, then either way the software is incorrect - those are both obviously 6 letter strings.
Now, does UTF8 help you discover they are 6 letter strings better than other representations? There are certainly text-oriented libraries that can do that, but not those that simply count the UTF8 code points - they must have an understanding of all of Unicode. Even worse, the question "how many letters does this string have" is not generally meaningful - there are plenty of perfectly valid unicode strings for which this question doesn't have a meaningful answer.
However, the question "how many unicode code points does this string have" is almost never of interest. You either care about some notion of unique glyphs, or you care about byte lengths.
> then either way the software is incorrect - those are both obviously 6 letter strings.
What I wanted to get at is that in Unicode, I have a chance to count letters to some useful degree. Why should I consider starting at byte-arrays?
> there are plenty of perfectly valid Unicode strings for which this question doesn't have a meaningful answer.
I don't get it. Why does the existence of degenerate cases invalidate the usefulness of a Unicode lib? If I want to know how many letters are in a string, I can probably get a useful answer from a Unicode lib. Not for all edge-cases, but I can decide on the trade-offs. If I have a byte-array, I start at a lower level.
> What I wanted to get at is that in Unicode, I have a chance to count letters to some useful degree.
You do not. You merely happen to get the right answer by coincidence in some cases, same as bytes-that-probably-are(n't)-ASCII. To throw your own words back at you:
"Åström" is 6 code points in Unicode
"Åström" is 8 code points in Unicode
If your software tells the user that one is six and the other is 8 letters long, it is not "less correct". It is incorrect. Further, if searching for "Åström" won't find "Åström", your software is less useful than it could be if it knew text. (And it's sad how often software gets this wrong.)
You can't truncate a sequence of Unicode codepoints without the risk of producing a broken string, either. What do you get if you truncate "Åström" after the first "o"? What do you get if you truncate 🇨🇦 after the first codepoint?
Normalization is not a real solution unless you restrict yourself to working with well-edited formal prose in common Western languages.
Sorry, we're mixing two layers. Of course, if I truncate a string, it may lose its meaning. And having accents fall off is problematic. But it's not the same as truncating a byte-array, because then an invalid sequence of bytes may result.
Stop treating these cases as equivalent. They're not.
They are equivalent. The only reason you find it problematic that a sequence of bytes is "invalid" (read: can't be decoded in your preferred encoding) is because you've manufactured the problem.
In the end, the only layer at which it really matters whether your byte sequence can be decoded is the font renderer, and just being valid utf-8 isn't good enough for it either.
> In the end, the only layer at which it really matters whether your byte sequence can be decoded is the font renderer
Ok that explains how we ended up here. I'm considering some other common uses! A search-index for example greatly profits from being able to normalize representations and split words.
Here's the thing: I don't want to work in UTF8. I want to work in Unicode. Big difference. Because tracking the encoding of my strings would increase complexity. So at the earliest convenience, I validate my assumptions about encoding and let a lower layer handle it from then on.
I understand you're arguing about some sort of equivalency between byte-arrays and Unicode strings. Sure there are half-baked ways to do word-splitting on a byte-array. But why do you consider that a viable option? Under what circumstances would you do that?
How would this look if strings were byte-arrays? How would `normalize()`, `lower()`, and `split()` know what encoding to use?
The way I see it: If the encoding is implicit, you have global state. If it's explicit, you have to pass the encoding. Both is extra state to worry about. When the passed value is a Unicode string, this question doesn't come up.
It looks pretty much the some, except that you assume the input is already in your library's canonical encoding (probably utf-8 nowadays).
I realize this sounds like a total cop-out, but when the use-case is destructively best-effort tokenizing an input string using library functions, it doesn't really matter whether your internal encoding is utf-32 or utf-8. I mean, under the hood, normalize still has to map arbitrary-length sequences to arbitrary-length sequences even when working with utf-32 (see: unicodedata.normalize("NFKC", "a\u0301 ffi") == "\xe1 ffi").
So on the happy path, you don't see much of a difference.
The main observable difference is that if you take input without decoding it explicitly, then the always-decode approach has already crashed long before reaching this function, while the assume-the-encoding approach probably spouts gibberish at this point. And sure, there are plenty of plausible scenarios where you'd rather get the crash than subtly broken behaviour. But ... I don't see this reasonably being one of them, considering that you're apparently okay with discarding all \W+.
I agree with you. I wish Python 3 had strings as byte sequences mainly in UTF-8 as Python 2 had once and Go has now. Then things would be kept simple in Japan.
Python 3 feels cumbersome. To handle a raw input as a string, you must decode it in some encoding first. It is a fragile process. It would be adequate to treat the input bytes transparently and put an optional stage to convert other encodings to UTF-8 if necessary.
So in one case, the text becomes corrupted and unreadable (i.e. loses its meaning), and in the other, it becomes corrupted and unreadable. What's the difference?
Having "accents fall off" has gotten people murdered [0]. Accents aren't things peppered in for effect, they turn letters into different letters, spelling different words. Analogously, imagine that a bunch of software accidentally turned every "d" into a "c" because some committee halfway around the world decided "d" should be composed of the "c" and "|" glyphs. That's the kind of text corruption that regularly happens in other languages when dealing with text at the code point layer.
[0] https://languagelog.ldc.upenn.edu/nll/?p=73 . Note that this is Turkish, which has the "dotted i" problem, meaning that this was more than likely a .toupper() gone wrong rather than a truncation issue.
The difference is that for truncating, I can work within Unicode to deal with the situation. I can accept the possibility of mutilated letters, I can convert to NFC, I can truncate on word-boundaries, I have choice.
If I have an byte-array, I can do none of these things short of implementing a good chunk of Unicode. If I truncate, I risk ending up with an invalid UTF-8 string. End of story.
And what is wrong with an invalid UTF-8 string? Why were you truncating the string in the first place?
Basically, I believe the point here is that a Unicode aware truncation should be done in a Unicode aware truncate method. There is no good reason to parse a string as UTF-8 ahead of time - just keep it as a blob of bytes until you need to do some something "texty" with it. It is the truncate-at-word-boundaries() method that should interpret the bytes as UTF-8 and fail if they are not valid. Why parse it sooner?
> If I have an byte-array, I can do none of these things short of implementing a good chunk of Unicode. If I truncate, I risk ending up with an invalid UTF-8 string.
Yes, and? You can have an invalid sequence of Unicode code points too, such as an unpaired surrogate (something Python's text model actually abuses to store "invalid Unicode" in a special, non-standard way).
If you truncate at the byte level, you are just truncating "between code points"; it's a closer granularity than at the code point layer, so you can also convert to NFC, truncate on word boundaries, etc. You just need to ignore the parts of the UTF-8 string that are invalid; which isn't difficult, because UTF-8 is self-synchronizing.
> All functions that return `bytes` continue to do so unless specifically opted in on a per file basis, then they return `unicode`.
Nothing in py2 returns bytes. They all return strings. That is the issue. What about subclasses or type wrappers? What about functions that return bytes or utf8 strings? How would you handle code that then calls “.startswith()” on a returned string/bytes value?
A language pragma that fundamentally alters a built in type across all the code you have in a program is never going to work and pushes the burden onto library authors to support a large matrix of different behaviours and types.
It would make the already ridiculous py2 str/bytes situation even more ridiculous.
> They would obviously not be removed and still be available but depræcated.
Having two almost separate object models in the same language is rather silly.
> Nothing in py2 returns bytes. They all return strings. That is the issue.
No, that is not an issue, that is semantics.
What one calls it does not change the behavior. And aside that the system could perfectly well be designed that this pragma changes that `str` is synonymous with either `bytes` or `unicode` depending on it's state.
What about subclasses or type wrappers? What about functions that return bytes or utf8 strings? How would you handle code that then calls “.startswith()” on a returned string/bytes value?
You would now which is which by using the pragma or not.
Not using the pragma defaults to the old behavior, as said, one only receives the new, breaking behavior, when one opts in.
Python could even support always opting in by a configuration file option for those that really want it and don't want to add the pragma at the top of every file.
> A language pragma that fundamentally alters a built in type across all the code you have in a program is never going to work and pushes the burden onto library authors to support a large matrix of different behaviours and types.
Opposed to the burden they already had of maintaining a 2 and 3 version?
Any new code can of course always return `unicode` rather than `str` which in this scheme is normally `bytes` but becomes `unicode` with the pragma.
It would make the already ridiculous py2 str/bytes situation even more ridiculous.
> Having two almost separate object models in the same language is rather silly.
Yes, it is, and you will find that most languages are full of such legacy things that no new code uses but are simply for legacy purposes.
“It is silly.” turns out to be a rather small price to pay to achieve. “We have not broken backwards compatibility.”.
I don’t really have the time of inclination to continue arguing, but I will point out that you say all this as though the approach the team took failed. It worked. The ecosystem is on py3.
You can imagine some world with a crazy context-dependent string/bytes type. Cool. In reality this would have caused endless confusion, especially with beginners and the scientific community, and likely killed the language or at the very least made the language a shadow of what it is now.
They made the right choice given the outcome. Anything else is armchair postulation that was discussed previously and outright rejected for obvious reasons.
Because they're doing everything they can to force py2 to go away. It's not it's dying a natural death out of disuse. Exhibit A is everyone else in this post still wanting to use it.
If you think strings "work" under py3, my guess is you've never had to deal with all the edge cases, especially across all the 3 major desktop platforms. Possibly because your applications are limited in scope. (You're definitely not writing general-purpose libraries that guarantee correctness for a wide variety of usage.) Most things Python treats as Unicode text by default (file contents, file paths, command-line arguments, stdio streams, etc.) are not guaranteed to be contain only Unicode. They can have invalid Unicode mixed into them, either accidentally or intentionally, breaking programs needlessly.
This program is content-agnostic (like `cat`, `printf`, etc.), and hence, with a decent standard library implementation, you would expect it to be able to pass arbitrary data through just fine. But it doesn't, because Python insists on treating arguments as Unicode strings rather than as raw data, and it behaves worse on Python 3 than Python 2. You really have to go out of your way to make it work correctly—and the solution is often pretty much to just ditch strings in many places and deal with bytes as much as possible... i.e., you realize Unicode strings were the wrong data type. But since you're still forced to deal with them in some ways, you get the worst of both worlds and that increases the complexity dramatically and it become increasingly painful to ensure your program still works correctly as it evolves.
I say all these because I've run into these and dealt with them, and it's become clear to me that others who love Unicode strings just haven't gone very far in trying to use them. Often this seems to be because they (a) are writing limited-scope programs rather than libraries, (b) confine themselves to nice, sanitized systems & inputs, and/or (c) take an "out-of-sight -> out-of-mind" attitude towards issues that don't immediately crop up on their systems & inputs.
> You're definitely not writing general-purpose libraries that guarantee correctness for a wide variety of usage.
At the risk of sounding like a dick, I’m a member of the Django technical board and have been involved with its development for quite a while. Is that widely used or general purpose enough?
If you want a string then it needs to be a valid string with a known encoding (not necessarily utf8). If you want to pass through any data regardless of its contents then you use bytes. They are two very different things with very different use cases.
If I read a file as utf8 I want it to error if it contains garbage, non-text contents because the decoding failed. Any other way pushes the error down later into your system to places that assume a string contains a string but it’s actually arbitrary bytes. We did this in py2 and it was a nightmare.
I concede that it’s convenient to ignore the difference in some circumstances, but differentiating between bytes/str has a lot of advantages and makes Python code more resilient and easier to read.
> I’m a member of the Django technical board and have been involved with its development for quite a while. Is that widely used or general purpose enough?
That's not quite what I was saying here. Note I said "wide variety of usage", not "widely used". Django is a web development framework—and its purpose is very clear and specific: to build a web app. Crucially, a web framework knows what its encoding constraints are at its boundaries, and it is supposed to enforce them. For examples, HTTP headers are known to be ASCII, HTML files have <meta ...> tags to declare encodings, etc. So if a user says (say) "what if I want to output non-ASCII in the headers?", your response is supposed to be "we don't let you do that because that's actually wrong". Contrast this with platform I/O where the library is supposed to work transparently without any knowledge of any encoding (or lack thereof) for the data it deals with, because that's a higher-level concern and you don't expect the library to impose artificial constraints of its own.
"If I read a book as Russian, I want it to error if it contains French, non-Russian contents because the decoding failed. Any other way pushes the error down later into your system to readers that assume a Russian passage contains Russian but it's actually arbitrary text. We did this in War and Peace and it was a nightmare."
“If I expect a delivery of war and peace in English, I want it to error if I actually receive a stone tablet containing Neanderthal cave paintings thrown through my window at night”. They are two very different things, even if they both contain some form of information.
You are engaged in some deep magical thinking about what encodings, to believe that knowing the encoding of a so-called string allows you to perform any operations on it more correctly than on sack of bytes. (Fewer, in fact—at least the length of a byte array has any meaning at all.)
It's an easy but very much confused mistake to make if the text you work with is limited to European languages and Chinese.
> You are engaged in some deep magical thinking about what encodings, to believe that knowing the encoding of a so-called string allows you to perform any operations on it more correctly than on sack of bytes.
Not really. How would “.toupper()” work on a raw set of bytes, which would either contain an MP3 file or UTF8 encoded text?
Every single operation on a string-that-might-not-be-a-string-really would have to be fallible, which is a terrible interface to have for the happy path.
How would slicing work? I want the first 4 characters of a given string. That’s completely meaningless without an encoding (not that it means much with it).
How would concatenation work? I’m not saying Python does this, but concatenation two graphemes together doesn’t necessarily create a string with len() == 2.
How would “.startswith()” work with regards to grapheme clusters?
Text is different from bytes. There’s extra meaning and information attached to an arbitrary stream of 1s and 0s that allows you to do things you wouldn’t have been able to do if your base type is “just bytes”.
Sure you could make all of these return garbage if your “string” is actually an mp3 file, aka the JavaScript way, but... why?
> Not really. How would “.toupper()” work on a raw set of bytes, which would either contain an MP3 file or UTF8 encoded text?
It doesn't. It doesn't work with Unicode either. No, not "would need giant tables", literally doesn't work—you need to know whether your text is Turkish.
> How would slicing work? I want the first 4 characters of a given string. That’s completely meaningless without an encoding.
It's meaningless with an encoding: what are the first four characters of "áíúéó"? Do you expect "áí"? What are the first four characters of "ﷺ"? Trick question, that's one unicode codepoint.
At least with bytes you know that your result after slicing four bytes will fit in a 4-byte buffer.
> How would concatenation work? I’m not saying Python does this, but concatenation two graphemes together doesn’t necessarily create a string with len() == 2.
It doesn't work with Unicode either. I'm sure you've enjoyed the results of concatenating a string with an RTL marker with unsuspecting text.
It gets worse if we remember try to ascribe linguistic meaning to the text. What's the result of concatenating "ranch dips" with "hit singles"?
> How would “.startswith()” work with regards to grapheme clusters?
It doesn't. "🇨" is a prefix of "🇨🇦"; "i" is not a prefix of "ij".
> Text is different from bytes. There’s extra meaning and information attached to an arbitrary stream of 1s and 0s that allows you to do things you wouldn’t have been able to before.
None of the distinctions you're trying to make are tenable.
It is not clear to me whether there is a material difference here. Any text string is a sequence of bytes for which some interpretation is intended, and many meaningful operations on those bytes will not be meaningful unless that interpretation is taken into account.
The problem that you have raised here seems to be one of what alphabet or language is being used, but that issue cannot even arise without taking the interpretation into account. If you want alphabet-aware, language-aware, spelling-aware or grammar-aware operators, these will all have to be layered on top of merely byte-aware operations, and this cannot be done without taking into account the intended interpretation of the bytes sequence.
Note that it is not unusual to embed strings of one language within strings written in another. I do not suppose it would be surprising to see some French in a Russian-language War and Peace.
This implies that you should have types for every intended use of a text string. This is, in fact, a sensible approach, reasonably popular in languages with GADTs, even if a bit cumbersome to apply universally.
A type to specify encoding alone? Totally useless. You can just as well implement those operations on top of a byte string assuming the encoding and language &c., as you can implement those operations on top of a Unicode sequence assuming language and culture &c..
To implement any of the above, while studiously avoiding anything making explicit the fact that the interpretation of the bytes as a sequence of glyphs is an intended, necessary and separable step on the way, would be bizzarre and tendentious.
I see you have been editing your post concurrently with my reply:
> You can just as well implement those operations on top of a byte string assuming the encoding and language &c., as you can implement those operations on top of a Unicode sequence assuming language and culture &c..
Of course you can (though maybe not "just as well"), but that does not mean it is the best way to do so, and certainly not that it is "totally useless" to implement the decoding as a separate step. Separation of concerns is a key aspect of software engineering.
> To implement any of the above, while studiously avoiding anything making explicit the fact that the interpretation of the bytes as a sequence of glyphs is an intended, necessary and separable step on the way, would be bizzarre and tendentious.
Codepoints are not glyphs. Nor are any useful operations generally performed on glyphs in the first place. Almost all interpretable operations you might want to do are better conceived of as operating as substrings of arbitrary length, rather than glyphs, and byte substrings do this better than unicode codepoint sequences anyway.
So I contest the position that interpreting bytes as a glyph sequence is a viable step at all.
Fair enough, codepoints, but the issue remains the same: you keep asserting that it is pointless - harmful, actually - to make use of this one particular interpretation from the hierarchy that exists, without offering any valid justification for why this one particular interpretation must be avoided, while both lower-level and higher-level interpretations are useful (necessary, even.)
Going back to the post I originally replied to, how would going down to a bytes view avoid the problems you see?
Let me rephrase. Codepoints are even less useful than abstract glyphs, cf. https://manishearth.github.io/blog/2017/01/14/stop-ascribing... (I don't agree 100% with the write-up, and in particular I would say that working on EGCs is still just punting the problem one more layer without resolving it; see some of my other posts in this thread. But it makes an attempt at clarifying the issue here.)
The choice of the bytes view specifically is just that it's the most popular view from which you can achieve one specific primitive: figuring out how much space a (sub)string occupies in whatever representation you store it in. A byte length achieves this. Of course, a length in bits or in utf-32 code units also achieves this, but I've found it rather uncommon to use utf-32 as a transfer encoding. So we need at least one string type with this property.
Other than this one particular niche, a codepoint view doesn't do much worse at most tasks. But it adds a layer of complexity while also not actually solving any of the problems you'd want it to. In fact, it papers over many of them, making it less obvious that the problems are still there to a team of eurocentric developers ... up until emoji suddenly become popular.
Now, I can understand the appeal of making your immediate problems vanish and leaving it for your successors, but I hope we can agree that it's not in good taste.
While all the facts in this post appear correct, they do not seem to me to amount to an argument either for the proposition that an implementation at the utf-8 level is uniquely harmful, or that a bytes-level approach avoids these problems.
For example, working with the utf-8 view does not somehow foreclose on knowing how much memory a (sub)string occupies, and it certainly does not follow that, because this involves regarding the string as a sequence of bytes, this is the only way to regard it.
For another, let's consider a point from the linked article: "One false assumption that’s often made is that code points are a single column wide. They’re not. They sometimes bunch up to form characters that fit in single “columns”. This is often dependent on the font, and if your application relies on this, you should be querying the font." How does taking a bytes view make this any less of a potential problem?
Is a team of eurocentric developers likely to do any better working with bytes? Their misconceptions would seem to be at a higher level of abstraction than either bytes or utf-8.
You are claiming that taking a utf-8 view is an additional layer of complexity, but how does it simplify things to do all your operations at the byte level? Using utf-8 is more complex than using ascii, but that is beside the point: we have left ascii behind and replaced it with other, more capable abstractions, and it is a universal principle of software engineering that we should make use of abstractions, because they simplify things. It is also quite widely acknowledged that the use of types reduces the scope for error (every high-level language uses them.)
The burden of proof is on showing that the unicode view is, in your words, a more capable abstraction. My thesis is that it is not. This is not because it necessarily does anything worse (though it does). It must simply do something better. If there were actually anything at all it did better—well, I still wouldn't necessarily want it as a default but it would be a defensible abstraction.
The heart of the matter is that a Unicode codepoint sequence view of a string has no real use case.
There is no "universal principle" that we use abstractions always, regardless of whether they fit the problem; that's cargo-culting. An abstraction that does no work is, ceteris paribus, worse than not having it at all.
> The burden of proof is on showing that the unicode view is, in your words, a more capable abstraction. My thesis is that it is not.
The quote, as you presented it, leaves open the question: more capable than what? Well, there's no doubt about it if you go back to my original post: more capable than ascii. Up until now, as far as I can tell, your thesis has not been that unicode is less capable than ascii, but if that's what your argument hangs on, go ahead - make that case.
What your thesis has been, up to this point, is that manipulating text as bytes is better, to the extent that doing it as unicode is harmful.
> It must simply do something better. If there were actually anything at all it did better...
It is amusing that you mentioned the burden of proof earlier, because what you have completely avoided doing so far is justify your position that manipulating bytes is better - for example, you have not answered any of the questions I posed in my previous post.
> The heart of the matter is that a Unicode codepoint sequence view of a string has no real use case.
Here we have another assertion presented without justification.
> There is no "universal principle" that we use abstractions always, regardless of whether they fit the problem...
It is about as close as anthing gets to a universal principle in software engineering, and if you want to disagree on that, go ahead, I'm ready to defend that point of view.
>... that's cargo-culting.
How about presenting an actual argument, instead of this bullshit?
Furthermore, you could take that statement out of my previous post, and it would do nothing to support the thesis you had been pushing up to that point. You seem to be seeking anything in my words that you think you can argue against, without regard to relevance - but in doing so, you might be digging a deeper hole.
> An abstraction that does no work is, ceteris paribus, worse than not having it at all.
Your use of a Latin phrase does not alter the fact that you are still making unsubstantiated claims.
Put it this way: claim a use-case you believe the unicode view does better on than an array of bytes. Since you're making the positive claim, this should be easy.
I guarantee you there will be a quick counterexample to demonstrate that the claimed use-case is incorrect. There always is.
You may review the gish gallop in the other branch of this thread for inspiration.
I agree with this, but I would make one important tweak: make the new behavior opt-out, instead of opt-in, with a configuration file option for switching the default.
You're still breaking code by default this way, but no one would have trouble updating.
My concern is that, if you don't make the preferred behavior clear, a lot of people would simply never adopt it. I don't think that Python's userbase in particular is going to spend time reading documentation on best practices.
I do believe that such a trivial change would indeed be fine. If one can go to the effort of installing the new version, one can add one line in a configuration file to depend upon old behavior.
I think some modular approach could have solved the incompatiblity issue, such as "from future import ...". Shorthands could have been invented to define everything in a single line.
Perl5 has similar flags ("use strict"), and Racket brings it even further to define the whole fucking language of the rest of the file ("#lang racket/gui"). Having the language being choosable by the user is against the "zen of python", I guess. In other words: Such an attemp does not feel "pythonic".
No, it’s the same language but with different semantics around a specific type. That’s not a different language and code can co-exist with a bit of thought.
Every language goes through this at some point in its development: flaws that limit future development have to be fixed. Should every language rename itself and split its community at that point? That seems like an extreme response to a common problem.
That people can make an initial plan that is self-consistent, logical, and foresees and provides for all future use-cases is a basic tenet of waterfall-style development. The history of software engineering does not uphold that principle. Why would it be different for language designers?
Yes, yes it is. And, like "Perl 6" and to a lesser extent "C++", that name is misleading (and therefore bad), because there is already a different language called "Python" (respectively "Perl", "C"), with significant superficial similarities that it could be confused with.
Please note that the misleading part of Perl 6 has been fixed by renaming it to the Raku Programming Language (https://raku.org using the #rakulang tag on social media).
> How would you have handled the string/bytes split in a way that’s backwards compatible?
My understanding is that the corresponding types are available in both 2 and 3, they're just named differently. A different one is "string". So you could have had some kind of mode directive at the top of the file which controlled which version that file was in, and allow files from 2 and 3 to be run together.
Actually think about it. bytes is str in Python 2. There is no bytes type in py2. How would a per-file directive (of all things) help?
What if one function running in “py2 mode” returned a string-that-is-actually-bytes, how would a function in “py3 mode” consume it? What would the type be? If different, how would it be detected or converted? What if it retuned a utf8 string OR bytes? What if that py3 function then passed it to a py2 function - would it become a string again? Would you have two string types - py2string that accepts anything and py3string that only works with utf8? How would this all work with C modules?
> What if one function running in “py2 mode” returned a string-that-is-actually-bytes, how would a function in “py3 mode” consume it? What would the type be?
It would be bytes. Because py2 string === py3 bytes.
> What if that py3 function then passed it to a py2 function - would it become a string again?
Yes
> Would you have two string types - py2string that accepts anything and py3string that only works with utf8?
Yes. You already have those two types in python3. bytes and string. You'd just alias those as string and utf8 or whatever you want to call it in python2.
How would this all work with C modules?
They'd have specify which mode they were working with too.
But all this would require huge rewrites of code and would never be backward compatible. You’re trading “py2 vs py3” with “py2 mode vs py3 mode”.
So you’d have some magic code that switches py2str to bytes. Which means every py3 caller has to cast bytes into a string to do anything useful with it, because returning strings is the most common case. Then that code has to be removed when the code it’s calling is updated to py3 mode. Which is basically the blue/green issue you see with async functions but way, way worse.
Then you’d need to handle subclasses, wrappers of bytes/str, returning collections of strings across py2/py3 boundaries (would these be copies? Different types? How would type(value[0]) work?), ending up with mixed lists/dicts of bytes and strings depending on the function context, etc etc.
It would become an absolute complete clusterfuck of corner cases that would have killed the language outright.
> You’re trading “py2 vs py3” with “py2 mode vs py3 mode”.
Yes, that's the whole point. Because compatible modes allow for a gradual transition. Which in practice allows for a much faster transition, because you don't have to transition everything at once (which puts some people off transitioning entirely - making things infinitely harder for everyone else).
Languages like Rust (editions) and JavaScript (strict mode) have done this successfully and relatively painlessly.
> So you’d have some magic code that switches py2str to bytes. Which means every py3 caller has to cast bytes into a string to do anything useful with it, because returning strings is the most common case. Then that code has to be removed when the code it’s calling is updated to py3 mode. Which is basically the blue/green issue you see with async functions but way, way worse.
Well yes, you'd still have to upgrade your code. That goes with a major version bump. But it would allow you to do it on a library-by-library basis rather than forcing you to wait until every dependency has a v3 version. Have that one dependency that keeping you stuck on v2: no problem, upgrade everything else and wrap that one lib in conversion code.
> Then you’d need to handle subclasses, wrappers of bytes/str, returning collections of strings across py2/py3 boundaries (would these be copies? Different types? How would type(value[0]) work?), ending up with mixed lists/dicts of bytes and strings depending on the function context, etc etc.
I'm not sure I understand the problem here. The types themselves are the same between python 2 and 3 (or could have been). It's just the labels that refer to them that are different. A subclass of string in python 2 code would just be a subclass of bytes in python 3 code.
The problem with this approach is that they wanted to reuse the `str` name, which requires a big "flag day", where it switches meaning and compatibility is effectively impossible across that boundary (without ugly hacks).
What they could have done instead would have been to just rename `str` to `bytes`, but retain a deprecated `str` alias that pointed to `bytes`.
That would keep old scripts running indefinitely, while hopefully spewing enough warnings that any maintained libraries and scripts would make the transition.
Eventually they could remove `str` entirely (though I'd personally be against it), but that would still give an actual transition period where everything would be seamlessly compatible.
Same thing with literals: deprecate bare strings, and transition to having to pick explicitly between `b"foo"` and `u"foo"`. Eventually consider removing bare strings entirely. DO NOT just change the meaning of bare strings while removing the ability to pick the default explicitly (in contrast, 3.0 removed `u"asdf"`, and it was only reintroduced several versions later).
What made me personally lose faith in the Python Core team wasn't that Guido made an old mistake a long time ago. It wasn't that they wanted to fix it. It was the absolutely bone-headed way that they prioritized aesthetics over the migration story.
>Would you have two string types - py2string that accepts anything and py3string that only works with utf8?
Yes. A single naive one for py2 and two separate ones for py3 bytes and unicode. All casting between the two would have to be made explicit.
> How would this all work with C modules?
In non-strict mode, you'd be able to use either py2 strings or py3 bytes with these, and gradually move all modules to strict mode which requires bytes.
And then, gradually after a decade or so attempt to get rid of all py2 types.
> How would you have handled the string/bytes split in a way that’s backwards compatible? Or the removal of old-style classes?
I'm not sure it's the best way to handle it, but I would have been fine with:
from __python2__ import *
for full backward compatibility; or, more explicitly:
from __python2__ import ascii_strings, old_style_classes, print_statement, ...
As the parent poster mentions, several other popular languages and systems (C++, Java, etc.) have done a pretty decent job preserving backward compatibility, for good reason: it saves millions of hours of human effort. It's embarrassing and disappointing that Python simply blew it with the Python 2 to 3 transition.
Maybe we could still evolve pypi to support a compatibility layer to allow easy mixing of python2 and python3 code, but I get the feeling that Python 3 has poisoned the well.
When I was learning Python 6 years ago I was the only one using Python 3 in my group because I use arch linux. It was very basic code and everyone basically solved the same problem. Everyone else's code didn't work on my machine because print is not a statement in Python 3.
That's just plain stupid. Just print a warning and add a python2 flag that hides the warning. Don't release a major version because of something trivial like this.
Python gave everyone 12 years to deal with version 3 being the way forward. There are many fundamental changes.
The fact that people seem to complain exclusively after Python 2' end of life a year ago feels a little telling. Perl's community roffle stomped their previous vision for Perl 6. Python community wasn't vocal about this being a bad change. Rather the opposite, very loud support.
Keep in mind, I dislike Python either way, but I'm not one of the devs that complains about continuing education requirements, or language adding things over each 10 year period. I can work in Python just fine, but that doesn't mean it feels nice & hygienic to use for me personally.
The Python core devs did not have the time or motivation to support the old codepaths in the CPython runtime, and the legacy code was getting in the way of a lot of longtime wants and needs for improving performance, runtime maintainability, language ergonomics, and the standard library. They also specifically increased the major revision number to signal their intent to move on from that legacy.
But you kind of addressed your entire spiel: Hindsight is exceedingly easy. They didn't realize how inadequate their migration tooling was, or how very entrenched Python 2 is in various places. It's hard when you don't know what you don't know and you're highly motivated by hopeful aspirations.
>The Python core devs did not have the time or motivation to support the old codepaths in the CPython runtime, and the legacy code was getting in the way of a lot of longtime wants and needs for improving performance, runtime maintainability, language ergonomics, and the standard library.
They could have fixed most of this legacy code without changing the external user-facing API so much.
It's an open source project. Is there really much of a difference between "I'm not going to work on this system because it's terrible" and "I'm forking this system and I'm not going to support the previous version"?
In both cases you can say "well someone else will just come along and support it", and for py2 they did, for a bit. In fact I believe you can still pay if you happen to want py2 support.
But if you're not paying, you're saying "hey, this thing you work on and provide to me for free - why are you working on it in the way you want rather than the way I want??"
> In both cases you can say "well someone else will just come along and support it", and for py2 they did, for a bit
Was python-2 handed off to new maintainers? News to me.
> why are you working on it in the way you want rather than the way I want
Is "it" python-2 or python-3?
This isn't users demanding py3 devs support py2 - it's users asking that devs who no longer want to support py2 to hand it off to those that do, rather than blocking it.
When I say "the developers of X could have done Y better" I don't mean that they owe it to me in any way to have done so.
I'm just judging their technical decision making. They are perfectly entitled to delete the whole project and start a new one and I have absolutely no right to say they shouldn't.
But I do have a right to critique their decisions from a technical standpoint.
Yes it's interesting if this is cited as one of the motivations.
That's a problem of a language being oriented around a single implementation. Is it even defined by this implementation?
Compare to eg. C or C++.
Diversity, and interoperability is important as it is a significant contributor to longevity.
I do like that you've used the term "API" as I think that sums it up. To think of "Python" not as a language agreed by multiple implementors, the behaviour here is that of a "library" with an "API".
It would have taken considerable effort, regardless. This cost was offset onto the development teams in companies doing migrations. It was the decision made and if you don't like it, consider using another programming language. Perhaps consider that it's an open source project with a lot of contributors essentially working for free.
How many downstream, Python dependent companies were funding its development? Everyone is entitled to their opinion. But if they build their system on a platform outside their control then they'll have to roll with the changes, fork it, or move to a different platform / fork.
I don’t think hindsight can be claimed here. It was a decision that was not made from ignorance. The Python developers chose to sacrifice backwards compatibility. Other languages do not typically make such choices and if they do they make updating codebases relatively easy.
Nothing about python versioning is easy. It’s a disaster and the key reason I do not start any projects in python.
> The Python developers chose to sacrifice backwards compatibility.
And it is quite clear that that choice was not based on accurate estimates and insights.
The original e.o.l. was laughably short and then had to be doubled. It was quite clear they based their choice on the assumption that consumers would have all switched to e at a time when 2 was still used by 80%.
They made that choice based on what can only be seen as complete ignorance of the cost of rewriting software.
Right now, the biggest reason to drop Python 2 for most serious consumers is not any of the improvements that Python 3 brings, but that it is e.o.l..
> Other languages do not typically make such choices and if they do they make updating codebases relatively easy.
I want to understand what was so hard about porting code from Python 2 to 3. I ported a few tens of thousand lines of Python 2 code to Python 3 and it was pretty trivial. In my experience the only thing that made porting code hard was when a package you depended on was not ported to Python 3 yet. But maybe my experience does not reflect some other cases. Can you eloborate on what was so hard about porting code from Python 2 to 3?
How do I regression test five different pieces of DAQ hardware? My best plan is to pull them working systems and deal with them missing. I don’t think it’s a good use of resources to buy extra DAQ cards just for a regression test bed.
Regardless of that, moving from python 2.5 to 2.7 is not trivial because not all used libraries were even updated to 2.7 from 2.5. Some that were broke backwards compatibility. How far do I have to bend backward just to get in the right place to update to python 3? I see many comments trivializing the effort needed to update to python 3 because they know of narrow use cases and expect large amounts of resources to maintain code. That isn’t the reality for most users.
The hardship of porting from 2 to 3 very much depends on how critical the software is. Porting 1000 lines of python 2 that deals with files encoded in various ways where it’s impossible to test all edge cases and where a failure might lead to huge liability charges is hard not because it’s hard to run 2to3 and do some random tests, but because you don’t know what you have missed. And still, a 300k lines of code project might be fine to just run 2to3 on and then find the bugs as you go. It’s a matter of context.
And the opposite -- tons of little unimportant scripts sitting around that add a lot of value as a whole, but just aren't worth devoting to rewriting on a whim of poor decision making skills of the Python developer team....
I have Python code dating back to when I was an undergrad. It's sad to see the Python team decide to nuke that. My C code from then (mostly) runs fine still.
The team decided to externalize a massive cost on its community without much benefit. That was sad to see at the time, and it continue to be sad to see.
As someone who does a decent part of development in python, I'd say you are using the wrong language, if you can't test your edge cases and have huge liabilities.
Python code is inherently almost-untestable and fragile. These days, when coding something critical and non-trivial, I choose a memory-safe language with static typing and type inference, ADTs, pattern matching and try to write simple yet pure functional code with well defined semantics, that works almost by definition.
> If you have a liability situation, maybe you could work to rectify it
Well yes, sure, of course.
And like you said, maybe Python isn't the right language in the first place for mission-critical life-is-on-the-line software.
But if you have already gotten yourself into a position where some piece of your business infrastructure is dependent on an obscure bit of hard-to-port-to-Python-3-and-maintain-exact-behaviour Python code, then it is exactly the "2to3 transition that's shaking up your house of cards", no?
And, furthermore, like you said, if you find yourself in this position, you should be looking at some other language entirely rather than porting to Py3, eh?
Note that I am not against using python in mission-critical code.
I was referring to untestable code with a myriad of edge-cases, in which case you have a problem that will surface sooner or later, be it 2to3 transition or something else.
If the code is truly static, you can ignore the transition and deprecation. Otherwise you should probably work on documentation/testing/refactoring and/or porting to another language.
2to3 transition was handled badly, up to about 2.7 and 3.4 or so, but the pains described here seem mostly self-inflicted, and I don't see it as an argument against the needed changes.
These are exactly the concerns of serious enterprises that the Python developers have missed, that made them seem as though they be hobbyists that have never dealt software that actually powers infrastructure.
Python was intended for education originally. It's possible that some uses are just too far outside that wheelhouse to expect it to work well forever. Doubtful I'll ever write desktop GUIs in PHP for example, though it appears some have already done it.
The standard library of 2 already came with many facilities that go well beyond that.
They targeted business; it came to be adopted by business; and then they were surprised that business was not enthusiastic about updating currently working code with all the potential regressions and downtime that might come from it.
Could you explain how "Python was intended for education originally."?
As I recall, Python was designed for the Amoeba operating system, and drew on experience from implementing ABC; ABC was definitely designed for education.
But ABC != Python. Checking now, the first Usenet post for Python 0.9 says:
> Python can be used instead of shell, Awk or Perl scripts, to write prototypes of real applications, or as an extension language of large systems, you name it.
There are certainly some cases where even the smallest backward incompatible change would cause serious problems on some systems. Thanks for giving an example, instead of just downvoting.
The problem was that you could only port once all the libraries you use had ported, but libraries didn't want to commit to abandoning Python 2 quickly.
Agreed, that was also my experience as well, the hardest part was not changing our codebase but if we depended on a package that was not ported to Python 3 yet.
> The Python core devs did not have the time or motivation to support the old codepaths
Then sounds like they didn't want to be python devs anymore, good luck on their new project..
Instead they held onto the reins and drove python into the ground so that their new code could devour the remains of the old.
> They didn't realize how inadequate their migration tooling was
A shame then that they decided that migration was mandatory. They don't need to know either, they just have to encourage users to migrate, rather than force them to. Saying "They didn't realize ... how very entrenched Python 2 is" is basically saying "we didn't think we'd encounter (significant) resistance". Their "hopeful aspirations" was that everybody (that mattered) would be onboard, which is why they didn't bother ask..
There are a billion blog posts about the python core developers acknowledging their mistakes and saying they would handle future changes much differently.
This post might be true, but it's roughly 10 years late in terms of hitting the intended audience. Everyone gets this now, and "beating a dead horse" might be an understatement
Some dead horses need a serious beating every now and then to remind people that they can resurrect if you're not careful. All of the lessons the python team did not put into practice were well known at the time, but they knew better and here we are.
The day after tomorrow someone will make breaking changes to some API, framework, language or OS who still needs to learn this lesson, maybe we'll get to them in time.
The lessons have not arrived at the current Python cabal. They just deprecated unittest and are seriously considering to break parts of the C-API again.
For the people who work in the correct companies this will generate many billable hours for no gain.
For others it will be a lot of unpaid work again. At this stage Python should be forked.
I seriously (I mean seriously) thought about forking Py2 (Tauthon is great BTW) but then I found out that PyPy has a Python2 mode and will for the foreseeable future. Just to be clear: PyPy runs Python 2 code, and always will. (As far as I know. Although it occurs to me that I have no idea what it's like if you're trying to work with the C API.)
(Also I got into Prolog, but that's another story.)
Apparently not everyone gets it, seeing that many are arguing against it, and every time this subject lands there are many ignorant users that say “Just upgrade your code.” as if that be free.
Python 2.7 still works as a binary. You can vendor all your requirements. The rug is not being pulled out from anyone, we’re looking at 10+ years of this.
pypi is a mostly volunteer-only endeavor, so it’s tough to support stuff forever. And even there older pips still will work!
Python 2 still works! It’s still there! Nobody is taking it away from you in any real sense. But Python developers don’t want to continue developing in that environment so are choosing to not handle it for future stuff.
Python 2 works. You can use it forever if you want. Nobody is forcing you to upgrade... except if you want the free labor from the community. And you have had years and years and years.
> Serious enterprises do not like to invest in something if it mean that 10 years later they would have to rewrite their entire codebase again.
Python 2 to Python 3 was nothing like rewriting an entire codebase. Most of the difficulty was if you depended on a package that only supported Python 2, other than that it was pretty easy to port a Python 2 codebase to Python 3. If you have millions of lines of code it might take more time understandably, but still it was nothing like rewriting a whole codebase.
> The very swarms of users that chanted “just upgrade” as if that not incur a significant cost also seemed ridiculously naïve to me, not understanding the real cost that large projects have by having to rewrite very extensive codebases and dealing the potential regressions that that might involve.
And yet we haven't heard of this being an actual, real problem, or are there any high profile examples?
I had to migrate multiple small projects (~10k loc) myself. That should be the typical use case for python (power law etc.) The whole thing took about half an hour per 1000 loc, and I had more than 10 years to plan it.
> The minor improvements of Python 3 did not warrant breaking backwards compatibility and most could have been handled in a way that would not break it opt-in directives.
There were serious issues in Python 2 that could not be fixed in any backward compatible way, and would have made further progress forward impossible.
It wasn't done lightly and a lot of smart people thought about it for a long time.
---
And your old Python 2 scripts will continue to work forever, so I'm not quite sure what your beef is.
Progress needs to be made, and sometimes dropping support for stuff you no longer want to spend time supporting makes sense.
That said, I still think situation is mishandled for this reason: py3 is basically another language, similar to py2. Calling it py3 is an exercise in marketing - instead of creating a new language to compete with py2 (along with all similar languages e.g. Julia) the existing py2 community was leveraged/shamed* into supporting the new thing, and most importantly, py2 was killed by it's maintainers (rather than handed off) so it couldn't compete with py3, and so that users would be forced to move somewhere else - py3 being the easiest.
If it had properly been a new language, they could have taken more liberties (compat breaking) to fix issues, like a single official package manager. And migration to py3 would have been more by consent, than by force.
Very much this. It's a separate language that, if it hadn't been pushed by BDFL and co., if it had appeared as an independent project (like e.g. Stackless Python or something), would have had to live or die on its own merits.
- - - -
An additional aspect that I see as an old Python user is the "poisoning of the well" of the inclusive and welcoming spirit of the community. We (I'm speaking as a Pythonista here) have had problems with this in the past (remember how grouchy effbot could be? He's a sweet person IRL though.)
We made great progress and got a lot of acceptance in the educational and academic worlds.
Now just read this very thread and you'll find so many people making curt dismissive comments to folks who aren't on board with Python 3.
I still love and respect GvR (I once, with his permission, gave him a hug!) even though I think he messed up with this 2->3 business (and in any event, the drama around language innovation eventually pushed him to resign, as we all know.) He's a human being. And a pretty good one.
I guess what I'm trying to say is Python 3 won. Let us (all of us) be gracious about it.
While I also find the timeline totally reasonable, I think most "I don't have the time" complaints are probably less about being able to finish it in time, and more about wanting to spend time doing something other than rewriting otherwise finished or stable code to satisfy a backwards-incompatible change.
> It’s open source, you can fund some program to keep supporting python 2.
No, actually, you can't - last time I checked, they were specifically threatening[0][1][others] to sue anyone who tried to continue developing Python for trademark infringement (despite that they are ones falsely using the trademark for something other what it got it's reputation from).
Written by someone that, for some reason, did not decide to maintain their own fork of python2. If time isn't free, why is it expected from maintainers to support other companies' lifestyle with their own time?
If you don't like the laws, are you a hypocrite for not starting your own country?
Arguments in this thread seem to miss a discrepancy:
"We don't want to support py2, and so why should we? Our time isn't free and we do what we want!"
"We know you don't want to migrate your py2 code, but you have to."
Forks aren't easy, especially when you get no support from the "official" python-2 maintainers. At the very least, a fork would not own the name.
Here's a question - why isn't python-3 a fork of python? Answer: because forks are hard, and the devs wanted to keep all the momentum/resources of python-2.
The fork comment is not meant to be a realistic suggestion, it just points out that there is work needed to maintain compatibility. The thing is, you can't both complain about the time it takes to migrate your project _and_ expect maintainers to spend an incommensurate amount of time maintaining suff for you, free of charge.
I understand that some people and companies are now caught between a rock and a hard place right now. But honestly, that rock has been coming for 12 years now, and the alternative is to put other people in that situation.
Sure, but the assumption is py3 devs do the work in order to dismiss the idea and suggest people are entitled.
py3 devs don't need to do the work, they just need to hand it off.
> The thing is, you can't both..
Yes you can, if "maintaining" is handing it off, as opposed to the straw man of forcing py3 devs to do it. Why do the gatekeepers only allow for themselves to do the work?
> that rock has been coming for 12 years
notice is not consent.
> the alternative is to put other people in that situation
12 years is enough time to hand off to people who are happy to maintain py2. But there was no choice given.
I've already tried running old C++ projects and every time something breaks, so it's not as clear cut as you make it to be
Some things in Python 2 were not fixable by keeping it backwards compatible. Print as a statement? Sure. But strings/byte arrays, no way.
Of course they could have made the Py2 implementation less broken and less stupid (yes please do use ASCII as the default, ignore the existence of unicode, be trigger-happy about errors, etc)
The whole string/bytearray disaster could have been prevented if strings would always be UTF-8 encoded. That way strings and bytearrays can continue to be different views on the same data. The great divide between byte- and string-data was completely pointless, especially in 2008 when python3 was started because by that time UTF-8 was already firmly established for at least a decade (it would be an excusable design fault only in the 1990's).
> by that time UTF-8 was already firmly established for at least a decade
For text files, maybe, but various APIs like the Window API and the Java String API still use UTF-16.
UTF-8 dependence is also a major pain for many where the local character set conflicts with UTF-8. For example, there's still a lot of Japanese files out there in SJIS that need to be decoded accordingly. The country of Myanmar officially switched to unicode less than two years ago so if you still need to operate on older data, you're going to need to support their old character set.
UTF-8 as a fixed encoding only works if you manage to write mappers from and to alternative character set for practically any language outside US English. Instead of breaking compatibility with most libraries, python3 would have broken compatibility with most libraries and a few countries instead.
Just like the rest of the world has to deal with three countries refusing to switch to metric, python3 needed to deal with countries refusing to switch to UTF8.
> UTF-8 as a fixed encoding only works if you manage to write mappers from and to alternative character set for practically any language outside US English.
Huh? I'm using UTF-8 exclusively for string data since around 20 years in C and C++ and never had to deal with language specifics (also true for non-European languages, we need to deal with various East Asian languages, and Arabic for instance). You need to convert from and to operating system specific encodings when talking to OS APIs (like UTF-16 on Windows), but that's it (and this is not language specific, code pages are an "8-bit pseudo-ASCII" thing that's irrelevant when working with UTF encodings.
When dealing with "vintage" text files with older language-specific encodings, you need to know the encoding/codepage used in those files anyway, and do the conversion from and to UTF-8 while writing or reading such files. Those conversions shouldn't be hardwired into the "string class".
> UTF-8 as a fixed encoding only works if you manage to write mappers from and to alternative character set for practically any language outside US English.
From a European perspective, this sounds very unlikely. Sure, you may have to deal with deprecated _encodings_, but I’d like to hear about mainstream languages with writing derived from the Latin alphabet, that aren’t supported by UTF-8
I don't buy the "discouragement" part there, if anything they could have made it mandatory or at least it set it to UTF-8
> For example, there's still a lot of Japanese files out there in SJIS that need to be decoded accordingly.
Yes but you would have to work on those cases anyway and ASCII would have made it blow anyway. But convert it to UTF-8/16 and it works.
EDIT: the reason is apparently that "(setdefaultencoding) will allow these to work for me, but won't necessarily work for people who don't use UTF-8. The default of ASCII ensures that assumptions of encoding are not baked into code"
Really. I can't explain my anger at how this is such an idiotic excuse. Yes, your program will fail if you use Latin-1 encoding, duh. Configure your environment correctly and it will work. Sounds like the kind of pedantry that made Guido quit over the walrus operator
Another typically ungrateful and entitled comment about opensource on HN. Color me surprised.
> Do these men think that time is free?
Do you think they have endless time to plan a migration with minute detail for every possible usecase?
Users have about a decade to migrate their codebases and stop writing new projects in Python 2. Do you need another decade? Or are you personally going to take over the maintenance of the python2 runtime?
Does anybody actually pay the core dev team for support? Do you? Does your company? Have they been coordinating all these years with the core devs and are unhappy with the result they paid for? I kinda doubt it.
It would be really nice if people were just thankful for all the free stuff they got and built their enterprises on.
> Do you think they have endless time to plan a migration with minute detail for every possible usecase?
My point is that for these small changes from 2 to 3, there should have never been a migration to begin with.
It's not an accusation of lack of effort; it's an accusation of ignorance on their part.
The migration has not only cost everyone else time and money; it has cost them time and money that was better spent elsewhere.
It has been a net detriment to all parties, including them, because they severely underestimated the cost of rewriting software and dealing with the regressions it might lead to.
I will damned well call a man foolish, for pointing a gun at his foot and getting shot in it, because he underestimated how easily the trigger would go off by accident, instead of being thankfully that he was willing to put in the effort to aim it at his foot.
You may be right or blinded by hindsight. Personally I'd rather see more breaking changes in languages like PHP that have a lot of baggage holding them back.
It’s the 21st century. Questions like “do these men think that time is free?” have no place in this century. Why not be accurate and say ‘people’? You chose to use a number of big words when simple ones would suffice. Why not take the same care in addressing people fairly?
For many non-native speakers, 'man' tends to be used as 'person' instead of 'male', probably because the translation of many common idioms involving 'man' uses a neutral word in their language.
For example, when translating between English and Romanian, 'man' often gets translated to 'om', which doesn't imply a gender in modern Romanian.
Even in English, 'man' is sometimes used without a gendered connotation. For example, if I say 'man is evil', I am unlikely to be referring to males, but rather people. Similarly, 'hey, man!' is not reserved for males.
I know that but take a moment to read OP’s finer use of the language and use of more precise language. Every single word is perfectly precise, right down to the tone.
That was a pointlessly gendered comment and has no place in our industry. You can keep defending it but I’m not going to stop calling it out.
Is that so? I find I used various open phrases such as “Python developers” without going into the exact semantics of which ones, as I'm sure many objected to it, or “serious enterprises” without naming them, an incomplete list of programming languages, and so forth.
It was certainly an informal statement I made, not a formal specification.
There was nothing gendered about that statement, and most do not seem to have interpreted it as such, nor was it so intended by me.
You don’t have to get defensive just be aware. The times have changed, the world is different and our default language absolutely has to adapt.
Language adapts or does. That’s how English got here. You can adapt too - it’s not as hard as you’re making it out to be. Heck, if you spent an eighth of the time thinking about inclusivity as you do about individual words, I wouldn’t have had to say this.
However, again, I’m glad I said something and regardless of what you claim, I’m not going to stop calling these kinds of grammatical monstrosities out. Language is important. Full stop.
My point is that, on an international site where English is only used as a means of communication, you should generally be more sensitive to cultural differences in the use of a language such as English. It is often used as a common common language between people who don't speak English natively, and so idioms and nuances from their own languages seep into this common English.
The finer points about the semantics of a word such as 'man'/'men' and when it can be taken to refer to people unambiguously vs when it may accidentally imply you are talking about adult males are likely to be lost on a non-native speaker, especially if they come from a culture/language where this distinction and its implications are not subjects of general interest. Even if they are well-versed in the use of English in general.
So it's better to follow HN guidelines and assume the best intentions where meaning is unclear, instead of calling people out on their use of English.
Now, if you know for a fact that the GP is a native English speaker, and especially if you know that they are American, then what I'm saying is not very relevant.
I get your point completely and I’m glad you shared it. When I was in University, I worked with ESL (English as a second language) students and they were always really happy to hear about idiomatic quirks like that. I should rethink how I approach this online. I don’t want to be patronizing because lots of non-native speakers have better written English than I do, but I’ll think about it and find a new line.
“Men are evil.” can also refer to humans in general.
I just searched for the phrase and it's about half split between either meaning from context inference. Yet, the meaning pertaining to the species is mostly from discussions by educated philosophers, and the other one are annoying identity politics arguments about why one's North American dating life is disappointing, — not exactly the audience I am ever interested in reaching, frankness be.
I'm simply disputing the claim that “Men are evil.” would be construed by English speakers to automatically refer to males.
The reason I'm not what you call “kind” is simply because this is how English works, and how it has always worked and how English speakers would interpret and parse that word.
I see no reason to avoid using a word in a perfectly acceptable, current, and historic use simply because you find that it has a different, secondary use. You call that “not being kind”. I call it “You don't own the English language any more than I do.”
You may speak as you will, I do not deny that the current usage of the word “man” has acquired a secondary meaning of “adult male human” opposed to it's historical meaning of “human" and if you wish to use it as such, then I'm confident I can usually discriminate by context. I merely ask that I be allowed the same and speak as I will and use the word in it's original meaning, that obviously still sees current use.
> You may speak as you will, I do not deny that the current usage of the word “man” has acquired a secondary meaning of “adult male human” opposed to it's historical meaning of “human" and if you wish to use it as such, then I'm confident I can usually discriminate by context.
To be fair, while I consider your original wording to be pretty clear, this is wrong. According to Wikipedia, the word 'man' has adopted the meaning of 'adult male human' as its primary meaning starting with Middle English, when it displaced Old English 'wer'. There are still uses where it retains the much older meaning, but its primary meaning today is 'adult male human', and has been for a good few hundred years.
The way I look at it, the usage therein of the word “man” to specifically discriminate sex is very rare but definitely occurs. What does occur is the use of the word “man” to refer to a specific individual, which would typically be male, but in most cases where the word “man” is used indeterminately to refer to a class, it seems to be used without regard to sex.
Apart from that the most common usage seems to simply be vocatively as address, which is also gender neutral.
I would agree that it is rare, outside of compounds, to use the word “man” in a determinate sense for a female man, such as “that man over there” which would mostly be used in a military context, but in an indeterminate context to speak of “a man in general” or “men in general”, the most common usage from context seems to be sexless to this day.
Reading through the first 100 results, I see it mostly used to refer specifically to adult male individuals, or to "a man" meaning specifically an adult male ("would've flipped out if a weird man said some creepy remarks"). There are some uses where it may or may not be gender neutral ("you are a Spammier man than I" - may refer to a man or a woman, but it is probably used because the author is male; a woman might have written "a Spammier woman than I" instead, while also addressing both men and women).
There are also clear cases where "a man" is used to refer to "a human", such as "wheat growing taller than a man".
Rather more interestingly, if you instead search for "men", you'll see that is used essentially exclusively to mean "adult males". The only exceptions I found was "and because the greed of a few men is such that they think it is necessary that they own everything" and even there I'm not sure.
> Reading through the first 100 results, I see it mostly used to refer specifically to adult male individuals, or to "a man" meaning specifically an adult male ("would've flipped out if a weird man said some creepy remarks"). There are some uses where it may or may not be gender neutral ("you are a Spammier man than I" - may refer to a man or a woman, but it is probably used because the author is male; a woman might have written "a Spammier woman than I" instead, while also addressing both men and women).
I disagree; the first uses of “man” in an indeterminate sense are these:
> down the economy, Here is the truth the republicans feel uncomfortable with a black man in the with house and a lot of voters are riding the republicans coat tail
> someday you might ask me to help you move. Or, to kill a man. # Leonard: I'll doubt he'll ask you to kill a man
> say, in 35 years of working I have almost always had at least one man who I felt " wrong " about. (the exception? Disney Studios!
> boyfriend, well husband, but either way would've flipped out if a weird man said some creepy remarks regarding me at a christmas party. To me this says
I have specifically included up till your reference, which was the first of an indeterminate usage of the word “man” that by implication is most likely gendered, whereas all the others are most likely not.
So there are three sexless ones before the first gendered one.
I would argue that the one about 'a black man in the white house' was in fact gendered, though it is somewhat debatable. It was referencing Barack Obama specifically. If there had been a black woman president, the phrase would have definitely been written to specifically say 'a black woman in the whitehouse'. On the other hand, if it had been written before either a black man or a black woman had (tried to) become president, it may have still used 'man' in a gender less way.
> You may speak as you will, I do not deny that the current usage of the word “man” has acquired a secondary meaning of “adult male human” opposed to it's historical meaning of “human" and if you wish to use it as such, then I'm confident I can usually discriminate by context.
That’s not even the same structure as ‘all men are evil.’ Instead what you wrote is gendered and thus completely inaccurate.
So again, you could have used ‘people’ to be respectful and inclusive but you’re choosing to stick with ‘man’ because that’s what you know.
That’s unkind. You know that this is an issue within our community but you are fully choosing to go against the norms because of ‘your language’?
I’m sorry but I thought we could have a conversation. This many replies in and I realize that you don’t actually have much sympathy, understanding or even basic caring.
> That’s not even the same structure as ‘all men are evil.’
Indeed it is not. I merely separately disagreed with that the statement “All men are evil.” would also by necessity be interpreted as such. Either can be, depending on context, but this is not such a context.
> Instead what you wrote is gendered and thus completely inaccurate.*
You seem to be of the minority that has interpreted it as such. I would not quickly use votes for an argument except when they pertain to popular opinion, and this is a matter of which interpretation is more common.
I certainly didn't mean any gendered statement, and I also believe that most readers did not read any gender into it.
> So again, you could have used ‘people’ to be respectful and inclusive but you’re choosing to stick with ‘man’ because that’s what you know.
I could, and you could also change your language to avoid any and all possible ambiguities that would not be a problem in practice due to the power of contextual inference.
You seem to ask that this specific word be given special treatment above all others.
> That’s unkind. You know that this is an issue within our community but you are fully choosing to go against the norms because of ‘your language’?
Such as here, the word “our community” is quite vague. You used the word “our” which is ambiguous in English as it's unclear whether it includes the listener or not, and on top of that also what it includes.
I can however perfectly well infer from context that this is an “our” that includes the listener, and can make a reasonable guess to the extent of the “community” you refer to.
Finally, do not know that it is “an issue” and I certainly do not know that there are “norms” about this. It very much seems that the majority sides with me on this issue given the votes, at least here. I do not believe I am going against any norms, not that I would consider an argumentum ad populum a strong one, but you were the one that raised it here.
> I’m sorry but I thought we could have a conversation. This many replies in and I realize that you don’t actually have much sympathy, understanding or even basic caring.
Well, frankness be, it seems from your language as though your default expectation is that your arbitrary wims, at least on this particular issue should be accommodated, and that everyone who disagrees with you is unkind or lacks sympathy.
You call it a conversation, but it seems as though you started it from the assumption that you are right, and everyone who disagrees is wrong.
> Be better. It’s easy.
It is your opinion that this is better that this is better indeed. Not everyone has to agree with you on that matter, and not everyone does.
Nobody ever has to agree with me and I’m proud to be a minority of one.
However, you’re a beautiful writer and beautiful writers can cause immeasurable pain. I’ll always speak out in case another minority of one feels pain but is too ??? to speak out.
Seriously, take good care. This has been a wonderful thread and again, you’re a really beautiful writer. :)
Not terribly pertinent, then. One is more likely to fall into conversations about mundane topics with uneducated people than to stumble upon existential conversations with educated philosophers, even though the latter might produce a large corpus.
One would also think that “man is evil” would be preferred by the erudite philosopher to the more ambigious “men are evil”, although one can never overestimate the fondness that an educated person might have towards pedantry, frankly.
> Not terribly pertinent, then. One is more likely to fall into conversations about mundane topics with uneducated people than to stumble upon existential conversations with educated philosophers, even though the latter might produce a large corpus.
“Mundane people” is an entirely different segment than “raging identity politics aficionados complaining about their romantic life”.
The common man on the street will think nothing ill of the word being used as such, even when he be a blue collar construction worker, and will normally interpret it as intended.
I have never met such a raging identity politics aficionado in real life. I would assume not living in the U.S.A., where most of them seem to be centred, reduces my chances. But even there, it seems to be a rather small segment that is isolated to weblogs, as even newspaper columns do not seem to find it mainstream enough to dedicate segments to it.
I'd gander that if I were to find myself in New York and strike a conversation with a blue collar local and say something such as “A beautiful city isn't it? all these millions of men, working as an organized beehive.”, that he'll not interpret me wrongly or even think much of it.
>I'd gander that if I were to find myself in New York and strike a conversation with a blue collar local and say something such as “A beautiful city isn't it? all these millions of men, working as an organized beehive.”, that he'll not interpret me wrongly or even think much of it.
Actually I think there's a very good chance she'll object.
The problem is that in your mind, males are the "default" human, and using sexist language reinforces this. This is not a recent opinion confined to "raging identity politics aficionados" or "weblogs" - at this point it's the wrong side of history for the better part of half a century. Consider this piece of satire by Douglas Hofstadter, written in 1985, which substitutes racist language for sexist language in a precisely analogous way:
> Actually I think there's a very good chance she'll object.
If you mean to suggest that this position runs across gender lines, then I very much object and find that a naive, but common, assumption.
It reminds me of a Canadian act that sought to introduce the word “fisherwoman” as a sign of good faith to the female fishermen, but it revealed that, overwhelmingly, the fishermen, male or female, did not like this change and found the word to sound silly.
I have noticed no correlation with the gender as to what position one takes on this, as many females as males seem to either favor, or object to, innovations such as “chairwoman” or “councilwoman”.
> The problem is that in your mind, males are the "default" human
No, that would be in the mind of those that read the word “man” and must compulsively attach a gender to a statement containing it.
I've certainly noticed that those so interested in gender language police invariably seem incapable of abstractly thinking of a person without attaching a gender thereto.
> and using sexist language reinforces this
The sexist history is to use the word that has always simply meant “human” and giving it a gendered, ageist meaning. — you reverse the history of the word here.
> at this point it's the wrong side of history for the better part of half a century.
What would you mean with “wrong side of history”? It is undeniable that the meaning of the word “man” to mean “human” is the original meaning of the word and that the secondary usage to mean “adult male human” is a later innovation.
No, you missed the point entirely. The point is that you pictured this "blue collar local" as a man, as evidenced by your use of the pronoun "he". Don't tell me that it's about the word "man" and its historical role to mean "human".
>I've certainly noticed that those so interested in gender language police invariably seem incapable of abstractly thinking of a person without attaching a gender thereto.
> No, you missed the point entirely. The point is that you pictured this "blue collar local" as a man, as evidenced by your use of the pronoun "he".
No I didn't. The pronoun “he” in English is also very often used to refer to an indeterminate, hypothetical person of irrelevant and unspecified sex.
I didn't picture him as anything in particular, given that I am partially aphantastic and never draw mental pictures about such scenarios.
> The irony. Next time say "they" instead of "he".
There is no irony here; you infer that he is male because of the pronoun and I find such usage to not be universal at all.
The pronoun “he” has a very long history in English for use with a hypothetical person, from which the listener is not meant to infer any particular gender. It is also true that some use the pronoun “they” in that case, but that is not a universal behavior and either may be encountered.
Use of “she” for such hypothetical persons has also seen recent use, and was probably innovated deliberately; some auctors deliberately alternate both in even distribution.
All of this is how the English language is used by different speakers. I am not telling you which is better and how you should use it; I am telling you that if you are denying that all have currency, you are but certainly being willfully ignorant because you do not like the descriptive truth about how English is used by it's speakers.
Well, the other perspective would be that you should stop using the word that refers to human beings and has always done so only for the adult male members thereof.
My perspective is that context is usually sufficient and that this is not the only word in English that is used as such. I never find such passionate debates about the word “chess” for instance which can be used for every game that descended from the Indian game, the European variant specifically, or simply any exercise of great tactical planning.
Such interesting objectivity men are awarded when politics not be in play.
I agree with you and the sentiment but your tone really discredits the argument. Instead of putting others down it's better to assume good faith, and educate in an elevating way. This way feminism gets a good reputation.
>Rust editions, which as far as I can tell have been a complete success.
Rust's commitment to backward compatibility is certainly extremely commendable, but I don't think the language went through anything resembling the switch from Python 2 to Python 3 in terms of breakage.
Some of the changes in Python 3 are very fundamental. Imagine if Rust had shipped without String/str and they were added after the fact, would Rust manage to avoid splitting the ecosystem? That's an open question as far as I'm concerned.
And I also hope that we never find out. Rust's fundamentals have proven to be very solid so far. Having things like OsStrings (something missing from most programming languages, including Python 3 AFAIK) shows a great amount of foresight and understanding of the problem space. Contrast that with Go which seems very intent on completely ignoring 30 years of programming language evolution.
Single codebase compatibility meaning that you can have python2 and python3 code in the same application? Isn't that significantly harder with an interpreted language or am I missing something?
It's a mostly solved problem with Racket. What they probably should have done was have python 2 code somehow declare itself as python 2 (Racket does this with #lang at the top of files). Then, just have a python 2 compatibility layer that works in two steps. First step is to compile/parse it into a similar form as python 3. Additionally, provide a small python 2 runtime which provides different versions of the functions which changed from 2 to 3. I think the two steps are important because some stuff is easier to solve via compilation like "print" while other stuff may be only possible at runtime like strings being unicode.
You would still have some differences which can't be papered over, but it would have made writing code that works in both python 2 and 3 much easier.
Single-codebase compatibility means that the same code can run under both Python 2 and Python 3.
The initial expectation was that translation tools would solve this problem, but it didn’t really work out that way. Adding language features and library shims to make it possible to write pidgin Python that would run under either version meant that you could migrate libraries and parts of large codebases one at a time until the whole thing ran under Python 3.
That's the main working solution I found: code works in both versions.
The problem is that it's way trickier than it should be. Has they made that relatively easy, the Python 2->3 transition would have had a much smoother "normal" upgrade process.
Python2 was poised to take over the world and be the next Java. 3 is losing ground to Node, Go, Rust, even Lua. 3 is a really fun and productive language to work in as long as you don't need to think about bytes.
It causes issues with running low overhead multithreaded code. I get by with the multiprocessing library, but then again I don't have a lot of threads (10 is the most I've ever needed), some people want to run hundreds of "light threads" depending on the type of programming that you are looking to do.
I am still not sold to Rust editions and think in the long run they won't be much different from solutions in other platforms.
There will come the day when something will change semantics, or require different kinds of runtime support across editions, and then the headaches how to link binary crates from different editions will start.
Editions to me appear only to work, provided everything is compiled from source code with the same compiler, aware of all editions that came into use.
Read the damn Editions RFC. The community agreed that no semantics or ABi breaking changes will land to Rust,
EVER.
This is not a lesson from Py th on, but from C++, which introduces breaking changes every single release, which are much smaller than Python but still a pain in million LOc code bases.
If that ever happens, it was agreed that the result would be a different language, with a different name.
That is, editions don’t have this problem because the FUD that you are trying to spread every single time this issue comes up cannot happen, by design.
Your argument “Rust editions don’t solve this problem because they don’t handle semantic or ABI changes” is false,
because in Rust there CANNOT be any semantics or ABI changes, and editions handle this situation just fine.
In the context of Python 2 vs 3 this argument makes even less sense, because editions allows combining libraries from different editions ina forward and backward compatible way without issues. Source: work on multiple >500kLOC Rust code base and one >1 million LOC and they all use crates from all editions, and mix & match them, without any issues, doing LTO across them, using dynamic libraries, and all possible combinations of binaries across 4 major platforms.
The problem is there, the fact you choose to ignore expectations of C and C++ enterprise developers using binary libraries across language versions is another matter.
You call it FUD, I call it hand waving.
I want Rust to succeed and one day be available as official Visual Studio installer language, but apparently unwelcome points of view are always pointed out as FUD and attacks.
When cargo does finally support binary crates, I will be glad to be proven wrong when linking 4 editions together into the same binary, just like I do with DLLs and COM today.
I think you misunderstood. C++ doesn't even have an official ABI, nevermind having a stable one. ABI changes can and do happen in many C++ implementations (and there is no compatibility across implementations - you can't link a library compiled with clang to one compiled with MSVC). You can't generally expect to link together libraries compiled with different major versions of the same toolchain, though this may be supported by some toolchains.
Instead, Rust has defined an ABI and has committed to never breaking that ABI. Editions support API-level changes, but the ABI won't change.
Rust has not defined an ABI. You're misunderstanding how the edition mechanism works. Each compiler knows how to turn source code of any given edition into the same internal IR, but that's purely internal. You still cannot compile an rlib with Rust compiler version 1.X and use it in a program compiled with Rust compiler version 1.Y. You can compile an rlib with Rust compiler version 1.Z that uses edition 2015 and use it in a program compiled with Rust compiler version 1.Z that uses edition 2018.
Rust actually supports multiple ABIs and you can pick which one to use.
The one I use for maximum portability is the C ABI defined in the ISO C standard and on the platform docs (eg the Itanium ABI specified eg in the x86psABI document on Linux).
I didn’t chose to ignore that. I compiled a Rust binary library 4 years ago, it still works without recompiling today on a dozen of operating systems and toolchains that did not exist back then.
Try doing the same with C++.
I really believe you when you say that you are clueless about how to do this, since you don’t seem to have any idea about what you are talking about, and all your posts start with a disclaimer about that.
But at this point the only thing I have to tell you is RTFM. Doing this is easy. HackerNews isnt a “Rust for illiterates” support group. Go and read the book.
There's no ABI compatibility between different Rust compiler versions as it is, so I don't see how editions will break a compatibility that doesn't exist.
Python is the best example I can point to for how important it is to get the versioning and dependent management story right.
Python is one of the most "accessible" languages in terms of the actually programming experience, but making a project reproducible is a nightmare. There doesn't seem to be a real "right way" to manage dependencies, and getting a project running seems to often start with figuring out how the author decided to encapsulate or virtualize the environment their project is running in, since changing your system python for one project can break another.
I know it's an older language, so many lessons have been learned, but after working with Rust, or even NPM it seems amazing that developers tolerate this situation.
Re: Dependencies - There are at least two well known, well supported, rock solid ways of managing dependencies that are in very common use in the python deployment world.
Some combination of requirements.txt (which lets you dial in, with great precision, each of the libraries you need, and is trivially created in < 50msec with "pip freeze")
1. Containers - That's it. You control everything in it.
2. virtualenv - Every environment comes their own version of python and their own set of packages and versions of those packages. Add virtualenvwrappers and you can create/switch trivially between them.
It's been at least 2 years since I've run into a issue with python and dependencies that wasn't solved by both of those approaches.
I mean I think you can get a workable setup, it just seems really clunky to me. Like you cobble together some solution out of pip, docker, venv, and if you're jumping into someone else's project you better hope they documented it (wait I have to call `source` on which file?).
It's a far cry from being able to download any git repo and call `cargo build`/`cargo run` or `npm install`/`npm run` with confidence that it's just going to work.
I guess part of it depends on the teams/people you work with. I agree - that it would be nice in the python world if we all just agreed, "virtualenv + requirements.txt - Done." - Instead, as you noted, the python ecosystem has split into venv, pyenv, pipenv, Poetry, Conda, ....
Where I work - life is simple. You build your project in a virtualenv so it only has the libraries it needs, generate a clean requirements.txt, check it into git with a requirements.txt - everyone can run it, and, because we have day-1 onboarding to teach everyone virtualenv/virtualenvwrapper - the first thing a person does before installing the application is mkvirtualenv.
I see a lot of references to Poetry here - but I've never been able to interest any of our senior developers into looking at it - they are pretty happy with our existing system.
To be honest, I do expect to find a project provided with a pipenv/poetry setup, just like I would expect a haskell codebase to have a cabal/stack setup, and a java codebase to have a maven/gradle setup.
It is true that most recent languages ship with these from day 1, but ecosystems rarely lack this kind of stuff. I mean, even my vim has a package manager nowadays.
As for whether you want to salvage old code that isn't provided with package management, it's up to you. But you would have this kind of problem with any old, unmaintained codebase.
> It's been at least 2 years since I've run into a issue with python and dependencies that wasn't solved by both of those approaches.
The problem is that this leaves us with 2 years of documentation that's reliable and addresses easily-solved problems, and 28 years of everything else that will confuse anyone new to the language. Not ideal when accessibility is one of the language's primary selling points.
One of the major problems with fixing design choices or odd behaviours in software is that all of the old threads and posts don't just disappear, and people are now going to be lead down paths that are not only so convoluted and ridiculous that they were eventually changed, but often paths that don't even work any more.
It's very very tough to fix that problem retroactively.
Forget containers. The actual "right" way to manage python dependencies in a project is Poetry. It's very solid and super reliable and uses virtualenvs internally.
Major Payne: Want me to show you a little trick to take your mind off that arm?
[Marine nods and Payne grabs the private's pinky finger]
Major Payne: Now you might feel a little pressure.
[Major Payne breaks the Marine's pinky]
Marine Private: AUGGGGH! My finger, my finger!
Major Payne: Works every time.
====
That's kind of how I feel about Docker. Before, you had a problem. With Docker, you have a new, bigger problem (and most of your old problem hasn't gone away; it's just been masked for a while).
On a more serious note, most uses of Docker that I've seen push problems back, and have accumulating technical debt (with interest).
* Robust systems shouldn't be tied to pinned versions. If your code works with PostgreSQL 9.6.19, and doesn't work with 9.6.20 or 9.6.18, that's usually the sign of something going very, very wrong.
* In particular, robust systems should always work with the latest versions of libraries. In most cases, they should work with stock versions of libraries too (whatever comes with Ubuntu LTS, Fedora, or similar). It's okay if you have one or two dependencies in a system beyond that, but if it's a messy web, that's a sign of something going very, very wrong.
* Even if that's not happening, as much as I appreciate having decoupled, independent teams, your whole system should work with the same versions of tools and libraries. If one microservice only works with PostgreSQL 11.10, and another with 12.07, that's a sign of something having gone way off the rails.
These aren't hard-and-fast rules -- exceptional circumstances come up (e.g. if you're porting Python 2->Python 3, everything might not land at the same time) -- but these should be rare enough to be individually approved (and usually NOT approved) by your chief architect/architecture council/CTO/however you structure this thing.
For the most part, I've seen Docker act as an enabler of bad practices:
* Each developer can have an identical install, so version dependencies creep in
* Each team has their own container, and it's easy for versions and technologies to diverge
* With per-team setups, you end up with an uncontrollable security perimeter, since you need to apply patches to a half-dozen different versions of the same library (or worse, libraries performing the same function)
The docker/microservices/etc. mode of operating gives a huge short-term productivity boost, but I haven't actually seen a case on teams I've been on where the benefits outweigh the long-term costs. That's not to say they don't exist, but they're in the minority.
For the most part, I use Python virtual environments and similar, but by the time you hit docker, I back away.
> What are the issues with using docker to solve this problem ?
Docker alone doesn't solve the problem and neither does pip unless you take extra steps.
Here's a common use case to demonstrate the issue:
I open source a web app written in Flask and push it to GitHub today with a requirements.txt file that only has top level dependencies (such as Flask, SQLAlchemy, etc.) included, all pinned down to their exact patch version.
You come in 3 months from now and clone the project and run docker-compose build.
At this point in time you're going to get different versions than I had 3 months ago for many sub-dependencies. This could result in broken builds. This happened multiple times with Celery and its sub-dependency of Vine and Flask with its sub-dependency of Werkzeug.
So the answer is simple right, just pip freeze your requirements.txt file. That works but now you have 100 dependencies in this file when really only about 8 of them are top level dependencies. It becomes a nightmare to maintain that as a human. You basically need to become a human dependency resolution machine that traces every dependency to each dependency.
Fortunately pip has an answer to this with the -c flag but for such a big problem it's not very well documented or talked about.
It is a solvable problem tho, to have a separate lock file with pip without using any external tools and the solution works with and without Docker. I have an example of it in this Docker Flask example repo https://github.com/nickjj/docker-flask-example#updating-depe..., but it'll work without Docker too.
Throwing more technology on a problem means more complexity and more things that can go wrong. Doing it once or twice is fine, but the complexity increases exponentially.
Also, Docker is not a universal and secure solution. It works great as an "universal server executable format and/or programming environment" on Linux, but less so on Windows, macOS and especially FreeBSD.
Imagine you get a new phone with a new phone number, and you have a problem because some people still contact you on your old number. So instead of getting everyone to use the same number, you hire someone to take both of your phones and forward you the messages from each one.
Yes at some level it solves your problem, but it adds a lot of complexity which doesn't need to exist. Also now depend on someone in the middle who takes effort to manage, and might not do exactly what you want.
If you are developing a software that you run on your own servers, none. It works fine.
If you are developing an open source software that people can install on their machines, is a terrible solution. In that case you should package it correctly and distribute it via pip, so people can easily install it on their systems.
> it struck me how ridiculous it is that open source languages have to put up with this. Clearly "major" numbers are insufficient, the only real answer is to rename the entire freaking language when you make incompatible changes to it.
Perl and Python are the only two examples of this to my knowledge: most open source languages do fine introducing breaking changes in major versions.
The question is why Perl and Python had such problems while for example NodeJS, PHP (comparable webserver scripting languages) have had no such issues.
I wonder is it anything to do with the areas they're used in (Python & Perl are popular local/cli scripting languages in addition to web—has Bash had similar version woes?), or is it purely that the changes they made were more significantly breaking than others'? That's probably true of Perl6/Raku at least.
Perl never had a problem (at least with the Perl 5 and Perl 6 distinction) besides marketing. No one has ever been confused why their perl script doesn't work because their perl was version 6 instead of 5, and they never will be. No one has ever had worry about having writing a perl program to be compatible with both, unlike python where you never know if "/usr/bin/python" is 2 or 3.
Marketing is important. I did have people ask me if there was any point to learning Perl 5 when Perl 6 was just around the corner, and people buying the butterfly book instead of the llama book for the same reason.
Then the Python people did the same mistake. Beginners learned Python 3 and then had trouble with App Engine or some other platform. The most popular question in Python forums for many years was if someone should learn 2 or 3. Some probably just went with Go instead.
In some fields, Python is embedded as an interpreter into major binary platforms or commercial apps.
So in many of these cases the end user doesn't have a choice to use Python 3 until it's on offer.
And the vendor has usually integrated Python at a binary level into C code; that's why they provide a Python API.
The answer could even be "Red Hat Enterprise Linux 6"; consider that Python 2 is the default in this OS, which ended official support only at the end of last year. Many enterprises _chose_ this platform for its longevity, along with 3rd party vendors of commercial software.
Likely a key factor is leaving support for the old version. If they had immediately deprecated python 2, users would have quickly updated their packages over to a supported version.
Java and PHP I could see as being similar to the Python as they all shared the pain of widespread adoption by vendors that were reluctant to update (Java: enterprise organisational internal, PHP4: bad cheap webhosts, Python2: everyone?).
With Node 0.12 though I don't see it. IOJS was a pretty momentary internal political issue that many users didn't even register on their radars. It certainly didn't have any long-lived impact on version adoption within the community.
And: the important point, they've all had very successful major bumps since. So even if there are pains, they can be overcome. There's nothing fundamentally un-doable about major version releases for open-source languages.
A better comparison would be to Typescript which is a breaking change from JS but is branded differently. I'd love native typescript in the browser but browser vendors aren't going to go pushing a massive breaking change like that.
PHP6, similar to ES4/ESX, was a language version that ended up in spec. hell. Nothing to do with release woes (none were ever released), so I don't see any real relevance in this discussion?
PHP7 on the other hand was a pretty seamless migration from PHP5, and PHP8 looks likely to be similar.
General point is the original commenter was posing this as some fundamental issue of open source languages: clearly there's plenty of examples of success, so it can't be.
> This is awesome in terms of avoiding all of the weird things when a person typed pip rather than pip3 and module didn't seem to get installed anywhere
This won't change that at all; pip/pip3 is a distro packaging thing, and any distro that packages legacy python2 as “python" and Python 3 as “python3” will probably continue packaging legacy pip-for-python2 as “pip” and pip-for-python3 as “pip3”.
I like using `<python_executable> -m pip` to avoid all ambiguity about what python version I'm running pip with/installing things for. Usually `python3 -m pip`, or `python3.8 -m pip`.
I like using pyenv to manage Python versions which will then always symlink "pip" and "python" to whichever version is my system default, directory tree default, shell default, virtualenv etc.
The PEP [0] has been revised since then, and the current recommendation regarding /usr/bin/python is "equivalent to python2 OR equivalent to python3 OR not available at all OR configurable by user/sysadmin".
`pip` is still going to work - all the weird mistakes are going to keep happening. `pip` is just no longer providing new clients for 2.7 - existing clients will keep working.
For the record, Perl6 has been officially renamed to Raku[1]
IIRC the language known as "Perl", version 5, when it gets to the point when it is going to update it's major version, will skip 6 and go right to Perl 7
Yes, but it took so many years of confusing people not deeply familiar with the situation that perl6 will be a better perl5 (as perl5 was a better perl4, perl4 was a better perl3, etc.) while in reality it was a very different beast.
It would've taken 45 seconds of reading wikipedia article to figure out that Perl 5 and Perl 6 are different languages. Java went from 1.4 to 5. With standardized languages like C/C++/etc you have to keep track of not only the toolchain version but also the standard version (like C99/C++11, etc). Programmers routinely keep track of these just fine.
Renaming to a complete different name is not necessary, everyone understands a major version breaks compatibility. Python3 is still very close to Python2 both in syntax and spirit.
But there was a sort of broken promise given by the Python creators: Python3 was almost like Python2, but every library author had to review and repackage their libraries, anyway.
At that point, Python 3 should have been unambiguously incompatible with Python 2 :
- the only allowed file extension should be py3
- all environment variables should have been duplicated with a "3" (it shouldn't read or modify Python 2 env vars)
- all installation folders should have been duplicated with "3"
- all tools like pip should be suffixed with "3"
- and most importantly, it shouldn't try to optimistically run previous Python2 code or previous v2 tools
The mistake was that you could use "pip" or "python" in bash scripts/shell, and not know if python2 or python3 were going to run.
Still today, you can run "python" in a recent version of Ubuntu or Fedora, and it will be Python 3. Only "python3" should be possible. Distro are repeating the same mistake than with Python 2, and we will struggle again with Python 4, if there is any Python 4.
Many headaches wouldn't have happened if "python" was reserved to Python 2.
Pro tip to language and distro maintainers : make the major version part of the language name and executable, from version 1.
That would have made cross-version codebases impossible, and that's what ultimately allowed migrating. One-shot migrations were not convenient, or successful, or even effectively feasible for complex enough projects.
What allowed the migration was community experiment in cross-version sources, as well as reintroduction of "compatibility" features into Python 3.
I dont buy this argument at all. Any sufficiently complex Python 2 project does not work with Python 3 without modification, there is _no_ cross-version compatibility.
And if you wanted to try that yourself anyway, changing all .py to .py3 in a directory is one unix command... It could easily have been part of a 2to3 tool
> I dont buy this argument at all. Any sufficiently complex Python 2 project does not work with Python 3 without modification, there is _no_ cross-version compatibility.
I was personally and solely responsible for migrating a >250kLOC project from Python 2 to Python 3, doing so without cross-version compatibility would not have been feasible. We literally picked the earliest P3 version we decided to support based on cross-compatibility features.
The issue is third-party libraries: They need to simultaneously support both versions of Python during the transition period. If you unilaterally migrate to v3, you break lots of existing projects. On the other hand, if you stay v2 only, you‘re holding up your dependent projects’ migration efforts.
I understand the benefit in theory, but in practice, you had two options :
- the codebase had to be modified to work on both versions at the same time
- you had to maintain two versions in different branches for a while
Is there any data to show which options was most often chosen amongst all pypi packages? I suspect that the second option was more popular for the most important packages of the ecosystem
The first option was by far the most popular and was used for years (including pip), only recently packages started dropping python 2 support. I'm not even aware of any packages which went with the second.
> I suspect that the second option was more popular for the most important packages of the ecosystem
You're 100% wrong. The few packages which decided on option 2 early on (e.g. dateutil) ended up having to roll back to option 1 because it was such a pain in the ass, both for the maintainer and for downstream users. The migration only really started happening once 2.7 dropped, projects like Six[0] started appearing, and the community started ignoring 2to3 and building up experience with cross-version projects and idioms (e.g. [1], [2])
That is not an elegant solution. That way you would forever had 3 in the name, even for future version of Python (e.g. python4, that will not break compatibility with old source code and thus making a .py4 wouldn't make sense).
In your shell you don't run "java8" or "java11", you just run "java", and then it's the matter of what version of java JDK you have in your PATH. The same with all other language interpreters and compilers, you don't run gcc9 or node14. Why doing something different for python?
Really, the mistake of python3 was to break compatibility with past programs. A lot of changes could have been done more gradually, in the first version require a __future__ import, then gradually remove compatibility with old features, and then remove them completely, making the new way the default and thus no longer require the __future__ import. And I think it will be the way for next python version, so in theory we will never have the same problem again.
Also to me it was an error for distros to continue packaging python2 as python. Other distributions, like ArchLinux, switched everything to python3 as default a lot of time ago, it's only Ubuntu that continued to ship python2 as python, thus making a lot of programmers relying on it. It would make sense that the command without the name refers to the latest version, and not the legacy one.
Why try to imagine the way it should have worked, when it shouldn't have happened in the first place. Python 3 was invented because Guido felt it looked nice, and the economic value of all the labor that went into pleasing him is likely equal to that of a small country.
OTOH PHP managed to migrate 4→5→7, and knew when to back out of v6. JS managed to migrate to ES6.
I think Python's failure is unique. It needlessly broke too much back-compat at once, provided too little benefits to make up for it, and let everyone drag their feet for a decade with the upgrade.
"We have nearly endless money and therefore we can make any new feature by treating older versions of our langage as bytecode and write transpilers for it".
Regarding rename the framework: It did and did not work with .NET Framework and .NET Core. There is a lot of issues still there (and their branding decision now is to go ahead with ".NET" ... Which is also not good).
However, I do not disagree. Renaming is the right thing. A version number is easily omitted.
Perl 6 was never meant to 'kill' perl 5. It's a completely separate language, has been from the beginning, and it's been renamed Raku recently. Unlike with python, perl devs realized even 20 years ago people realized there was too much legacy perl 5 code for 'replacement' to be practical. The result is that perl 5 is very backwards compatible, Raku is at the very least an interesting language worthy of some attention.
Compare this to Python 2/3. It's basically an incompatible fork that doesn't add enough for many projects to consider upgrades, and adds the overhead of having to worry about two version. All it really accomplished was guarantee that "Python 4" will never, EVER be a thing.
Perl 6 *was* to be the next version of Perl. That was the intent when the whole effort started in 2000. That it didn't turn out that way, has many causes. Enough has been said about that already. But to say that it was a completely separate language from the beginning, is historically incorrect in my opinion.
I disagree. Perl 6 changed fundamental low level syntax and semantics in the language. The 4 -> 5 transition in contrast was mostly syntax compatible, and in fact Perl 4 scripts are our there in the wild running on the Perl 5 interpreter just fine. Yes, it should've been called "Larry's next crazy experiment language" from the start.
The most that was ever promised about the 5->6 transition was that there'd be ways of using 5 modules in 6 (which more or less works for 'pure-perl' 5 modules, within reason).
> It is our belief that if Perl culture is designed right, Perl will be able to evolve into the language we need 20 years from now. It’s also our belief that only a radical rethinking of both the Perl language and its implementation can energize the community in the long run.
> (which more or less works for 'pure-perl' 5 modules, within reason)
Are you misunderstanding what has been achieved?
Using the "Best First" view of replies to the 2008 PerlMonks question What defines "Pure Perl"?[2]
> "Pure Perl" refers to not using the C-extensions ("XS") and thus, not requiring a working C compiler setup.
Inline::Perl lets Raku code use Perl modules as if they were Raku modules. XS or pure perl.[3]
Not all of course. Some make no sense whatsoever in Raku (eg source filters). Some don't yet work but could if someone cared to deal with issues that arise. But if you're thinking that Rakudo only imports pure perl modules for the above definition of "pure perl", please know that Rakudo is light years ahead of that due to niner's amazing work. And if you mean some other definition of "pure perl" it would help me if you shared it. :)
Agree that it did change low level syntax. What was promised initially was that a "use v5" in a lexical scope, would switch to a Perl 5 compatible parser. This project was started, but became pre-empted by the Inline::Perl5 module, which now allows you to use 99.9% of CPAN's Perl 5 modules (even with XS components) seamlessly in Raku. And yes, that is stable enough to be used in production.
The more accurate thing would be to say it ended up killing all perl, which is something the Python 2/3 transition hasn't done to Python, warts an all.
What difference could that make? There are lots of languages shipped in various distributions that are no longer 'living' languages in the sense that they have little mindshare and little new development is done in them. Perl wasn't one of those languages and now it is. If you want to comment on that or perhaps dispute it, sure. But the 'delete perl and see what happens' thing is beside-the-point nitpickery.
Perl has "little mindshare" and "little new development" in the same way as Bash. It's there, people who understand unix will reach for it when it's appropriate and it's as indispensable.
I want to know which distro removed perl because that's quite a drastic step and am interested in studying it. Sorry if that offends you.
It doesn't offend me, it just isn't related to the point I was making. You replied to me as if that sort of test is relevant and I don't think it is. The analogy to bash is similarly not suitable for this kind of argument - bash has never had any ambition to be a general purpose programming language, perl very much did. Unfortunately for perl, perl's own efforts in that regard pretty much eliminated the possibility of that ever happening. Python's trajectory hasn't been that, even with the travails of the 2/3 transition.
'look at how perl did things!' is just a really strange approach in a discussion about the Python 2/3 thing. That operation was successful, the patient died.
The pip/pip3 thing is partly the fault of distros and partly due to lazy training and insufficient tooling for python.
The only distro I'm aware of which tries to protect users shooting themselves in the foot with pip is Gentoo. Most of the others will happily let you "sudo pip install" stuff and lead people to think that's the correct way to do things.
Unfortunately pipx has come too late. Pipx provides a real way for users to install arbitrary python tools, but too many docs out there tell users to use pip. Then you've got all the users who want to "play around" with python and install libraries. Even things like jupyter have crap support for virtualenvs and make it far too easy for users to have all their projects in a single env. It's a mess.
Regular users should never have been exposed to pip2/pip3. They should never have even been interacting with the OS python interpreter. Pip should only exist in a "global" contexy to support bootstrapping virtualenvs and nothing else. Poetry does a lot of this right.
I'm in an awkward chain of dependencies. You see, in my industry there are very few players, and the current version of their software depends on much functionality and rather exacting specifications from products which are made using ArcGIS 10.2.1. Not 10.2. Not 10.2.2.
This is a very conservative hunk of software because this is not a "move fast and break things" industry. This is a "people die when we screw up" industry. So they haven't moved from ArcGIS 10.2.1 for our version. Change is coming but they have to be careful.
So ArcGIS 10.2.1 comes with a python install of 2.7.2 (I am pretty sure it is .2). And you are very strongly advised NOT TO UPGRADE THAT in big bold letters on the relevant help pages. So I need to use 2.7.2 to manipulate some stuff out of ArcGIS 10.2.1 to do "stuff."
I work for Esri, but I specifically work on ArcGIS Pro which has shipped with python 3.x for quite a while. (since before I joined the company anyway)
ArcGIS 10.2.1 comes with 2.7.5, [1] and is very, very old. It was released in Jan 2014 and hit EOL in 2019. [2]
All not-EOL versions of ArcGIS Server support python 3.x. But if you're using ArcMap, upgrading to a later version will not move you to python 3.x: you'll have to migrate to ArcGIS Pro for that. (note for the unaware: this is not a "well just buy the more expensive thing" post. ArcMap and ArcGIS Pro share the same license.) I recommend migrating to Pro for this and other reasons, but I understand that a lot of orgs don't have that on their roadmap.
Out of curiosity, is there a technical reason why your org isn't using a supported version, or is it an "ain't broke" situation? I'd ask migrating to ArcMap to Pro, but if you're on 10.2.1... yeah.
It can be "very, very old" in its context. If a new release comes out every year, support duration is 5 years and you're 2 years past EOL, I'd agree that the software can be called "very, very old".
And that's where things go off the rails. A release that is only supported for five years is a hobby project.
The kind of software projects that make the world go round continue past the lives of their original authors and can easily span decades. 5 years is just enough for the original shake-out.
I mean i am sure we would do that if any of the user company would want to pay the price for that. None of them want. You probably multiply the price by at least 10 (i would say add at least +1.5 to the multiplier for every year of support and that is probably highly conservative).
Totally agreed, but check upthread for a comment that literally says this tool is unsuitable for work where lives depend on it. And for telcos that's pretty much a given.
Yes, ok, thanks, the Python foundation will immediately spend a lot of time and money to support releases for 2 decades. How could they not know!? /s
That out of the way, you're right that some niches require a lot longer cycles, but it's the big big biiiig advantage of FOSS. Downstream can maintain it for as long as they wish. As you said things got shaken out by the community for free basically, and if some serious software is so so serious that upgrading and retesting/certifying is somehow more expensive than trying to airgap an EOLed pile of libs (while at the same time it needs support) then the stakeholders can do it.
No sarcasm needed, Red Hat happily invests time and money in supporting it until 2024.
That might have been an easy eay to provide upstream releases too, had the Python maintainers not been intent on using deprecations as an instrument to get the community moving.
That strategy doesn't work very well however, as we've seen when TLS 1.2 was held off from Python 2.
> A release that is only supported for five years is a hobby project.
I believe what you are trying to say is that you chose the wrong tool for the job. It is really condescending to lash out on other projects like that just because they don't share the same needs as you. They owe you nothing. Python is free and open source, just fork the damn language spec and support it yourself.
The discussion is about a specific version of ArcGIS that only supports a version of python that reached its end of life, so I beg to differ. My point stands: if you had the need for a really long supported tech, in the order of decades, choosing a tech based on a dependency that had a clear life span of 5 years or so was a poor choice. If, still, you need that, you can support it yourself, by forking your own version, but python owes ArcGIS nothing.
A finished software product that gets regular updates is old by the time it turns 7, I'd agree on that.
Dependencies on libraries is a different story, there's only so many ways you can implement a functionality, and some of these happen to be decades old!
Yes, the technical reason was outlined above. Our Big Software Vendor in the field depends on specifically the products of 10.2.1. They're extraordinarily clear on that.
So I assume that the geodatabases, the address locators, and such are all the sticking points in terms of the "products" of ArcMap. I suspect that sometimes their software dips into that specific version of arcpy, as well.
Until the vendor hops, we cannot hop. It's a problem that will be eventually fixed, but I bring it up because many people here at HN think of greenfield development often, and also in industries where tons of iteration are met with approval.
That sounds very familiar. My company deployed a patch so that their webmaps continues working after flash became end of life. We still use 10.2.1 for most things. Thankfully I have ArcGIS Pro on my laptop. I had to move heaven and earth to get it.
But that’s beyond conservative, Python3 isn’t new, it’s more than a decade old at this point.
I can understand that in some industries you don’t easily upgrade, because of testing and certifications. It’s however also possible to put of upgrades for so long that it becomes reckless. If you’re stuck on Python 2.7 in 2021, and your code could kill people, you’re putting people in danger by being reckless.
I’ve also seen people claiming a need for stable platform as a reason to not upgrade. Well, running Windows Server 2008 in 2021 you indeed have a “stable” platform, but it should come as no surprise when modern protocols and encryption isn’t working.
> If you’re stuck on Python 2.7 in 2021, and your code could kill people, you’re putting people in danger by being reckless.
It's comments like this that make me very sad for the software industry as a whole.
No, he's not being reckless and he's not putting people in danger. The people who make breaking changes to languages, APIs, operating systems and libraries are the ones that put people in danger and that cause others to stop upgrading because they quite reasonably fear breaking changes that could have far reaching consequences.
The price of backwards incompatibility is poorly understood.
Breaking changes to languages, APIs, operating systems and libraries are not putting lives at risk. All of the above have expressed their compatibility guarantees in clear terms. No one advertises Python as suitable for life-critical applications. It's the responsibility of the person who choses to use it for life-critical applications to understand its change lifecycle and put that into their evaluation.
While I agree that in general we should care more about backwards compatibility, don't put the blame on the maintainers for bad choices in life-critical domains. Most domains are not life-criticial and it's entirely reasonable to make and meet a set of compatibility guarantees which applies to the domains you're working in.
> No one advertises Python as suitable for life-critical applications.
I think you just relegated python to the status of 'toy language'.
But that's the wrong attitude. Just like everything will sooner or later connected to the internet, which is why it matters that it is secure, with 'software is eating the world' as pretty much the mantra of HN you should realize that any piece of software, no matter how trivial has the potential to affect lives and sometimes in the most direct way with the ultimate price - of someone else's life - as the consequence of our collective failure.
And that's because of the very nature of software: it gets re-used all the time. One persons application is another persons library, one persons application is another persons service and so on.
Leaky abstractions are bad enough, leaky broken abstractions can sooner or later turn out to be dangerous.
Of course whoever deploys the software has the responsibility to check it for suitability. But in principle any language, OS or library has the potential to be applied in ways the original author did not foresee and if that potential can extend to life critical or life affecting purposes then that should come with a similar standard of care.
How many machine learning systems are written in Python? How many of those make decisions every day regarding people's lives, some of those life critical? Are you seriously suggesting that all those people made the wrong decision in picking the language the application is written in?
If so then you are most likely at odds about this with just about every developer out there who - rather than arguing with their manager about whether language 'x', 'y' or 'z' is suitable for the job will simply be told what language is on the menu and to do the job in that one based on availability of programmers and whatever is 'hip' at the moment. These decisions are not necessarily made in the most rational manner and whether you agree or disagree with that in principle it is the reality that we live in.
And so 'toy' languages such as Python, PHP, Perl, Visual Basic, FileMaker and so on are all going to be used for mission critical stuff, whether you add a disclaimer or not.
Better be aware of that before you throw your brainchild out into the world: you own it.
If the actual expectation of every piece of software published is that the author is responsible for every harebrained choice made with it, then nothing would ever get published. That simply isn't practical.
My responsibility is to meet whatever guarantees I make when I publish a tool, and to open source the code if I plan to stop maintaining it such that it no longer meets the original guarantees.
If you want to lay at my feet the consequences of everyone who chooses to use it, whether it's an individual dev or some executive mandating its use to their whole org, I think you need to reconsider your perspective.
The general attitude to software development is why we really should not be calling it software engineering. More like software plumbing. And even plumbers tend to carry liability insurance.
If you have a plumber fix your toilet, and multiple years later it breaks, and you go to them expecting their insurance to pay out for a replacement, how do you think that conversation will go?
If you hire an engineer to build you a bridge, and years later you are jealous that every other city has two-level bridges, do you imagine they will rebuild it for you for free?
Python 3 was always going to be backwards incompatible. That was the point. It was also why you where give a decade to migrate.
This is the same stupidity you see in hospitals, with medical equipment running Windows 7 or older. The companies knew when they sold the equipment that it have a service life beyond the EOL date of the OS, yet make no plans to provide updates. The customers where also given all the details, to ask about support lifecycle before buying, and apparently also don’t care.
The sad part about the software industry is how little some people care about providing their customers with the updates and support plan they should be entitled to.
I think our industry has too much sympathy for companies running obsolete software for decades because they can't be bothered to spend on keeping their software current and up to date. If it's mission critical and obsolete, you messed up as a company. If you allowed that to fester for a decade plus, you've been negligent.
Now a lot of python stuff isn't really that mission or safety critical. So, a hopelessly outdated red hat server that hasn't been updated in years running some software developed over a decade ago may be convenient to keep up and running but if it's mission critical you could have acted by now. The good news is, you haven't updated in years in any case so why start caring about that now? Either way, it's your problem; own it and deal with it or not.
The source and binaries are still there and will still work a decade down the road. A lot of banks do just fine running software written in languages that stopped being fashionable half a century ago, where the developers have long retired/died, suppliers have long disappeared. Running their stuff on (typically emulated by now) hardware and OS that has gone out of production decades before the end of last century. Same thing. If push comes to shove, you can pay people to fix things for you. Banks do this all the time. But expect to pay for the privilege. Or just fix it properly, finally.
Saying that it’s been more than a decade is not exactly fair. Yes the language was out then, but it took a while for even for basic practices like 2 and 3 compatible code bases to emerge, and for any of the most popular libraries to support 3. Even 5 years ago, maybe even 3 years ago the “is it compatible with py3 yet” page still had some non-trivial gaps.
So it’s not like it has been feasible for most projects to migrate for a full decade, it’s only been a few years.
Well, the good news is, Python 2.7.2 still works, as does pip 20.
If you haven't already, I'd recommend setting up a PyPI proxy and backing it up. I don't know what PyPI's policy is around supporting old packages or versions of the package manager are, but I have learned the hard way from working on other legacy software stacks that things have a way of disappearing from the Internet when you're not looking.
At my current company we’re in an awkward situation with python 2.7.1 and 2.7.5, I’ve found that a easy solution to circular dependencies caused by the new resolver is installing pip==20.2.4 and an older setuptools (I don’t have this version memorized) before doing anything.
That's a version of python that does not send SNI when opening TLS connections, so sites that don't have a dedicated IP address for the domain cannot support TLS with such clients, and most sites that do have a dedicated IP address won't support TLS "just because" (no SNI - get self signed dummy cert or closed connection).
So the first thing I would have to do with such Python is monkeypatch ssl.
It's telling that there are a number of commercially available Python libraries targeting enterprise customers for which the Python 3 version is free, but there's a hefty license fee for the Python 2 version. Maybe that's an appropriate tax on sluggish software/IT departments.
You assume they will move forward, when they most likely won’t. It’s not a tax, it’s a product targeted at a guaranteed line of revenue. If 3.x were ever deprecated, they would start charging for that version. There probably aren’t enough users for their 3.x version yet, so it’s free to try and gain adoption.
That is quite interesting OSS funding method - charge for legacy patches and support which no-one else would provide. Although it is likely boring venture, but that is the price.
Potential security vulnerabilities? If I have a library and it works in 2 & 3. But I stopped working on keeping it 2 compliant because 2 is no longer supported. Than I will never bring a single fix to 2, even security fixes.
Due to code divergence it may not be even easy for me to understand if the issue reproduces in 2.
That creates a good business model for someone to come along and charge a premium to fix security bugs in old code. I think it’s more likely something like that will happen, than everyone moving their code to work on 3.x.
There have been several threads on COBOL here on this forum, and the anecdotal consensus is that COBOL developers don't actually get paid much more than developers in any other language.
It sounds like the tax is from choosing to use python at all. Either pay to keep legacy codebases or pay to update. Given a choice between two evils, I choose another language.
Similar to how we dealt with IE6 back in the days. By default not supported, any time spent on workarounds for IE6 are billed double hourly rate, and mind you these often require a lot of time.
Quickly made a lot of our clients untick the IE6 requirement from the wishlist.
I managed to avoid Python 2 for nearly 3 years now in my projects, I think it is time to get rid of it for good and use the new freedoms to improve the tooling.
It is so ironic that in Python the most "unpythonic" thing is the version management of the language and libraries etc itself.
Something like python poetry, venv etc are great steps into the right direction, but there should be an official solution that "just works" and is predictable.
When I give python courses I usually skip dependency managment and explain it in the very end because it would probably (rightfully so) break the will of everybody involved. It shouldn't be like that
I think that currently the biggest nightmare when it comes to packaging is that installations are usually not self-contained. If you can create a virtualenv or run Poetry, your headaches are over and you'll have a very pleasant experience. I don't think I've ever had a problem running stuff with Poetry and the way it does locking.
All the problems come before that point, with multiple installations making it so you don't know which installation you're using, which version, etc.
Not sure what I’m doing “wrong,” but I have no nightmares packaging Python, it all just works. I suppose some ecosystems have done it better, but no complaints from me.
Are you on Linux? People with Macs and Windows seem to be having more problems, because on Linux there's usually one canonical Python installation, whereas on Mac you usually have one from brew, one from Apple, maybe one you downloaded from the site, and whenever you try to install things it's a gamble which Python was used.
Yes, Linux. Also, I know how to use a shebang, which command, install to user, and/or pythonX.Y -m ... to avoid problems. Perhaps newbies don’t get enough of that advice?
I try my stuff on macos without issue, but perhaps not there enough to have big problems.
The worst thing that happened to Python was rhat Linux distros not only shipped it but used it pervasively for scripting. Thus replacing Py 2 became a huge problem.
Until recently it was easier to set up for Python dev on Windows because you could download a clean Python and not worry about broken Pythons on your path.
Microsoft wrecked it all last year by including a system Python and they'll probably play games to keep it in front of your Python on the path.so you will be typing pip3.7, pip3.8, pip-3.6-pypy, etc. for the rest of your life.
You are partially right. Distros like Debian/Ubuntu used Python 2 extensively as a system dependency and so replacing /use/bin/python became a big undertaking. But since at least 2009 it was easily possible to install alternative versions and use them with virtualenv. pyenv has made it even easier since then and it made it platform agnostic.
Yet no matter how much I try to educate users some of them still feel entitled to use the system Python and I find it disheartening to try to suppport those people.
It would be nice if virtualenv type of functionality was just a first class citizen for Python. Basically if you could just specify the root of your project and have ./site-packages/ automatically included in the path or similar. Yes of course you can just update sys.path but it would be neat to do that with no code necessary. Or do it with a config file like how PlatformIO does library management.
I wonder if it would have helped to move the executable from /usr/bin/python to /usr/sbin/python . Or /usr/sbin/python-for-distro-use-only-dont-run-apps-with-this ...
In the last 2 years or so Win 10 has gotten aggressive about restoring configuration changes (e.g. the fruits of all that telemetry research.)
I have found that my PATH entry reverted at least once after I tried to hide the system Python and I am also wonder if Microsoft will use their 'monkey patch the standard library' approach to forclose any stopping the insanity.
It’s not even in System32, it’s in %localappdata%\WindowsApps. Which is also where other appx packages you install put their PATH shortcuts. Upside is, you can disable the behavior (by binary) in Settings, but then you have to hope that isn’t overridden next time someone needs to bump up their Python download numbers.
Since the Python2->3 migration discussion is being had again here, I'd just like to advertise my compatibility library that can help with migrating old projects: https://github.com/mbarkhau/lib3to6
I'm more annoyed for the "Drop support for Python 3.5" than the first one. There are a few codebases I maintain where I already had to pin the pip version because the machine runs python 3.4, and then pin other package versions that did drop support for older python3 versions.
Why can't you ship your own interpreter to the machine? It's a mistake to ship python source code out to some distro maintained interpreter. Think of your interpreter, dependencies and source code as one atomic unit.
You can, of course, think of it this way; and you probably should. But, it also means more work, and you might as well update your distro if you go down that path. I'm reluctant to build & ship my own python for a very simple reason: you then take on the role&responsibilities of the distribution maintainers. I've taken over these roles in the past, and I recognize that this is hard, thankless work. You have to weight whether or not you can do better than the major distributions (you probably can't), and if it's even worth your time.
Python 3.5 is the latest you can get on Debian Stretch (oldstable) from official repositories. Stretch is still in the Long Term Support program until around 2022. Dropping support for Python 3.5 means you can't use latest pip on Stretch anymore.
I have a background in systems engineering, working on a commercial operating system. I code in several languages. To me, it’s just a given that adoption extends the lifetime of something, even if it is not under active development. There’s a strange phenomenon with Python in particular, where it is upsetting for some reason to be told that Python 2.x will still be in use after we retire. To me this is just the natural way things work. There is so much stuff out there using Python 2.x where there’s no programmer around to do porting work and because things work, there’s no reason to invest. Normally when this sort of thing has happened to say, a language like VB, it doesn’t really matter because the end result is something compiled with a runtime. As long as the environment the app runs in has consistent behavior (app compat) there’s nothing to do. But with Python that isn’t precompiled you need the toolchain and dependencies. Python 2.x isn’t going anywhere, so I think Pip will ultimately have to keep and “freeze” support for Python 2.x, rather than not supporting it. If you are reading this and infuriated, perhaps ask some old hands you know, instead of taking my word for it. Flash is perhaps the closest example I can think of. Flash was “over” ages ago but browsers have had to keep support for an awful long time.
I give their decision a week before they revert and come up with an alternative.
Flash is a terrible example because there is literally no place for flash apps to go except be rewritten from scratch in some entirely different platform. Python 2 programs simply get ported to python 3. And we are already ten years into python 3, so yes, python 2 is going away at least as much as say Perl 4 or MySQL 3 or Oracle 7 has gone away. Which is pretty damn away.
Also they aren't withdrawing pip from being available for python 2. You just use the older version of pip. Programs that need pips newer features won't install on python 2 anyway. Because it's really going away very quickly now.
AFAIK even the most modern Perl 5.32 runs Perl 4 programs just fine, so "porting" programs from 4 to 5 was "trivial" in the strongest sense.
Perl 7 finally is going to break this compatibility, at least according to the current plan, but the "porting" still is supposed to be trivial, in most cases just adding a pragma or two. 27 years after Perl 5 was released...
Old versions of pip do support Python 2.x. Nothing is preventing you from using them.
There is also a straightforward way of "compiling" your code - install all your dependencies into a virtualenv, tar up that virtualenv, and untar it to the same path on the next system. I've seen this system used to work around pip and Python upgrade complexities that already exist, even within Python 2 alone, and it works fine. Then you're insulated from both changes to pip and changes to the actual packages you're trying to install from pip.
Edit to answer your question because someone downvoted me and now HN is preventing me from replying: I don't think you'll need to do anything special. The common case, honestly, is that you already have some version of pip shipped with your system (e.g., you're using Python and pip from your OS) - just keep using that version, because by definition it's a Python 2-compatible pip. The official Python 2 binaries bundle a version of pip which is also by definition Python 2-compatible.
If you need to upgrade pip for some reason, you might need to specify a version constraint on it, e.g.
pip install -U 'pip<21'
But you also might not, because there's a way for packages to declare what Python versions they're compatible with, and so pip can take that into account when deciding what to resolve. (The trouble is that very old versions of pip, such as those bundled with LTS-age OSes, don't know how to do that, so they're going to try to upgrade to the newest possible version of pip. Less-old versions of pip running on Python 2 should, I believe, not upgrade to an incompatible version.)
That's correct, and I've verified it with modern versions of pip for Python 2.7 and 3.x. `pip2.7 install -U pip` just grabs pip 20.3.4 and installs it, ignoring pip 21.0.
pip and the legacy easy_install both access the "simple" list for a package when determining download options. This is a basic HTML page with links to all public versions of a package. Here's pip's: https://pypi.org/simple/pip/
These links contain Python version specifier information, which pip can use to select an appropriate version. For pip 21.0, that specifier is ">=3.6", so any modern pip will know it can't be used on a Python prior to 3.6. It will therefore fall back to the nearest version that provides a compatible specifier.
Looks like this was implemented starting in Pip 9.0 (at the end of 2016). From experience (my product is written in Python), there are plenty of enterprise installs that still use much older versions of pip than this. Those will grab pip 21.0 or newer (I just confirmed with a copy of pip 8).
I know for us, that'll be important to document. We still support Python 2.7 in part due to slow-moving enterprise installs, and I'm sure we're not alone in that. So geofft's example for forcing installs to <21 is exactly what a not-insignificant number of people will ultimately need to do.
Thanks for testing! And just to rephrase for others, this is a one-time (per environment/per virtualenv) thing - you want to upgrade pip to a version new enough to know not to upgrade too much, and then you're fine. Once you run the command I gave, a later "pip install -U pip" will keep you on 20.x and not upgrade to 21.
Don't forget about native dependencies. Any Python module that embeds native libraries needs to remain binary compatible with the host system. And because Python doesn't control this in anyway whatsoever, it's a lottery if things break on a current system, let alone an obsolete virtualenv one where the chances are even higher.
Would I have to do anything special aside from installing an older version? I’m not familiar with the back end of pip - in particular whether it talks to a package server or not.
It does talk to a package server, but you'll want to grab all the source code/wheels for the relevant packages before they are no longer available anyway, especially as many projects are no longer going to be available under Python 2.
Interesting. I imagine what may have to happen is someone will (perhaps for profit) do all of the heavy lifting here so companies can use their server for old packages.
I get expecting that everyone will do this on their own, I just don’t think it will happen that way.
The default backend of pip is pypi. Pypi does not have a policy of removing previous releases (though package authors do have the ability to do this I believe). There are packages going back a really long time. You can find releases for Python 2.6 (EOL 2013) that were published in 2012 still on pypi (and accessible by pip).
In addition, paid extended support for Python 2.7 already exists from 3rd party vendors (I know of ActivePython) - Python 2 EOL just means you don't get free updates and supports.
PyPI may exist and hold old packages, but even that will eventually bitrot, and the API that the old version of pip uses to download files may no longer exist in the future (as an example).
For those cases you will want to make sure you have a local mirror.
I've definitely worked on projects where people want to just have it working and leave it working until after I'm long gone. I've had clients that I'm sure are still on Java 6, CentOS 5, maybe older, and will be for years to come.
But the Python communities I've interacted with seem to care deeply about the ecosystem, updated libraries, new features. Web apps, data science, etc. Is my experience here warped? Are there people here seeing Python get used for the kind of stuff that might live as long as Cobol without moving forward with the language?
One of the companies I used to work at had a several million line python 2 codebase supporting their product stack. They have decided to stick with python 2 at least for the foreseeable future, because the work of porting to python 3 is seen as too expensive and risky. This is a company that sold for over $2 billion to a much larger, much more enterprisey company. I wouldn’t be even remotely surprised if they’re still on python 2 in another ten years.
I think the largest chunk of Python usage is what you’re talking about. But there’s still a significant number of people (and importantly, companies) just using it as a scripting language to move around and parse data or files. They’ve gotta do it with something after all, so why not Python. And like other people were saying, these are the types of scripts that don’t get updated.
For future projects, sure. For ongoing ones, maybe. For projects that are working I'm not touching... why?
A tool is a tool. When I bought a new drill I used it to put together the playhouse for my kids rather than tearing down my fence just so I could put it back together.
Self contained containers (like singularity) have come up to fill this gap. Since a few years back we have started investing in creating these containers at the end of every project, as it is becoming increasingly hard to depend on dependencies remaining available online indefinitely.
That is a fabulous idea. I would love it if that became the de facto for newly published scientific papers - here is the source code and here is the container that we used to run it. Ultimate reproducibility.
> The error here is you think the people doing this even have programmers to port to 3.x or admins with the wherewithal to make sure an older version of pip is used.
What wherewithal does that take? PyPI will just provide the last version of pip that supports Python 2. It takes no extra effort to not install a Python 3 version of pip on Python 2.
> Nobody wants to keep Python 2.x in use or force the Pip maintainers to keep 2.x support.
I'm sure there are people who do want to do the former. And they can, but it has no bearing on the latter.
Well, yes, it's possible some future change to pyPI could break pip, but that would also break all versions of pip before the change, so there'd have to be a large transition window (because pyPI is how you update pip, too, so if you break existing clients, no one can get the upgrade to the version that supports the backend changes), so that's not going to be sudden and without warning.
Hell, several of our projects are stuck on pip 9 because our requirements files have dependency conflicts that 10+ choke on, but 9 installs the right combination of.
There isn't a separate set of servers. The same servers (https://pypi.org) are used for all Python versions, and they keep old versions of packages around. If everything works as it should (which, to be fair, it might not :) ), pip running on Python 2 ought to find the latest Python 2-compatible version of packages, including the latest Python 2-compatible version if itself if you ask it to upgrade itself.
It's theoretically possible that the API used could change in a way that old versions of pip can't handle, but a) the API is super simple and there isn't a strong reason to do this and b) doing that would break the ability of old versions of pip to upgrade themselves - that is, for released versions of Python with bundled pip to work - so they're very unlikely to break the existing interface, even if they add a new one.
Still, since PyPI is a free service with no support contracts, and especially because packages hosted on PyPI are uploaded by the individual volunteer authors who have the ability to delete old versions of their own packages, businesses who depend on Python 2 would be well-advised to download/mirror what they need locally to ensure continuity of business.
My understanding is that Pip uses PyPI so, yeah, if PyPI changes their APIs/URLs, this last version of Pip that supports Python 2 will stop working and then deployments will start breaking.
I've found 3 broad classes of code bases when migrating:
1. Trivial or easy. Probably started using best 2/3 compatibility practices easily on, or the nature of the code lends itself to 2to3.
2. Not able to switch without effort, but able to be slowly ported to 2/3 with six and the like.
3. Hopelessly intractable. Either too much bad practices in string handling, or tied to a legacy system. No way to port it without risking subtle bugs. Usually this is due to playing fast and loose with string types, even though we have known better since well before python 3.
I don't really use Python much any more, and haven't for a couple of years. I may have some Python projects coming up later this year though. What's the current recommended way to deal with Python versions and isolating dependencies? Are there any good tools that simplify virtual environments &c or am I best off just installing my required Python and packages in a Docker container?
Is pipenv still being patched? When I moved to Poetry, I did because there were outstanding issues for pipenv that made it bug out with my setup (pretty sure it was a WSL thing?) but the maintainer hadn’t committed anything in months and wasn’t accepting PRs.
It is popular, but it is not standard by any means. There are many voices against it in the Python community who prefer Poetry, or who have stuck with setuptools because there is no mature replacement yet.
> Every languages has breaking change at some point.
This is where you are quite wrong.
Many languages and other software projects take backwards compatibility quite seriously.
Backwards compatibility is far more interesting to many clientele than the quality of the language itself. — the idea that one would have to rewrite one's codebase again later with all the potential regressions that that might lead to is not an appealing prospect.
Planes would crash if the software that ran airports would have to do this.
Is Go mainstream enough? They haven't made any breaking changes so far.
Which brings me to another point: not all breaking changes are equal. Sometimes, a change to a syntactic element is needed. But if there's an identical replacement for it, and it can be updated mechanically, there shouldn't be a problem (in theory).
Making constructions or the standard library do things (subtly) differently without a fall-back to the original functionality is a much bigger problem: then you have to inspect every occurrence and verify if it still works under all conditions.
That's a change in a library though, not in the language.
And if you want to never have changes in libraries, including
for fixing security bugs, you'll end up with stuff like
mysql_real_escape_string.
I guess you missed to read the comment I replied to.
"Making constructions or the standard library do things (subtly) differently without a fall-back to the original functionality is a much bigger problem: then you have to inspect every occurrence and verify if it still works under all conditions. "
That is specific implementations, we are talking about languages, and in regards to C there are definitely a collection of breaking changes across ISO revisions.
Even minor ones aren't so minor, if the code happens to rely on them all over the place.
Perhaps of minor things that nobody actually used or to fix soundness issues.
What Python did is far beyond that; — it knowingly broke what everyone was using with the full knowledge that almost no Python 2 codebase could run under 3 unmodified.
Gcc will, still run c89, c++98 and even fortran66 code. There has never been any suggestion support for these older versions be removed.
Packaging Python 2 & 3 together, like gcc, would have had another big advantage -- it would have meant early on I could assume everyone had Pythkn 3. There are still, plenty of macs with 2 but not 3 by default.
Well, there hasn't been a C standard after c89 so it's hard to move forward. In theory there is a c99 standard but it was never adopted by MSVC so good luck with that (that is a long story on its own).
On the C++ side, C++ didn't get more spec until C++11 (2011), then it took a decade to be supported by compilers. Imagine how long it takes to roll out a language/library change when compilers themselves are on a 3+ years release schedule.
Either way, the C++ world is heavy on proprietary extensions and tools. There is little care about standards. The standard itself is hard to consider a standard, all it does is suggest tens of new features -some of which are impossible to implement- that may be added one by one over the decade.
Hmm, is there any other major blocker in modern MSVC (that supports C11) other than variable size local arrays to be considered C99 compatible? (Designated initializer fields came into C++20 so they were forced to put that in). Blocking var-sized arrays might've been a blessing in disguise from a security standpoint.
That's not really a compiler limitation though (since in C++ mode it will happily handle it) but rather a validation/mode thing. Since they now support C11, I'm sure that part is removed if you select C11 mode?
I literally just ran a script that will only work on python 2.3 or older (as in it won't run on python 2.4). I don't need to worry about pip support because I'm pretty sure the script is older than pip, but it's not like old code magically disappears and it's replaced with newer code.
Don't worry, at one point running the code will become more complicated than writing new one, until you reach that point you can carry on (constantly monitoring for vulnerabilities that have been fixed in newer python versions of course).
At this point, I swear, the main source of difficulty in Python 2->3 is all the foot-dragging in migrating to Python 3. I wouldn't be at all surprised to discover that vastly more Python 2 code was written in the past twelve years than in the preceding eight.
> the main source of difficulty in Python 2->3 is all the foot-dragging in migrating to Python 3.
My main difficulty was having targets that ran python 2 and wouldn't get an official python 3 package unless a miracle happened. Meanwhile most systems supporting python 3 also had a python 2 package. Ended up porting a few scripts to a statically linked c++ binary, those should keep on working until we get 128 bit systems that drop support for 64 bit binaries.
If you don't have a pipeline to update those static binaries, you're just giving yourself a different form of headache 5 years from now when you need to apply bug fix.
That's not a deployment problem though. And if everything you did to build those binaries failed, you can still run those binaries. You aren't out of business.
We're talking about this versus the alternative of potentially not being able to deploy anything. Then you're dead.
In both cases you needed a build pipeline anyway...unless you're operating in a world where "it works on my machine" is a good enough answer.
With static linking you mostly have all the pieces updatable independently. Can move to new distro and keep the binaries. Can update some binaries with a bug fix to some parser library without releasing other binaries. Etc. except for glibc and stuff like 64-bit inode values.
That’s very true but making “static distributions” of Python apps is a fairly hacky/annoying problem with lots of room for subtle breakage. In practice one static pkg per major district version seems advisable.
Curiously Linux, despite having an extremely stable user space interface, suffers from a high degree of user space incompatibility. This starts with glibc and its components and ends with graphical toolkits. Go seems to be very successful because it ditches all of that crud and goes for the stable interface instead.
I'm pretty sure that I was quite blatantly suggesting that Python fails to meet suitability. That doesn't mean we don't use it...because people like it. But it adds complexity and "business risk", whether you like it or not.
In some spaces (ML) you're going to have to ship some Python, but what I was saying was that I would rather not.
My role the past few years has been some mix of deployment engineering & systems engineering and the conclusion that I quickly came to was that deploying scripting languages like Python/Ruby/etc is the realm of assholes. ...which was a funny lesson because mainly through my career I've worked in smaller scale shops using almost exclusively scripting languages.
A friend of mine uses Python in his lab. Tons of them were writing Python 2 code when they were learning it two years ago (Feb 2019). So they started from scratch with it two years ago.
I don't think migration was given enough serious consideration until Python 3.3 in 2012, and even then there were some baffling and massively unnecessarily migration-hostile decisions like not allowing a separate PYTHON3PATH envvar.
Anyway, how hard would it have been to have a "from __past__ import old_strings" that could have worked for the first few releases to allow the single biggest issue to be smoothed over universally and then fixed file by file under the Python 3 environment? With that in place a lot of shops could have just migrated on day 1 and then iteratively worked to finish the job rather than delaying for so long.
And some of the other changes should not have happened.
Personally I still have a sour taste of the urllib migration. All the imports were moved around for no reasons, completely breaking all usage of urllib and any "import urllib", with no way to fix in sight.
The solution came with six a million years later, adding a hundred alias in six.moves.urllib.somefunction that dispatch to the right place.
Right but it poisoned the well; people had already had four years to kick the tires and conclude "yeah nothing works, my dependencies aren't ready; this seems completely not worth it."
Saying that people sat on their butt for 8 years is insulting now? Come on.
At the end of the day, they had anywhere from 8 to 12 years to migrate. In the tech world, that's a millennia. Most products and services will give you months when they deprecate or sunset.
I'm just saying that criticizing individual people or groups is irrelevant when it was so widespread. Clearly the problems are/were systemic, so it's more productive to focus on the systemic causes than blaming individuals.
Sure, but it would have to be a single executable with 2 interpreters inside of it to support some sort of incremental upgrade path, which is what the comment I replied to is talking about (they are asking for something that didn't get made). And then you have to decide if they can talk to each other or what.
Oh, a module level 2/3 split, or even finer than that. Okay, I agree that that would have helped the transition, at the cost of being a pain to implement (if it's even reasonable, ex. I'm not sure what to do if a py2 function passes a string to a py3 function)
Wouldn't have had to be the whole interpreter, really just a flag with single file scope which makes bare strings be interpreted as bytes instead of str, and maybe some other similar translations in stdlib functions.
That's exactly the point: some software will never be ported. That's why is such a bad idea for languages to create a break point like the geniuses of Python decided to do.
No, there's an implicit assumption that you need to do maintainance on software you run. People who don't budget for it are just denying reality.
There's 0 difference between the states that were putting a call out for COBOL programmers to deal with their disaster of an unemployment system and the professor running some python script from 2003. Their lack of responsible ownership is their own fault.
Your comparison couldn't be worse. COBOL is having problems not because of the language, but because of the code created 50 years ago. For all its disadvantages, COBOL is in fact one of the most successful languages of all time because it has allowed businesses to run their software for the last 50 years. That's why it won't go away, even though no one in their sane mind would create even a new line of software in COBOL.
Python decided instead that their customers were of no value, and all the millions of lines of existing Python shouldn't continue to run, even though it would be trivial for them to continue supporting 2.x syntax along with 3.x. I think it is one of the most insane decisions ever made by a mainstream programming language.
There are still new Cobol versions being released (I think the current standard is Cobol 2018), along with new tooling, presumably because it's used by companies that care about maintenance even if they are unwilling/unable to switch to a more popular language. Compare that with Python 2 where you hear about lots of companies complaining about end-of-life issues or missing migration tools, yet somehow none of them seem willing to do anything about it.
> C came into being in the years 1969-1973, in parallel with the early development of the Unix operating system; the most creative period occurred during 1972. Another spate of changes peaked between 1977 and 1979
But lets make you a favour and consider 1972, it makes C a 48 years old systems programming language, only surpassed by NEWP (1961) as oldest systems programming language still in use in 2021.
Maybe it is about time to start talking about C the same way people talk about COBOL.
Your comment doesn't feel like it's doing me any favours. It makes me feel like not commenting on HN any more.
My last interaction, which felt similar, and put me off commenting for a while, was with someone who claimed that "most concert pianists and serious competitors have absolutely gigantic hands" then moved the goalposts around so they could be Right.
It's true that you need to budget for maintenance. It is a serious societal problem that maintenance is underfunded.
But that means that any language revision needs to work hard at making the transition easy and incremental. 2to3 was and is laughable; at no point was it a reasonable solution.
It's okay to say "we need maintenance money". But no one has an unlimited budget. People complain about the 2->3 transition because (1) it was extraordinarily steep, (2) was not justified, (3) its huge costs are repeatedly denied.
If anything, having such a long runway could cause the issue of people not migrating. 12 years presents no urgency in software, code is regularly rewritten every few years.
Outside of mere observational contribution to this topic, are you playing devils advocate because it is easy, or do you actually believe that this is a real excuse for the python community?
I'm not making excuses for them, I'm saying the upgrade path was too long. It's the same reason people don't care about climate change - many of them won't see the results of their efforts now so they just keep doing what they're comfortable with. If it was a year or two, people would've upgraded straight away.
Academia doesn't have unlimited budget to do migrations that don't solve any pressing problems. I guess now there is a pressing problem, but more likely people will maintain the old version of pip locally.
They don't pay them to work 40+hrs/week, but often they expect them to work 50+hrs/week. From what I've seen in Sweden at least, but at least the pay is okay there.
Well in the US, postgrad education is a genuinely interesting proposition.
* Very low pay.
* Good benefits in a nation with poor safety nets.
* Tuition waivers along the lines of $10-100K/year.
* When the Dr. says jump, you ask how high.
If you view education as an investment, it isn't necessarily bad compared to an ordinary job. But it's kind of like a FAANG company; your experience depends on who you report to.
Sure, the low pay is commensurate with...uh...some sort of ephemeral opportunities in the future.
But seriously, you're right. Grad students do grunt work, that's how it goes. And if an academic Python2 library is widely-used, porting it is important grunt work.
Surely, no serious researcher would let an important tool rot, right?
The problem is that because of the way the incentives are set up, it is not important to anyone involved.
What is important to the grad students is to produce research papers and to fulfill their mandatory obligations (teaching, project deliverables). And most grad students, even in CS, are not professional software developers anyway. Good luck convincing capable grad student candidates to join your group to do boring software maintenance for horrible pay and no job security.
What is important to the professors, who decide what the grad students will work on, is again to produce research papers, fulfill their mandatory obligations (teaching, project deliverables) and to continually file for grants. Spending grad student time on porting and maintaining libraries does not help with that. In the worst case your grad student is spending their time maintaining a tool that a competing group's grad students are using to churn out papers, beating you to publications and grants.
What is important for the funding agencies is flashy new research in the current hot topics. I never saw a funding agency that would even consider paying a grad student, let alone a full software engineer salary, to port an academic tool from Python2 to Python3 or do all the other maintenance you need to do on production codebases---nor do most universities even have salary classes and positions for that.
As a result, in the many years I spent in academia, I saw many important research tools rot (both software and large hardware testbeds). The solution is not grad students, but to have fully paid software engineer positions in academia. But realistically that is not going to happen.
I was getting a tour of a lab from a grad student, and I was told a desk was full of 3.5" floppies with data from old research. I said "Wait, what? Old floppies won't retain data indefinitely!" and got shushed -- she didn't want to wind up responsible for trying to do data recovery on hundreds of floppy disks, which would do jack for getting her to her Ph.D.
Rot is a very big part of what happens to a lot of information.
From limited exposure, these codebases are often in truly awful shape. They've endured years or decades of having been hacked up just enough to finish someone's thesis or dissertation with no concern at all towards maintenance.
It should be no surprise that such a poorly engineered process produces awful results.
Then they are asking the wrong question, instead of asking do I have to upgrade, they need to be asking why shouldn't I upgrade.
Keeping current with industry is the only way they can stay relevant in the modern world, particularly in IT and related fields where anyone with a computer at home can do the same research as someone at a university.
“ It's kind of disappointing that developers aren't self-aware enough to understand that they are essentially screwing themselves by not moving quickly in dropping "legacy" support.
"Python 3 will be available in 2008 but we understand that you won't use it for at least 5 years."
There's an entire class of developers who won't upgrade until they absolutely have to. There's also a class of administrator that won't upgrade their current PC browser from IE6-IE8 until they absolutely have to. Developers are basically screwing themselves by not drawing a firm line.”
There isn't really a good solution. Forced migrations always create a decision point, at which point people may gather their resentments and move to something else. But if you don't eventually force a change, you have to live with all of the language's historical mistakes.
My shop was interested in migrating long ago, but we had to wait until languages and frameworks (such as Django) supported 3. On projects where that wasn't a risk, we moved to Python around 2014 or so. But: I totally get that for some shops, the feasibility of migrating was low.
idk. I think there's an alternate universe somewhere in which Python just got hard forked by the community, and the python foundation lost its influence over the direction of the language. The split today would be even larger, or Python 3 would have become an intellectual curiosity rarely used in the real world.
Not at all. It's only too generous if there is a low-cost way to incrementally transition to 3, as is normally the case with language updates.
Python 2->3 was a poorly managed update that did not follow normal upgrade rules, so the "normal" rules don't apply.
It is still often very extremely expensive to convert python2 to python3. Normally you could upgrade in small steps, maybe a file or library at a time, instead of changing everything simultaneously including all transitive dependencies. That problem continues to be denied, so python2 continues to be used. I say that as someone who has converted code from 2 to 3. Python3 is fine, it's the huge unnecessary transition cost that is not. It's gotten a little better, but not a lot better.
This makes it worse. Now it's even harder to incrementally update, making it even harder to switch to 3.
The main issue I ran into was string handling. Autoconversion handled most of the rest, but because byte-strings and regular strings were interchangeable in Py2, random places that weren't even touched by the autoconversion now suddenly have a "b'abc'" where they should have "abc", and fail in very non-obvious ways.
Just some hack around that would probably already help.
I had to change my public APIs in a backwards-incompatible way. I had a method something like obj.to_string("abc") to convert the object into a string in 'abc' format.
I also supported things like obj.to_string("abc.gz") to get the record in gzip-compressed form.
I also had a from_string() -> obj functions.
You can see the problem. I had to change to_string() so it didn't allow compression (breaking backwards compatibility) and add a to_bytes() for that functionality, and add type dispatching in the from_string() code to support either byte or strings, with different code paths.
And change all the open() calls to use "b", and to add type checks on user-passed-in file objects to insist on reading bytes (if not isinstance(user_file.read(0), bytes) raise "Must be open in binary mode") because all of the underlying parsers are in C and the needless decode/encode step adds overhead.
Oh, and re-write the C extension so it handles both Python 2 and Python 3.
> If you started only making new stuff in python 3 in 2010 2 years after python 3 came out how much python 2 would you have to convert?
In many organizations there was never a time where they could start writing new code in Python 3. They needed to write code that was compatible with their existing python 2 code, and the only way to do that is to continue to write new code in python 2. Rinse, lather, repeat.
This is why the failure to provide a gradual transition was so bad. When I write new code in Python I use python3, but that assumes that there are python3 modules available that I need.
If you have infinite money this is not a problem. But I think we should be sympathetic to the people who do not have infinite money and have never been given a realistic upgrade path from 2 to 3. The 2to3 program is not a workable solution for many.
Not until the year 2015 and python 3.5 around the corner.
That's when the interpreter finally got the minimum support required to make code compatible with both, and linters improved enough, and some libraries started being ported.
six 1.0 was in 2011.[1] 50% of the top 200 packages were compatible by the end of 2012.[2] And there were features you could use in 2008 to make the eventual conversion easier.
> 50% of the top 200 packages were compatible by the end of 2012.
Which meant you could not usually convert, since ALL dependenices had to be converted. The chance of doing that successfully then with 300 libraries (including transitive dependencies) was approximately (0.5)^300, which is practically 0.
Around 2015 is when lots of people found all their dependencies were compatible with Python 3 or abandoned and replaced. How much work they had to do depended on if they put any effort into compatibility for the last 7 years. Or even just followed recommendations for Python 2. Writing 100% compatible code wasn't practical in 2008 but distinguishing bytes from text was.
I joined an AI company on 2018 which was building their whole prototype with python 2.7 since 2017. I had to spend 2 weeks and do the migration myself, do a merge request out of the blue and give it to their lead engineers, otherwise i am pretty sure they would have a meeting tomorrow 24/1/2021 to see how they are going to migrate.
What about code people wrote before 2010 that were perfectly fine? Are you going to have people rewrite research algorithms whose original authors have long graduated?
Just because industry has a habit of rewriting the whole stack every five years on account of make-work job security doesn't make foundational scientific algorithms change.
Except the Python folks produced tools that did almost all of the work for you. 2to3 has worked for an overwhelming number of use cases. And in fact, most code requires very few changes to begin with.
If this academic code isn't well understood or well tested, it's probably not as valuable as you might think.
2to3 does 95% of the job. The other 5% requires manually fixing up all the bits it missed, and until you do that, your codebase will be subtly (not not so subtly) broken. The most common problem I saw was relating to iterating over dictionary keys.
That 5% that requires manually fixing up is the sticking point. You still need to audit every line of the codebase, and each line that gets missed is a guaranteed bug introduced to your codebase by the conversion. This is not much of an issue for small scripts or tiny programs. It is an issue for big applications. This migration really highlights (yet again), the dangers of using interpreted languages at scale. With no compiler to pick up errors, no typechecking by default etc., identifying all of the remaining faults is a huge task.
Like it or not, this is a huge risk to a business. There is a risk of introducing vast quantities of bugs, and there a huge developer cost to performing the migration.
For the record, I have migrated several medium-sized codebases with 2to3 and python-modernize. Because these were internal tools with defined inputs and outputs, it was trivial to validate that the behaviour was unchanged after the conversion. But for most projects this will not be the case.
The 2 to 3 conversion will be a textbook case of what not to do for many decades to come. For the many billions it will cost for worldwide migration efforts, the interpreter could have retained two string types and handled interpreting both old and new scripts. The cost would have been several orders of magnitude less.
I migrated a large codebase with 2to3 and six. It took me half a day. 2to3 did roughly 60-75% of the work and six cleaned up almost all of the rest. It's not foolproof, but it is surely better than nothing.
It's perhaps also worth noting that I did this in late 2017. Your experience likely varies depending on when you attempted it.
In the migration I was a part of (probably one of the larger ones, period), I think something like 90% of code could be migrated by automation. And of the remaining 10, most of it needed only trivial human oversight.
That remaining two percent had a lot of painful things (truly, I have some stories), but "the overwhelming number of use cases" was trivial.
Not really. For example R ships with or depends on a lot of FORTRAN libraries. I doubt they have changed much at all in decades. There is no talk of a breaking FORTRAN language change that would require rewriting this perfectly FB ctional code with a stable interface.
> Normally you could upgrade in small steps, maybe a file or library at a time, instead of changing everything simultaneously including all transitive dependencies.
This was absolutely possible. Via the path of upgrading your code (or your dependencies code, in any arbitrary order) to be compatible with both py2 and py3 (via, say, six) and then once all code was compatible in either direction, flipping the switch.
It was always dangerously hard to validate that the whole codebase was actually compatible, and there wasn’t some path through the code that would make things break (in production on the customers’ site while handling all their most critical data).
I can think of exactly one language who bungled the upgrade path worse and it’s Perl 6, which they finally renamed after 19 years of stringing people along like it would be the next big thing.
I generally agree with your statement (the version flip is scary), but that's a different complaint than GGPs, which was more that you couldn't incrementally migrate your source files. The version flip to Java N+1 can be scary and cause issues too (and sure, it's perhaps less likely to cause issues, but don't worry, they both do!)
The length of this dual language transition period is a travesty and an incalculable waste of money and time. They dragged this out for more than a decade. That’s the problem, people just got used to there being two versions. Should have just ripped the bandaid off way sooner.
There were too many breakages (and performance regressions in the early 3.x versions) to force a quicker transition, which would likely result in no one upgrading at all.
One option could have been to spread the brraking changes over multiple versions, but given the bad state of version/dependency management in Python, this would have likely been a clusterfuck too.
Best option would have probably been to have a longer period of RCs and only release 3.0 with the performance regressions fixed. The myths around the slowness of 3.x (especially for scientific libraries) stuck around for a very long time after they were fixed.
It's not a question of generous or not generous. If the work to migrate to 3.x was too great before, this won't change that -- they will just manually install libraries without pip. Or find another tool.
I suspect many python 2 shops will maintain the old and move to something else for the new. Maybe the "new" thing will be Python 3, but I expect this will give other languages opportunity, because people do have emotions, rational or not.
This was announced 13 years ago. That's enough time to get an entire computer science doctorate and write your own alternative package manager from scratch, if you really need it so much.
If Pip dropping py2 support today kills your app, it wasn't fragile, you were negligent.
Who are the people that are creating a fragile world here? The ones who didn’t rewrite their code to be compatible with the existing software called python, or the ones who didn’t rewrite their code to be compatible with the existing software that uses python?
I don’t have any rights to python. If I chose not to rewrite my code for python3, I am increasing the fragility of the world.
However the same is true for the python maintainers. If the maintainers were interested in maximum anti-fragility, Python3 should have been a fork.
I’m not making any sort of moral judgement here. But I think it’s obvious both from the actions of devs not rewriting their software, and python being willing to make breaking changes, and that everybody seems to be blaming each other (and have been for a decade), that the goal here wasn’t really to create long lasting tools.
Python2 code still works. You just can't use a convenient package manager to grab libraries for it from a managed repository anymore. When has C ever had that?
Python 2 has some value Python doesn't have. E.g. there are Python 2 builds for really old platforms. There even are some considerably exotic platforms which are not so old and not deprecated but still don't have Python 3.
commit b71c7dc9ddd6997be49ed6aaabf99a067e2c0388
Author: Victor Stinner <victor.stinner@gmail.com>
Date: Fri Oct 10 11:55:41 2014 +0200
Issue #22591: Drop support of MS-DOS
Drop support of MS-DOS, especially of the DJGPP compiler (MS-DOS port of GCC).
Today is a sad day. Good bye MS-DOS, good bye my friend :'-(
Not a direct response to your question, but given even Windows XP isn't supported by Python 3 as of 2015 (just 1 year after XP extended support ended), it does seem like Python does have a fairly nontrivial set of requirements for the underlying platform.
One thing that might be important to note about this: to the extent that pip is a client for PyPI, a strategy of freezing the last supported version of pip indefinitely may not work. It will presumably always be able to install Python 2 wheels from a local wheelhouse, but it's worth being prepared for a future where `python2 -m pip install <x>` stops working.
According to the contribution graph (https://github.com/ccrisan/motioneye/graphs/contributors) it was started in 2013 when Python 3.3 was already released. So what where the reaosons the project started in Python 2 in the first place?
not in my neck of science python (biomed/neuroscience; computer vision). We've been substantially more aggressive about moving to 3 than other python neighborhoods! Look at the top projects at https://python3statement.org/, for example.
Hell, numpy 1.20.x is dropping support for anything below 3.7!
I started doing science python in 2015 on 3.5.x, never had to touch 2.x at all.
Same in my neck of the woods (engineering R&D). Python is very popular and everybody I know is using Python 3 unless they are supporting an outdated codebase, where they are usually in process of migrating.
There's another reason for using containers or pretested distributions. I just wrote an article for LWN about SciPy¹. To get what was supposed to be a clean install of the newest release, I created a virtual environment and used Pip to install the SciPy components. It was broken out of the box. IPython did not work because a module that it depends on, that does tab completion, had been recently upgraded and was incompatible with it. I mean it crashed frequently, not just that it didn’t do tab completion. To get a working system, I had to install an older version of the library. So people seek solutions that avoid the dependency hell that comes from maintainers releasing things at will, with nobody testing the combination of things that they nevertheless market as “SciPy”, which is supposed to be a bunch of stuff that works together. And whenever I install anything from the Python world, I am amazed when it actually works without the need to spend hours with Google or staring at sources trying to resolve conflicts. One of the many nice things about Julia is that their package system works.
So yes, once you manage to put together a working installation, you naturally want to encase that precious thing in the bubble of some kind of container, to protect it from an upgrade of some piece that will break everything. If you work with a lot of Python things, you may have a dozen copies of the same libraries, and a handful of Python binaries, taking up space on your drive. The concept of dynamic linking was supposed to save us from this. But the haphazard habits endemic to the Python community seem to have left us with the worst of all possible worlds.
Docker is the best thing to ever happen to scientific computing yet there is so much griping from the old guard. Trying to make a hermetic, CI/CD, within-machine-epsilon reproducible computer vision pipeline opened my eyes to just how un-reproducible your average machine learning project really is. Especially those coming from academia.
"Oh the PhD pinned to a nightly version of some dep and the AUC score plummets when you try to upgrade and customer wants this latest model in production on airgapped machines last week? Aight, stick their Conda env in a container, call it with `docker run`, and sort it out later"
Horrifying? Yes. Solving the problem on short notice? Also yes.
Plenty of more pedestrian uses but that was an actual situation where docker's reproducibility saved my bacon.
Yes. When your calculation uses dozens of files and a change in any of them might change the results, I don’t see how there is any alternative to some form of containerization. Even if you could generate a statically linked binary, if it’s science you want to distribute the sources for everything as well.
I notice that most people seem to reach for Docker automatically, though. There are other technologies, such as systemd containers, that might be better in some situations.
I work on a physics project. The data acquisition system was made on a tiny budget and most of the code is python 2.5.2. It hasn’t been touched in 12 years. We’re stuck with Windows 7 running on some machines because there are no Python 2.5 libraries that support NI-DAQmx and NI does not have any Traditional NI DAQ releases in windows 10. Obviously good solutions involve porting to linux and airgapping the DAQ, but the simplest solution I like is updating the codebase. I’m one engineer with loads of other responsibilities. Python is popular with scientists. I don’t begrudge the scientists. I begrudge the awful decision by the python developers to disregard backwards compatibility.
No one will do it after support in the tools is killed off either. People will use the last versions of the tools from wherever they can get them, and workarounds for any issues that are discovered will become part of the folklore surrounding the code.
In what sense is it tautological? I can think of a lot of things that are important that won’t get done, healthcare improvements being one. Software on the other hand is typically fluid enough that someone will rewrite it.
I think it’s slightly more meaningful, because not all dreams can be achieved. If software isn’t getting ported from Python 2 to Python 3, it almost certainly isn’t because it’s too hard to do.
I mean, huge chunks of the scientific community run on MATLAB or Fortran. (Also... are you actually managing to get new students to install Python 2 on their laptops, somehow? And no enterprising undergrad is just like "I will just make this work on Python 3 because that's easier"?)
Ehh, I absolutely did that. Heck, for a class I reverse engineered a robot's bluetooth control api because coding up computer vision was easier in python than in matlab (which had a provided "SDK" that some TA had written in years prior).
I spent less time doing that issue than it took to debug a missing comma on the end of a line in some matlab code (for a different assignment in a later class) that led to some incorrect matrix operation and completely failing code. I do not like matlab.
I think it’s a matter of experience. I’ve used MATLAB for nearly a decade and can whip up optimized bit banging, data acquisition, and signal processing code in a matter of minutes. When I try the same in python I spend days trying to figure out what the “right” solution to something is, try 3-4 options, find the issues with each, and by that point I’m burnt out and tell myself it isn’t worth it to try to replace MATLAB.
And the fact that we still periodically trap the efforts of thousands of smart people and bury it alive is insane. When told that something will only live 5 years we turn it down. When told that something 5 years old is now obsolete, we celebrate.
The definition of insanity is doing the same thing over and over again and expecting a different result. (c)
Having worked with a huge Python 2 codebase recently, it's fine[1] but just becomes painful regarding dependencies. It just means you'll never have the latest versions of things.
> Since RPython is built on top of Python2 and that is extremely unlikely to change, the Python2 version of PyPy will be around “forever”, i.e. as long as PyPy itself is around.
A lot of research has been released with assets written in now defunct versions of Python. It is a shame that the barrier to reproduction just went up a lot.
One would hope that language designers would make the languages monotonically increasing to preserve compatibility and otherwise fork not projects when they loose confidence in previous features.
I work in CGI and I'm happy to see Python moving forward, I'm so sad CGI is still lacking of broad Python 3 support. This industry was top notch 10 years ago but its inability to handle Python 3 migration since years despite its ubiquitous usage make me realize it's no more an industry driving domain.
You can, it's called use __future__, six, and 2/3 compatible idioms. I've been writing code (begrudgingly) like this for years. Test suite runs on 2.7-3.6 no problem, albeit annoying to write vs pure py3.
How would that work? I assume you'd still need to declare which version you want to run as, and if any library you want to import wouldnt be compatible, then either you are where you are right now, or it would bring a lot of backward/forward incompatibility issues with unicode/bytes/strings/syntax etc.
In that case you wouldn't need a compatible interpreter, you'd just specify the interpreter you want in the shebang. Python does not usually use shebangs because you're supposed to run in a virtualenv, not shebang your files to your system python.
Common Lisp was a choice that many teams chose to ignore for whatever reasons. The price of perpetual upgrades is apparently cheaper than starting with a complete and stable language.
I suspect most Python users simply don't know anything else. That's how they got introduced to programming, and unless someone holds their hand again, they will never use anything else.
Also, many devs who work in Python in fact hate it. Python is so popular, it's foisted onto people.
In some jobs, you have to interact with Python from time to time, even if you don't develop in it.
I only rarely use Python (Ruby, Go, R and Bash usually do whatever I need) and I've never gotten my head around the whole Python 2 vs 3 thing. For a language often touted to be the best language for novice programmers, I think it's odd to expect them not to get caught with their metaphorical pants down in that dichotomy.
Edit: I should probably clarify. I have nothing against Python, quite the contrary, I've just always found that version 2 vs 3 thing odd.
yet on my newest and best ubuntu20.04 when you type 'python' you are still running python2, which I don't use, will we ever be possible to use python as python3 by default,it's annoying.
let python be python3, and python2 for python2 instead
I never understood this logic. It's two characters, and it's a pain to port once you go from print statements to actual logging. Also coming from C I'm used to print functions.
Python code with this absence of begin/end {} block statements is easy to read and looks beautiful. This additional () in print takes away some part of that beauty.
I don’t have a problem ideologically with the maintainers doing this since 2.7 is EOL, but making a breaking change like this over the weekend was kind of cruel.
Sure, but there are plenty of things out there that are just pulling the latest version by default (in our case it was the Vagrant ansible_local provisioner...)
Good. Django moved the needle to improving python3 adoption just by moving their docs to include only python3 snippets. So this is another important step.
I like that type hinting isn't mandatory, I don't like that it's unenforced. Gdscript is an example of a python-influenced language that does it right.
Seriously. If I expressed my opinions on the regressive attitude Python 2 lovers have shown in the last 13 years, I'd break this website's code of conduct.
I've certainly seen that, but it can be quite variable depending on the specific circumstances of a field, department, project, etc. My team hired a recent graduate, and he's completely up to date, to the point I'm the old dog learning new tricks from him.
Then I know someone who's a researcher at the university, and they have no funds to buy computers, so she provides her own and is quite incensed about it. And the money for refactoring and updating stuff is zero. For any lab equipment connected to a computer, the computer is as old as the equipment.
And then everything in between.
In my view the vast majority of scientists using Python are using it at a level where they could practically switch to Python 3 just by cleaning up their print statements.
And an amusing anecdote... by the time my thesis project was in full swing, the OS that I had been using (MS-DOS running hardware that I had built) was largely obsolete. But I sure as hell wasn't going to change horses in mid stream, so I persisted with the old stuff.
This is the reason why we see so many graduates that don't have the skills to do they job they trained for.
Universities need to keep up to date rather then just saying "but it still works" and sticking there head in the sand ignoring what the rest of the industry is doing to move forward around them.
Experience with version X of software package Y is only relevant for a fleeting amount of time. Universities can and should teach software development fundamentals, not experience with specific versions of things.
Plot twist : it's not an IT faculty.
I know how to code, Professors don't. I don't get paid enough to fix decades of errors. They had some smart people who created the code and they were smart enough to leave as soon as they got their degrees.
I've experienced that, even with IT faculty. For the most part, faculty don't code. They are too busy with teaching, research, and all the overhead and university politics that goes with it. Their students code, and the stuff they develop hangs around forever because it's in a publication and might be cited, but there's no funding to maintain it after the research grant ends. You're absolutely right about that.
Experience with the problem of "version X of software package Y is deprecated, we need to upgrade to version Z instead" is a skill that will always be relevant. For everyone who uses computers, not just computer scientists.
The fact that universities are apparently incapable of learning such a basic life skill is terrifying. It's akin to an auto mechanic who doesn't know how to perform an oil change.
While I agree that students should be learning Python 3, I don't think it makes a big difference either, unless there's a larger problem with the school's curriculum. What happens when Python 4 comes out?
Schools should be teaching the fundamentals of programming, which apply across languages. I don't want to say "the language doesn't matter", because COBOL would be a poor choice, but Python 2 vs 3? Whatever.
The difference is, if they aren't being taught with up-to-date tooling, there knowledge with that tooling is worth exactly zero the moment they finish that course.
and lets not forget that with python 2 not being supported anymore, even getting it installed and running in the fist place is going to start getting harder and harder.
> The difference is, if they aren't being taught with up-to-date tooling, there knowledge with that tooling is worth exactly zero the moment they finish that course.
So what? Students should not be paying tens of thousands of dollars to learn tooling that will be obsolete within a decade anyway.
All else being equal, sure, start kids off with the most recent tools available. But it wouldn't be near the top of my priority list.
Never claimed this is how it should be. I'm telling you how it is and I don't plan to lose my job by nagging senior stuff and telling them how they should do their jobs.
That's one way to do it. I wish people would embrace the idea that an ecosystem moving forward does not mean that you need to move forward. It creates so much needless pressure in people's minds. In the absence of security considerations, updating software might actually be considered harmful.
They usually have some software created by some students in the 2000's. These people took their degrees and left years ago. Nobody will get paid to migrate this software and you can't get a degree for working on rewriting code. That's it. I was making backups of floppy diskettes last month with some obscure software. Windows XP is still king in many places because no drivers for ancient scientific hardware on Windows10 and Linux is "black magic".
> and you can't get a degree for working on rewriting code.
Which is sad, really, because we really could use more people who can do that. In fact, I really wish that every CS program had a requirement to take an existing code base and do material work on it; it's not like anybody's going to graduate from school and then get hired to do greenfield development, so it'd be kind of nice if they were actually trained for the things that they're going to do in practice.
If only university's had some way of getting funding from, I don't know, something like teaching courses, that could be used to pay people to maintain and update the software they depend on.
I just finished a Graduate diploma at a local University, a single subject had over 100 students each paying $3500 to do that subject alone yet the Uni tried to claim they didn't have the staff/resources to update that course (that was 10 hours of face to face class time) each session. I find it hard to imagine that running that one session cost even a fraction of the $350,000 paid by students to run such that there was nothing left over to fund resourcing for updating the materials.
The simple reality is there’s no economic reason to do that.
That $350,000 goes towards staff, facilities, and research projects.
If they had the choice between funding new cutting-edge research or migrating some obscure legacy code, the new research will always win. It’s not even a hard choice.
and that just makes you wonder why they expect students to keep paying them to do courses they wont even do the basics to ensure they are relevant and useful.
My experience in university was that digital was an absolute hodge podge of part timers writing things in 4 month intervals who are then replaced with other part timers writing things in 4 month intervals.
A few full time people, but for university systems, not learning materials or examples or research.
At present I have no choice but to use Python 2.x as Maya and Houdini (animation software) are still shipped with Python 2.x The industry has know for years about the EOL but it's going at a glacial pace due to issues with PySide / PyQt and Python 3. https://vfxpy.com/
That's just because scientific code isn't all that important. If it were, it would be maintained, and it would be upgraded. There's no reason to maintain or upgrade any code most scientists write, so it'll molder forever, and nobody will notice when it finally becomes impossible to run.
I still can't get why they replaced: `print "abc"` with `print("abc")`.
Python REPL was my go to shell calculator and it is quite elaborate to put the parentheses around what I would like to print, for me that was the main drawback of python3.
But I don't write production code in python, it is used only for small shell scripts, or like calculator.
Just like Visual BASIC 6.0, dropped for Visual BASIC.Net and Visual C#.Net because of progress. You can't expect a language to last forever or support to last forever.
Now Python 4.0 is coming out, and soon Python 3.X will lose support.
Dirty secret: There is plenty of multi-million dollar VB 6.0 software still running out there in the world.
I think it's not even possible to compile it without a Windows XP VM but the output still runs on Windows 10.
Microsoft honestly should have provided a compatible version of Visual Basic that would run in the .NET environment instead of VB.NET. That would have allowed much more software to move over than what actually happened.
I dont know if you are being carefully specific about "compiling" as opposed to generally using the tooling (and maybe using P-code), but you can run Visual Basic 6 on newer versions of Windows. I have a machine with it set up, actually, as a researcher friend of mine inherited such a codebase. Here is an example set of instructions, not that I have read or vetted them (I don't remember what guide I had read to set it up).
I can run Visual Studio 6 just fine on Windows 10 - the only thing I needed to do was not install certain features and disable the old Java version install/check, and that's it. Windows is insane regarding backwards compatibility.
Microsoft's OS division has essentially committed to supporting VB6 apps forever. I imagine that RedHat and etc will do that with Python 2, too much of the world is running on it.
(VisualBasic, like Python was the most popular programming language of its day and found its way into a lot of niches which are not developer-centric.)