Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Only for the specific case of input bytes already in the language's internal encoding (which granted will be common as most inputs would be ascii or utf-8) and with the same ownership constraints as the input, and that's mostly enabled by Rust's ownership model.

Except of course on operating systems where text I/O is done entirely in UTF-16. Say, Windows.

Since Python strings have no fixed encoding, but choose "the most efficient one" (heuristically) when decoding, they can cope better than a fixed UTF-8 encoding in these cases.

>> Python does not.

> Python doesn't generally do no-alloc/0-copy operations so that's not overly surprising.

Indeed. Even when the encoding is not changed, the string will be always copied. One could think of an API that does that, though, to optimize all those cases were memory is already owned by a shim in the runtime.



> Since Python strings have no fixed encoding, but choose "the most efficient one" (heuristically) when decoding, they can cope better than a fixed UTF-8 encoding in these cases.

That is wrong. Python can never pick the most efficient encoding unless you decode from latin1.


PEP 393


That's why I said it can use the most appropriate encoding in the latin1 case. Before that it would never have the right encoding.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: