On Wed, Apr 13, 2016 at 2:15 AM, Ethan Furman <ethan at stoneleaf.us> wrote: > On 04/11/2016 04:43 PM, Victor Stinner wrote: >> >> Le 11 avr. 2016 11:11 PM, "Ethan Furman" a écrit : > > >>> So my concern in such a case is what happens if we pass this SE >>> string somewhere else: a UTF-8 file, or over a socket, or into a >>> database? Does this have issues that we wouldn't face if we just used >>> bytes? >> >> >> "SE string" are returned by os.listdir(str), os.walk(str), >> os.getenv(str), sys.argv[int], ... since Python 3.3. Nothing new under >> the sun. > > > So when we pass a bytes object in, Python (on posix) converts that to a > string using surrogateescape, gets back strings from the os, and encodes > them back to bytes, again using surrogateescape? > > >> Trying to encode a surrogate to ascii, latin1 or utf8 raise an encoding >> error. > > > latin1? I thought latin1 had a code point for 0-255, so how could using it > raise an encoding error? Latin-1 / ISO-8859-1 defines a character for every byte, so any byte string will *decode*. It only defines 256 characters as having equivalent bytes, though, so *encoding* can fail. ChrisA
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4