On Wed, Feb 10, 2016 at 12:41:08PM +1100, Chris Angelico wrote: > On Wed, Feb 10, 2016 at 12:37 PM, Steve Dower <python at stevedower.id.au> wrote: > > I really don't like the idea of not being able to use bytes in cross > > platform code. Unless it's become feasible to use Unicode for lossless > > filenames on Linux - last I heard it wasn't. > > It has, but only in Python 3 - anyone who needs to support 2.7 and > arbitrary bytes in filenames can't use Unicode strings. Are you sure? Unless I'm confused, which I may be, I don't think you can specify file names with arbitrary bytes in Python 3. Writing, and reading, filenames including odd bytes works in Python 2.7: [steve at ando ~]$ python -c 'open("/tmp/abc\xD8\x01", "w").write("Hello World\n")' [steve at ando ~]$ ls /tmp/abc* /tmp/abc?? [steve at ando ~]$ python -c 'print open("/tmp/abc\xD8\x01", "r").read()' Hello World [steve at ando ~]$ And I can read the file using bytes in Python 3: [steve at ando ~]$ python3.3 -c 'print(open(b"/tmp/abc\xD8\x01", "r").read())' Hello World [steve at ando ~]$ But Unicode fails: [steve at ando ~]$ python3.3 -c 'print(open("/tmp/abc\xD8\x01", "r").read())' Traceback (most recent call last): File "<string>", line 1, in <module> FileNotFoundError: [Errno 2] No such file or directory: '/tmp/abcĂ\x01' What Unicode string does one need to give in order to open file b"/tmp/abc\xD8\x01"? I think one would need to find a valid unicode string which, when encoded to UTF-8, gives the byte sequence \xD8\x01, but since that's half of a surrogate pair it is an illegal UTF-8 byte sequence. So I don't think it can be done. Am I mistaken? -- Steve
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4