RetroSearch Browse

Wed Feb 10 05:18:15 EST 2016 · https://mail.python.org/pipermail/python-dev/2016-February/143244.html

On Wed, Feb 10, 2016 at 12:41:08PM +1100, Chris Angelico wrote:
> On Wed, Feb 10, 2016 at 12:37 PM, Steve Dower <python at stevedower.id.au> wrote:
> > I really don't like the idea of not being able to use bytes in cross
> > platform code. Unless it's become feasible to use Unicode for lossless
> > filenames on Linux - last I heard it wasn't.
> 
> It has, but only in Python 3 - anyone who needs to support 2.7 and
> arbitrary bytes in filenames can't use Unicode strings.

Are you sure? Unless I'm confused, which I may be, I don't think you 
can specify file names with arbitrary bytes in Python 3.

Writing, and reading, filenames including odd bytes works in Python 2.7:

[steve at ando ~]$ python -c 'open("/tmp/abc\xD8\x01", "w").write("Hello World\n")'
[steve at ando ~]$ ls /tmp/abc*
/tmp/abc??
[steve at ando ~]$ python -c 'print open("/tmp/abc\xD8\x01", "r").read()'
Hello World

[steve at ando ~]$

And I can read the file using bytes in Python 3:

[steve at ando ~]$ python3.3 -c 'print(open(b"/tmp/abc\xD8\x01", "r").read())'
Hello World

[steve at ando ~]$

But Unicode fails:

[steve at ando ~]$ python3.3 -c 'print(open("/tmp/abc\xD8\x01", "r").read())'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/abcØ\x01'

What Unicode string does one need to give in order to open file 
b"/tmp/abc\xD8\x01"? I think one would need to find a valid unicode 
string which, when encoded to UTF-8, gives the byte sequence \xD8\x01, 
but since that's half of a surrogate pair it is an illegal UTF-8 byte 
sequence. So I don't think it can be done.

Am I mistaken?

-- 
Steve

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2016-February/143244.html below:

[Python-Dev] Windows: Remove support of bytes filenames in theos module?