Lino Mastrodomenico wrote: > Let's suppose that I use Python 2.x or something else to create a file > with name b'\xff'. My (Linux) system has a sane configuration and the > filesystem encoding is UTF-8, so it's an invalid name but the kernel > will blindly accept it anyway. > > With this PEP, Python 3.1 listdir() will convert b'\xff' to the string '\udcff'. One question that really bothers me about this proposal is the following: Assume a UTF-8 locale. A file named b'\xff', being an invalid UTF-8 sequence, will be converted to the half-surrogate '\udcff'. However, a file named b'\xed\xb3\xbf', a valid[1] UTF-8 sequence, will also be converted to '\udcff'. Those are quite different POSIX pathnames; how will Python know which one it was when I later pass '\udcff' to open()? A poster hinted at this question, but I haven't seen it answered, yet. [1] I'm assuming that it's valid UTF8 because it passes through Python 2.5's '\xed\xb3\xbf'.decode('utf-8'). I don't claim to be a UTF-8 expert.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4