Martin: > Indeed, that would be consistent. I deliberately want to leave this > out of PEP 277. On Unix, things are not that clear - as Jack points > out, readlink() and getcwd() also need consideration. > Linux and MacOSX use UTF-8 and should probably be treated as such,=20 i.e. I want to open("=E4=F6=FC"), not open("=E4=F6=FC".encode("utf-8"))= =2E One interesting tidbit is that MacOSX requires Unicode filenames to be = in NFD. I don't know whether anybody agreed on a standard normal form for Linux= =2E > In this terrain, Windows has the cleaner API (they consider file nam= es > as character strings, not as byte strings), so doing the right thing > is easier. > Byte strings are perfectly OK if they have a common encoding (meaning=20 UTF-8, in some accepted normal form). Character strings are bad if=20 their interpretation, or indeed their usability, changes with the=20 presense of some random environment variable / registry entry /=20 whatever. Under these constraints, calling it a character string vs.=20 a byte string, and/or using it as such, is a matter of programmers'=20 convenience. --=20 Matthias Urlichs
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4