> The way I see it, to fix this we have 2 basic choices when a Unicode object > is passed as a filename: > > * we call the Unicode versions of the CRTL. That is the choice that I prefer. I understand that it won't work on Win95, but I think that needs to be worked-around. By using "Unicode versions" of an API, you are making the code Windows-specific anyway. So I wonder whether it might be better to use the plain API instead of the CRTL; I also wonder how difficult it actually is to do "the right thing all the time". On NT, the file system is defined in terms of Unicode, so passing Unicode in and out is definitely the right thing (*). On Win9x, the file system uses some platform specific encoding, which means that using that encoding is the right thing. On Unix, there is no established convention, but UTF-8 was invented exactly to deal with Unicode in Unix file systems, so that might be appropriate choice (**). So I'm in favour of supporting Unicode on all file system APIs; that does include os.listdir(). For 2.1, that may be a bit much given that a beta release has already been seen; so only accepting Unicode on input is what we can do now. Regards, Martin (*) Converting to the current MBCS might be lossy, and it might not support all file names. The "ASCII only" approach of 2.0 was precisely taken to allow getting it right later; I strongly discourage any approach that attempts to drop the restriction in a way that does not allow to get it right later. (**) Atleast, that is the best bet. Many Unix installations use some other encoding in their file names; if Unicode becomes more common, most likely installations will also use UTF-8 on their file systems. Unless it can be established what the file system encoding is, returning Unicode from os.listdir is probably not the right thing.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4