Guido van Rossum <guido@python.org> writes: > But shouldn't it return Unicode whenever there are filenames in the > directory that can't represented as ASCII? Unfortunately, on Windows, there is no way to find out: If you use the ANSI function (which not only covers ASCII, but the full user's code page), and you have a file name not representable in this code page, the system returns a file name that contains question marks. Of course, you could always use the Win32 Wide API (unicode) function, and convert the pure-ASCII strings into byte strings. That gives a number of options: - always return Unicode for Unicode directory argument, - return Unicode only for non-ASCII, and only for Unicode argument, - return Unicode only for non-ASCII, regardless of Unicode argument, - return Unicode only for non-MBCS (again depending or not depending on whether the argument is Unicode). In the third case, if you have a non-representable file name, you currently get a string like "??????.txt", whereas you then get u"\uabcd\uefgh...txt". What might be worse: If the file name is representable in "mbcs", yet outside ASCII, you currently get a "good" byte string, and you get a Unicode string under option three. So the MBCS options sound better. Unfortunately, testing whether a string encodes as MBCS might be expensive. > Hm, I don't know if I'd like os.listdir() to have an encoding > argument. Sounds like the wrong solution somehow. I don't like that, either. > > Oh yes, the same reasoning would hold for readlink(), getcwd() > > and any other call that returns filenames. > > Ditto. For readlink, if you trust FileSystemDefaultEncoding, you could return a Unicode object if you find non-ASCII in the link contents. For getcwd, you again have the issue of reliably detecting non-ASCII if you use the ANSI function; if you use the Wide function, you again have the choice of returning Unicode only if non-ASCII, or only if non-MBCS. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4