> Unfortunately, on Windows, there is no way to find out: If you use the > ANSI function (which not only covers ASCII, but the full user's code > page), and you have a file name not representable in this code page, > the system returns a file name that contains question marks. > > Of course, you could always use the Win32 Wide API (unicode) function, > and convert the pure-ASCII strings into byte strings. That gives a > number of options: > - always return Unicode for Unicode directory argument, > - return Unicode only for non-ASCII, and only for Unicode argument, > - return Unicode only for non-ASCII, regardless of Unicode argument, > - return Unicode only for non-MBCS (again depending or not depending > on whether the argument is Unicode). > > In the third case, if you have a non-representable file name, you > currently get a string like "??????.txt", whereas you then get > u"\uabcd\uefgh...txt". What might be worse: If the file name is > representable in "mbcs", yet outside ASCII, you currently get a "good" > byte string, and you get a Unicode string under option three. Why is getting Unicode worse than getting MBCS? #3 looks right to me... > So the MBCS options sound better. Unfortunately, testing whether a > string encodes as MBCS might be expensive. I still don't fully understand MBCS. I know there's a variable assignment of codes to the upper half of the 8-bit space, based on a user setting. But is that always a simply mapping to 128 non-ASCII characters, or are there multi-byte codes that expand the total character set to more than 256? > For readlink, if you trust FileSystemDefaultEncoding, you could return > a Unicode object if you find non-ASCII in the link contents. What is FileSystemDefaultEncoding and when can you trust it? > For getcwd, you again have the issue of reliably detecting non-ASCII > if you use the ANSI function; if you use the Wide function, you again > have the choice of returning Unicode only if non-ASCII, or only if > non-MBCS. Wide + Unicode (if non-ASCII) sounds good to me. The fewer places an app has to deal with MBCS the better, it seems to me. --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4