Guido van Rossum <guido@python.org> writes: > Aha! So MBCS is not an encoding: it's an indirection for a variety of > encodings. (Is there a way to find out what the encoding is?) Correct. In Python, locale.getdefaultlocale()[1] returns the encoding; the underlying API function is GetACP, and Python uses it as PyOS_snprintf(encoding, sizeof(encoding), "cp%d", GetACP()); There is a second indirection, the "OEM code page", which they use: - for on-disk FAT short file names, - for the cmd.exe window Python currently offers no access to GetOEMCP(). > Do you mean that the condition on > > #if defined(HAVE_LANGINFO_H) && defined(CODESET) > > is reliably false on Windows? Otherwise _locale.setlocale() could set > it. Correct. nl_langinfo is a Sun invention (I believe) which made it into Posix; Microsoft ignores it. > So as long as they use 8-bit it's not our problem, right. Another > reason to avoid prodicing Unicode without a clue that the app expects > Unicode (alas). (BTW I find a Unicode argument to os.listdir() a > sufficient clue. IOW os.listdir(u".") should return Unicode.) Indeed, that would be consistent. I deliberately want to leave this out of PEP 277. On Unix, things are not that clear - as Jack points out, readlink() and getcwd() also need consideration. > > Ok, I'll update the PEP. > > To what? (It would be bad if I convinced you at the same time you > convinced me of the opposite. :-) I haven't changed anything yet, and I won't. In this terrain, Windows has the cleaner API (they consider file names as character strings, not as byte strings), so doing the right thing is easier. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4