David Hopwood schrieb: > On Windows, file system pathnames can contain arbitrary Unicode characters > (well, almost). Despite the existence of "ANSI" filesystem APIs, and > regardless of what 'sys.getfilesystemencoding()' returns, the underlying > file system encoding for NTFS and FAT filesystems is UTF-16LE. > > Thus, either: > - the fact that sys.getfilesystemencoding() returns a non-Unicode encoding > on Windows is a bug, or > - any program that relies on sys.getfilesystemencoding() being able to > encode arbitrary Windows pathnames has a bug. > > We need to decide which of these is the case. There is a third option: - the operating system has a bug It is actually this option that rules out the other two. sys.getfilesystemencoding() returns "mbcs" on Windows, which means CP_ACP. The file system encoding is an encoding that converts a file name into a byte string. Unfortunately, on Windows, there are file names which cannot be converted into a byte string in a standard manner. This is an operating system bug (or mis-design; they should have chosen UTF-8 as the byte encoding of file names, instead of making it depend on the system locale, but they of course did so for backwards compatibility with Windows 3.1 and 9x). As a side note: every encoding in Python is a Unicode encoding; so there aren't any "non-Unicode encodings". Programs that rely on sys.getfilesystemencoding() being able to represent arbitrary file names on Windows might have a bug; programs that rely on sys.getfilesystemencoding() being able to encode all elements of sys.path do not (atleast not for Python 2.5 and earlier). Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4