I would like to discuss Unicode on the Windows platform, and how it relates to MBCS that Windows uses. My main goal here is to ensure that Unicode on Windows can make a round-trip to and from native Unicode stores. As an example, let's take the registry - a Windows user should be able to read a Unicode value from the registry then write it back. The value written back should be _identical_ to the value read. Ditto for the file system: If the filesystem is Unicode, then I would expect the following code: for fname in os.listdir(): f = open(fname + ".tmp", "w") To create filenames on the filesystem with the exact base name even when the basename contains non-ascii characters. However, the Unicode patches do not appear to make this possible. open() uses PyArg_ParseTuple(args, "s..."); PyArg_ParseTuple() will automatically convert a Unicode object to UTF-8, so we end up passing a UTF-8 encoded string to the C runtime fopen function. The end result of all this is that we end up with UTF-8 encoded names in the registry/on the file system. It does not seem possible to get a true Unicode string onto either the file system or in the registry. Unfortunately, Im not experienced enough to know the full ramifications, but it _appears_ that on Windows the default "unicode to string" translation should be done via the WideCharToMultiByte() API. This will then pass an MBCS encoded ascii string to Windows, and the "right thing" should magically happen. Unfortunately, MBCS encoding is dependant on the current locale (ie, one MBCS sequence will mean completely different things depending on the locale). I dont see a portability issue here, as the documentation could state that "Unicode->ASCII conversions use the most appropriate conversion for the platform. If the platform is not Unicode aware, then UTF-8 will be used." This issue is the final one before I release the win32reg module. It seems _critical_ to me that if Python supports Unicode and the platform supports Unicode, then Python unicode values must be capable of being passed to the platform. For the win32reg module I could quite possibly hack around the problem, but the more general problem (categorized by the open() example above) still remains... Any thoughts? Mark.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4