Mark Hammond wrote: > > I understand the issue of "default Unicode encoding" is a loaded one, > however I believe with the Windows' file system we may be able to use a > default. > > Windows provides 2 versions of many functions that accept "strings" - one > that uses "char *" arguments, and another using "wchar *" for Unicode. > Interestingly, the "char *" versions of function almost always support > "mbcs" encoded strings. > > To make Python work nicely with the file system, we really should handle > Unicode characters somehow. It is not too uncommon to find the "program > files" or the "user" directory have Unicode characters in non-english > version of Win2k. > > The way I see it, to fix this we have 2 basic choices when a Unicode object > is passed as a filename: > * we call the Unicode versions of the CRTL. > * we auto-encode using the "mbcs" encoding, and still call the non-Unicode > versions of the CRTL. > > The first option has a problem in that determining what Unicode support > Windows 95/98 have may be more trouble than it is worth. Sticking to purely > ascii versions of the functions means that the worst thing that can happen > is we get a regular file-system error if an mbcs encoded string is passed on > a non-Unicode platform. > > Does anyone have any objections to this scheme or see any drawbacks in it? > If not, I'll knock up a patch... Hmm... the problem with MBCS is that it is not one encoding, but can be many things. I don't know if this is an issue (can there be more than one encoding per process ? is the encoding a user or system setting ? does the CRT know which encoding to use/assume ?), but the Unicode approach sure sounds a lot safer. Also, what would os.listdir() return ? Unicode strings or 8-bit strings ? -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4