Mark Hammond wrote: > > Sorry, I notice I didn't answer your specific question: > > > Also, what would os.listdir() return ? Unicode strings or 8-bit > > strings ? > > This would not change. > > This is what my testing shows: > > * I can switch to a German locale, and create a file using the keystrokes > "`atest`o". The "`" is the dead-char so I get an umlaut over the first and > last characters. > > * os.listdir() returns '\xe0test\xf2' for this file. > > * That same string can be passed to "open" etc to open the file. > > * The only way to get that string to a Unicode object is to use the > encodings "Latin1" or "mbcs". Of them, "mbcs" would have to be safer, as at > least it has a hope of handling non-latin characters :) > > So - assume I am passed a Unicode object that represents this filename. At > the moment we simply throw that exception if we pass that Unicode object to > open(). I am proposing that "mbcs" be used in this case instead of the > default "ascii" > > If nothing else, my idea could be considered a "short-term" solution. If > ever it is found to be a problem, we can simply move to the unicode APIs, > and nothing would break - just possibly more things _would_ work :) Sounds like a good idea. We'd only have to assure that whatever os.listdir() returns can actually be used to open the file, but that seems to be the case... at least for Latin-1 chars (I wonder how well this behaves with Japanese chars). -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4