On 30 Apr 2009, at 05:52, Martin v. Löwis wrote: >> How do get a printable unicode version of these path strings if they >> contain none unicode data? > > Define "printable". One way would be to use a regular expression, > replacing all codes in a certain range with a question mark. What I mean by printable is that the string must be valid unicode that I can print to a UTF-8 console or place as text in a UTF-8 web page. I think your PEP gives me a string that will not encode to valid UTF-8 that the outside of python world likes. Did I get this point wrong? > > >> I'm guessing that an app has to understand that filenames come in >> two forms >> unicode and bytes if its not utf-8 data. Why not simply return >> string if >> its valid utf-8 otherwise return bytes? > > That would have been an alternative solution, and the one that 2.x > uses > for listdir. People didn't like it. In our application we are running fedora with the assumption that the filenames are UTF-8. When Windows systems FTP files to our system the files are in CP-1251(?) and not valid UTF-8. What we have to do is detect these non UTF-8 filename and get the users to rename them. Having an algorithm that says if its a string no problem, if its a byte deal with the exceptions seems simple. How do I do this detection with the PEP proposal? Do I end up using the byte interface and doing the utf-8 decode myself? Barry
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4