Barry Scott wrote: > > On 30 Apr 2009, at 05:52, Martin v. Löwis wrote: > >>> How do get a printable unicode version of these path strings if they >>> contain none unicode data? >> >> Define "printable". One way would be to use a regular expression, >> replacing all codes in a certain range with a question mark. > > What I mean by printable is that the string must be valid unicode > that I can print to a UTF-8 console or place as text in a UTF-8 > web page. > > I think your PEP gives me a string that will not encode to > valid UTF-8 that the outside of python world likes. Did I get this > point wrong? > > >> >> >>> I'm guessing that an app has to understand that filenames come in two >>> forms >>> unicode and bytes if its not utf-8 data. Why not simply return string if >>> its valid utf-8 otherwise return bytes? >> >> That would have been an alternative solution, and the one that 2.x uses >> for listdir. People didn't like it. > > In our application we are running fedora with the assumption that the > filenames are UTF-8. When Windows systems FTP files to our system > the files are in CP-1251(?) and not valid UTF-8. > > What we have to do is detect these non UTF-8 filename and get the > users to rename them. > > Having an algorithm that says if its a string no problem, if its > a byte deal with the exceptions seems simple. > > How do I do this detection with the PEP proposal? > Do I end up using the byte interface and doing the utf-8 decode > myself? > What do you do currently? The PEP just offers a way of reading all filenames as Unicode, if that's what you want. So what if the strings can't be encoded to normal UTF-8! The filenames aren't valid UTF-8 anyway! :-)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4