On 30 Apr 2009, at 21:06, Martin v. Löwis wrote: >>>> How do get a printable unicode version of these path strings if >>>> they >>>> contain none unicode data? >>> >>> Define "printable". One way would be to use a regular expression, >>> replacing all codes in a certain range with a question mark. >> >> What I mean by printable is that the string must be valid unicode >> that I can print to a UTF-8 console or place as text in a UTF-8 >> web page. >> >> I think your PEP gives me a string that will not encode to >> valid UTF-8 that the outside of python world likes. Did I get this >> point wrong? > > You are right. However, if your *only* requirement is that it should > be printable, then this is fairly underspecified. One way to get > a printable string would be this function > > def printable_string(unprintable): > return "" Ha ha! Indeed this works, but I would have to try to turn enough of the string into a reasonable hint at the name of the file so the user can some chance of know what is being reported. > > > This will always return a printable version of the input string... > >> In our application we are running fedora with the assumption that the >> filenames are UTF-8. When Windows systems FTP files to our system >> the files are in CP-1251(?) and not valid UTF-8. > > That would be a bug in your FTP server, no? If you want all file names > to be UTF-8, then your FTP server should arrange for that. Not a bug its the lack of a feature. We use ProFTPd that has just implemented what is required. I forget the exact details - they are at work - when the ftp client asks for the FEAT of the ftp server the server can say use UTF-8. Supporting that in the server was apparently none-trivia. > > >> Having an algorithm that says if its a string no problem, if its >> a byte deal with the exceptions seems simple. >> >> How do I do this detection with the PEP proposal? >> Do I end up using the byte interface and doing the utf-8 decode >> myself? > > No, you should encode using the "strict" error handler, with the > locale encoding. If the file name encodes successfully, it's correct, > otherwise, it's broken. O.k. I understand. Barry
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4