On Apr 28, 2009, at 6:46 AM, Hrvoje Niksic wrote: > Are you proposing to unconditionally encode file names as > iso8859-15, or to do so only when undecodeable bytes are encountered? For what it is worth, what we have previously planned to do for the Tahoe project is the second of these -- decode using some 1-byte encoding such as iso-8859-1, iso-8859-15, or windows-1252 only in the case that attempting to decode the bytes using the local alleged encoding failed. > If you switch to iso8859-15 only in the presence of undecodable > UTF-8, then you have the same round-trip problem as the PEP: both > b'\xff' and b'\xc3\xbf' will be converted to u'\u00ff' without a > way to unambiguously recover the original file name. Why do you say that? It seems to work as I expected here: >>> '\xff'.decode('iso-8859-15') u'\xff' >>> '\xc3\xbf'.decode('iso-8859-15') u'\xc3\xbf' >>> >>> >>> >>> '\xff'.decode('cp1252') u'\xff' >>> '\xc3\xbf'.decode('cp1252') u'\xc3\xbf' Regards, Zooko
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4