On May 1, 2009, at 9:42 PM, Zooko O'Whielacronx wrote: > Yep, I reversed the order of encode() and decode(). However, my whole > statement was utterly wrong and shows that I still didn't fully get it > yet. I have flip-flopped again and currently think that PEP 383 is > useless for this use case and that my original plan [1] is still the > way to go. Please let me know if you spot a flaw in my plan or a > ridiculousity in my requirements, or if you see a way that PEP 383 can > help me. If I were designing a new system such as this, I'd probably just go for utf8b *always*. That is, set the filesystem encoding to utf-8b. The end. All files always keep the same bytes transferring between unix systems. Thus, for the 99% of the world that uses either windows or a utf-8 locale, they get useful filenames inside tahoe. The other 1% of the world that uses something like latin-1, EUC_JP, etc. on their local system sees mojibake filenames in tahoe, but will see the same filename that they put in when they take it back out. Gnome already uses only utf-8 for filename displays for a few years now, for example, so this isn't exactly an unheard-of position to take... But if you don't do that, then, I still don't see what purpose your requirements serve. If I have two systems: one with a UTF-8 locale, and one with a Latin-1 locale, why should transmitting filenames from system 1 to system 2 through tahoe preserve the raw bytes, but doing the reverse *not* preserve the raw bytes? (all byte-sequences are valid in latin-1, remember, so they'll all decode into unicode without error, and then be reencoded in utf-8...). This seems rather a useless behavior to me. James
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4