On 30 Sep, 09:37 pm, guido at python.org wrote: >On Tue, Sep 30, 2008 at 11:42 AM, <glyph at divmod.com> wrote: >>There are other ways to glean this knowledge; for example, looking at >>the >>'iocharset' or 'nls' mount options supplied to mount various >>filesystems. >I know we could do a better job, but absent anyone who knows what >they're doing we've chosen a fairly conservative approach. I certainly >hope that someone will contribute some mean encoding-guessing code to >the stdlib that users can use. I'm not sure if I'll ever endorse doing >this automatically in io.open(), though I'd be fine with a convention >like passing encoding="guess". I think the conservative approach is actually correct, or rather, as close to correct as it is possible to get in this mess. Inspecting these fantastically obscure options is only likely to be helpful in a tool which tries to correct filesystem encoding errors on legacy data. I wouldn't even know about them if I hadn't written several such tools (well, just little scripts, really) in the past. I was just verifying that I wasn't missing some "right way" which would let someone else do the guesswork for me. In reality, you have two options for filesystem encoding on Linux: * UTF-8 * fall in a well and die The OS will happily let you create a completely nonsensical environment where no application can possibly do anything reasonable: set LC_ALL to KOI8R, mount your USB keychain as Shift_JIS and your windows partition as ISO-8859-8. Of course nobody would actually _do_ this, because they want things to work, so everything is gradually evolving to a default of UTF-8 everywhere. In practice, however, there are still problems with CIFS/SMB shares where other clients have different ideas about encoding. I've experienced this most commonly when sharing with Macs, which have very particular and different ideas about normalization, as has already been discussed in this thread.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4