What's the correct way to deal with filenames in a Unicode environment?= Consider this: >>> import site >>> site.encoding 'latin-1' >>> a =3D "abc\xe4\xfc\xdf.txt" >>> u =3D unicode (a, "latin-1") >>> uu =3D u.encode ("utf-8") >>> open(a, "w") <open file 'abc=E4=FC=DF.txt', mode 'w' at 0x823c2a0> >>> open(u, "w") <open file 'abc=E4=FC=DF.txt', mode 'w' at 0x823a1e8> >>> open(uu, "w") <open file 'abc=C3=A4=C3=BC=C3.txt', mode 'w' at 0x81d6160> If I change my site's default encoding back to ascii, the second open f= ails: >>> import site >>> site.encoding 'ascii' >>> a =3D "abc\xe4\xfc\xdf.txt" >>> u =3D unicode (a, "latin-1") >>> uu =3D u.encode ("utf-8") >>> open(a, "w") <open file 'abc=E4=FC=DF.txt', mode 'w' at 0x822b448> >>> open(u, "w") Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) >>> open(uu, "w") <open file 'abc=C3=A4=C3=BC=C3.txt', mode 'w' at 0x822d260> as I expect it should. The third open is a problem as well, even thoug= h it succeeds with either encoding. (Why doesn't it fail when the default encoding is ascii?) My thought is that before using a plain string or = a unicode string as a filename it should first be coerced to a unicode st= ring with the default encoding, something like: if type(fname) =3D=3D types.StringType: fname =3D unicode(fname, site.encoding) elif type(fname) =3D=3D types.UnicodeType: fname =3D fname.encode(site.encoding) else: raise TypeError, ("unrecognized type for filename: %s"%type(fna= me)) Is that the correct approach? Apparently Python's file object doesn't = do this under the covers. Should it? Thx, Skip
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4