>>>>> "Martin" =3D=3D Martin v Loewis <martin@v.loewis.de> writes: >> What's the correct way to deal with filenames in a Unicode >> environment? Consider this: >> >> >>> import site site.encoding >> 'latin-1' Martin> Setting site.encoding is certainly the wrong thing to do. H= ow Martin> can you know all users of your system use latin-1? Why is setting site.encoding appropriate to your environment at the tim= e you install Python wrong? I can't know that all users of my system (whatev= er the definition of "my system" is) will use latin-1. Somewhere along th= e way I have to make some assumptions, however. On any given computer I assume the people who install Python will s= et site.encoding appropriate to their environment. The example I used was latin-1 simply because the folks I'm working= with are in Austria and they came up with the example. I assume the bes= t default encoding for them is latin-1. The application writers themselves will have no problem restricting= internal filenames to be ascii. I assume it users want to save fil= es of their own, they will choose characters from the Unicode character s= et they use most frequently. So, my example used latin-1. I could just as easily have chosen someth= ing else. Martin> On my system, the following works fine Martin> >>> import locale ; locale.setlocale(locale.LC_ALL,"") Martin> 'LC_CTYPE=3Dde_DE;LC_NUMERIC=3Dde_DE;LC_TIME=3Dde_DE;LC_COL= LATE=3DC;LC_MONETARY=3Dde_DE;LC_MESSAGES=3Dde_DE;LC_PAPER=3Dde_DE;LC_NA= ME=3Dde_DE;LC_ADDRESS=3Dde_DE;LC_TELEPHONE=3Dde_DE;LC_MEASUREMENT=3Dde_= DE;LC_IDENTIFICATION=3Dde_DE' Martin> >>> a =3D "abc\xe4\xfc\xdf.txt" u =3D unicode (a, "latin-1"= ) open(u, "w") Martin> <open file 'abc=E4=FC=DF.txt', mode 'w' at 0x8173e88> Martin> On Unix, your best bet for file names is to trust the user'= s Martin> locale settings. If you do that, open will accept Unicode Martin> objects. Martin> What is your locale? The above setlocale call prints 'LC_CTYPE=3Den_US;LC_NUMERIC=3Den_US;LC_TIME=3Den_US;LC_COLLATE=3De= n_US;LC_MONETARY=3Den_US;LC_MESSAGES=3Den_US;LC_PAPER=3Den;LC_NAME=3Den= ;LC_ADDRESS=3Den;LC_TELEPHONE=3Den;LC_MEASUREMENT=3Den;LC_IDENTIFICATIO= N=3Den' I can't get to the machines in Austria right now to see how their local= es are set, though I suspect they haven't fiddled their LC_* environment, because they are having the problems I described. >> Is that the correct approach? Apparently Python's file object >> doesn't do this under the covers. Should it? Martin> No. There is no established convention, on Unix, how to do Martin> non-ASCII file names. If anything, following the user's loc= ale Martin> setting is the most reasonable thing to do; this should be = in Martin> synch of how the user's terminal displays characters. The P= ython Martin> installations' default encoding is almost useless, and shou= ldn't Martin> be changed. Martin> On Windows, things are much better, since there a notion of= Martin> Unicode file names in the system. This suggests to me that the Python docs need some introductory materia= l on this topic. It appears to me that there are two people in the Python community who live and breathe this stuff are you, Martin, and Marc-And= r=E9. For most of the rest of us, especially if we've never conciously writte= n code for consumption outside an ascii environment, the whole thing just= looks like a quagmire. Skip
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4