On approximately 4/29/2009 1:28 PM, came the following characters from the keyboard of Martin v. Löwis: >>>>>>>> C. File on disk with the invalid surrogate code, accessed via the >>>>>>>> str interface, no decoding happens, matches in memory the file on disk >>>>>>>> with the byte that translates to the same surrogate, accessed via the >>>>>>>> bytes interface. Ambiguity. >>> What does that mean? What specific interface are you referring to to >>> obtain file names? >> os.listdir("") >> >> os.listdir(b"") >> >> So I guess I'd better suggest that a specific, equivalent directory name >> be passed in either bytes or str form. > > [Leaving the issue of the empty string apparently having different > meanings aside ...] > > Ok. Now I understand the example. So you do > > os.listdir("c:/tmp") > os.listdir(b"c:/tmp") > > and you have a file in c:/tmp that is named "abc\uDC10". > >> So what you are saying here is that Python doesn't use the "A" forms of >> the Windows APIs for filenames, but only the "W" forms, and uses lossy >> decoding (from MS) to the current code page (which can never be UTF-8 on >> Windows). > > Actually, it does use the A form, in the second listdir example. This, > in turn (inside Windows), uses the lossy CP_ACP encoding. You get back > a byte string; the listdirs should give > > ["abc\uDC10"] > [b"abc?"] > > (not quite sure about the second - I only guess that CP_ACP will replace > the half surrogate with a question mark). > > So where is the ambiguity here? None. But not everyone can read all the Python source code to try to understand it; they expect the documentation to help them avoid that. Because the documentation is lacking in this area, it makes your concisely stated PEP rather hard to understand. Thanks for clarifying the Windows behavior, here. A little more clarification in the PEP could have avoided lots of discussion. It would seem that a PEP, proposed to modify a poorly documented (and therefore likely poorly understood) area, should be educational about the status quo, as well as presenting the suggested change. Or is it the Python philosophy that the PEPs should be as incomprehensible as possible, to generate large discussions? -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4