On Thu, Sep 16, 2010 at 10:56:56AM -0700, Guido van Rossum wrote: > On Thu, Sep 16, 2010 at 10:46 AM, Martin (gzlist) <gzlist at googlemail.com> wrote: > > On 16/09/2010, Guido van Rossum <guido at python.org> wrote: > >> > >> In all cases I can imagine where such polymorphic functions make > >> sense, the necessary and sufficient assumption should be that the > >> encoding is a superset of 7-bit(*) ASCII. This includes UTF-8, all > >> Latin-N variant, and AFAIK also the popular CJK encodings other than > >> UTF-16. This is the same assumption made by Python's byte type when > >> you use "character-based" methods like lower(). > > > > Well, depends on what exactly you're doing, it's pretty easy to go wrong: > > > > Python 3.2a2+ (py3k, Sep 16 2010, 18:43:45) [MSC v.1500 32 bit (Intel)] on win32 > > Type "help", "copyright", "credits" or "license" for more information. > >>>> import os, sys > >>>> os.path.split("C:\\十") > > ('C:\\', '十') > >>>> os.path.split("C:\\十".encode(sys.getfilesystemencoding())) > > (b'C:\\\x8f', b'') > > > > Similar things can catch out web developers once they step outside the > > percent encoding. > > Well, that character is not 7-bit ASCII. Of course things will go > wrong there. That's the whole point of what I said, isn't it? > You were talking about encodings that were supersets of 7-bit ASCII. I think Martin was demonstrating a byte string that was a superset of 7-bit ASCII being fed to a stdlib function which went wrong. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: <http://mail.python.org/pipermail/python-dev/attachments/20100916/8c955ef8/attachment.pgp>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4