On 16/09/2010, Guido van Rossum <guido at python.org> wrote: > > In all cases I can imagine where such polymorphic functions make > sense, the necessary and sufficient assumption should be that the > encoding is a superset of 7-bit(*) ASCII. This includes UTF-8, all > Latin-N variant, and AFAIK also the popular CJK encodings other than > UTF-16. This is the same assumption made by Python's byte type when > you use "character-based" methods like lower(). Well, depends on what exactly you're doing, it's pretty easy to go wrong: Python 3.2a2+ (py3k, Sep 16 2010, 18:43:45) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import os, sys >>> os.path.split("C:\\十") ('C:\\', '十') >>> os.path.split("C:\\十".encode(sys.getfilesystemencoding())) (b'C:\\\x8f', b'') Similar things can catch out web developers once they step outside the percent encoding. Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4