2012/11/7 Alexandre Vassalotti <alexandre at peadrop.com>: > The Unicode code points in the U+DC00-DFFF range (low surrogate area) can't > be encoded in UTF-8. Quoting from RFC 3629: > > The definition of UTF-8 prohibits encoding character numbers between U+D800 > and U+DFFF, which are reserved for use with the UTF-16 encoding form (as > surrogate pairs) and do not directly represent characters. > > > It looks like this test was doing something specific with regards to this. > So, I am curious as well about this change. os.fsencode() uses the surrogateescape error handler (PEP 393) on UNIX. >>> os.fsencode('\udcf1\udcea\udcf0\udce8\udcef\udcf2') b'\xf1\xea\xf0\xe8\xef\xf2' I replaced this arbitrary string (and other similar constant strings) with support.FS_NONASCII which is more portable (should be available on all locale encodings... except ASCII) and documented. I rewrote test_cmd_line_script.test_non_ascii() (and other tests) in Python 3.4 to use support.FS_NONASCII. This change should improve code coverage on heterogeneous environments. Victor
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4