RetroSearch Browse

Wed Nov 7 23:47:13 CET 2012 · https://mail.python.org/pipermail/python-dev/2012-November/122595.html

2012/11/7 Alexandre Vassalotti <alexandre at peadrop.com>:
> The Unicode code points in the U+DC00-DFFF range (low surrogate area) can't
> be encoded in UTF-8. Quoting from RFC 3629:
>
> The definition of UTF-8 prohibits encoding character numbers between U+D800
> and U+DFFF, which are reserved for use with the UTF-16 encoding form (as
> surrogate pairs) and do not directly represent characters.
>
>
> It looks like this test was doing something specific with regards to this.
> So, I am curious as well about this change.

os.fsencode() uses the surrogateescape error handler (PEP 393) on UNIX.

>>> os.fsencode('\udcf1\udcea\udcf0\udce8\udcef\udcf2')
b'\xf1\xea\xf0\xe8\xef\xf2'

I replaced this arbitrary string (and other similar constant strings)
with support.FS_NONASCII which is more portable (should be available
on all locale encodings... except ASCII) and documented.

I rewrote test_cmd_line_script.test_non_ascii() (and other tests) in
Python 3.4 to use support.FS_NONASCII.

This change should improve code coverage on heterogeneous environments.

Victor

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2012-November/122595.html below:

[Python-Dev] cpython: Issue #16218: skip test if filesystem doesn't support required encoding