RetroSearch Browse

Tue Nov 30 17:40:19 CET 2010 · https://mail.python.org/pipermail/python-dev/2010-November/106187.html

On Mon, Nov 29, 2010 at 2:38 PM, Alexander Belopolsky
<alexander.belopolsky at gmail.com> wrote:
..
>> Still, if it's not detrimental and it it's not difficult to support,
>> then why do you care?
>
> It is difficult to support.  A fix for issue10557 would be much
> simpler if we did not support non-European digits.  I now added a
> patch that handles non-ascii digits, so you can see what's involved.
> Note that when Unicode Consortium inevitably adds more Nd characters
> to the non-BMP planes, we will have to add surrogate pairs' support to
> this code.
>

It turns out that this did in fact happen:

# Newly assigned in Unicode 3.1.0 (March, 2001)
..
1D7CE..1D7FF  ; 3.1 #  [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL
MONOSPACE DIGIT NINE

See http://unicode.org/Public/UNIDATA/DerivedAge.txt

And of course,

>>> unicodedata.digit('\U0001D7CE')
0

but

>>> int('\U0001D7CE')
..
UnicodeEncodeError: 'decimal' codec can't encode character '\ud835' ..

on a narrow Unicode build.  (Note the character reported in the error message!)

If you think non-ASCII digits are not difficult to support, please
contribute to the following tracker issues:

http://bugs.python.org/issue10581
(Review and document string format accepted in numeric data type constructors)

http://bugs.python.org/issue10557
(Malformed error message from float())

http://bugs.python.org/issue10435
(Document unicode C-API in reST - Specifically, PyUnicode_EncodeDecimal)

http://bugs.python.org/issue8646
(PyUnicode_EncodeDecimal is undocumented)

http://bugs.python.org/issue6632
(Include more fullwidth chars in the decimal codec)

and back to the issue of user confusion

http://bugs.python.org/issue652104 [closed/invalid]
(int(u"\u1234") raises UnicodeEncodeError by Guido van Rossum)

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2010-November/106187.html below:

[Python-Dev] Python and the Unicode Character Database