On Mon, Nov 29, 2010 at 2:38 PM, Alexander Belopolsky <alexander.belopolsky at gmail.com> wrote: .. >> Still, if it's not detrimental and it it's not difficult to support, >> then why do you care? > > It is difficult to support. A fix for issue10557 would be much > simpler if we did not support non-European digits. I now added a > patch that handles non-ascii digits, so you can see what's involved. > Note that when Unicode Consortium inevitably adds more Nd characters > to the non-BMP planes, we will have to add surrogate pairs' support to > this code. > It turns out that this did in fact happen: # Newly assigned in Unicode 3.1.0 (March, 2001) .. 1D7CE..1D7FF ; 3.1 # [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL MONOSPACE DIGIT NINE See http://unicode.org/Public/UNIDATA/DerivedAge.txt And of course, >>> unicodedata.digit('\U0001D7CE') 0 but >>> int('\U0001D7CE') .. UnicodeEncodeError: 'decimal' codec can't encode character '\ud835' .. on a narrow Unicode build. (Note the character reported in the error message!) If you think non-ASCII digits are not difficult to support, please contribute to the following tracker issues: http://bugs.python.org/issue10581 (Review and document string format accepted in numeric data type constructors) http://bugs.python.org/issue10557 (Malformed error message from float()) http://bugs.python.org/issue10435 (Document unicode C-API in reST - Specifically, PyUnicode_EncodeDecimal) http://bugs.python.org/issue8646 (PyUnicode_EncodeDecimal is undocumented) http://bugs.python.org/issue6632 (Include more fullwidth chars in the decimal codec) and back to the issue of user confusion http://bugs.python.org/issue652104 [closed/invalid] (int(u"\u1234") raises UnicodeEncodeError by Guido van Rossum)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4