On Sun, Nov 28, 2010 at 6:43 PM, Steven D'Aprano <steve at pearwood.info> wrote: .. >> is more important than to assure users that once their program >> accepted some text as a number, they can assume that the text is >> ASCII. > > Seems like a pretty foolish assumption, if you ask me, pretty much akin to > assuming that if string.isalpha() returns true that string is ASCII. > It is not to 99.9% of Python users whose code is written for 2.x. Their strings are byte strings and string.isdigit() does imply ASCII even if string.isalpha() does not in many locales. .. > The fact that this is (apparently) only being raised now means that it isn't > actually a problem in real life. I'd even say that it's a feature, and that > if Python didn't support non-Arabic numerals, it should. > I raised this problem because I found a bug that is related to this feature. The bug is also a regression from 2.x. In 2.7: >>> float(u'1234\xa1') .. ValueError: invalid literal for float(): 1234? The last character is lost, but the error message is still meaningful. In 3.x, however: >>> float('1234\xa1') .. ValueError See http://bugs.python.org/issue10557 While investigating this issue I found that by the time the string gets to the number parser (_Py_dg_strtod), all non-ascii characters are dropped by PyUnicode_EncodeDecimal() so it cannot produce meaningful diagnostic. Of course, PyUnicode_EncodeDecimal(), can be fixed by making it pass non-ascii chars through as UTF-8 bytes, but I was wondering if preserving the ability to parse exotic numerals was worth the effort.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4