On 11/29/2010 10:19 AM, M.-A. Lemburg wrote: > Nick Coghlan wrote: >> On Mon, Nov 29, 2010 at 9:02 PM, M.-A. Lemburg<mal at egenix.com> wrote: >>> If we would go down that road, we would also have to disable other >>> Unicode features based on locale, e.g. whether to apply non-ASCII >>> case mappings, what to consider whitespace, etc. >>> >>> We don't do that for a good reason: Unicode is supposed to be >>> universal and not limited to a single locale. >> >> Because parsing numbers is about more than just the characters used >> for the individual digits. There are additional semantics associated >> with digit ordering (for any number) and decimal separators and >> exponential notation (for floating point numbers) and those vary by >> locale. We deliberately chose to make the builtin numeric parsers >> unaware of all of those things, and assuming that we can simply parse >> other digits as if they were their ASCII equivalents and otherwise >> assume a C locale seems questionable. > > Sure, and those additional semantics are locale dependent, even > between ASCII-only locales. However, that does not apply to the > basic building blocks, the decimal digits themselves. > >> If the existing semantics can be adequately defined, documented and >> defended, then retaining them would be fine. However, the language >> reference needs to define the behaviour properly so that other >> implementations know what they need to support and what can be chalked >> up as being just an implementation accident of CPython. (As a point in >> the plus column, both decimal.Decimal and fractions.Fraction were able >> to handle the '١٢٣٤.٥٦' example in a manner consistent with the int >> and float handling) > > The support is built into the C API, so there's not really much > surprise there. > > Regarding documentation, we'd just have to add that numbers may > be made up of an Unicode code point in the category "Nd". > > See http://www.unicode.org/versions/Unicode5.2.0/ch04.pdf, section > 4.6 for details.... > > """ > Decimal digits form a large subcategory of numbers consisting of those digits that can be > used to form decimal-radix numbers. They include script-specific digits, but exclude char- > acters such as Roman numerals and Greek acrophonic numerals. (Note that<1, 5> = 15 = > fifteen, but<I, V> = IV = four.) Decimal digits also exclude the compatibility subscript or > superscript digits to prevent simplistic parsers from misinterpreting their values in context. > """ > > int(), float() and long() (in Python2) are such simplistic > parsers. Since you are the knowledgable advocate of the current behavior, perhaps you could open an issue and propose a doc patch, even if not .rst formatted. -- Terry Jan Reedy
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4