[MAL] > > > As you may have noticed, the Unicode objects provide > > > new methods .islower(), .isupper() and .istitle(). Finn Bock > > > mentioned that Java also provides .isdigit() and .isspace(). > > > > > > Question: should Unicode also provide these character > > > property methods: .isdigit(), .isnumeric(), .isdecimal() > > > and .isspace() ? Plus maybe .digit(), .numeric() and > > > .decimal() for the corresponding decoding ? [Guido] > > What would be the difference between isdigit, isnumeric, isdecimal? > > I'd say don't do more than Java. I don't understand what the > > "corresponding decoding" refers to. What would "3".decimal() return? [MAL] > These originate in the Unicode database; see > > ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html > > Here are the descriptions: > > """ > 6 > Decimal digit value > normative > This is a numeric field. If the > character has the decimal digit > property, as specified in Chapter > 4 of the Unicode Standard, the > value of that digit is represented > with an integer value in this field > 7 > Digit value > normative > This is a numeric field. If the > character represents a digit, not > necessarily a decimal digit, the > value is here. This covers digits > which do not form decimal radix > forms, such as the compatibility > superscript digits > 8 > Numeric value > normative > This is a numeric field. If the > character has the numeric > property, as specified in Chapter > 4 of the Unicode Standard, the > value of that character is > represented with an integer or > rational number in this field. This > includes fractions as, e.g., "1/5" for > U+2155 VULGAR FRACTION > ONE FIFTH Also included are > numerical values for compatibility > characters such as circled > numbers. > > u"3".decimal() would return 3. u"\u2155". > > Some more examples from the unicodedata module (which makes > all fields of the database available in Python): > > >>> unicodedata.decimal(u"3") > 3 > >>> unicodedata.decimal(u"²") > 2 > >>> unicodedata.digit(u"²") > 2 > >>> unicodedata.numeric(u"²") > 2.0 > >>> unicodedata.numeric(u"\u2155") > 0.2 > >>> unicodedata.numeric(u'\u215b') > 0.125 Hm, very Unicode centric. Probably best left out of the general string methods. Isspace() seems useful, and an isdigit() that is only true for ASCII '0' - '9' also makes sense. What about "123".isdigit()? What does Java say? Or do these only apply to single chars there? I think "123".isdigit() should be true if "abc".islower() is true. > > > Similar APIs are already available through the unicodedata > > > module, but could easily be moved to the Unicode object > > > (they cause the builtin interpreter to grow a bit in size > > > due to the new mapping tables). > > > > > > BTW, string.atoi et al. are currently not mapped to > > > string methods... should they be ? > > > > They are mapped to int() c.s. > > Hmm, I just noticed that int() et friends don't like > Unicode... shouldn't they use the "t" parser marker > instead of requiring a string or tp_int compatible > type ? Good catch. Go ahead. --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4