Guido van Rossum <guido@python.org> writes: > Thanks! But now we have a diverging set of isxxx methods for 8-bit > strings and Unicode. I really don't know what the equivalent of these > (ispunct, iscntrl, isgraph, isprint) is in Unicode -- maybe MAL or MvL > know? I don't think there is an "official" mapping between these categories and Unicode character categories. I believe an "intuitive" relationship would be: ispunct: Punctuation (Pc, Pd, Ps, Pe, Pi, Pf, Po) iscntrl: Other, control (Cc); perhaps other Other isprint: Letters (L*), Marks (M*), Numbers (N*), Separators (Z*), perhaps informative categories (Symbol, Punctuation) isgraph: everything isprint, except Separators Another approach is to use the classification found in other libraries, such as Qt, Perl, or Win32 (GetStringTypeW). Marcin Kowalczyk presented his intuition in http://mail.nl.linux.org/linux-utf8/2000-09/msg00076.html but some of his classification was challenged later on; I guess glibc would be just another library to draw classificiations from. > Unicode also has a wider definition of digits; do we want to > extend isxdigit() for that? (Probably not, but I'm not sure.) Certainly not. We have to remember the common use for these, which is in computer stuff. There, hexdigit is 0..9{a..f|A..F}. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4