A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2002-May/024723.html below:

[Python-Dev] String module

[Python-Dev] String moduleMartin v. Loewis martin@v.loewis.de
31 May 2002 00:05:44 +0200
Guido van Rossum <guido@python.org> writes:

> Thanks!  But now we have a diverging set of isxxx methods for 8-bit
> strings and Unicode.  I really don't know what the equivalent of these
> (ispunct, iscntrl, isgraph, isprint) is in Unicode -- maybe MAL or MvL
> know?  

I don't think there is an "official" mapping between these categories
and Unicode character categories. I believe an "intuitive"
relationship would be:

ispunct: Punctuation (Pc, Pd, Ps, Pe, Pi, Pf, Po)
iscntrl: Other, control (Cc); perhaps other Other
isprint: Letters (L*), Marks (M*), Numbers (N*), Separators (Z*),
         perhaps informative categories (Symbol, Punctuation)
isgraph: everything isprint, except Separators

Another approach is to use the classification found in other
libraries, such as Qt, Perl, or Win32 (GetStringTypeW).

Marcin Kowalczyk presented his intuition in

http://mail.nl.linux.org/linux-utf8/2000-09/msg00076.html

but some of his classification was challenged later on; I guess glibc
would be just another library to draw classificiations from.

> Unicode also has a wider definition of digits; do we want to
> extend isxdigit() for that?  (Probably not, but I'm not sure.)

Certainly not. We have to remember the common use for these, which is
in computer stuff. There, hexdigit is 0..9{a..f|A..F}.

Regards,
Martin




RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4