"M.-A. Lemburg" wrote: > > Fredrik Lundh wrote: > > how about this plan: > > > > -- you add a Py_UNICODE_ALPHA to unicodeobject.h asap, > > which does exactly that (or I can do that, if you prefer). > > (and maybe even a Py_UNICODE_ALNUM) > > Ok, I'll add Py_UNICODE_ISALPHA and Py_UNICODE_ISALNUM > (first with approximations of the sort you give above and > later with true implementations using tables in unicodectype.c) > on Monday... gotta run now. > > > -- I change SRE to use that asap. > > > > -- you, I, or someone else add a better implementation, > > some other day. I've just looked into this... the problem here is what to consider as being "alpha" and what "numeric". I could add two new tables for the characters with category 'Lo' (other letters, not cased) and 'Lm' (letter modifiers) to match all letters in the Unicode database, but those tables have some 5200 entries (note that there are only 804 lower case letters and 686 upper case ones). Note that there seems to be no definition of what is to be considered alphanumeric in Unicode. The only quote I found was in http://www.w3.org/TR/xslt#convert which says: """ Alphanumeric means any character that has a Unicode category of Nd, Nl, No, Lu, Ll, Lt, Lm or Lo. """ Here's what the glibc has to say about these chars: /* Test for any wide character for which `iswupper' or 'iswlower' is true, or any wide character that is one of a locale-specific set of wide-characters for which none of `iswcntrl', `iswdigit', `iswpunct', or `iswspace' is true. */ extern int iswalpha __P ((wint_t __wc)); Question: Should I go ahead and add the Lo and Lm tables to unicodectype.c ? Pros: standards confrom Cons: huge in size -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4