[M.-A. Lemburg] >"M.-A. Lemburg" wrote: >> >> Fredrik Lundh wrote: >> > how about this plan: >> > >> > -- you add a Py_UNICODE_ALPHA to unicodeobject.h asap, >> > which does exactly that (or I can do that, if you prefer). >> > (and maybe even a Py_UNICODE_ALNUM) >> >> Ok, I'll add Py_UNICODE_ISALPHA and Py_UNICODE_ISALNUM >> (first with approximations of the sort you give above and >> later with true implementations using tables in unicodectype.c) >> on Monday... gotta run now. >> >> > -- I change SRE to use that asap. >> > >> > -- you, I, or someone else add a better implementation, >> > some other day. > >I've just looked into this... the problem here is what to >consider as being "alpha" and what "numeric". > >I could add two new tables for the characters with category 'Lo' >(other letters, not cased) and 'Lm' (letter modifiers) >to match all letters in the Unicode database, but those >tables have some 5200 entries (note that there are only 804 lower >case letters and 686 upper case ones). In JDK1.3, Character.isLetter(..) and Character.isDigit(..) are documented as: http://java.sun.com/j2se/1.3/docs/api/java/lang/Character.html#isLetter(char) http://java.sun.com/j2se/1.3/docs/api/java/lang/Character.html#isDigit(char) http://java.sun.com/j2se/1.3/docs/api/java/lang/Character.html#isLetterOrDigit(char) I guess that java uses the extra huge tables. regards, finn
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4