Guido van Rossum <guido@python.org> writes: > Yes, but all the non-ASCII has to be represented as Unicode strings. > I.e. no Latin-1 in 8-bit strings! Exactly. This might still cause problems for inspect and other introspective tools. For ASCII identifiers, I agree that using byte strings is sensible, for best backwards compatibility. > Really? I thought Unicode's isalpha() was built on the Unicode text > database? It isn't if it has a "usable wchar_t", see unicodeobject.h: #if defined(HAVE_USABLE_WCHAR_T) && defined(WANT_WCTYPE_FUNCTIONS) #include <wctype.h> #define Py_UNICODE_ISSPACE(ch) iswspace(ch) ... I was missing the part that it also requires active selection of wctype functions - that is probably a feature that is never used. So it is better than I thought: isletter might vary across builds on the same platform, but likely never varies in practice. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4