RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2000-May/003922.html below:

[Python-Dev] Unicode comparisons & normalization

[Python-Dev] Unicode comparisons & normalizationJust van Rossum just@letterror.com
Wed, 3 May 2000 10:03:16 +0100

Previous message: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python bltinmodule.c,2.154,2.155
Next message: [Python-Dev] Unicode comparisons & normalization
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

After quickly browsing through the unicode.org URLs I posted earlier, I
reach the following (possibly wrong) conclusions:

- there is a script and language independent canonical form (but automatic
normalization is indeed a bad idea)
- ideally, unicode comparisons should follow the rules from
http://www.unicode.org/unicode/reports/tr10/ (But it seems hardly realistic
for 1.6, if at all...)
- this would indeed mean that it's possible for u == v even though type(u)
is type(v) and len(u) != len(v). However, I don't see how this would
collapse /F's world, as the two strings are at most semantically
equivalent. Their physical difference is real, and still follows the
a-string-is-a-sequence-of-characters rule (!).
- there may be additional customized language-specific sorting rules. I
currently don't see how to implement that without some global variable.
- the sorting rules are very complicated, and should be implemented by
calculating "sort keys". If I understood it correctly, these can take up to
4 bytes per character in its most compact form. Still, for it to be
somewhat speed-efficient, they need to be cached...
- u.find() may need an alternative API, which returns a (begin, end) tuple,
since the match may not have the same length as the search string... (This
is tricky, since you need the begin and end indices in the non-canonical
form...)

Just

Previous message: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python bltinmodule.c,2.154,2.155
Next message: [Python-Dev] Unicode comparisons & normalization
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4