Scott David Daniels wrote: > I naïvely wrote: > >Could we perhaps use a comparison that, in effect, did: > > def uni_equal(first, second): > > if first == second: > > return True > > return first.normalize() == second.normalize() > >That is, take advantage of the fact that normalization is often > >unnecessary for "trivial" reasons. > > [...] Before we start considering how it's possible to make unicode.__equal__ act encoding-insensitively[1], I think we need to consider whether that's really the behavior we want. In some ways, this seems like case-insensitive equality to me: it's certainly a useful operation, but I don't think it should be the object's builtin notion of equality.. - I think people will be confused if s1==s2 but s1[0]!=s2[0]. - Sometimes you might *want* to distinguish different encodings of the "same" string; a "normalized" equality test makes that very difficult. And if you *do* want unicode objects to act normalized, then I think that the right way to do it is to normalize them at creation time. Then all the right hash/eq/cmp stuff just falls out. But since some people will may want to distinguish different encodings of the same string, I think that the most sensible alternative is to add a new subclass to unicode -- something like "normalized_unicode." It would normalize itself at construction time; and when combined with other unicode strings (eg by +), the result would be normalized (so unicode+normalized_unicode -> normalized_unicode). It's possible that the normalized unicode class would be more useful to people (and therefore more widely used?), but the non-normalized version would still be available for people who want it. (or we could just leave things as they are now, and force people to do any normalization themselves. :) ) -Edward [1] I don't think that "encoding" is the right technical term here, but I'm not sure what the right term is. I mean insensitive to the difference between separated diacritics & unified diacritics.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4