> Before we start considering how it's possible to make unicode.__equal__ > act encoding-insensitively[1], I think we need to consider whether > that's really the behavior we want. In some ways, this seems like > case-insensitive equality to me: it's certainly a useful operation, but > I don't think it should be the object's builtin notion of equality.. > - I think people will be confused if s1==s2 but s1[0]!=s2[0]. > - Sometimes you might *want* to distinguish different encodings of > the "same" string; a "normalized" equality test makes that very > difficult. Right. Couldn't have said it better myself. > And if you *do* want unicode objects to act normalized, then I think > that the right way to do it is to normalize them at creation time. Then > all the right hash/eq/cmp stuff just falls out. Exactly. > But since some people will may want to distinguish different encodings > of the same string, I think that the most sensible alternative is to add > a new subclass to unicode -- something like "normalized_unicode." It > would normalize itself at construction time; and when combined with > other unicode strings (eg by +), the result would be normalized (so > unicode+normalized_unicode -> normalized_unicode). It's possible that > the normalized unicode class would be more useful to people (and > therefore more widely used?), but the non-normalized version would still > be available for people who want it. Works for me. I recomment that someone try this approach as a user subclass first -- this should be easy enough, right? > (or we could just leave things as they are now, and force people to do > any normalization themselves. :) ) Do we even have normalization code in core Python? --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4