On woensdag, augustus 14, 2002, at 02:13 , Guido van Rossum wrote: > Note that normalization doesn't belong in the codecs (except perhaps > as a separate Unicode->Unicode codec, since codecs seem to be useful > for all string->string transformations). It's a separate step that > the application has to request; only the app knows whether a > particular Unicode string is already normalized or not, and whether > the expense is useful for the app, or not. I don't like this, I don't like it at all. Python jumps through hoops to make 'jack' and u'jack' compare=20 identical and be interchangeable in dict keys and what have you,=20 and now suddenly I find out that there's two ways to say u'j=E4ck'=20 and they won't compare equal. Not good. I sympathise with the fact that this is difficult (although I=20 still don't understand why: whereas when you want to create the=20 decomposed version I can imagine there's N! ways to notate a=20 character with N combining chars, I would think there's one and=20 only one way to write a combined character), but that shouldn't=20 stop us at least planning to fix this. And I don't think the burden should fall on the application.=20 That same reasoning could have been followed for making ascii=20 and unicode-ascii-subset compare equal: the application will=20 know it has to convert ascii to unicode before comparing. -- - Jack Jansen <Jack.Jansen@oratrix.com> =20 http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution --=20 Emma Goldman -
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4