M.-A. Lemburg <mal@lemburg.com> wrote: > Just a small note on the subject of a character being atomic > which seems to have been forgotten by the discussing parties: >=20 > Unicode itself can be understood as multi-word character > encoding, just like UTF-8. The reason is that Unicode entities > can be combined to produce single display characters (e.g. > u"e"+u"\u0301" will print "=E9" in a Unicode aware renderer). > Slicing such a combined Unicode string will have the same > effect as slicing UTF-8 data. really? does it result in a decoder error? or does it just result in a rendering error, just as if you slice off any trailing character without looking... > It seems that most Latin-1 proponents seem to have single > display characters in mind. While the same is true for > many Unicode entities, there are quite a few cases of > combining characters in Unicode 3.0 and the Unicode > nomarization algorithm uses these as basis for its > work. do we supported automatic normalization in 1.6? </F>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4