On Thu, Apr 30, 2015 at 11:03 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote: > Note that even if you have a UTF-8 input source, some users are likely > to be surprised because IIRC Python doesn't canonicalize in its > codecs; that is left for higher-level libraries. Linux UTF-8 is > usually NFC normalized, while Mac UTF-8 is NFD normalized. > > > >> u'\xce\xb1' > > Note that that is perfectly legal Unicode. It's legal Unicode, but it doesn't mean what he typed in. This means: '\xce' LATIN CAPITAL LETTER I WITH CIRCUMFLEX '\xb1' PLUS-MINUS SIGN but the original input was: '\u03b1' GREEK SMALL LETTER ALPHA ChrisA
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4