On Wed, Jul 2, 2008 at 11:22 AM, Jeroen Ruigrok van der Werven <asmodai at in-nomine.org> wrote: > -On [20080702 19:42], Guido van Rossum (guido at python.org) wrote: >>Yes. At least in the sense that \Uxxxxxxxx gets translated to a >>surrogate pair, and that the UTF-8 codec supports surrogate pairs in >>both directions. It's been like this for a long time. What else would >>you expect from UTF-16 support? > > Well, unless I misunderstand things, a Python 3 compiled with the default > Unicode option gives this: > >>>> len("\N{MUSICAL SYMBOL G CLEF}") > 2 > > Whereas a Python 3 with --with-wide-unicode gives: > > >>>> len("\N{MUSICAL SYMBOL G CLEF}") > 1 > > This, of course, causes problems with splitting, finding, and so on. Understood. > So that > means that a Python 3 with only 2 byte Unicode support is not to be > used/recommended for Unicode outside of the BMP. I disagree. Instead, I would say that such code needs to be aware of surrogates. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4