RetroSearch Browse

Thu Nov 25 06:39:30 CET 2010 · https://mail.python.org/pipermail/python-dev/2010-November/105997.html

On 11/24/2010 3:06 PM, Alexander Belopolsky wrote:

> Any non-trivial text processing is likely to be broken in presence of
> surrogates.  Producing them on input is just trading known issue for
> an unknown one.  Processing surrogate pairs in python code is hard.
> Software that has to support non-BMP characters will most likely be
> written for a wide build and contain subtle bugs when run under a
> narrow build.  Note that my latest proposal does not abolish
> surrogates outright.  Users who want them can still use something like
> "surrogateescape"  error handler for non-BMP characters.

It seems to me that what you are asking for is an alternate, optional, 
utf-8-bmp codec that would raise an error, in either direction, for 
non-bmp chars. Then, as you suggest, if one is not prepared for 
surrogates, they are not allowed.

-- 
Terry Jan Reedy

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2010-November/105997.html below:

[Python-Dev] len(chr(i)) = 2?