A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2010-November/105997.html below:

[Python-Dev] len(chr(i)) = 2?

[Python-Dev] len(chr(i)) = 2? [Python-Dev] len(chr(i)) = 2?Terry Reedy tjreedy at udel.edu
Thu Nov 25 06:39:30 CET 2010
On 11/24/2010 3:06 PM, Alexander Belopolsky wrote:

> Any non-trivial text processing is likely to be broken in presence of
> surrogates.  Producing them on input is just trading known issue for
> an unknown one.  Processing surrogate pairs in python code is hard.
> Software that has to support non-BMP characters will most likely be
> written for a wide build and contain subtle bugs when run under a
> narrow build.  Note that my latest proposal does not abolish
> surrogates outright.  Users who want them can still use something like
> "surrogateescape"  error handler for non-BMP characters.

It seems to me that what you are asking for is an alternate, optional, 
utf-8-bmp codec that would raise an error, in either direction, for 
non-bmp chars. Then, as you suggest, if one is not prepared for 
surrogates, they are not allowed.

-- 
Terry Jan Reedy

More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4