A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2011-November/114322.html below:

[Python-Dev] Unicode exception indexing

[Python-Dev] Unicode exception indexing [Python-Dev] Unicode exception indexingAntoine Pitrou solipsis at pitrou.net
Thu Nov 3 20:29:50 CET 2011
On Thu, 03 Nov 2011 18:14:42 +0100
martin at v.loewis.de wrote:
> There is a backwards compatibility issue with PEP 393 and Unicode exceptions:
> the start and end indices: are they Py_UNICODE indices, or code point indices?
> 
> On the one hand, these indices are used in formatting error messages such as
> "codec can't encode character \u%04x in position %d", suggesting they  
> are regular
> indices into the string (counting code points).
> 
> On the other hand, they are used by error handlers to lookup the character,
> and existing error handlers (including the ones we have now) use
> PyUnicode_AsUnicode to find the character. This suggests that the indices
> should be Py_UNICODE indices, for compatibility (and they currently do
> work in this way).

But what about error handlers written in Python?

> The indices can only be different if the string is an UCS-4 string, and
> Py_UNICODE is a two-byte type (i.e. on Windows).
> 
> So what should it be?

I'd say let's do the Right Thing and accept the small compatibility
breach (surrogates on UCS-2 builds).

Regards

Antoine.


More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4