On Thu, Nov 3, 2011 at 12:29 PM, Antoine Pitrou <solipsis at pitrou.net> wrote: > On Thu, 03 Nov 2011 18:14:42 +0100 > martin at v.loewis.de wrote: >> There is a backwards compatibility issue with PEP 393 and Unicode exceptions: >> the start and end indices: are they Py_UNICODE indices, or code point indices? >> >> On the one hand, these indices are used in formatting error messages such as >> "codec can't encode character \u%04x in position %d", suggesting they >> are regular >> indices into the string (counting code points). >> >> On the other hand, they are used by error handlers to lookup the character, >> and existing error handlers (including the ones we have now) use >> PyUnicode_AsUnicode to find the character. This suggests that the indices >> should be Py_UNICODE indices, for compatibility (and they currently do >> work in this way). > > But what about error handlers written in Python? > >> The indices can only be different if the string is an UCS-4 string, and >> Py_UNICODE is a two-byte type (i.e. on Windows). >> >> So what should it be? > > I'd say let's do the Right Thing and accept the small compatibility > breach (surrogates on UCS-2 builds). +1 -- --Guido van Rossum (python.org/~guido)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4