>> On the one hand, these indices are used in formatting error messages such as >> "codec can't encode character \u%04x in position %d", suggesting they >> are regular >> indices into the string (counting code points). >> >> On the other hand, they are used by error handlers to lookup the character, >> and existing error handlers (including the ones we have now) use >> PyUnicode_AsUnicode to find the character. This suggests that the indices >> should be Py_UNICODE indices, for compatibility (and they currently do >> work in this way). > > But what about error handlers written in Python? I'm working on a patch where an C error handler using PyUnicodeEncodeError_GetStart gets a different value than a Python error handler accessing .start. The _GetStart/_GetEnd functions would take the value from the exception object, and adjust it before returning it. The implementation is fairly straight-forward, just a little expensive (in the case of non-BMP strings on Windows). Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4