[Tim] > ... > If some internal Unicode operation decided to allocate a short Unicode > string, but freed it before filling in any of the string bytes, I > suppose the keepalive optimization would retain a Unicode object with > uninitialized str space on the unicode free list. A subsequent > _PyUnicode_New could grab that and try to boost its ->str size. That > could explain it. And that turned out to be the case. One example (there are more) is in PyUnicode_DecodeASCII(): the local PyUnicodeObject *v; gets initialized: v = _PyUnicode_New(size); Suppose size is 1. Suppose the string coming in is "\xc8". The first iteration of the loop sets a "ordinal not in range(128)" error, and jumps to onError: onError: Py_XDECREF(v); unicode_dealloc() then stuffs v on unicode_freelist, and because the length "is small" (size == 1), v->str[] is retained, still holding uninitialized heap trash. A later _PyUnicode_New() grabs this off unicode_freelist, decides to boost the str space via unicode_resize(), and the latter blows up in the unicode_latin1[unicode->str[0]] == unicode)) { check because unicode->str[0] happens to be gigantically negative on Jeremy's box, and the preceding unicode->str[0] < 256 && check is too weak. Or is there an implicit assumption that Py_UNICODE is always an unsigned type (in which case, why isn't the literal 256U?; and in which case, it doesn't seem to be true on Jeremy's box).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4