[Tim] >> So what if MAL ammened his suggestion to >> >> reject signed 2-byte wchar_t value as not-usable >> +++++++ ? [M.-A. Lemburg] > That would not solve the problem. Then what is the problem, specifically? I thought you agreed with Martin that a signed 32-bit type doesn't hurt, since the sign bit remains clear then in all cases of Unicode data. > Note that we have proper conversion routines that allow > converting between wchar_t and Py_UNICODE. These routines must > be used for conversions anyway (even if Py_UNICODE and wchar_t > happen to be the same type), so from a programmer perspective > changing Py_UNICODE to be unsigned won't be noticed and we > don't lose anything much. > > Again, I don't see the point in using a signed type for data > that doesn't have any concept of signed values. It's just > bad design and we shouldn't try to go down the same route > if we don't have to. I don't know why Martin favors wchar_t when possible. The answer to that isn't clear. The answer to why there's an intractable problem if wchar_t happens to be a signed type > 2 bytes also isn't clear. > The Unicode implementation has always defined Py_UNICODE to > be an unsigned type; see the Unicode PEP 100: > > """ > Internal Format > > The internal format for Unicode objects should use a Python > specific fixed format <PythonUnicode> implemented as 'unsigned > short' (or another unsigned numeric type having 16 bits). Byte > order is platform dependent. > > ... > > The configure script should provide aid in deciding whether > Python can use the native wchar_t type or not (it has to be a > 16-bit unsigned type). > """ > > Python can also deal with UCS4 now, but the concept remains the > same. Well, it doesn't have to be a 16-bit type either, even in a UCS2 build, and we had a long argument about that one before, because a particular Cray system didn't have any 16-bit type and the Unicode code wasn't working there. That got repaired when I rewrote the few bits of code that assumed "exactly 16 bits" to live with the weaker "at least 16 bits". In this iteration, Martin agreed that a signed 16-bit wchar_t can be rejected. The question remaining is what actual problem exists when there's a signed wchar_t exceeding 16 bits. Since Jeremy is running on exactly such a system, and the tests pass for him, there's no *obvious* problem with it (the segfault he experienced was due to reading uninitialized memory, and that was a bug, and that's been fixed).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4