Nicholas Bastin wrote: > If this is the case, then we're clearly misleading users. If the > configure script says UCS-2, then as a user I would assume that > surrogate pairs would *not* be encoded, because I chose UCS-2, and it > doesn't support that. What do you mean by that? That the interpreter crashes if you try to store a low surrogate into a Py_UNICODE? > I would assume that any UTF-16 string I would > read would be transcoded into the internal type (UCS-2), and information > would be lost. If this is not the case, then what does the configure > option mean? It tells you whether you have the two-octet form of the Universal Character Set, or the four-octet form. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4