On May 6, 2005, at 3:42 PM, James Y Knight wrote: > On May 6, 2005, at 2:49 PM, Nicholas Bastin wrote: >> If this is the case, then we're clearly misleading users. If the >> configure script says UCS-2, then as a user I would assume that >> surrogate pairs would *not* be encoded, because I chose UCS-2, and it >> doesn't support that. I would assume that any UTF-16 string I would >> read would be transcoded into the internal type (UCS-2), and >> information would be lost. If this is not the case, then what does >> the >> configure option mean? > > It means all the string operations treat strings as if they were > UCS-2, but that in actuality, they are UTF-16. Same as the case in the > windows APIs and Java. That is, all string operations are essentially > broken, because they're operating on encoded bytes, not characters, > but claim to be operating on characters. Well, this is a completely separate issue/problem. The internal representation is UTF-16, and should be stated as such. If the built-in methods actually don't work with surrogate pairs, then that should be fixed. -- Nick
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4