[Martin von Löwis] > Manuel Huesser <sylphaleya at hta.fhz.ch> writes: > > > Yep Unicode supports less characters than there are possible with > > utf-8 (ucs range = 2 ** 31). > > > > so there is no possibility to support the full range of the ucs > > character set with python? > > The ucs range (for UCS-4) is *not* 2**31; it is 17*2**16. It was 2**32 > in ISO/IEC 10646:1993 (I believe), but it got constrained in 10646:2000. I think UCS-4 is (or at least was) defined for 2**31 code points only. I do not know why the sign bit was excluded (maybe to avoid problems with negative values for code points?), but if you consider the logic of UTF-8, you will see than one full byte would be needed to support the 32th bit. This does not mean it was the reason, I do not know. UTF-16 has 17*2**16 code points. I did not recently study the legal verses, but my overall impression is that UTF-16 has been more or less integrated in UCS-2 in more recent Unicode versions, and made official. I do not know exactly what means UCS-2 nowadays, as it does not really exist anymore as defined originally (with the intent of being fixed width). Unless UCS-2 is 2**16 - 2**11 codepoints? The surrogate areas cannot sensibly be part of it, at least nowadays. Hmph! I should really read recent legal texts when I get to dive in such areas... :-) -- François Pinard http://www.iro.umontreal.ca/~pinard
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4