Guido van Rossum writes: > I see nothing wrong with having the language's fundamental data types > (i.e., the unicode object, and even the re module) to be defined in > terms of codepoints, not characters, and I see nothing wrong with > len() returning the number of codepoints (as long as it is advertised > as such). In fact, the Unicode Standard, Version 6, goes farther (to code units): 2.7 Unicode Strings A Unicode string data type is simply an ordered sequence of code units. Thus a Unicode 8-bit string is an ordered sequence of 8-bit code units, a Unicode 16-bit string is an ordered sequence of 16-bit code units, and a Unicode 32-bit string is an ordered sequence of 32-bit code units. Depending on the programming environment, a Unicode string may or may not be required to be in the corresponding Unicode encoding form. For example, strings in Java, C#, or ECMAScript are Unicode 16-bit strings, but are not necessarily well-formed UTF-16 sequences. (p. 32).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4