> I think you want to use codePointCount() to count the Unicode code points. > length() returns Unicode code units. > > As http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html explains: > > In the J2SE API documentation, Unicode code point is used for character > values in the range between U+0000 and U+10FFFF, and Unicode code unit is > used for 16-bit char values that are code units of the UTF-16 encoding. So you would like to contribute a function codePointCount to Python's standard library? Go ahead. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4