2014-02-03 Phil Thompson <phil at riverbankcomputing.com>: > For example, a string created with a maxchar of 255 (ie. a Latin-1 string) > must contain at least one character in the range 128-255 otherwise you get > an assertion failure. Yes, it's the specification of the PEP 393. > As it stands, when converting Latin-1 strings in my C extension module I > must first check each character and specify a maxchar of 127 if the strings > happens to only contain ASCII characters. Use PyUnicode_FromKindAndData(PyUnicode_1BYTE_KIND, latin1_str, length) which computes the kind for you. > What is the reasoning behind the checks being so strict? Different Python functions rely on the exact kind to compare strings. For example, if you search a latin1 substring in an ASCII string, the search returns immediatly instead of searching in the string. A latin1 string cannot be found in an ASCII string. The main reason in the PEP 393 itself, a string must be compact to not waste memory. Victor
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4