On 03-02-2014 3:35 pm, Victor Stinner wrote: > 2014-02-03 Phil Thompson <phil at riverbankcomputing.com>: >> For example, a string created with a maxchar of 255 (ie. a Latin-1 >> string) >> must contain at least one character in the range 128-255 otherwise >> you get >> an assertion failure. > > Yes, it's the specification of the PEP 393. > >> As it stands, when converting Latin-1 strings in my C extension >> module I >> must first check each character and specify a maxchar of 127 if the >> strings >> happens to only contain ASCII characters. > > Use PyUnicode_FromKindAndData(PyUnicode_1BYTE_KIND, latin1_str, > length) which computes the kind for you. > >> What is the reasoning behind the checks being so strict? > > Different Python functions rely on the exact kind to compare strings. > For example, if you search a latin1 substring in an ASCII string, the > search returns immediatly instead of searching in the string. A > latin1 > string cannot be found in an ASCII string. > > The main reason in the PEP 393 itself, a string must be compact to > not > waste memory. > > Victor Are you saying that code will fail if a particular Latin-1 string just happens not to contains any character greater than 127? I would be very surprised if that was the case. If it isn't the case then I think that particular check shouldn't be made. Phil
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4