On Wed, Jun 04, 2014 at 01:14:04PM +0000, Steve Dower wrote: > I'm agree with Daniel. Directly indexing into text suggests an > attempted optimization that is likely to be incorrect for a set of > strings. I'm afraid I don't understand this argument. The language semantics says that a string is an array of code points. Every index relates to a single code point, no code point extends over two or more indexes. There's a 1:1 relationship between code points and indexes. How is direct indexing "likely to be incorrect"? e.g. s = "---ÿ---" offset = s.index('ÿ') assert s[offset] == 'ÿ' That cannot fail with Python's semantics. [Aside: it does fail in Python 2, showing that the idea that "strings are bytes" is fatally broken. Fortunately Python has moved beyond that.] > Splitting, regex, concatenation and formatting are really the > main operations that matter, and MicroPython can optimize their > implementation of these easily enough for O(N) indexing. Really? Well, it will be a nice experiment. Fortunately MicroPython runs under Linux as well as on embedded systems (a clever decision, by the way) so I look forward to seeing how their internal-utf8 implementation stacks up against CPython's FSR implementation. Out of curiosity, when the FSR was proposed, did anyone consider an internal UTF-8 representation? If so, why was it rejected? -- Steven
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4