On 5 June 2014 22:01, Paul Sokolovsky <pmiscml at gmail.com> wrote: > > All these changes are what let me dream on and speculate on > possibility that Python4 could offer an encoding-neutral string type > (which means based on bytes) > To me, an "encoding neutral string type" means roughly "characters are atomic", and the best representation we have for a "character" is a Unicode code point. Through any interface that provides "characters" each individual "character" (code point) is indivisible. To me, Python 3 has exactly an "encoding-neutral string type". It also has a bytes type that is is just that - bytes which can represent anything at all.It might be the UTF-8 representation of a string, but you have the freedom to manipulate it however you like - including making it no longer valid UTF-8. Whilst I think O(1) indexing of strings is important, I don't think it's as important as the property that "characters" are indivisible and would be quite happy for MicroPython to use UTF-8 as the underlying string representation (or some more clever thing, several ideas in this thread) so long as: 1. It maintains a string type that presents code points as indivisible elements; 2. The performance consequences of using UTF-8 are documented, as well as any optimisations, tricks, etc that are used to overcome those consequences (and what impact if any they would have if code written for MicroPython was run in CPython). Cheers, Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140605/4ca05eb2/attachment.html>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4